GOCR   (1) manpage
GOCR
1
14 May 2003
Linux
User's Manual
  • NAME
      gocr - command line OCR tool
  • SYNOPSIS
      gocr [OPTION] [-i] pnm file
  • DESCRIPTION
      gocr is an optical character recognition program that can be used from the command line.  It takes input in PNM, PGM, PBM, PPM, or PCX format, and writes recognized text to stdout.  If the pnm file is a single dash, PNM data is read from stdin. If gzip, bzip2 and netpbm-progs are installed and your system supports popen(3) also pnm.gz, pnm.bz2, png, jpg, jpeg, tiff, gif, bmp, ps (only single pages) and eps are supported as input files (not as input stream), where pnm can be replaced by one of ppm, pgm and pbm.
  • OPTIONS
      -h
      show usage information
      -i file
      read input from file (or stdin if file is a single dash)
      -o file
      send output to file instead of stdout
      -e file
      send errors to file instead of stderr or to stdout if file is a dash
      -x file
      progress output to file (file can be a file name, a fifo name or a file descriptor 1...255)
      -p path
      database path (including final slash, default is ./db/)
      -f format
      output format (ISO8859_1 TeX HTML UTF8 ASCII)
      -l level
      set grey level to level (0<160<=255, default: 0 for autodetect)
      -d size
      set dust size in pixels (clusters smaller than this are removed), 0 means no clusters are removed, the default is -1 for auto detection
      -s num
      set spacewidth/dots (default: 0 for autodetect)
      -v verbosity
      be verbose; verbosity is a bitfield
      -c string
      only verbose output of characters from string
      -C string
      only recognise characters from string
      -m modes
      set operation modes; modes is a bitfield
      -n bool
      if bool is non-zero, only recognise numbers (this is now obsolete, use -C "0123456789")

      The verbosity is specified as a bitfield:
      1
      print more info
      2
      list shapes of boxes (see -c)
      4
      list pattern of boxes (see -c)
      8
      print pattern after recognition
      16
      print line information
      32
      create outXX.pgm

      The operation modes are:
      2
      use database (early development)
      4
      layout analysis, zoning (development)
      8
      don't compare unrecognized characters
      16
      don't divide overlapping characters
      32
      don't do context correction
      64
      character packing (development)
      130
      extend database, prompts user (128+2, early development)
      256
      switch off the OCR engine (makes sense together with -m 2)

  • AUTHOR
      Joerg Schulenburg <jschulen@gmx.de>
      First version of man page by Tim Waugh <twaugh@redhat.com>
  • VERSION INFORMATION
      This man page documents gocr, version 0.37.
  • REPORTING BUGS
      Report bugs to <jschulen@gmx.de>
  • SEE ALSO
      More details can be found at /usr/share/doc/gocr-X.XX/gocr.html.
  • EXAMPLES
      gocr -v 33 text1.pbm
      output verbose information, out30.bmp is created to see details of recognition process
      gocr -v 7 -c _YV text1.pbm
      verbose output for unknown chars and chars Y and V
      djpeg -pnm -gray text.jpg | gocr -
      convert a jpeg file to pnm format and input via pipe
Current Users: 49 © 1999-2006 Linux.com.hk PenguinSoft
All trademarks and copyrights on this page are owned by their respective companies. Linux is a trademark of Linus Torvalds.