Tesseract-OCR5.0命令类型

tesseract

tesseract-ocr/tesseract: 是一个开源的光学字符识别（OCR）引擎，适用于从图像中提取和识别文本。特点是可以识别多种语言，具有较高的识别准确率，并且支持命令行和API调用。

项目地址：https://gitcode.com/gh_mirrors/te/tesseract

免费下载资源

桔子code

695人浏览 · 2021-10-22 14:31:04

桔子code · 2021-10-22 14:31:04 发布

原文链接：http://www.juzicode.com/image-ocr-tesseract-ocr5-command

帮助命令 –help或-h

直接输入tesseract或tesseract –help或tesseract -h会带出帮助信息：

E:\juzicode>tesseract
Usage:
  tesseract --help | --help-extra | --version
  tesseract --list-langs
  tesseract imagename outputbase [options...] [configfile...]

OCR options:
  -l LANG[+LANG]        Specify language(s) used for OCR.
NOTE: These options must occur before any configfile.

Single options:
  --help                Show this help message.
  --help-extra          Show extra help for advanced users.
  --version             Show version information.
  --list-langs          List available languages for tesseract engine.

完整帮助信息 –help-extra

E:\juzicode>tesseract --help-extra
Usage:
  tesseract --help | --help-extra | --help-psm | --help-oem | --version
  tesseract --list-langs [--tessdata-dir PATH]
  tesseract --print-fonts-table [options...] [configfile...]
  tesseract --print-parameters [options...] [configfile...]
  tesseract imagename|imagelist|stdin outputbase|stdout [options...] [configfile...]

OCR options:
  --tessdata-dir PATH   Specify the location of tessdata path.
  --user-words PATH     Specify the location of user words file.
  --user-patterns PATH  Specify the location of user patterns file.
  --dpi VALUE           Specify DPI for input image.
  --loglevel LEVEL      Specify logging level. LEVEL can be
                        ALL, TRACE, DEBUG, INFO, WARN, ERROR, FATAL or OFF.
  -l LANG[+LANG]        Specify language(s) used for OCR.
  -c VAR=VALUE          Set value for config variables.
                        Multiple -c arguments are allowed.
  --psm NUM             Specify page segmentation mode.
  --oem NUM             Specify OCR Engine mode.
NOTE: These options must occur before any configfile.

Page segmentation modes:
  0    Orientation and script detection (OSD) only.
  1    Automatic page segmentation with OSD.
  2    Automatic page segmentation, but no OSD, or OCR. (not implemented)
  3    Fully automatic page segmentation, but no OSD. (Default)
  4    Assume a single column of text of variable sizes.
  5    Assume a single uniform block of vertically aligned text.
  6    Assume a single uniform block of text.
  7    Treat the image as a single text line.
  8    Treat the image as a single word.
  9    Treat the image as a single word in a circle.
 10    Treat the image as a single character.
 11    Sparse text. Find as much text as possible in no particular order.
 12    Sparse text with OSD.
 13    Raw line. Treat the image as a single text line,
       bypassing hacks that are Tesseract-specific.

OCR Engine modes:
  0    Legacy engine only.
  1    Neural nets LSTM engine only.
  2    Legacy + LSTM engines.
  3    Default, based on what is available.

Single options:
  -h, --help            Show minimal help message.
  --help-extra          Show extra help for advanced users.
  --help-psm            Show page segmentation modes.
  --help-oem            Show OCR Engine modes.
  -v, --version         Show version information.
  --list-langs          List available languages for tesseract engine.
  --print-fonts-table   Print tesseract fonts table.
  --print-parameters    Print tesseract parameters.

查看版本 –version或-v

tesseract –version或tesseract -v

E:\juzicode>tesseract --version
tesseract v5.0.0-rc1.20211030
 leptonica-1.78.0
  libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
 Found AVX2
 Found AVX
 Found FMA
 Found SSE4.1
 Found libarchive 3.5.0 zlib/1.2.11 liblzma/5.2.3 bz2lib/1.0.6 liblz4/1.7.5 libzstd/1.4.5
 Found libcurl/7.77.0-DEV Schannel zlib/1.2.11 zstd/1.4.5 libidn2/2.0.4 nghttp2/1.31.0

查看安装语言包 –list-langs

E:\juzicode>tesseract --list-langs
 List of available languages (6):
 chi_sim
 eng
 mnist
 osd

识别命令

tesseract imagename|imagelist|stdin outputbase|stdout [options…] [configfile…]

a、options参数：

OCR options:
  --tessdata-dir PATH   Specify the location of tessdata path.
  --user-words PATH     Specify the location of user words file.
  --user-patterns PATH  Specify the location of user patterns file.
  --dpi VALUE           Specify DPI for input image.
  --loglevel LEVEL      Specify logging level. LEVEL can be
                        ALL, TRACE, DEBUG, INFO, WARN, ERROR, FATAL or OFF.
  -l LANG[+LANG]        Specify language(s) used for OCR.
  -c VAR=VALUE          Set value for config variables.
                        Multiple -c arguments are allowed.
  --psm NUM             Specify page segmentation mode.
  --oem NUM             Specify OCR Engine mode.
NOTE: These options must occur before any configfile.

b、查看psm分段类型

E:\juzicode\image\tess>tesseract --help-psm
Page segmentation modes:
  0    Orientation and script detection (OSD) only.
  1    Automatic page segmentation with OSD.
  2    Automatic page segmentation, but no OSD, or OCR. (not implemented)
  3    Fully automatic page segmentation, but no OSD. (Default)
  4    Assume a single column of text of variable sizes.
  5    Assume a single uniform block of vertically aligned text.
  6    Assume a single uniform block of text.
  7    Treat the image as a single text line.
  8    Treat the image as a single word.
  9    Treat the image as a single word in a circle.
 10    Treat the image as a single character.
 11    Sparse text. Find as much text as possible in no particular order.
 12    Sparse text with OSD.
 13    Raw line. Treat the image as a single text line,
       bypassing hacks that are Tesseract-specific.

c、查看ocr引擎类型


E:\juzicode>tesseract --help-oem
OCR Engine modes:
  0    Legacy engine only.
  1    Neural nets LSTM engine only.
  2    Legacy + LSTM engines.
  3    Default, based on what is available.

查看某种语言支持的字体 –print-fonts-table -l xxx

E:\juzicode>tesseract --print-fonts-table  -l chi_sim
 Tesseract fonts table:
 ID=  1: Aachen_Std_Bold is_italic=false is_bold=true is_fixed_pitch=false is_serif=false is_fraktur=false
 ID=  2: Aachen_Std_Medium is_italic=false is_bold=false is_fixed_pitch=false is_serif=false is_fraktur=false
 ID=  3: aakar_Medium is_italic=false is_bold=false is_fixed_pitch=false is_serif=false is_fraktur=false
 ID=  4: Abadi_MT_Std_Bold is_italic=false is_bold=true is_fixed_pitch=false is_serif=true is_fraktur=false
 ID=  5: Abadi_MT_Std_Bold_Italic is_italic=true is_bold=true is_fixed_pitch=false is_serif=true is_fraktur=false
 ID=  6: Abadi_MT_Std_Condensed is_italic=false is_bold=false is_fixed_pitch=false is_serif=true is_fraktur=false
 ID=  7: Abadi_MT_Std_Light is_italic=false is_bold=false is_fixed_pitch=false is_serif=true is_fraktur=false
 ID=  8: Abadi_MT_Std_Light_Condensed is_italic=false is_bold=false is_fixed_pitch=false is_serif=true is_fraktur=false
 ID=  9: Abadi_MT_Std_Light_Italic is_italic=true is_bold=false is_fixed_pitch=false is_serif=true is_fraktur=false
 ID= 10: Abadi_MT_Std_Medium_Italic is_italic=true is_bold=false is_fixed_pitch=false is_serif=true is_fraktur=false
 ID= 11: Abadi_MT_Std_Ultra-Bold is_italic=false is_bold=true is_fixed_pitch=false is_serif=true is_fraktur=false
 ID= 12: Abadi_MT_Std_Ultra-Bold_Italic is_italic=true is_bold=true is_fixed_pitch=false is_serif=true is_fraktur=false
 ID= 13: Abaton_ITC_Std_Light is_italic=false is_bold=false is_fixed_pitch=false is_serif=false is_fraktur=false
 ID= 14: Aboriginal_Sans is_italic=false is_bold=false is_fixed_pitch=false is_serif=false is_fraktur=false
 ID= 15: Aboriginal_Sans_Bold is_italic=false is_bold=true is_fixed_pitch=false is_serif=false is_fraktur=false
。。。。。

查看参数–print-parameters

E:\juzicode>tesseract --print-parameters
 Tesseract parameters:
 log_level       2147483647      Logging level
 textord_dotmatrix_gap   3       Max pixel gap for broken pixed pitch
 textord_debug_block     0       Block to do debug on
 textord_pitch_range     2       Max range test on pitch
 textord_words_veto_power        5       Rows required to outvote a veto
 textord_tabfind_show_strokewidths       0       Show stroke widths (ScrollView)
 pitsync_linear_version  6       Use new fast algorithm
 oldbl_holed_losscount   10      Max lost before fallback line used
 textord_skewsmooth_offset       4       For smooth factor
 textord_skewsmooth_offset2      1       For smooth factor
 textord_test_x  -2147483647     coord of test pt
 textord_test_y  -2147483647     coord of test pt
 textord_min_blobs_in_row        4       Min blobs before gradient counted
 textord_spline_minblobs 8       Min blobs in each spline segment
 textord_spline_medianwin        6       Size of window for spline segmentation
 textord_max_blob_overlaps       4       Max number of blobs a big blob can overlap
 textord_min_xheight     10      Min credible pixel xheight
 textord_lms_line_trials 12      Number of linew fits to do
。。。。

扩展阅读：

Tesseract-OCR5.0软件安装和语言包安装(Windows系统)

GitHub 加速计划 / te / tesseract

下载

最近提交(Master分支：3 个月前 )

bc490ea7 Don't check for a directory, because a symbolic link is also allowed. Signed-off-by: Stefan Weil <sw@weilnetz.de> 5 个月前

2991d36a - 6 个月前

GitCode 开源社区

旨在为数千万中国开发者提供一个无缝且高效的云端环境，以支持学习、使用和贡献开源项目。

更多推荐

[转载]在Windows环境下安装GNU Radio

转自：在Windows环境下安装GNURadio_恐弱智_新浪博客GNU Radio是用Python开发的，大部分开源的工程能够在Linux环境下运行良好，而Windows下却运行的很勉强，而且安装配置都很复杂。GNU Radio算是个例外了，不光提供了Windows的二进制安装，还有比较详细的说明。我是Python小白，所以折腾了好久才弄好，特意记录下来，免得以后再装还折腾。GNU Radio的

GitCode 开源社区

centOS 8 使用dnf安装Docker

DNF是什么？CentOS 8使用YUM软件包管理器版本v4.0.4。现在，该版本使用DNF(已删除YUM)。DNF是软件包管理器。它会在Linux发行版上安装，执行更新并删除软件包。使用DNF安装Docker跳过具有损坏依赖性的程序包一个有效的解决方案是使您的CentOS 8系统使用以下--nobest命令安装最符合条件的版本：sudo dnf install docker...

GitCode 开源社区

定时同步数据库表(mysql+linux+crontab)

sync.sh里面的参数需要改变，ip/username/password/database/tablesync.sh#!/bin/sh# Please change the IP and password of the data source db.# Then change the table name.filename=/home/nington/db/$(date +%Y-%m