环境搭建

安装Tesseract

下载64位

ab4a623563f5ebc1a18bf9dde79d68e4.png

安装时可以选择语言包一路next

加入path环境变量后,查看是否成功,pycharm需要重新启动,否则找不到

C:\Program Files (x86)\Tesseract-OCR

c4f984e3985fbc7cc6c7bc23aa41322b.png

安装Python相关库

pip install opencv-contrib-python -i https://pypi.doubanio.com/simple/ --trusted-host pypi.doubanio.com

pip install pytesseract -i https://pypi.doubanio.com/simple/ --trusted-host pypi.doubanio.com

英文数字识别

8df922992500e52f9322d996de993e7a.png

结果

text: import cv2 as cv

import numpy as np

import pytesseract as tess

from PIL import Image

如果识别验证码的话,需要做更多的处理,去除线条和噪点

import cv2 as cv

import numpy as np

import pytesseract as tess

from PIL import Image

img = cv.imread('code2.jpg')

cv.imshow('img', img)

gray = cv.cvtColor(img, cv.COLOR_BGR2GRAY)

ret, binary = cv.threshold(gray, 0, 255, cv.THRESH_BINARY | cv.THRESH_OTSU)

cv.imshow('bin', binary)

kernel = cv.getStructuringElement(cv.MORPH_RECT, (2, 2))

open_out = cv.morphologyEx(binary, cv.MORPH_OPEN, kernel)

cv.imshow('open', open_out)

cv.bitwise_not(open_out, open_out)

cv.imshow('open_out', open_out)

text_img = Image.fromarray(open_out)

text = tess.image_to_string(text_img)

print('text:', text)

cv.waitKey(0)

中文识别

查看支持语言

tesseract --list-langs

eng 英文

chi_tra 中文繁体

chi_sim 中文简体

只需要改变一个参数即可

8b2934bcf2914d79984d4ffa6ad2d7c8.png

D:\ProgramData\Anaconda3\python.exe D:/code/py/blogsolr/验证码识别.py

text: API层面

, 学会使用OpenCy 形态学与二值化API做预处理

,使用Tesseract- OCR做文字识别

4 识别率问题讨论

import cv2 as cv

import numpy as np

import pytesseract as tess

from PIL import Image

img = cv.imread('code3.jpg')

cv.imshow('img', img)

gray = cv.cvtColor(img, cv.COLOR_BGR2GRAY)

ret, binary = cv.threshold(gray, 0, 255, cv.THRESH_BINARY | cv.THRESH_OTSU)

cv.imshow('bin', binary)

kernel = cv.getStructuringElement(cv.MORPH_RECT, (2, 2))

open_out = cv.morphologyEx(binary, cv.MORPH_OPEN, kernel)

cv.imshow('open', open_out)

cv.bitwise_not(open_out, open_out)

cv.imshow('open_out', open_out)

text_img = Image.fromarray(open_out)

text = tess.image_to_string(text_img, 'chi_sim')

print('text:', text)

cv.waitKey(0)

转载至链接:https://my.oschina.net/ahaoboy/blog/1922309

Logo

AtomGit 是由开放原子开源基金会联合 CSDN 等生态伙伴共同推出的新一代开源与人工智能协作平台。平台坚持“开放、中立、公益”的理念,把代码托管、模型共享、数据集托管、智能体开发体验和算力服务整合在一起,为开发者提供从开发、训练到部署的一站式体验。

更多推荐