首先需要下载tesseract-ocr-setup-3.02.02.exe,来自谷歌的一个ocr识别项目,提供一个地址:
http://pan.baidu.com/s/1jI551Gi
下载后进行安装,本例安装地址是:C:\Program Files (x86)\Tesseract-OCR\tesseract.exe
然后需要在安装python对应的支持库PIL,pytesseract
pip install pil
pip install pytesseract
记录一段测试代码:
1 # -*- coding: utf-8 -*- 2 3 import urllib.request, urllib.parse 4 from PIL import Image 5 from pytesseract import * 6 7 ''' 8 获取验证码 9 ''' 10 auth_img_url = r'https://********/authImage?' 11 urllib.request.urlretrieve(auth_img_url, 'auth.jpg') 12 13 ''' 14 对图像进行灰度化,二值化处理 15 ''' 16 img = Image.open('auth.jpg') 17 img_gray = img.convert('L') 18 19 threshold = 140 20 table = [] 21 for i in range(256): 22 if i < threshold: 23 table.append(0) 24 else: 25 table.append(1) 26 27 out = img_gray.point(table, '1') 28 # out.save('auth_b.jpg') 29 30 ''' 31 识别验证码中的文字 32 ''' 33 auth = pytesseract.image_to_string(out) 34 35 print(auth)
ps. 可能出现“FileNotFoundError: [WinError 2] 系统找不到指定的文件”的提示,解决办法是:
打开文件 pytesseract.py,将tesseract_cmd的值修改为全路径
# CHANGE THIS IF TESSERACT IS NOT IN YOUR PATH, OR IS NAMED DIFFERENTLY # tesseract_cmd = 'tesseract' tesseract_cmd = r'C:\Program Files (x86)\Tesseract-OCR\tesseract.exe'
所有评论(0)