NET6使用PaddleOCR识别图片中的文字信息
PaddleOCR
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
项目地址:https://gitcode.com/gh_mirrors/pa/PaddleOCR
免费下载资源
·
最近发现一个开源的OCR项目,PaddleOCR,支持通过离线部署Hub Serving服务来识别和本地程序包识别。
运行环境 :Windows 10
开发工具: Visual Studio 2022
NET版本:NET6
需要安装的程序包:PaddleOCR,版本:0.0.5 。以及PaddleOCRUtf8,版本:0.0.5
刚刚开始时候使用PaddleOCR来识别,发现英文和数字可以成功识别,准确率还很高。后面发现识别中文的时候,出现中文乱码(识别模型都是用的同一个)。后面用PaddleOCRUtf8包识别,发现可以解决中文乱码的问题,如下图:
识别图片:
识别的基础代码如下:
using System.Text;
using System.Text.Json;
namespace JuCheap_Demo_OCR
{
internal class PaddleOCRService
{
//基础路径
private readonly static string _basePath = AppDomain.CurrentDomain.BaseDirectory;
//识别图片的路径
private readonly static string _imagePath = $"{_basePath}\\id_card.jpg";
private readonly string _detPath = $"{_basePath}\\PaddleModel\\ch_ppocr_server_v2.0_det_infer";
private readonly string _recPath = $"{_basePath}\\PaddleModel\\ch_ppocr_server_v2.0_rec_infer";
private readonly string _clsPath = $"{_basePath}\\PaddleModel\\ch_ppocr_mobile_v2.0_cls_infer";
private readonly string _charListFileListPath = $"{_basePath}\\PaddleModel\\chinese_zh_dict.txt";
private readonly string _fileBase64 = Convert.ToBase64String(File.ReadAllBytes(_imagePath), Base64FormattingOptions.None);
/// <summary>
/// PaddleOCR包本地识别
/// </summary>
public async Task RecognizeByPaddleOCR()
{
WriteOneLine();
//通过本地程序包识别(英文和数字可以。中文会出现乱码)
PaddleOCR.PaddleOCR.Initialize(_detPath, _recPath, _clsPath, _charListFileListPath, 4, true);
var result = await PaddleOCR.PaddleOCR.Recognize(_imagePath);
foreach (var box in result.Boxes)
{
Console.WriteLine($"PaddleOCR本地包识别结果={box.Text},信任度={box.Score}");
}
}
/// <summary>
/// PaddleOCRUtf8本地识别
/// </summary>
public async Task RecognizeByPaddleOCRUtf8()
{
WriteOneLine();
//解决中文乱码问题
PaddleOCRUtf8.PaddleOCR.Initialize(_detPath, _recPath, _clsPath, _charListFileListPath, 4, true);
var resultUtf8 = await PaddleOCRUtf8.PaddleOCR.Recognize(_imagePath);
foreach (var box in resultUtf8.Boxes)
{
Console.WriteLine($"PaddleOCRUtf8本地包识别结果={box.Text},信任度={box.Score}");
}
}
/// <summary>
/// 使用python搭建的HubServing解析服务识别
/// </summary>
public async Task RecognizeByHubServing()
{
WriteOneLine();
try
{
//通过hub ocr_system识别
var client = new HttpClient();
client.BaseAddress = new Uri("http://127.0.0.1:8866/");
var postData = new
{
images = new string[] { _fileBase64 }
};
var content = new StringContent(JsonSerializer.Serialize(postData), Encoding.UTF8, "application/json");
var response = await client.PostAsync("predict/ocr_system", content);
var responseContent = await response.Content.ReadAsStringAsync();
var responseResult = JsonSerializer.Deserialize<OCRResponseDTO>(responseContent);
if (responseResult != null && responseResult.Data != null)
{
foreach (var items in responseResult.Data)
{
foreach (var box in items)
{
Console.WriteLine($"HubServing识别结果={box.Text},信任度={box.Confidence}");
}
}
}
}
catch (Exception ex)
{
Console.WriteLine($"Hub Serving识别异常:{ex.Message}");
}
WriteOneLine();
}
private void WriteOneLine()
{
Console.WriteLine($"--------------------------------------------------------------------------------------------------");
}
}
}
识别结果:
源代码:
https://gitee.com/jucheap/demo
里面的JuCheap-Demo-OCR项目,直接运行,可以看到效果。
总结:本地包的识别,多少会有点问题,比如:【公民身份证】没有识别完整。推荐使用Hub Serving来搭建服务识别。准确率更高。
GitHub 加速计划 / pa / PaddleOCR
41.52 K
7.59 K
下载
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
最近提交(Master分支:3 个月前 )
d3d7e858
5 天前
d1bc4166
6 天前
更多推荐
已为社区贡献2条内容
所有评论(0)