最近发现一个开源的OCR项目,PaddleOCR,支持通过离线部署Hub Serving服务来识别和本地程序包识别。

运行环境 :Windows 10

开发工具: Visual Studio 2022

NET版本:NET6

需要安装的程序包:PaddleOCR,版本:0.0.5 。以及PaddleOCRUtf8,版本:0.0.5

刚刚开始时候使用PaddleOCR来识别,发现英文和数字可以成功识别,准确率还很高。后面发现识别中文的时候,出现中文乱码(识别模型都是用的同一个)。后面用PaddleOCRUtf8包识别,发现可以解决中文乱码的问题,如下图:

识别图片:

识别的基础代码如下:

using System.Text;
using System.Text.Json;

namespace JuCheap_Demo_OCR
{
    internal class PaddleOCRService
    {
        //基础路径
        private readonly static string _basePath = AppDomain.CurrentDomain.BaseDirectory;
        //识别图片的路径
        private readonly static string _imagePath = $"{_basePath}\\id_card.jpg";

        private readonly string _detPath = $"{_basePath}\\PaddleModel\\ch_ppocr_server_v2.0_det_infer";
        private readonly string _recPath = $"{_basePath}\\PaddleModel\\ch_ppocr_server_v2.0_rec_infer";
        private readonly string _clsPath = $"{_basePath}\\PaddleModel\\ch_ppocr_mobile_v2.0_cls_infer";
        private readonly string _charListFileListPath = $"{_basePath}\\PaddleModel\\chinese_zh_dict.txt";
        private readonly string _fileBase64 = Convert.ToBase64String(File.ReadAllBytes(_imagePath), Base64FormattingOptions.None);

        /// <summary>
        /// PaddleOCR包本地识别
        /// </summary>
        public async Task RecognizeByPaddleOCR()
        {
            WriteOneLine();

            //通过本地程序包识别(英文和数字可以。中文会出现乱码)
            PaddleOCR.PaddleOCR.Initialize(_detPath, _recPath, _clsPath, _charListFileListPath, 4, true);
            var result = await PaddleOCR.PaddleOCR.Recognize(_imagePath);
            foreach (var box in result.Boxes)
            {
                Console.WriteLine($"PaddleOCR本地包识别结果={box.Text},信任度={box.Score}");
            }
        }

        /// <summary>
        /// PaddleOCRUtf8本地识别
        /// </summary>
        public async Task RecognizeByPaddleOCRUtf8()
        {
            WriteOneLine();

            //解决中文乱码问题
            PaddleOCRUtf8.PaddleOCR.Initialize(_detPath, _recPath, _clsPath, _charListFileListPath, 4, true);
            var resultUtf8 = await PaddleOCRUtf8.PaddleOCR.Recognize(_imagePath);
            foreach (var box in resultUtf8.Boxes)
            {
                Console.WriteLine($"PaddleOCRUtf8本地包识别结果={box.Text},信任度={box.Score}");
            }
        }

        /// <summary>
        /// 使用python搭建的HubServing解析服务识别
        /// </summary>
        public async Task RecognizeByHubServing()
        {
            WriteOneLine();

            try
            {
                //通过hub ocr_system识别
                var client = new HttpClient();
                client.BaseAddress = new Uri("http://127.0.0.1:8866/");
                var postData = new
                {
                    images = new string[] { _fileBase64 }
                };

                var content = new StringContent(JsonSerializer.Serialize(postData), Encoding.UTF8, "application/json");
                var response = await client.PostAsync("predict/ocr_system", content);
                var responseContent = await response.Content.ReadAsStringAsync();

                var responseResult = JsonSerializer.Deserialize<OCRResponseDTO>(responseContent);
                if (responseResult != null && responseResult.Data != null)
                {
                    foreach (var items in responseResult.Data)
                    {
                        foreach (var box in items)
                        {
                            Console.WriteLine($"HubServing识别结果={box.Text},信任度={box.Confidence}");
                        }
                    }
                }
            }
            catch (Exception ex)
            {
                Console.WriteLine($"Hub Serving识别异常:{ex.Message}");
            }

            WriteOneLine();
        }

        private void WriteOneLine()
        {
            Console.WriteLine($"--------------------------------------------------------------------------------------------------");
        }
    }
}

识别结果:

源代码:

https://gitee.com/jucheap/demo

里面的JuCheap-Demo-OCR项目,直接运行,可以看到效果。

总结:本地包的识别,多少会有点问题,比如:【公民身份证】没有识别完整。推荐使用Hub Serving来搭建服务识别。准确率更高。

GitHub 加速计划 / pa / PaddleOCR
41.52 K
7.59 K
下载
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
最近提交(Master分支:3 个月前 )
d3d7e858 5 天前
d1bc4166 6 天前
Logo

旨在为数千万中国开发者提供一个无缝且高效的云端环境,以支持学习、使用和贡献开源项目。

更多推荐