数据位置: data/data159696/report_ex.tar

解压命令 !tar -xf /home/aistudio/data/data159696/report_ex.tar


  └─ pngs:存放体检照片,以pngs形式结尾
  └─ txts: 存放标注坐标信息及包含内容.
  └─ json:内容同上 ,存放json格式信息。


Rect (182.0, 1078.03125, 266.0, 1064.03125) 姓名:张某某

Rect (356.0, 1078.03125, 412.0, 1064.03125) 性别:男

Rect (516.0, 1078.03125, 572.0, 1064.03125) 年龄:40
  • *注:本数据坐标是以左下角为原点,利用Paddleocr做检测时需要转换成左上角原点,且本数据坐标需要横纵坐标都乘4.




1.1 安装项目环境


%cd ~ 
!git clone -b release/2.1 https://github.com/PaddlePaddle/PaddleOCR.git
# 安装依赖库
%cd ~/PaddleOCR
!pip install -r requirements.txt -i https://mirror.baidu.com/pypi/simple
1.2. 下载预测模型并测试


! mkdir inference
# 下载超轻量级中文OCR模型的检测模型并解压
! cd inference && wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar && tar xf ch_ppocr_mobile_v2.0_det_infer.tar && rm ch_ppocr_mobile_v2.0_det_infer.tar
# 下载超轻量级中文OCR模型的识别模型并解压
! cd inference && wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar && tar xf ch_ppocr_mobile_v2.0_rec_infer.tar && rm ch_ppocr_mobile_v2.0_rec_infer.tar
# 下载超轻量级中文OCR模型的文本方向分类器模型并解压
! cd inference && wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar && tar xf ch_ppocr_mobile_v2.0_cls_infer.tar && rm ch_ppocr_mobile_v2.0_cls_infer.tar
import matplotlib.pyplot as plt
from PIL import Image
%pylab inline

def show_img(img_path,figsize=(10,10)):
    ## 显示原图,读取名称为11.jpg的测试图像
    img = Image.open(img_path)
    plt.figure("test_img", figsize=figsize)
Populating the interactive namespace from numpy and matplotlib



调用tools/infer/predict_system.py 完成报告识别,共需要传入三个参数:

  • image_dir: 指定要测试的图像
  • det_model_dir: 指定轻量检测模型的inference model
  • rec_model_dir: 指定轻量识别模型的inference model
  • cls_model_dir: 指定轻量方向分类器模型的inference model
# 快速运行
!python3 ./tools/infer/predict_system.py --image_dir="../20220623110401-0.png" \
--det_model_dir="./inference/ch_ppocr_mobile_v2.0_det_infer"  \
--rec_model_dir="./inference/ch_ppocr_mobile_v2.0_rec_infer" \
# 训练效果
!python3 ./tools/infer/predict_system.py --image_dir="../20220623110401-0.png" \
--det_model_dir="./outputall/db_mv3/best_accuracy"  \
--rec_model_dir="./output/rec/best_accuracy" \





2. 训练文字检测模型


注:官方icdar15数据集存放在 ~/data/data34815/icdar2015.tar ,后续如有数据格式问题可做参考。官方数据~/train_data/icdar2015/text_localization 有两个文件夹和两个文件,分别是:

  └─ icdar_c4_train_imgs/         icdar数据集的训练数据
  └─ ch4_test_images/             icdar数据集的测试数据
  └─ train_icdar2015_label.txt    icdar数据集的训练标注
  └─ test_icdar2015_label.txt     icdar数据集的测试标注


" 图像文件名                    json.dumps编码的图像标注信息"
ch4_test_images/img_61.jpg    [{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]], ...}]

json.dumps编码前的图像标注信息是包含多个字典的list,字典中的 p o i n t s points points表示文本框的四个点的坐标(x, y),从左上角的点开始顺时针排列。
t r a n s c r i p t i o n transcription transcription表示当前文本框的文字,在文本检测任务中并不需要这个信息。

2.1. 数据准备

!tar -xf /home/aistudio/data/data159696/report_ex.tar
%cd /home/aistudio/report_ex/pngs
!ls -l | grep "^-" | wc -l   #一共20011张图片


由于数据格式不同,本项目需要编写转换数据程序构建为PaddleOCR标注文件格式, 由于时间原因,格式代码比较粗糙,读者后续可根据需求自行完善。


  └─ train_det_new1_hebing/        report_ex数据集的测试数据
  └─ test_det_new1_hebing  			  report_ex数据集的测试数据
  └─ train_det_new1_hebing.txt.txt    report_ex数据集的训练标注
  └─ test_det_new1_hebing.txt.txt    report_ex数据集的测试标注
  └─ gen_data_det_reg.py          格式转换代码
  └─ hebing.py						数据合并
  └─ split_data.py					切分训练集与测试集
  └─ file.py               拷贝训练集与测试集图片到文件夹
  └─ tools/train.py            训练代码
  └─ tools/infer_det.py         推理代码
  └─ configs/det/det_mv3_db_all.yml  配置文件

2.2 快速启动训练


# 下载MobileNetV3的预训练模型
!wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x0_5_pretrained.tar
! cd pretrain_models/ && tar xf MobileNetV3_large_x0_5_pretrained.tar
# 下载ResNet50的预训练模型
!wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_pretrained.tar
! cd pretrain_models/ && tar xf ResNet50_vd_ssld_pretrained.tar




20220623110401-0.png [{“transcription”:“姓名:张某某”,“points”:[[182.0,4256.125],[266.0,4256.125],[182.0,4312.125],[266.0,4312.125]]}]

20220623110401-0.png [{“transcription”:“性别:男”,“points”:[[356.0,4256.125],[412.0,4256.125],[356.0,4312.125],[412.0,4312.125]]}]

20220623110401-0.png [{“transcription”:“年龄:40”,“points”:[[516.0,4256.125],[572.0,4256.125],[516.0,4312.125],[572.0,4312.125]]}]


20220623110401-0.png 姓名:张某某

20220623110401-0.png 性别:男

20220623110401-0.png 年龄:40

注:本次体检报告由于数据量巨大且训练时间长,为了方便大家查看运行效果及调试 ,下列训练用到的数据集会分为 1.部分数据集与 2.全集数据集,脚本命名已写在注释中,按需运行按需打开注释即可。

#1.部分数据集数据转换脚本, 生成det1.txt ,合并后一共一百多张图片, 坐标为横坐标×4 、 纵坐标=图片高度-纵坐标×4 , reg.txt目前没用到。
#执行报错,会出现IndexError: list index out of range,  只生成2万多条数据,但不影响跑。
# %cd /home/aistudio/
# !python ./gen_data_det_reg.py
%cd /home/aistudio/
!python ./gen_data_all.py
#合并上述生成txt的数据,改为一张照片对应所有坐标合并成一行  ,生成合并后新的txt
#1.部分数据合并 det1.txt,生成det_new_hebing.txt
# !python hebing.py
#2.全量数据合并 det_all.txt,生成det_new_hebing_all.txt
!python hebing_all.py

将检测数据det.txt、识别数据reg.txt 划分训练集和验证集 , 生成train_det.txt 、 test_det.txt、 train_reg.txt 、 test_reg.txt四个文件

# !python split_data.py
#2.划分全量数据集,将det_new_hebing_all.txt拆成 train_det_hebing_all.txt和test_det_hebing_all.txt
!python split_data_all.py



#编辑file.py打开对应注释,重复执行两次!!! 一次train 、一次test , 生成上述txt对应的图片文件夹包含图片。

# !python file.py
!python file_all.py

本次选择backbone为MobileNetV3、Resnet50的db算法的检测模型.通过-c 选择训练使用配置文件configs/det/det_db_mv3.yml配置文件,-o参数在不需要修改yml文件的情况下,改变训练的参数

# 官方训练backbone为MobileNetV3的db算法的检测模型,此部分只做参考,不用执行!
# !python3 PaddleOCR/tools/train.py -c PaddleOCR/configs/det/det_mv3_db.yml -o \
# Global.eval_batch_step="[0,500]" \
# Global.load_static_weights=true \
# Global.pretrained_model='PaddleOCR/pretrain_models/MobileNetV3_large_x0_5_pretrained' \
# Train.dataset.data_dir='PaddleOCR/train_data/text_localization/' \
# Train.dataset.label_file_list=['PaddleOCR/train_data/text_localization/train_icdar2015_label.txt'] \
# Eval.dataset.data_dir='PaddleOCR/train_data/text_localization/' \
# Eval.dataset.label_file_list=['PaddleOCR/train_data/text_localization/test_icdar2015_label.txt']
!pip install lmdb
!pip install pyclipper
!pip install  Levenshtein
!pip install imgaug



%cd /home/aistudio/
!python3 PaddleOCR/tools/train.py -c PaddleOCR/configs/det/det_mv3_db_all.yml -o \
Global.eval_batch_step="[0,300]" \
Global.load_static_weights=true \
Global.checkpoints='./outputall/db_mv3/best_accuracy' \
Global.pretrained_model='PaddleOCR/pretrain_models/MobileNetV3_large_x0_5_pretrained' \
Train.loader.batch_size_per_card=32 \
Train.dataset.data_dir='./report_ex/train_det_hebing_all' \
Train.dataset.label_file_list=['./train_det_hebing_all.txt'] \
Eval.dataset.data_dir='./report_ex/test_det_hebing_all' \
%cd /home/aistudio/
!python3 PaddleOCR/tools/train.py -c PaddleOCR/configs/det/det_mv3_db_all_resnet.yml -o \
Global.eval_batch_step="[0,500]" \
Global.load_static_weights=true \
Global.checkpoints='/home/aistudio/outputall/db_resnet/best_accuracy' \
Global.pretrained_model='PaddleOCR/pretrain_models/ResNet50_vd_ssld_pretrained' \
Train.loader.batch_size_per_card=16 \
Train.dataset.data_dir='./report_ex/train_det_hebing_all' \
Train.dataset.label_file_list=['./train_det_hebing_all.txt'] \
Eval.dataset.data_dir='./report_ex/test_det_hebing_all' \


%cd /home/aistudio/
!python3 PaddleOCR/tools/train.py -c PaddleOCR/configs/det/det_mv3_db.yml -o \
Global.eval_batch_step="[0,50]" \
Global.load_static_weights=true \
Global.pretrained_model='PaddleOCR/pretrain_models/MobileNetV3_large_x0_5_pretrained' \
Train.loader.batch_size_per_card=16 \
Train.dataset.data_dir='./report_ex/train_det_new1_hebing' \
Train.dataset.label_file_list=['./train_det_new1_hebing.txt'] \
Eval.dataset.data_dir='./report_ex/test_det_new1_hebing' \
#3.合并后全集训练模版,可能要调batch_size_per_card大小 ,执行打开注释,注释其他,
# %cd /home/aistudio/
# !python3 PaddleOCR/tools/train.py -c PaddleOCR/configs/det/det_mv3_db.yml -o \
# Global.eval_batch_step="[0,10]" \
# Global.load_static_weights=true \
# Global.pretrained_model='PaddleOCR/pretrain_models/MobileNetV3_large_x0_5_pretrained' \
# Train.loader.batch_size_per_card=32 \
# Train.dataset.data_dir='./report_ex/train_det_hebing_all' \
# Train.dataset.label_file_list=['./train_det_hebing_all.txt'] \
# Eval.dataset.data_dir='./report_ex/test_det_hebing_all' \
# Eval.dataset.label_file_list=['./test_det_hebing_all.txt']

2.3. 测试检测效果



# %cd PaddleOCR
# !python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="../20220623110401-0.png" Global.pretrained_model="/home/aistudio/output1/db_mv3/best_accuracy"
!python3 tools/infer_det.py -c configs/det/det_mv3_db_all.yml -o Global.infer_img="../20220623110401-0.png" Global.pretrained_model="/home/aistudio/outputall/db_mv3/best_accuracy"
# %cd PaddleOCR/
# !python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="../20220623110401-0.png"  Global.checkpoints="./output/db_mv3/best_accuracy"


#!python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/"  Global.checkpoints="./output/db_mv3/best_accuracy"

3. 训练文字识别模型

3.1. 数据准备


本项目识别使用的数据集: train_reg.txt 、test_reg.txt ,注意: 默认请将图片路径和图片标签用 \t 分割。

  • txt训练集
" 图像文件名                 图像标注信息 "

  20220623110401-0.png   姓名:张某某


    |- train_reg.txt
    |- report_ex/
    	|- train_reg
           |- word_001.png
           |- word_002.jpg
           | ...
    	|- test_reg
           |- word_001.png
           |- word_002.jpg
           | ...

3.2. 快速启动训练

本节文字识别网络以 CRNN 识别模型为例,网络模型使用PaddleOCR主流两种识别模型backbone,MobileNetV3和ResNet50_vd :

# 下载ResNet50的预训练模型
%cd PaddleOCR/
!wget -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mv3_none_bilstm_ctc_v2.0_train.tar
! cd pretrain_models/ && tar xf rec_mv3_none_bilstm_ctc_v2.0_train.tar
%cd /home/aistudio/
!python ./rec.py
%cd ./new_pngs
!ls -l | grep "^-" | wc -l   #一共1490577张图片
%cd /home/aistudio/
!python ./rec_split_data.py
#拷贝训练集与测试集对应图片到文件夹用于文字识别训练,执行方法同上,需要执行两次,一次 train ,一次test
!python rec_file.py



%cd PaddleOCR/
!python3 ./tools/train.py -c ./configs/rec/rec_icdar15_train.yml -o \
Global.eval_batch_step="[0,100]" \
Global.save_epoch_step=500 \
Global.pretrained_model='./pretrain_models/rec_mv3_none_bilstm_ctc_v2.0_train/best_accuracy' \
Train.dataset.data_dir='../report_ex/train_rec' \
Train.dataset.label_file_list=['../train_rec.txt'] \
Eval.dataset.data_dir='../report_ex/test_rec' \
Eval.dataset.label_file_list=['../test_rec.txt'] \

3.3 测试识别效果


# !python3 tools/infer_rec.py -c configs/rec/rec_icdar15_train.yml -o Global.checkpoints=./pretrain_models/rec_mv3_none_bilstm_ctc_v2.0_train/best_accuracy Global.infer_img=../20220623110401-0.png
!python3 tools/infer_rec.py -c configs/rec/rec_icdar15_train.yml -o Global.checkpoints=./output/rec/best_accuracy Global.infer_img=../report_ex/test_rec/20220623110401-1001_0119.png
import matplotlib.pyplot as plt
from PIL import Image
%pylab inline

def show_img(img_path,figsize=(10,10)):
    ## 显示原图,读取名称为11.jpg的测试图像
    img = Image.open(img_path)
    plt.figure("test_img", figsize=figsize)
!python3 tools/infer_rec.py -c configs/rec/rec_icdar15_train.yml -o Global.checkpoints=./output/rec_CRNN/best_accuracy Global.infer_img=./doc/imgs_words_en/


!cp ./data/data164761/best_accuracy.pdopt ./outputall/rec
!cp ./data/data164761/best_accuracy.pdparams ./outputall/rec
%cd PaddleOCR/
!python tools/export_model.py -c configs/det/det_mv3_db_all.yml \
-o Global.pretrained_model="../outputall/db_mv3/best_accuracy" \
[2022/08/17 22:11:33] root INFO: load pretrained model from ['../outputall/db_mv3/best_accuracy']
[2022/08/17 22:11:34] root INFO: inference model is saved to ./my_det_model/inference
%cd PaddleOCR/
!python tools/export_model.py -c configs/rec/ch_PP-OCRv3_rec_distillation.yml \
-o Global.pretrained_model="../outputall/rec/best_accuracy" \
%cd PaddleOCR/
!python tools/export_model.py -c configs/rec/rec_icdar15_train.yml \
-o Global.pretrained_model="./output/rec/best_accuracy" \
# 检测+识别
#!python3 ./tools/infer/predict_system.py --image_dir="../report_ex/test_rec/20220623110401-1006_0123.png" \
%cd PaddleOCR
!python3 ./tools/infer/predict_system.py --image_dir="../report_ex/pngs/20220623110401-1001.png" \
--det_model_dir="./my_det_model/" \#使用上述代码生成的检测模型
--rec_model_dir="./inference/ch_ppocr_mobile_v2.0_rec_infer/" \
# --rec_model_dir="./my_zj_rec_model/" \  #使用上述代码生成的识别模型
# --rec_model_dir="./my_rec_model/" \   #使用后期训练的识别模型


%cd PaddleOCR
!python3 ./tools/infer/predict_system.py --image_dir="../report_ex/pngs/20220623110401-1001.png" \
--det_model_dir="./my_det_model/" \
E0823 12:04:17.329943  1246 analysis_config.cc:80] Please compile with gpu to EnableGpu()
E0823 12:04:17.623020  1246 analysis_config.cc:80] Please compile with gpu to EnableGpu()
[2022/08/23 12:04:19] root INFO: dt_boxes num : 165, elapse : 1.5276000499725342
[2022/08/23 12:04:21] root INFO: rec_res num  : 165, elapse : 1.9995708465576172
[2022/08/23 12:04:21] root INFO: Predict time of ../report_ex/pngs/20220623110401-1001.png: 3.705s
[2022/08/23 12:04:21] root INFO: 姓名:张某某, 0.982
[2022/08/23 12:04:21] root INFO: 性别:男, 0.982
[2022/08/23 12:04:21] root INFO: 年龄:40, 0.997
[2022/08/23 12:04:21] root INFO: 浆膜腔漏出液及渗出液检查(胸腹水常规), 0.982
[2022/08/23 12:04:21] root INFO: 检查日期:2022-6-1, 0.997
[2022/08/23 12:04:21] root INFO: 检查医生:陆静, 0.990
[2022/08/23 12:04:21] root INFO: 透明度, 0.992
[2022/08/23 12:04:21] root INFO: 细菌, 0.884
[2022/08/23 12:04:21] root INFO: 清洗或, 0.995
[2022/08/23 12:04:21] root INFO: 清洗或微混, 0.963
[2022/08/23 12:04:21] root INFO: 无, 0.987
[2022/08/23 12:04:21] root INFO: 无, 0.998
[2022/08/23 12:04:21] root INFO: 比重, 0.996
[2022/08/23 12:04:21] root INFO: 0.3, 0.983
[2022/08/23 12:04:21] root INFO: 凝固, 0.963
[2022/08/23 12:04:21] root INFO: 不易凝, 0.998
[2022/08/23 12:04:21] root INFO: 不易凝固, 0.995
[2022/08/23 12:04:21] root INFO: <1.018, 0.988
[2022/08/23 12:04:21] root INFO: 糖定性, 0.998
[2022/08/23 12:04:21] root INFO: 0.1, 0.977
[2022/08/23 12:04:21] root INFO: 与血糖, 0.995
[2022/08/23 12:04:21] root INFO: 与血糖类似, 0.927
[2022/08/23 12:04:21] root INFO: 细胞计数, 0.996
[2022/08/23 12:04:21] root INFO: 109/L, 0.970
[2022/08/23 12:04:21] root INFO: <0.1*1, 0.953
[2022/08/23 12:04:21] root INFO: g/dl, 0.940
[2022/08/23 12:04:21] root INFO: 22.58, 0.889
[2022/08/23 12:04:21] root INFO: 细胞分类, 0.997
[2022/08/23 12:04:21] root INFO: 以淋巴, 0.961
[2022/08/23 12:04:21] root INFO: 蛋白定量, 0.994
[2022/08/23 12:04:21] root INFO: 2.0, 0.985
[2022/08/23 12:04:21] root INFO: 以淋巴细胞为, 0.848
[2022/08/23 12:04:21] root INFO: 阳性, 0.999
[2022/08/23 12:04:21] root INFO: 颜色, 0.999
[2022/08/23 12:04:21] root INFO: 淡黄色或黄绿, 0.997
[2022/08/23 12:04:21] root INFO: 蛋白定性, 0.992
[2022/08/23 12:04:21] root INFO: 淡黄色, 0.994
[2022/08/23 12:04:21] root INFO: 阴性, 0.996
[2022/08/23 12:04:21] root INFO: 检查小结, 0.995
[2022/08/23 12:04:21] root INFO: 本站网络现在东西精华东西, 0.999
[2022/08/23 12:04:21] root INFO: 检查日期:2022-6-1, 0.995
[2022/08/23 12:04:21] root INFO: 检查医生:盛浩, 0.989
[2022/08/23 12:04:21] root INFO: 前列腺液常规检查, 0.943
[2022/08/23 12:04:21] root INFO: 单位, 0.998
[2022/08/23 12:04:21] root INFO: 项目名称, 0.998
[2022/08/23 12:04:21] root INFO: 检查结果, 0.993
[2022/08/23 12:04:21] root INFO: 参考标识, 0.996
[2022/08/23 12:04:21] root INFO: 参考范围, 0.990
[2022/08/23 12:04:21] root INFO: 偶尔可见, 0.991
[2022/08/23 12:04:21] root INFO: 精子, 0.995
[2022/08/23 12:04:21] root INFO: 偶尔可见, 0.991
[2022/08/23 12:04:21] root INFO: 颜色, 0.997
[2022/08/23 12:04:21] root INFO: 淡乳白色稀薄液体, 0.990
[2022/08/23 12:04:21] root INFO: 淡乳白色稀薄液体, 0.993
[2022/08/23 12:04:21] root INFO: 少见,老年易见到, 0.997
[2022/08/23 12:04:21] root INFO: 少见,老年易见到, 0.995
[2022/08/23 12:04:21] root INFO: 滨粉样本, 0.828
[2022/08/23 12:04:21] root INFO: 白细胞, 0.971
[2022/08/23 12:04:21] root INFO: <10个/HP, 0.963
[2022/08/23 12:04:21] root INFO: <10个/HP, 0.972
[2022/08/23 12:04:21] root INFO: 上皮细胞, 0.990
[2022/08/23 12:04:21] root INFO: 少量, 0.989
[2022/08/23 12:04:21] root INFO: 少量, 0.986
[2022/08/23 12:04:21] root INFO: 偶尔可见, 0.991
[2022/08/23 12:04:21] root INFO: 颗粒细胞, 0.991
[2022/08/23 12:04:21] root INFO: 偶尔可见, 0.991
[2022/08/23 12:04:21] root INFO: 偶见, 0.965
[2022/08/23 12:04:21] root INFO: 红细胞, 0.992
[2022/08/23 12:04:21] root INFO: 偶见, 0.860
[2022/08/23 12:04:21] root INFO: 卵磷脂小体, 0.972
[2022/08/23 12:04:21] root INFO: 多量,均匀分布满视野(/HP), 0.956
[2022/08/23 12:04:21] root INFO: 多量,均匀分布满视野(/HP), 0.971
[2022/08/23 12:04:21] root INFO: 滴虫, 0.992
[2022/08/23 12:04:21] root INFO: 无, 0.997
[2022/08/23 12:04:21] root INFO: 无, 0.996
[2022/08/23 12:04:21] root INFO: 量, 0.998
[2022/08/23 12:04:21] root INFO: 数滴一1ml, 0.876
[2022/08/23 12:04:21] root INFO: 数滴-1ml, 0.895
[2022/08/23 12:04:21] root INFO: 检查小结, 0.996
[2022/08/23 12:04:21] root INFO: 工作城市这种分析上海, 0.998
[2022/08/23 12:04:21] root INFO: 检查日期:2022-6-1, 0.997
[2022/08/23 12:04:21] root INFO: 尿液常规, 0.978
[2022/08/23 12:04:21] root INFO: 检查医生:柯春梅, 0.952
[2022/08/23 12:04:21] root INFO: 检查结果, 0.997
[2022/08/23 12:04:21] root INFO: 单位, 0.998
[2022/08/23 12:04:21] root INFO: 项目名称, 0.998
[2022/08/23 12:04:21] root INFO: 参考标识, 0.989
[2022/08/23 12:04:21] root INFO: 参考范围, 0.995
[2022/08/23 12:04:21] root INFO: 阳性, 0.997
[2022/08/23 12:04:21] root INFO: 亚硝酸盐定性, 0.969
[2022/08/23 12:04:21] root INFO: 阴性, 0.967
[2022/08/23 12:04:21] root INFO: 5.5, 0.995
[2022/08/23 12:04:21] root INFO: 均值约6.5, 0.998
[2022/08/23 12:04:21] root INFO: 酸碱度测定, 0.985
[2022/08/23 12:04:21] root INFO: 1.1, 0.981
[2022/08/23 12:04:21] root INFO: 原比重, 0.863
[2022/08/23 12:04:21] root INFO: 1.002-1.030, 0.982
[2022/08/23 12:04:21] root INFO: 阴性, 0.991
[2022/08/23 12:04:21] root INFO: 尿潜血, 0.966
[2022/08/23 12:04:21] root INFO: 阴性, 0.992
[2022/08/23 12:04:21] root INFO: 白细胞, 0.969
[2022/08/23 12:04:21] root INFO: 阳性, 0.998
[2022/08/23 12:04:21] root INFO: 阴性, 0.996
[2022/08/23 12:04:21] root INFO: 阴性, 0.988
[2022/08/23 12:04:21] root INFO: 蛋白测定, 0.998
[2022/08/23 12:04:21] root INFO: 阴性, 0.994
[2022/08/23 12:04:21] root INFO: 阴性, 0.994
[2022/08/23 12:04:21] root INFO: 阴性, 0.992
[2022/08/23 12:04:21] root INFO: 酮体测定, 0.793
[2022/08/23 12:04:21] root INFO: 阴性, 0.991
[2022/08/23 12:04:21] root INFO: 尿胆原定性, 0.986
[2022/08/23 12:04:21] root INFO: 阴性, 0.992
[2022/08/23 12:04:21] root INFO: 阳性, 0.998
[2022/08/23 12:04:21] root INFO: 胆红素定性, 0.995
[2022/08/23 12:04:21] root INFO: 阴性, 0.993
[2022/08/23 12:04:21] root INFO: 阳性, 0.997
[2022/08/23 12:04:21] root INFO: 葡萄糖测定, 0.990
[2022/08/23 12:04:21] root INFO: 阴性, 0.967
[2022/08/23 12:04:21] root INFO: 应该一起规定发展没有喜欢, 0.998
[2022/08/23 12:04:21] root INFO: 检查小结, 0.993
[2022/08/23 12:04:21] root INFO: 生化室项目一览表, 0.981
[2022/08/23 12:04:21] root INFO: 检查日期:2022-6-1, 0.996
[2022/08/23 12:04:21] root INFO: 检查医生:王秀珍, 0.986
[2022/08/23 12:04:21] root INFO: 10., 0.815
[2022/08/23 12:04:21] root INFO: UL, 0.946
[2022/08/23 12:04:21] root INFO: 79., 0.903
[2022/08/23 12:04:21] root INFO: 快速谷草转氨, 0.959
[2022/08/23 12:04:21] root INFO: 0.00~4, 0.972
[2022/08/23 12:04:21] root INFO: a-经丁酸脱, 0.924
[2022/08/23 12:04:21] root INFO: U/L, 0.732
[2022/08/23 12:04:21] root INFO: 95.00~, 0.919
[2022/08/23 12:04:21] root INFO: 会, 0.886
[2022/08/23 12:04:21] root INFO: 111, 0.997
[2022/08/23 12:04:21] root INFO: mmoVL, 0.867
[2022/08/23 12:04:21] root INFO: 96.00~, 0.949
[2022/08/23 12:04:21] root INFO: 血清蛋白电泳, 0.993
[2022/08/23 12:04:21] root INFO: 0.1, 0.945
[2022/08/23 12:04:21] root INFO: Y:0.10, 0.877
[2022/08/23 12:04:21] root INFO: 69., 0.895
[2022/08/23 12:04:21] root INFO: 4.8, 0.962
[2022/08/23 12:04:21] root INFO: 总铁结合力, 0.998
[2022/08/23 12:04:21] root INFO: 45.00~, 0.917
[2022/08/23 12:04:21] root INFO: 总胆固醇, 0.996
[2022/08/23 12:04:21] root INFO: mmolL, 0.877
[2022/08/23 12:04:21] root INFO: umovL, 0.791
[2022/08/23 12:04:21] root INFO: 3.10~5, 0.985
[2022/08/23 12:04:21] root INFO: 119, 0.997
[2022/08/23 12:04:21] root INFO: 钠, 0.894
[2022/08/23 12:04:21] root INFO: 136.00, 0.981
[2022/08/23 12:04:21] root INFO: 肌钙蛋白!, 0.888
[2022/08/23 12:04:21] root INFO: 阳性, 0.988
[2022/08/23 12:04:21] root INFO: mmoyL, 0.919
[2022/08/23 12:04:21] root INFO: 阴性, 0.994
[2022/08/23 12:04:21] root INFO: UIL, 0.760
[2022/08/23 12:04:21] root INFO: 载脂蛋白AI, 0.895
[2022/08/23 12:04:21] root INFO: 0.90~2, 0.974
[2022/08/23 12:04:21] root INFO: 快速尿膜淀粉, 0.889
[2022/08/23 12:04:21] root INFO: 308, 0.998
[2022/08/23 12:04:21] root INFO: 1.3, 0.978
[2022/08/23 12:04:21] root INFO: 0.00~1, 0.906
[2022/08/23 12:04:21] root INFO: 检查小结, 0.992
[2022/08/23 12:04:21] root INFO: 些业务觉得以上, 0.924
[2022/08/23 12:04:21] root INFO: XX医院体检中心体检报告, 0.979
[2022/08/23 12:04:23] root INFO: The visualized image saved in ./inference_results/20220623110401-1001.png



支持python whl包和命令行两种方式,简单易用


1 安装 Layout-Parser

!pip install -U https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl
!pip install "paddleocr>=2.2" --no-deps -r requirements.txt
!pip install PyMuPDF


import datetime
import os
import fitz  # fitz就是pip install PyMuPDF
import cv2
import shutil
from paddleocr import PPStructure,draw_structure_result,save_structure_res
import cv2
import layoutparser as lp
image = cv2.imread('20220623110401-0.png')
image = image[..., ::-1]

# 加载模型
model = lp.PaddleDetectionLayoutModel(config_path="lp://PubLayNet/ppyolov2_r50vd_dcn_365e_publaynet/config",
                                label_map={0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"},
# 检测
layout = model.detect(image)

# 显示结果
show_img = lp.draw_box(image, layout, box_width=3, show_element_type=True)

!python -m pip install paddlepaddle==2.1.2
table_engine = PPStructure(show_log=True)
save_folder = './result'
img_dir = './imgs'

files = os.listdir(img_dir)  
for fi in files:
    # 找到文件对应子目录
    # print(fi)
    fi_d = os.path.join(img_dir,fi)  
    # print(fi_d)  
    for img in os.listdir(fi_d):
        img_path = os.path.join(fi_d,img)
        img = cv2.imread(img_path)
        result = table_engine(img)
        # 保存在每张图片对应的子目录下
        save_structure_res(result, os.path.join(save_folder,fi),os.path.basename(img_path).split('.')[0])
!tree result
└── 1
    └── 20220623110401-0
        ├── [136, 1142, 3033, 2449]_0.xlsx
        ├── [138, 2167, 3040, 3259]_0.xlsx
        ├── [140, 392, 3032, 1056]_0.xlsx
        └── res_0.txt

2 directories, 4 files



!cat result/1/20220623110401-0/res_0.txt



#安装Paddle Serving --CPU
!pip install paddle-serving-client==0.4.0
!pip install paddle-serving-server==0.4.0 #CPU
!pip install paddle-serving-app==0.2.0
import paddle
import paddle_serving_client.io as serving_io
serving_io.inference_model_to_serving("./my_det_model/inference.pdmodel", serving_server="serving_server", serving_client="serving_client", model_filename=None, params_filename=None)
!python -m paddle_serving_server.serve --model serving_server --thread 10 --port 9292

from paddle_serving_client import Client
import numpy as np
from PIL import Image

# 连接客户端
client = Client()
def load_image(file):
    im = Image.open(file).convert('L')                        #将RGB转化为灰度图像,L代表灰度图像,像素值在0~255之间
    im = im.resize((28, 28), Image.ANTIALIAS)                 #resize image with high-quality 图像大小为28*28
    im = np.array(im).reshape(1, 28, 28).astype(np.float32)#返回新形状的数组,把它变成一个 numpy 数组以匹配数据馈送格式。
    # print(im)
    im = im / 255.0 * 2.0 - 1.0                               #归一化到【-1~1】之间
    return im

img = load_image('imgs/1/20220623110401-0.png')
fetch_map = client.predict(feed={"image": img}, fetch=["save_infer_model/scale_0.tmp_0"])
fetch_map = np.argsort(fetch_map['save_infer_model/scale_0.tmp_0'][0])



# 安装环境
!pip install onnx==1.10.1 onnxruntime-gpu==1.10 paddle2onnx
!paddle2onnx --model_dir ./stac --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 11 --save_file result.onnx
 #模型转换:Paddle2ONNX 来转换模型,替换对应目录即可
 %cd ~
!paddle2onnx \
    --model_dir my_rec_model \
    --model_filename __model__ \
    --params_filename __params__ \
    --save_file inference_model/inference.onnx \
    --opset_version 12 \
    --enable_onnx_checker True

2onnx \
    --model_dir my_rec_model \
    --model_filename __model__ \
    --params_filename __params__ \
    --save_file inference_model/inference.onnx \
    --opset_version 12 \
    --enable_onnx_checker True


作者主页:https://blog.csdn.net/qq_36816848 ,有关项目和学习问题欢迎跟大家多多交流!


