PaddleOCR训练自己的数据集
1. 环境准备
本次训练的工程代码为PaddleOCR-2.7可自行去官网下载,linux环境,cuda11.6
(1)安装paddle
python -m pip install paddlepaddle-gpu==2.6.2.post116 -i https://www.paddlepaddle.org.cn/packages/stable/cu116/
(2)安装paddleOCR等一系列依赖安装包
python tools/infer/predict_system.py --image_dir ./ppocr_img/imgs/2.png --use_angle_cls True --use_gpu True --det_model_dir models/ch_PP-OCRv3_det_infer --rec_model_dir models/ch_PP-OCRv3_rec_infer --cls_model_dir models/ch_ppocr_mobile_v2.0_cls_infer
通过上述命令,测试是否能用。
2. PPOCRLabel工具使用
此工具用于标注OCR数据,我是在windows上使用,又重新在windows上配了环境,需要安装paddlepaddle、numpy、opencv等包,注意numpy版本,我出现报错,要求numpy版本不准超过2.0,,所以我安装了 numpy=1.24.4
使用方式:
进入PPOCRLabel目录中 cd D:\workspace\code\PaddleOCR2.7\PPOCRLabel
执行 python PPOCRLabel.py

标注结束后,点击File->Export Label,会生成Label.txt,点击File->Export Recognition Result,会生成crop_img、rec_gt.txt,分别用于文本区域检测训练和文本内容识别训练。



至此,数据准备工作完成。
3. 文本检测训练
在开始训练之前,需要在官网中下载所文本检测、文本识别需要的预训练权重

3.1 修改配置文件
进入PaddleOCR-2.7\configs\det\ch_PP-OCRv3,修改ch_PP-OCRv3_det_student.yml配置文件
Global:
debug: false
use_gpu: true
epoch_num: 300 #训练轮数
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/ch_PP-OCR_V3_det/ #模型保存路径
save_epoch_step: 100
eval_batch_step:
- 0
- 400
cal_metric_during_train: true #创建计算指标,用于保存最好的模型, 默认是flase
pretrained_model: /home/workspace/code/PaddleOCR2.7/models/ch_PP-OCRv3_det_distill_train/student #预训练权重路径
checkpoints: null #./output/ch_PP-OCR_V3_det/latest #断点训练,一旦训练发生中断,可继续恢复当前训练
save_inference_dir: null
use_visualdl: false
infer_img: ppocr_img/imgs/2.png #doc/imgs_en/img_10.jpg #推理图片
save_res_path: ./checkpoints/det_db/predicts_db.txt
distributed: true
Architecture:
model_type: det
algorithm: DB
Transform:
Backbone:
name: MobileNetV3
scale: 0.5
model_name: large
disable_se: True
Neck:
name: RSEFPN
out_channels: 96
shortcut: True
Head:
name: DBHead
k: 50
Loss:
name: DBLoss
balance_loss: true
main_loss_type: DiceLoss
alpha: 5
beta: 10
ohem_ratio: 3
Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: Cosine
learning_rate: 0.001
warmup_epoch: 2
regularizer:
name: L2
factor: 5.0e-05
PostProcess:
name: DBPostProcess
thresh: 0.3
box_thresh: 0.6
max_candidates: 1000
unclip_ratio: 1.5
Metric:
name: DetMetric
main_indicator: hmean
Train:
dataset:
name: SimpleDataSet
data_dir: /home/workspace/code/PaddleOCR2.7/myData/barcode/ #训练数据集路径
label_file_list:
- /home/workspace/code/PaddleOCR2.7/myData/barcode/Label.txt #训练数据集标签
ratio_list: [1.0]
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- DetLabelEncode: null
- IaaAugment:
augmenter_args:
- type: Fliplr
args:
p: 0.5
- type: Affine
args:
rotate:
- -10
- 10
- type: Resize
args:
size:
- 0.5
- 3
- EastRandomCropData:
size:
- 960
- 960
max_tries: 50
keep_ratio: true
- MakeBorderMap:
shrink_ratio: 0.4
thresh_min: 0.3
thresh_max: 0.7
- MakeShrinkMap:
shrink_ratio: 0.4
min_text_size: 8
- NormalizeImage:
scale: 1./255.
mean:
- 0.485
- 0.456
- 0.406
std:
- 0.229
- 0.224
- 0.225
order: hwc
- ToCHWImage: null
- KeepKeys:
keep_keys:
- image
- threshold_map
- threshold_mask
- shrink_map
- shrink_mask
loader:
shuffle: true
drop_last: false
batch_size_per_card: 8
num_workers: 4
Eval:
dataset:
name: SimpleDataSet
data_dir: /home/riins/workspace/code/PaddleOCR2.7/myData/barcode/ #验证集路径
label_file_list:
- /home/riins/workspace/code/PaddleOCR2.7/myData/barcode/Label.txt #验证集标签
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- DetLabelEncode: null
- DetResizeForTest: null
- NormalizeImage:
scale: 1./255.
mean:
- 0.485
- 0.456
- 0.406
std:
- 0.229
- 0.224
- 0.225
order: hwc
- ToCHWImage: null
- KeepKeys:
keep_keys:
- image
- shape
- polys
- ignore_tags
loader:
shuffle: false
drop_last: false
batch_size_per_card: 1
num_workers: 2
3.2 执行训练
python tools/train.py -c configs/det/ch_PP-OCRv3/my_ch_PP-OCRv3_det_student.yml
3.3 模型验证
python tools/eval.py --config configs/det/ch_PP-OCRv3/my_ch_PP-OCRv3_det_student.yml -o Global.checkpoints=./output/ch_PP-OCR_V3_det/latest

3.4 推理测试
python tools/infer_det.py --config configs/det/ch_PP-OCRv3/my_ch_PP-OCRv3_det_student.yml

预训练权重推理结果

训练模型推理结果
3.5 模型导出
python tools/export_model.py -c configs/det/ch_PP-OCRv3/my_ch_PP-OCRv3_det_student.yml -o Global.checkpoints=./output/ch_PP-OCR_V3_det/latest -o Global.save_inference_dir=./inference/ch_PP-OCRv3_det_barcode
4. 文本识别训练
4.1 修改配置文件
进入PaddleOCR-2.7\configs\rec\PP-OCRv3,修改ch_PP-OCRv3_rec.yml配置文件
Global:
debug: false
use_gpu: true
epoch_num: 300
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/rec_ppocr_v4
save_epoch_step: 100
eval_batch_step: [0, 2000]
cal_metric_during_train: true
pretrained_model: #/home/riins/workspace/code/PaddleOCR2.7/models/ch_PP-OCRv4_rec_train/student #预训练模型加载路径
checkpoints: ./output/rec_ppocr_v4/latest
save_inference_dir:
use_visualdl: false
infer_img: ppocr_img/imgs/ocr1_crop_0.jpg
character_dict_path: ppocr/utils/ppocr_keys_v1.txt
max_text_length: &max_text_length 25
infer_mode: false
use_space_char: true
distributed: true
save_res_path: ./output/rec/predicts_ppocrv3.txt
Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: Cosine
learning_rate: 0.001
warmup_epoch: 5
regularizer:
name: L2
factor: 3.0e-05
Architecture:
model_type: rec
algorithm: SVTR_LCNet
Transform:
Backbone:
name: PPLCNetV3
scale: 0.95
Head:
name: MultiHead
head_list:
- CTCHead:
Neck:
name: svtr
dims: 120
depth: 2
hidden_dims: 120
kernel_size: [1, 3]
use_guide: True
Head:
fc_decay: 0.00001
- NRTRHead:
nrtr_dim: 384
max_text_length: *max_text_length
Loss:
name: MultiLoss
loss_config_list:
- CTCLoss:
- NRTRLoss:
PostProcess:
name: CTCLabelDecode
Metric:
name: RecMetric
main_indicator: acc
Train:
dataset:
name: MultiScaleDataSet
ds_width: false
data_dir: /home/riins/workspace/code/PaddleOCR2.7/myData/barcode/
ext_op_transform_idx: 1
label_file_list:
- /home/riins/workspace/code/PaddleOCR2.7/myData/barcode/rec_gt.txt #验证图片标注文件
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- RecConAug:
prob: 0.5
ext_data_num: 2
image_shape: [48, 320, 3]
max_text_length: *max_text_length
- RecAug:
- MultiLabelEncode:
gtc_encode: NRTRLabelEncode
- KeepKeys:
keep_keys:
- image
- label_ctc
- label_gtc
- length
- valid_ratio
sampler:
name: MultiScaleSampler
scales: [[320, 32], [320, 48], [320, 64]]
first_bs: &bs 192
fix_bs: false
divided_factor: [8, 16] # w, h
is_training: True
loader:
shuffle: true
batch_size_per_card: *bs
drop_last: true
num_workers: 8
Eval:
dataset:
name: SimpleDataSet
data_dir: /home/riins/workspace/code/PaddleOCR2.7/myData/barcode/
label_file_list:
- /home/riins/workspace/code/PaddleOCR2.7/myData/barcode/rec_gt.txt #验证图片标注文件
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- MultiLabelEncode:
gtc_encode: NRTRLabelEncode
- RecResizeImg:
image_shape: [3, 48, 320]
- KeepKeys:
keep_keys:
- image
- label_ctc
- label_gtc
- length
- valid_ratio
loader:
shuffle: false
drop_last: false
batch_size_per_card: 1
num_workers: 4
4.2 执行训练
python tools/train.py -c configs/rec/PP-OCRv4/ch_PP-OCRv4_rec.yml
4.3 模型验证
python tools/eval.py -c configs/rec/PP-OCRv4/ch_PP-OCRv4_rec.yml -o Global.checkpoints=./output/rec_ppocr_v4/latest
4.4 推理测试
python tools/infer_rec.py -c configs/rec/PP-OCRv4/ch_PP-OCRv4_rec.yml
4.5 模型导出
python tools/export_model.py -c configs/rec/PP-OCRv4/ch_PP-OCRv4_rec.yml -o Global.checkpoints=./output/rec_ppocr_v4/latest -o Global.save_inference_dir=./inference/ch_PP-OCRv4_rec_barcode
AtomGit 是由开放原子开源基金会联合 CSDN 等生态伙伴共同推出的新一代开源与人工智能协作平台。平台坚持“开放、中立、公益”的理念,把代码托管、模型共享、数据集托管、智能体开发体验和算力服务整合在一起,为开发者提供从开发、训练到部署的一站式体验。
更多推荐



所有评论(0)