1.下载paddleOCR的数据集,发现训练与测试标签txt为这样

2.上图中的数字对应官方给的字典中的汉字的行数

3.将标签转换为能够训练的标签

4.转换代码

#include <fstream>
#include <iostream>
#include <vector>
#include <string>
using namespace std;

int main() {
	int n = 5990;
	vector<string> dict;
	ifstream infile;
	infile.open("C:\\Users\\JSM-SQ\\Documents\\DataSet\\key_dict.txt", ios::in);
	for (int i = 0; i < n; i++) {
		char data[100];
		infile >> data;
		//cout << data << endl;
		dict.push_back(data);
	}
	infile.close();

	infile.open("C:\\Users\\JSM-SQ\\Documents\\DataSet\\data_train.txt", ios::in);
	ofstream outfile;
	outfile.open("C:\\Users\\JSM-SQ\\Documents\\DataSet\\new_data_train.txt", ios::app);
	string line;
	int i = 0;
	while (!infile.eof())            // 若未到文件结束一直循环 
	{
		getline(infile, line, '\n');//读取一行,以换行符结束,存入line中
		//cout << line << endl;
		char t[100];
		strcpy(t, line.c_str());
		char * p = strtok(t, " ");
		//cout << p << '\n';
		outfile << p << "\t";
		p = std::strtok(NULL, " ");
		
		while (p != 0)
		{
			//cout << p << " ";
			int dict_num = atoi(p);
			//cout << dict[dict_num - 1];
			outfile << dict[dict_num - 1];
			p = std::strtok(NULL, " ");
		}
		//cout << endl;
		outfile << "\n";
		i++;                    //下一行
	}
	infile.close();
	outfile.close();
	system("pause");
	return 0;
}

5.key_dict.txt下载路径

链接:https://pan.baidu.com/s/1HhbCuVYcstE8XLlL-Wt6Kg 
提取码:p7vj 
 

6.不想转换的同学,直接下载转换好的标签

链接:https://pan.baidu.com/s/1JQpIwJSoIUYdrSsiK4irmw 
提取码:6nlu 
 

 

 

 

GitHub 加速计划 / pa / PaddleOCR
41.53 K
7.59 K
下载
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
最近提交(Master分支:3 个月前 )
7bbda2bc 9 天前
1d4e7a80 11 天前
Logo

旨在为数千万中国开发者提供一个无缝且高效的云端环境,以支持学习、使用和贡献开源项目。

更多推荐