paddleOCR官方给的训练数据标签转换成能够训练的数据标签
PaddleOCR
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
项目地址:https://gitcode.com/gh_mirrors/pa/PaddleOCR
免费下载资源
·
1.下载paddleOCR的数据集,发现训练与测试标签txt为这样
2.上图中的数字对应官方给的字典中的汉字的行数
3.将标签转换为能够训练的标签
4.转换代码
#include <fstream>
#include <iostream>
#include <vector>
#include <string>
using namespace std;
int main() {
int n = 5990;
vector<string> dict;
ifstream infile;
infile.open("C:\\Users\\JSM-SQ\\Documents\\DataSet\\key_dict.txt", ios::in);
for (int i = 0; i < n; i++) {
char data[100];
infile >> data;
//cout << data << endl;
dict.push_back(data);
}
infile.close();
infile.open("C:\\Users\\JSM-SQ\\Documents\\DataSet\\data_train.txt", ios::in);
ofstream outfile;
outfile.open("C:\\Users\\JSM-SQ\\Documents\\DataSet\\new_data_train.txt", ios::app);
string line;
int i = 0;
while (!infile.eof()) // 若未到文件结束一直循环
{
getline(infile, line, '\n');//读取一行,以换行符结束,存入line中
//cout << line << endl;
char t[100];
strcpy(t, line.c_str());
char * p = strtok(t, " ");
//cout << p << '\n';
outfile << p << "\t";
p = std::strtok(NULL, " ");
while (p != 0)
{
//cout << p << " ";
int dict_num = atoi(p);
//cout << dict[dict_num - 1];
outfile << dict[dict_num - 1];
p = std::strtok(NULL, " ");
}
//cout << endl;
outfile << "\n";
i++; //下一行
}
infile.close();
outfile.close();
system("pause");
return 0;
}
5.key_dict.txt下载路径
链接:https://pan.baidu.com/s/1HhbCuVYcstE8XLlL-Wt6Kg
提取码:p7vj
6.不想转换的同学,直接下载转换好的标签
链接:https://pan.baidu.com/s/1JQpIwJSoIUYdrSsiK4irmw
提取码:6nlu
GitHub 加速计划 / pa / PaddleOCR
41.53 K
7.59 K
下载
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
最近提交(Master分支:3 个月前 )
7bbda2bc
9 天前
1d4e7a80
11 天前
更多推荐
已为社区贡献3条内容
所有评论(0)