鲲鹏arm服务器部署paddleOCR
1. 部署环境信息查看
1.1 操作系统
$ cat /etc/os-release
PRETTY_NAME="UnionTech OS Server 20"
NAME="UnionTech OS Server 20"
VERSION_ID="20"
VERSION="20"
ID="uos"
PLATFORM_ID="platform:uel20"
HOME_URL="https://www.chinauos.com/"
BUG_REPORT_URL="https://bbs.chinauos.com/"
VERSION_CODENAME="fuyu"
$ cat /etc/issue
UnionTech OS Server 20 1050e \n \l
uos Server 20 1050e
1.2 CPU
$ lscpu
Architecture: aarch64
CPU op-mode(s): 64-bit
Byte Order: Little Endian
CPU(s): 8
Vendor ID: HiSilicon
Model name: Kunpeng-920
CPU max MHz: 2600.0000
CPU min MHz: 2600.0000
BogoMIPS: 200.00
L1d cache: 512 KiB
L1i cache: 512 KiB
L2 cache: 4 MiB
L3 cache: 256 MiB
NUMA node0 CPU(s): 0-7
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Spec store bypass: Not affected
Vulnerability Spectre v1: Mitigation; __user pointer sanitization
Vulnerability Spectre v2: Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm
64位arm架构,主频2.6GHz 8线程,L3 256MB,不支持avx指令集
1.3 内存
$ free -m
total used free shared buff/cache available
Mem: 14993 624 13273 25 1094 13964
Swap: 2047 0 2047
14GB可用内存
1.4 存储
$ lsblk -d -o name,size,model
NAME SIZE MODEL
vda 20G
vdb 200G
两块磁盘,看不出磁盘型号,应该是虚拟化的原因
$ df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 7.1G 0 7.1G 0% /dev
tmpfs 7.4G 192K 7.4G 1% /dev/shm
tmpfs 7.4G 25M 7.3G 1% /run
tmpfs 7.4G 0 7.4G 0% /sys/fs/cgroup
/dev/mapper/uos_host--10--200--4--96-root 217G 6.5G 210G 3% /
tmpfs 7.4G 192K 7.4G 1% /tmp
/dev/vda2 1014M 154M 861M 16% /boot
/dev/vda1 599M 25M 575M 5% /boot/efi
tmpfs 1.5G 0 1.5G 0% /run/user/0
tmpfs 1.5G 0 1.5G 0% /run/user/993
根目录下有210GB空间可用,基本不用担心磁盘写满了
2. 安装paddle人工智能框架
2.1 python安装
2.1.1 查看是否安装了python3
$ python3 -V
Python 3.7.9
根据paddlepaddle官方文档提示,arm架构的paddlepaddle至少需要python3.8才能支持,不满足要求。
2.1.1 安装python3.8
# 安装python编译依赖
yum -y install gcc make zlib zlib-devel bzip2-devel ncurses-devel libffi libffi-devel sqlite-devel
# 切换到home目录
cd /home
# 下载python3.8.2源码
wget https://www.python.org/ftp/python/3.8.2/Python-3.8.2.tgz
# 解压源码
tar -zxvf Python-3.8.2.tgz
# 进入工作目录
cd Python-3.8.2/
# 配置安装目录
./configure --enable-optimizations --prefix=/usr/local/python3.8.2
# 编译
make
# 安装二进制文件
make install
# 确认安装情况
cd /usr/local/python3.8.2/
./bin/python3
# 修改环境变量
vim /etc/profile
export PATH=/usr/local/python3.8.2/bin:$PATH
# 重新加置配置文件以使环境变量生效
source /etc/profile
# 查询python版本
$ pythonn3 -V
Python 3.8.2
python3.8已成功安装
2.2 更新pip3版本
查看arm linux环境下,paddlepaddle 要求的pip3版本至少为:20.2.2+
- 查看现有pip3版本
$ pip3 -V
pip 19.2.3 from /usr/local/python3.8.2/lib/python3.8/site-packages/pip (python 3.8)
pip3版本不满足要求需要更新
- 更新pip3
$ python3 -m pip install --upgrade pip
Collecting pip
Downloading https://files.pythonhosted.org/packages/f4/ab/e3c039b5ddba9335bd8f82d599eb310de1d2a2db0411b8d804d507405c74/pip-24.1.1-py3-none-any.whl (1.8MB)
|████████████████████████████████| 1.8MB 10kB/s
Installing collected packages: pip
Found existing installation: pip 19.2.3
Uninstalling pip-19.2.3:
Successfully uninstalled pip-19.2.3
Successfully installed pip-24.1.1
# 更新成功
2.3 安装paddle
# 在线安装
$ pip3 install paddlepaddle==2.6.1 -i https://mirror.baidu.com/pypi/simple/
Collecting https://mirror.baidu.com/pypi/simple/
Downloading https://mirror.baidu.com/pypi/simple/ (10.4 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 10.4/10.4 MB 8.5 MB/s eta 0:00:00
ERROR: Cannot unpack file /tmp/pip-unpack-q2uv8iqc/simple.html (downloaded from /tmp/pip-req-build-glcy3jgo, content-type: text/html); cannot detect archive format
ERROR: Cannot determine archive format of /tmp/pip-req-build-glcy3jgo
# 报错,尝试离线安装
$ pip3 install paddlepaddle-2.6.1-cp38-cp38-manylinux2014_aarch64.whl -i https://mirror.baidu.com/pypi/simple/
Looking in indexes: https://mirror.baidu.com/pypi/simple/
Processing ./paddlepaddle-2.6.1-cp38-cp38-manylinux2014_aarch64.whl
Collecting httpx (from paddlepaddle==2.6.1)
Downloading https://mirror.baidu.com/pypi/packages/41/7b/ddacf6dcebb42466abd03f368782142baa82e08fc0c1f8eaa05b4bae87d5/httpx-0.27.0-py3-none-any.whl (75 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 75.6/75.6 kB 240.9 kB/s eta 0:00:00
Collecting numpy>=1.13 (from paddlepaddle==2.6.1)
Downloading https://mirror.baidu.com/pypi/packages/25/6f/2586a50ad72e8dbb1d8381f837008a0321a3516dfd7cb57fc8cf7e4bb06b/numpy-1.24.4-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (14.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.0/14.0 MB 6.5 MB/s eta 0:00:00
Collecting Pillow (from paddlepaddle==2.6.1)
Downloading https://mirror.baidu.com/pypi/packages/fa/20/34bd8b37f19d121b81f79491270c08772907837c85da1e9545a39be870d4/pillow-10.3.0-cp38-cp38-manylinux_2_28_aarch64.whl (4.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.3/4.3 MB 5.4 MB/s eta 0:00:00
Collecting decorator (from paddlepaddle==2.6.1)
Downloading https://mirror.baidu.com/pypi/packages/d5/50/83c593b07763e1161326b3b8c6686f0f4b0f24d5526546bee538c89837d6/decorator-5.1.1-py3-none-any.whl (9.1 kB)
Collecting astor (from paddlepaddle==2.6.1)
Downloading https://mirror.baidu.com/pypi/packages/c3/88/97eef84f48fa04fbd6750e62dcceafba6c63c81b7ac1420856c8dcc0a3f9/astor-0.8.1-py2.py3-none-any.whl (27 kB)
Collecting opt-einsum==3.3.0 (from paddlepaddle==2.6.1)
Downloading https://mirror.baidu.com/pypi/packages/bc/19/404708a7e54ad2798907210462fd950c3442ea51acc8790f3da48d2bee8b/opt_einsum-3.3.0-py3-none-any.whl (65 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 65.5/65.5 kB 534.3 kB/s eta 0:00:00
WARNING: Skipping page https://mirror.baidu.com/pypi/simple/protobuf/ because the GET request got Content-Type: application/octet-stream. The only supported Content-Types are application/vnd.pypi.simple.v1+json, application/vnd.pypi.simple.v1+html, and text/html
INFO: pip is looking at multiple versions of paddlepaddle to determine which version is compatible with other requirements. This could take a while.
ERROR: Ignored the following versions that require a different python version: 1.25.0 Requires-Python >=3.9; 1.25.0rc1 Requires-Python >=3.9; 1.25.1 Requires-Python >=3.9; 1.25.2 Requires-Python >=3.9; 1.26.0 Requires-Python <3.13,>=3.9; 1.26.0b1 Requires-Python <3.13,>=3.9; 1.26.0rc1 Requires-Python <3.13,>=3.9; 1.26.1 Requires-Python <3.13,>=3.9; 1.26.2 Requires-Python >=3.9; 1.26.3 Requires-Python >=3.9; 1.26.4 Requires-Python >=3.9; 2.0.0 Requires-Python >=3.9; 2.0.0b1 Requires-Python >=3.9; 2.0.0rc1 Requires-Python >=3.9; 2.0.0rc2 Requires-Python >=3.9
ERROR: Could not find a version that satisfies the requirement protobuf>=3.20.2; platform_system != "Windows" (from paddlepaddle) (from versions: none)
ERROR: No matching distribution found for protobuf>=3.20.2; platform_system != "Windows"
# 继续报错 protobuf>=3.20.2找不到合适的版本
# 手工找到先安装
pip3 install protobuf-3.20.2-cp38-cp38-manylinux2014_aarch64.whl -i https://mirror.baidu.com/pypi/simple/
Looking in indexes: https://mirror.baidu.com/pypi/simple/
Processing ./protobuf-3.20.2-cp38-cp38-manylinux2014_aarch64.whl
Installing collected packages: protobuf
Successfully installed protobuf-3.20.2
# 安装成功,重新装paddle
$ pip3 install paddlepaddle==2.6.1 -i https://mirror.baidu.com/pypi/simple/
...
Requirement already satisfied: protobuf>=3.20.2 in /usr/local/python3.8.2/lib/python3.8/site-packages (from paddlepaddle==2.6.1) (3.20.2)
Collecting anyio (from httpx->paddlepaddle==2.6.1)
Downloading https://mirror.baidu.com/pypi/packages/7b/a2/10639a79341f6c019dedc95bd48a4928eed9f1d1197f4c04f546fc7ae0ff/anyio-4.4.0-py3-none-any.whl (86 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 86.8/86.8 kB 649.0 kB/s eta 0:00:00
Collecting certifi (from httpx->paddlepaddle==2.6.1)
Downloading https://mirror.baidu.com/pypi/packages/5b/11/1e78951465b4a225519b8c3ad29769c49e0d8d157a070f681d5b6d64737f/certifi-2024.6.2-py3-none-any.whl (164 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 164.4/164.4 kB 565.3 kB/s eta 0:00:00
Collecting httpcore==1.* (from httpx->paddlepaddle==2.6.1)
Downloading https://mirror.baidu.com/pypi/packages/78/d4/e5d7e4f2174f8a4d63c8897d79eb8fe2503f7ecc03282fee1fa2719c2704/httpcore-1.0.5-py3-none-any.whl (77 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 77.9/77.9 kB 679.1 kB/s eta 0:00:00
Collecting idna (from httpx->paddlepaddle==2.6.1)
Downloading https://mirror.baidu.com/pypi/packages/e5/3e/741d8c82801c347547f8a2a06aa57dbb1992be9e948df2ea0eda2c8b79e8/idna-3.7-py3-none-any.whl (66 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 66.8/66.8 kB 511.6 kB/s eta 0:00:00
Collecting sniffio (from httpx->paddlepaddle==2.6.1)
Downloading https://mirror.baidu.com/pypi/packages/e9/44/75a9c9421471a6c4805dbf2356f7c181a29c1879239abab1ea2cc8f38b40/sniffio-1.3.1-py3-none-any.whl (10 kB)
Collecting h11<0.15,>=0.13 (from httpcore==1.*->httpx->paddlepaddle==2.6.1)
Downloading https://mirror.baidu.com/pypi/packages/95/04/ff642e65ad6b90db43e668d70ffb6736436c7ce41fcc549f4e9472234127/h11-0.14.0-py3-none-any.whl (58 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58.3/58.3 kB 395.0 kB/s eta 0:00:00
Collecting exceptiongroup>=1.0.2 (from anyio->httpx->paddlepaddle==2.6.1)
Downloading https://mirror.baidu.com/pypi/packages/01/90/79fe92dd413a9cab314ef5c591b5aa9b9ba787ae4cadab75055b0ae00b33/exceptiongroup-1.2.1-py3-none-any.whl (16 kB)
Collecting typing-extensions>=4.1 (from anyio->httpx->paddlepaddle==2.6.1)
Downloading https://mirror.baidu.com/pypi/packages/26/9f/ad63fc0248c5379346306f8668cda6e2e2e9c95e01216d2b8ffd9ff037d0/typing_extensions-4.12.2-py3-none-any.whl (37 kB)
Installing collected packages: typing-extensions, sniffio, Pillow, numpy, idna, h11, exceptiongroup, decorator, certifi, astor, opt-einsum, httpcore, anyio, httpx, paddlepaddle
Successfully installed Pillow-10.3.0 anyio-4.4.0 astor-0.8.1 certifi-2024.6.2 decorator-5.1.1 exceptiongroup-1.2.1 h11-0.14.0 httpcore-1.0.5 httpx-0.27.0 idna-3.7 numpy-1.24.4 opt-einsum-3.3.0 paddlepaddle-2.6.1 sniffio-1.3.1 typing-extensions-4.12.2
# paddlepaddle安装成功
所以合理的安装顺序是:
先从pypi
仓库手工下载paddlepaddle-2.6.1-cp38-cp38-manylinux2014_aarch64.whl
和protobuf-3.20.2-cp38-cp38-manylinux2014_aarch64.whl
,上传到服务器。
pip3 install protobuf-3.20.2-cp38-cp38-manylinux2014_aarch64.whl -i https://mirror.baidu.com/pypi/simple/
pip3 install paddlepaddle-2.6.1-cp38-cp38-manylinux2014_aarch64.whl -i https://mirror.baidu.com/pypi/simple/
即可完成安装
3. 安装paddleOCR
3.1 安装paddleOCR 2.6.1
$ pip3 install paddleocr==2.7.3 -i https://mirror.baidu.com/pypi/simple/
Looking in indexes: https://mirror.baidu.com/pypi/simple/
Collecting paddleocr==2.7.3
Downloading https://mirror.baidu.com/pypi/packages/f2/55/0469ebca1d9c581a3fa740621afe96461a0ef450e489e10e278cc17a19ef/paddleocr-2.7.3-py3-none-any.whl (780 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 780.0/780.0 kB 2.7 MB/s eta 0:00:00
Collecting shapely (from paddleocr==2.7.3)
Downloading https://mirror.baidu.com/pypi/packages/96/ac/3b91deb96b022f3833076893ccf11eebb03d6d0bc1199334badc0515ec85/shapely-2.0.4-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.4 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.4/2.4 MB 5.4 MB/s eta 0:00:00
Collecting scikit-image (from paddleocr==2.7.3)
Downloading https://mirror.baidu.com/pypi/packages/fa/2b/ffecc6f29b48d1d46dc3bb7b4c908490260c3a0d69ac2d248d846b90d505/scikit_image-0.21.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (13.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 13.3/13.3 MB 8.2 MB/s eta 0:00:00
Collecting imgaug (from paddleocr==2.7.3)
Downloading https://mirror.baidu.com/pypi/packages/66/b1/af3142c4a85cba6da9f4ebb5ff4e21e2616309552caca5e8acefe9840622/imgaug-0.4.0-py2.py3-none-any.whl (948 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 948.0/948.0 kB 3.9 MB/s eta 0:00:00
Collecting pyclipper (from paddleocr==2.7.3)
Downloading https://mirror.baidu.com/pypi/packages/80/a5/acc0e11bf33ff178cc6d97798c15de376af149d6414ee2c5083fb8465fd5/pyclipper-1.3.0.post5-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (932 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 932.3/932.3 kB 3.1 MB/s eta 0:00:00
Collecting lmdb (from paddleocr==2.7.3)
Downloading https://mirror.baidu.com/pypi/packages/82/af/9971570e8165a28b6eec56fe3a4fd12f62aa70a231650767a0a560743bb0/lmdb-1.4.1-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (298 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 298.3/298.3 kB 1.1 MB/s eta 0:00:00
Collecting tqdm (from paddleocr==2.7.3)
Downloading https://mirror.baidu.com/pypi/packages/18/eb/fdb7eb9e48b7b02554e1664afd3bd3f117f6b6d6c5881438a0b055554f9b/tqdm-4.66.4-py3-none-any.whl (78 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.3/78.3 kB 1.1 MB/s eta 0:00:00
Requirement already satisfied: numpy in /usr/local/python3.8.2/lib/python3.8/site-packages (from paddleocr==2.7.3) (1.24.4)
Collecting visualdl (from paddleocr==2.7.3)
Downloading https://mirror.baidu.com/pypi/packages/ea/b5/37726c750a4f4598660998327c3566b2d2ed5a1a5f44e9f0dde875602447/visualdl-2.5.3-py3-none-any.whl (6.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.3/6.3 MB 7.1 MB/s eta 0:00:00
Collecting rapidfuzz (from paddleocr==2.7.3)
Downloading https://mirror.baidu.com/pypi/packages/2a/fc/fbed40f9486d2f27a146ba15bb8ec1f1ec4fc8bad7e7910ced812c75dbd8/rapidfuzz-3.9.3-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 3.8 MB/s eta 0:00:00
Collecting opencv-python<=4.6.0.66 (from paddleocr==2.7.3)
Downloading https://mirror.baidu.com/pypi/packages/12/5d/1527327b9f7ea13bef31377f8bf399f03dc5f4f1c9f1fb69bc56b6e24cd4/opencv_python-4.6.0.66-cp36-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (39.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 39.5/39.5 MB 5.5 MB/s eta 0:00:00
Collecting opencv-contrib-python<=4.6.0.66 (from paddleocr==2.7.3)
Downloading https://mirror.baidu.com/pypi/packages/13/2d/e8580e089200a8fe84f9befca59ba2d4c92e174dc17dc37a4574ab113db0/opencv_contrib_python-4.6.0.66-cp36-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (45.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 45.3/45.3 MB 6.3 MB/s eta 0:00:00
Collecting cython (from paddleocr==2.7.3)
Downloading https://mirror.baidu.com/pypi/packages/a8/b9/e01e8f09ccb91884eac5ee865edd24e62b36743d2b4bd04191e6d8c53f49/Cython-3.0.10-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.5/3.5 MB 5.6 MB/s eta 0:00:00
Collecting lxml (from paddleocr==2.7.3)
Downloading https://mirror.baidu.com/pypi/packages/4d/72/f118145d85f02a76d3f31955c044cb913419c57413b11cd44c00359903dc/lxml-5.2.2-cp38-cp38-manylinux_2_28_aarch64.whl (4.8 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.8/4.8 MB 6.8 MB/s eta 0:00:00
Collecting premailer (from paddleocr==2.7.3)
Downloading https://mirror.baidu.com/pypi/packages/b1/07/4e8d94f94c7d41ca5ddf8a9695ad87b888104e2fd41a35546c1dc9ca74ac/premailer-3.10.0-py2.py3-none-any.whl (19 kB)
Collecting openpyxl (from paddleocr==2.7.3)
Downloading https://mirror.baidu.com/pypi/packages/30/d0/abcdb0669931be3a98881e6d7851605981693e93a7924061c67d0cd9f292/openpyxl-3.1.4-py2.py3-none-any.whl (251 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 251.4/251.4 kB 1.9 MB/s eta 0:00:00
Collecting attrdict (from paddleocr==2.7.3)
Downloading https://mirror.baidu.com/pypi/packages/ef/97/28fe7e68bc7adfce67d4339756e85e9fcf3c6fd7f0c0781695352b70472c/attrdict-2.0.1-py2.py3-none-any.whl (9.9 kB)
Requirement already satisfied: Pillow>=10.0.0 in /usr/local/python3.8.2/lib/python3.8/site-packages (from paddleocr==2.7.3) (10.3.0)
Collecting pyyaml (from paddleocr==2.7.3)
Downloading https://mirror.baidu.com/pypi/packages/e1/a1/27bfac14b90adaaccf8c8289f441e9f76d94795ec1e7a8f134d9f2cb3d0b/PyYAML-6.0.1-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (723 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 723.8/723.8 kB 4.3 MB/s eta 0:00:00
Collecting python-docx (from paddleocr==2.7.3)
Downloading https://mirror.baidu.com/pypi/packages/3e/3d/330d9efbdb816d3f60bf2ad92f05e1708e4a1b9abe80461ac3444c83f749/python_docx-1.1.2-py3-none-any.whl (244 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 244.3/244.3 kB 1.3 MB/s eta 0:00:00
Collecting beautifulsoup4 (from paddleocr==2.7.3)
Downloading https://mirror.baidu.com/pypi/packages/b1/fe/e8c672695b37eecc5cbf43e1d0638d88d66ba3a44c4d321c796f4e59167f/beautifulsoup4-4.12.3-py3-none-any.whl (147 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 147.9/147.9 kB 1.4 MB/s eta 0:00:00
Collecting fonttools>=4.24.0 (from paddleocr==2.7.3)
Downloading https://mirror.baidu.com/pypi/packages/97/3c/2c9701084d5664c77dc85fdc89a9be49b1457107ab499b9ace08a744f2ae/fonttools-4.53.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (4.7 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.7/4.7 MB 5.1 MB/s eta 0:00:00
Collecting fire>=0.3.0 (from paddleocr==2.7.3)
Downloading https://mirror.baidu.com/pypi/packages/1b/1b/84c63f592ecdfbb3d77d22a8d93c9b92791e4fa35677ad71a7d6449100f8/fire-0.6.0.tar.gz (88 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 88.4/88.4 kB 834.9 kB/s eta 0:00:00
Installing build dependencies ... error
error: subprocess-exited-with-error
× pip subprocess to install build dependencies did not run successfully.
│ exit code: 1
╰─> [4 lines of output]
Looking in indexes: https://mirror.baidu.com/pypi/simple/
WARNING: Skipping page https://mirror.baidu.com/pypi/simple/setuptools/ because the GET request got Content-Type: application/octet-stream. The only supported Content-Types are application/vnd.pypi.simple.v1+json, application/vnd.pypi.simple.v1+html, and text/html
ERROR: Could not find a version that satisfies the requirement setuptools>=40.8.0 (from versions: none)
ERROR: No matching distribution found for setuptools>=40.8.0
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error
× pip subprocess to install build dependencies did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
# 报错,找不到setuptools>=40.8.0。老规矩,手工安装
$ pip3 install setuptools-40.8.0-py2.py3-none-any.whl -i https://mirror.baidu.com/pypi/simple/
Looking in indexes: https://mirror.baidu.com/pypi/simple/
Processing ./setuptools-40.8.0-py2.py3-none-any.whl
Installing collected packages: setuptools
Attempting uninstall: setuptools
Found existing installation: setuptools 41.2.0
Uninstalling setuptools-41.2.0:
Successfully uninstalled setuptools-41.2.0
Successfully installed setuptools-40.8.0
# 继续安装paddleocr
$ pip3 install paddleocr==2.7.3 -i https://mirror.baidu.com/pypi/simple/
# 然后,居然和之前报了一模一样的错。。。仔细看setuptools是安装成功的日志,之前就已经有41.2.0,按理已经满足要求了。放弃百度源,直接从pypi安装。。。
$ pip3 install paddleocr==2.7.3 --timeout=3600
...
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 7.1 MB/s eta 0:00:00
Downloading pyparsing-3.1.2-py3-none-any.whl (103 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 103.2/103.2 kB 2.0 MB/s eta 0:00:00
Downloading python_dateutil-2.9.0.post0-py2.py3-none-any.whl (229 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 229.9/229.9 kB 2.1 MB/s eta 0:00:00
Downloading pytz-2024.1-py2.py3-none-any.whl (505 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 505.5/505.5 kB 2.0 MB/s eta 0:00:00
Downloading tzdata-2024.1-py2.py3-none-any.whl (345 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 345.4/345.4 kB 3.3 MB/s eta 0:00:00
Downloading urllib3-2.2.2-py3-none-any.whl (121 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 121.4/121.4 kB 1.7 MB/s eta 0:00:00
Downloading werkzeug-3.0.3-py3-none-any.whl (227 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 227.3/227.3 kB 3.0 MB/s eta 0:00:00
Downloading more_itertools-10.3.0-py3-none-any.whl (59 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 59.2/59.2 kB 643.0 kB/s eta 0:00:00
Downloading MarkupSafe-2.1.5-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (26 kB)
Downloading zipp-3.19.2-py3-none-any.whl (9.0 kB)
Building wheels for collected packages: fire, PyMuPDFb
Building wheel for fire (pyproject.toml) ... done
Created wheel for fire: filename=fire-0.6.0-py2.py3-none-any.whl size=118133 sha256=994ccbb5ea4ccfd4e66d298b86f670229de2a7160fe4fb29aef0b29f89ac8847
Stored in directory: /root/.cache/pip/wheels/f6/76/a0/afe23f6f3bc186630845efab23bc7a6e348204102070fdf465
Building wheel for PyMuPDFb (pyproject.toml) ... done
Created wheel for PyMuPDFb: filename=PyMuPDFb-1.24.6-py3-none-linux_aarch64.whl size=1030 sha256=d67b0517736cc33ba10e3da9fd17a020b2ad187a6cc9bff5e2b83741ab79f7a5
Stored in directory: /root/.cache/pip/wheels/8c/e2/9e/39ec1e4c2eec9a425032a4116757a2d2bf27ea2f2871da2cad
Successfully built fire PyMuPDFb
Installing collected packages: pytz, PyMuPDFb, pyclipper, lmdb, zipp, urllib3, tzdata, tqdm, tifffile, termcolor, soupsieve, six, shapely, scipy, rarfile, rapidfuzz, pyyaml, PyWavelets, pyparsing, PyMuPDF, pycryptodome, psutil, packaging, opencv-python-headless, opencv-python, opencv-contrib-python, networkx, more-itertools, MarkupSafe, lxml, kiwisolver, itsdangerous, imageio, future, fonttools, et-xmlfile, cython, cycler, cssselect, contourpy, click, charset-normalizer, cachetools, blinker, Babel, Werkzeug, requests, python-docx, python-dateutil, openpyxl, lazy_loader, Jinja2, importlib-resources, importlib-metadata, fire, cssutils, beautifulsoup4, bce-python-sdk, attrdict, scikit-image, premailer, pdf2docx, pandas, matplotlib, flask, imgaug, Flask-Babel, visualdl, paddleocr
Successfully installed Babel-2.15.0 Flask-Babel-4.0.0 Jinja2-3.1.4 MarkupSafe-2.1.5 PyMuPDF-1.24.7 PyMuPDFb-1.24.6 PyWavelets-1.4.1 Werkzeug-3.0.3 attrdict-2.0.1 bce-python-sdk-0.9.17 beautifulsoup4-4.12.3 blinker-1.8.2 cachetools-5.3.3 charset-normalizer-3.3.2 click-8.1.7 contourpy-1.1.1 cssselect-1.2.0 cssutils-2.11.1 cycler-0.12.1 cython-3.0.10 et-xmlfile-1.1.0 fire-0.6.0 flask-3.0.3 fonttools-4.53.0 future-1.0.0 imageio-2.34.2 imgaug-0.4.0 importlib-metadata-8.0.0 importlib-resources-6.4.0 itsdangerous-2.2.0 kiwisolver-1.4.5 lazy_loader-0.4 lmdb-1.4.1 lxml-5.2.2 matplotlib-3.7.5 more-itertools-10.3.0 networkx-3.1 opencv-contrib-python-4.6.0.66 opencv-python-4.6.0.66 opencv-python-headless-4.10.0.84 openpyxl-3.1.4 packaging-24.1 paddleocr-2.7.3 pandas-2.0.3 pdf2docx-0.5.8 premailer-3.10.0 psutil-6.0.0 pyclipper-1.3.0.post5 pycryptodome-3.20.0 pyparsing-3.1.2 python-dateutil-2.9.0.post0 python-docx-1.1.2 pytz-2024.1 pyyaml-6.0.1 rapidfuzz-3.9.3 rarfile-4.2 requests-2.32.3 scikit-image-0.21.0 scipy-1.10.1 shapely-2.0.4 six-1.16.0 soupsieve-2.5 termcolor-2.4.0 tifffile-2023.7.10 tqdm-4.66.4 tzdata-2024.1 urllib3-2.2.2 visualdl-2.5.3 zipp-3.19.2
# 走了proxy才安装成功,不然pypi直连的网速,可能几个小时都装不完。
3.2 验证paddleOCR是否安装成功
- 查看paddleocr是否安装成功
paddleocr -h
报错,缺少sqlite3模块。这个是python内置的数据库,编译python前,安装的依赖少了。(文档现已补上)
# 安装sqlite开发包
$ yum install sqlite-devel -y
# 进入源码目录
$ cd /home/Python-3.8.2/
# 配置安装目录
./configure --enable-optimizations --prefix=/usr/local/python3.8.2
# 重新编译
make
# 重新安装二进制文件
make install
# 验证sqlite3模块是否已安装
$ python3
> import slqite3
>
# 没有报错了,安装成功
# 重新运行paddleocr
$ paddleocr -h
-h, --help show this help message and exit
--use_gpu USE_GPU
...
# 正常显示帮助,顺利运行了
4. paddleOCR性能测试
4.1 paddleocr测试pdf识别性能
paddleocr --image_dir ./test.pdf --cpu_threads 8 --output rec.txt --use_gpu false --use_angle_cls false
download https://paddleocr.bj.bcebos.com/PP-OCRv4/chinese/ch_PP-OCRv4_det_infer.tar to /root/.paddleocr/whl/det/ch/ch_PP-OCRv4_det_infer/ch_PP-OCRv4_det_infer.tar
100%|███████████████████████████████████████████████████████████████████████████████████| 4.89M/4.89M [00:00<00:00, 16.1MiB/s]
download https://paddleocr.bj.bcebos.com/PP-OCRv4/chinese/ch_PP-OCRv4_rec_infer.tar to /root/.paddleocr/whl/rec/ch/ch_PP-OCRv4_rec_infer/ch_PP-OCRv4_rec_infer.tar
100%|███████████████████████████████████████████████████████████████████████████████████| 11.0M/11.0M [00:00<00:00, 28.4MiB/s]
download https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar to /root/.paddleocr/whl/cls/ch_ppocr_mobile_v2.0_cls_infer/ch_ppocr_mobile_v2.0_cls_infer.tar
100%|███████████████████████████████████████████████████████████████████████████████████| 2.19M/2.19M [00:00<00:00, 8.54MiB/s]
--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
0 inflateReset2
----------------------
Error Message Summary:
----------------------
FatalError: `Segmentation fault` is detected by the operating system.
[TimeInfo: *** Aborted at 1719492178 (unix time) try "date -d @1719492178" if you are using GNU date ***]
[SignalInfo: *** SIGSEGV (@0x10) received by PID 95061 (TID 0xffff081e47e0) from PID 16 ***]
# 理所当然,又又又又报错了。而且是一脸懵逼的错。
github上找issue,不能说一模一样吧,简直就是双胞胎
# 切换成2.5.2重新测试
$ pip3 install paddlepaddle-2.5.2-cp38-cp38-manylinux2014_aarch64.whl --timeout=3600
Processing ./paddlepaddle-2.5.2-cp38-cp38-manylinux2014_aarch64.whl
Requirement already satisfied: httpx in /usr/local/python3.8.2/lib/python3.8/site-packages (from paddlepaddle==2.5.2) (0.27.0)
...
Installing collected packages: paddlepaddle
Attempting uninstall: paddlepaddle
Found existing installation: paddlepaddle 2.6.1
Uninstalling paddlepaddle-2.6.1:
Successfully uninstalled paddlepaddle-2.6.1
Successfully installed paddlepaddle-2.5.2
# 安装成功,再上
$ paddleocr --image_dir ./test.pdf --cpu_threads 8 --output rec.txt --use_gpu false --use_angle_cls false
download https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar to /root/.paddleocr/whl/cls/ch_ppocr_mobile_v2.0_cls_infer/ch_ppocr_mobile_v2.0_cls_infer.tar
100%|███████████████████████████████████████████████████████████████████████████████████| 2.19M/2.19M [00:01<00:00, 2.07MiB/s]
[2024/06/27 21:13:10] ppocr DEBUG: Namespace(alpha=1.0, alphacolor=(255, 255, 255), benchmark=False, beta=1.0, binarize=False, ...
[2024/06/27 21:13:10] ppocr INFO: **********./test.pdf**********
Traceback (most recent call last):
File "/usr/local/python3.8.2/lib/python3.8/site-packages/paddle/utils/lazy_import.py", line 32, in try_import
mod = importlib.import_module(module_name)
File "/usr/local/python3.8.2/lib/python3.8/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 783, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/usr/local/python3.8.2/lib/python3.8/site-packages/fitz/__init__.py", line 2, in <module>
from pymupdf import *
File "/usr/local/python3.8.2/lib/python3.8/site-packages/pymupdf/__init__.py", line 29, in <module>
from . import extra
File "/usr/local/python3.8.2/lib/python3.8/site-packages/pymupdf/extra.py", line 10, in <module>
from . import _extra
ImportError: libmupdf.so.24.4: cannot open shared object file: No such file or directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/python3.8.2/bin/paddleocr", line 8, in <module>
sys.exit(main())
File "/usr/local/python3.8.2/lib/python3.8/site-packages/paddleocr/paddleocr.py", line 794, in main
result = engine.ocr(img_path,
File "/usr/local/python3.8.2/lib/python3.8/site-packages/paddleocr/paddleocr.py", line 647, in ocr
img = check_img(img)
File "/usr/local/python3.8.2/lib/python3.8/site-packages/paddleocr/paddleocr.py", line 527, in check_img
img, flag_gif, flag_pdf = check_and_read(image_file)
File "/usr/local/python3.8.2/lib/python3.8/site-packages/paddleocr/ppocr/utils/utility.py", line 112, in check_and_read
fitz = try_import("fitz")
File "/usr/local/python3.8.2/lib/python3.8/site-packages/paddle/utils/lazy_import.py", line 41, in try_import
raise ImportError(err_msg)
ImportError: Failed importing fitz. This likely means that some paddle modules require additional dependencies that have to be manually installed (usually with `pip install fitz`).
# 麻木了,Failed importing fitz,我敢说我见过无数次这个错误,就是fitz和PyMuPDF两个的爱恨情仇,复杂得要命。先不折腾了,明天再看,先用图片测试吧。
4.2 paddleocr测试图片识别性能
4.2.1 300DPI A4大小的png图片(473KB,2478px * 3503px)测试
paddleocr --image_dir ./test-1.png --cpu_threads 8 --output rec.txt --use_gpu false --use_angle_cls false
[2024/06/27 21:20:50] ppocr DEBUG: Namespace(alpha=1.0, alphacolor=(255, 255, 255), benchmark=False, beta=1.0, binarize=False, cls_batch_num=6, ...)
[2024/06/27 21:20:51] ppocr INFO: **********.test-1.png**********
[2024/06/27 21:20:52] ppocr DEBUG: dt_boxes num : 53, elapsed : 1.1459033489227295
[2024/06/27 21:21:07] ppocr DEBUG: rec_res num : 53, elapsed : 14.777923345565796
...
[2024/06/27 21:21:07] ppocr INFO: [[[878.0, 3251.0], [918.0, 3251.0], [918.0, 3280.0], [878.0, 3280.0]], ('友好', 0.9936521053314209)]
[2024/06/27 21:21:07] ppocr INFO: [[[1416.0, 3284.0], [1545.0, 3284.0], [1545.0, 3324.0], [1416.0, 3324.0]], ('dataset数据集', 0.9795007705688477)]
[2024/06/27 21:21:07] ppocr INFO: [[[1412.0, 3339.0], [1608.0, 3339.0], [1608.0, 3375.0], [1412.0, 3375.0]], ('train/lnfer训练和推理', 0.9572830200195312)]
测试结论:这是一张300DPI的A4大小的图片,在8线程鲲鹏920arm64(没有avx指令) 的cpu上,使用paddleOCR进行推理识别的耗时是16秒。
4.2.2 96DPI A4大小的png图片(95KB,793px * 1121px)测试
paddleocr --image_dir ./test/test-1.png --cpu_threads 8 --output rec.txt --use_gpu false --use_angle_cls false
[2024/06/27 21:50:26] ppocr DEBUG: Namespace(alpha=1.0, alphacolor=(255, 255, 255), benchmark=False, beta=1.0, binarize=False, ...)
[2024/06/27 21:50:27] ppocr INFO: **********./test/test-1.png**********
[2024/06/27 21:50:28] ppocr DEBUG: dt_boxes num : 12, elapsed : 1.122035026550293
[2024/06/27 21:50:36] ppocr DEBUG: rec_res num : 12, elapsed : 7.64125394821167
...
[2024/06/27 21:50:36] ppocr INFO: [[[38.0, 384.0], [398.0, 385.0], [398.0, 405.0], [38.0, 404.0]], ('总的来说,MindSpore是一个开源的全场景Al框架。', 0.9963896870613098)]
[2024/06/27 21:50:36] ppocr INFO: [[[38.0, 419.0], [431.0, 419.0], [431.0, 439.0], [38.0, 439.0]], ('再来个导图把基本概念先理一理,其中绿底为惊喜元素,', 0.9993771910667419)]
[2024/06/27 21:50:36] ppocr INFO: [[[439.0, 419.0], [558.0, 419.0], [558.0, 439.0], [439.0, 439.0]], ('红底为懵逼元素。', 0.9684547781944275)]
[2024/06/27 21:50:36] ppocr INFO: [[[39.0, 464.0], [176.0, 464.0], [176.0, 488.0], [39.0, 488.0]], ('1.1.1基本概念', 0.9995652437210083)]
测试结论:这是一张96DPI的A4大小的图片,在8线程鲲鹏920arm64(没有avx指令) 的cpu上,使用paddleOCR进行推理识别的耗时是9秒。但是由于图片本身模糊不清,对比300DPI的53个检测框,漏了41个检测框。好吧,这个图片不具有代表性。再测一下批量图片的性能。
4.2.3 96DPI A4大小的png图片单进程批量测试
$ paddleocr --image_dir ./test/ --cpu_threads 8 --output rec.txt --use_gpu false --use_angle_cls false
[2024/06/27 21:56:18] ppocr INFO: **********./test/test-1.png**********
[2024/06/27 21:56:19] ppocr DEBUG: dt_boxes num : 12, elapsed : 1.1205856800079346
[2024/06/27 21:56:27] ppocr DEBUG: rec_res num : 12, elapsed : 7.5928943157196045
...
[2024/06/27 22:02:08] ppocr INFO: **********./test/test-9.png**********
[2024/06/27 22:02:10] ppocr DEBUG: dt_boxes num : 63, elapsed : 1.1389813423156738
[2024/06/27 22:02:43] ppocr DEBUG: rec_res num : 63, elapsed : 32.98988652229309
...
[2024/06/27 22:02:43] ppocr INFO: [[[373.0, 1061.0], [719.0, 1061.0], [719.0, 1077.0], [373.0, 1077.0]], ('#创建一个Flatten层,用于将输入展平为一维向量', 0.9904438257217407)]
测试结论:17张96DPI A4大小的png图片单进程批量识别总耗时6分25秒,即505秒,平均每张29秒
4.2.4 96DPI A4大小的png图片8进程批量测试
$ paddleocr --image_dir ./test/ --enable_mkldnn false --cpu_threads 8 --output rec.txt --use_gpu false --use_angle_cls false --use_mp=True --total_process_num 8 --ir_optim true
...
[2024/06/27 22:19:25] ppocr INFO: **********./test/test-1.png**********
[2024/06/27 22:19:26] ppocr DEBUG: dt_boxes num : 12, elapsed : 1.1384954452514648
[2024/06/27 22:19:34] ppocr DEBUG: rec_res num : 12, elapsed : 7.640575885772705
...
[2024/06/27 22:25:52] ppocr INFO: [[[373.0, 1061.0], [719.0, 1061.0], [719.0, 1077.0], [373.0, 1077.0]], ('#创建一个Flatten层,用于将输入展平为一维向量', 0.9904438257217407)]
测试结论:17张96DPI A4大小的png图片8进程批量识别总耗时6分27秒,和单进程没什么两样。所以这个多进程的设置,到底有什么意义?
另外,由于本次测试的cpu没有avx指令集,无法开启mkldnn加速,整个过程始终只有一个cpu线程在工作。如果有mkldnn加速,把8个cpu线程全跑满,相信不会这样慢的。
4.2.5 300DPI A4大小的png图片8进程批量测试
paddleocr --image_dir ./test2/ --cpu_threads 1 --output rec.txt --use_gpu false --use_angle_cls false --use_mp=True --total_process_num 8 --enable_mkldnn false --ir_optim true
...
[2024/06/27 22:34:30] ppocr INFO: **********./test2/test-1.png**********
[2024/06/27 22:34:31] ppocr DEBUG: dt_boxes num : 53, elapsed : 1.156975507736206
[2024/06/27 22:34:46] ppocr DEBUG: rec_res num : 53, elapsed : 14.503132581710815
...
[2024/06/27 22:41:21] ppocr INFO: [[[1165.0, 3313.0], [2242.0, 3313.0], [2242.0, 3364.0], [1165.0, 3364.0]], (')#创建一个Flatten层,用于将输入展平为一维向量', 0.9875224828720093)]
测试结论:17张300DPI A4大小的png图片8进程批量识别总耗时6分50秒,和96DPI的差距并不明显。识别过程中应该是有自动缩放的算法,所以其实图片DPI对识别速度的影响几乎可以忽略不计。
5. 总结
- 鲲鹏920arm64 cpu没有avx指令集,无法启用cpu加速,现在平均识别一页A4纸约30秒。是否启用多进程,对识别性能影响几乎可以忽略不计
- 目前可以在鲲鹏920arm cpu上正常执行识别的一个组合环境为:
- 操作系统:uos server V20 1050e
- python3.8.2(需要内置sqlite模块)
- pip 20.2.2+
- paddlepaddle 2.5.2
- paddleocr 2.7.3
- 识别pdf,在python3.8以上需要手工处理
fitz
与PyMuPDF
的关系,需要安装某个特定版本才可以
更多推荐
所有评论(0)