onnxruntime模型部署(一)-pythonAPI

onnxruntime

microsoft/onnxruntime: 是一个用于运行各种机器学习模型的开源库。适合对机器学习和深度学习有兴趣的人，特别是在开发和部署机器学习模型时需要处理各种不同框架和算子的人。特点是支持多种机器学习框架和算子，包括 TensorFlow、PyTorch、Caffe 等，具有高性能和广泛的兼容性。

项目地址：https://gitcode.com/gh_mirrors/on/onnxruntime

免费下载资源

战术摸鱼大师

3276人浏览 · 2024-02-29 22:51:25

战术摸鱼大师 · 2024-02-29 22:51:25 发布

onnx是由微软等多个机构联合发起的开放模型存储标准，onnxruntime是专为onnx设计的模型部署推理引擎
在这里插入图片描述

安装

python版本的安装使用pip安装即可。有nvidia gpu的可以安装gpu版本，没有的就安装cpu版本

pip install onnx
pip install onnxruntime # cpu版本
pip install onnxruntime-gpu # gpu版本
# 安装时一般都会自动匹配合适的版本，但是不排除意外情况。

安装时需要注意onnxruntime和cuda的版本对应关系，一般都不会错，出错自行百度onnxruntime和cuda版本对应关系。

python推理demo

onnxruntime提供了多种语言接口，包括Python，C++，Java等语言，常用的主要有Python和C++。

模型导出

通过pytorch训练好的模型导出为onnx格式。这个demo中是一个很简单的图像超分模型。

# Super Resolution model definition in PyTorch
import torch
import numpy as np
import torch.utils.model_zoo as model_zoo
import torch.nn as nn
import torch.nn.init as init


class SuperResolutionNet(nn.Module):
    def __init__(self, upscale_factor, inplace=False):
        super(SuperResolutionNet, self).__init__()

        self.relu = nn.ReLU(inplace=inplace)
        self.conv1 = nn.Conv2d(1, 64, (5, 5), (1, 1), (2, 2))
        self.conv2 = nn.Conv2d(64, 64, (3, 3), (1, 1), (1, 1))
        self.conv3 = nn.Conv2d(64, 32, (3, 3), (1, 1), (1, 1))
        self.conv4 = nn.Conv2d(32, upscale_factor ** 2, (3, 3), (1, 1), (1, 1))
        self.pixel_shuffle = nn.PixelShuffle(upscale_factor)

        self._initialize_weights()

    def forward(self, x):
        x = self.relu(self.conv1(x))
        x = self.relu(self.conv2(x))
        x = self.relu(self.conv3(x))
        x = self.pixel_shuffle(self.conv4(x))
        return x

    def _initialize_weights(self):
        init.orthogonal_(self.conv1.weight, init.calculate_gain('relu'))
        init.orthogonal_(self.conv2.weight, init.calculate_gain('relu'))
        init.orthogonal_(self.conv3.weight, init.calculate_gain('relu'))
        init.orthogonal_(self.conv4.weight)

# Create the super-resolution model by using the above model definition.
torch_model = SuperResolutionNet(upscale_factor=3)

# Load pretrained model weights
model_url = 'https://s3.amazonaws.com/pytorch/test_data/export/superres_epoch100-44c6958e.pth'
batch_size = 1    # just a random number

# Initialize model with the pretrained weights
map_location = lambda storage, loc: storage
if torch.cuda.is_available():
    map_location = None
torch_model.load_state_dict(model_zoo.load_url(model_url, map_location=map_location))

# set the model to inference mode
torch_model.eval()

# Input to the model
x = torch.randn(batch_size, 1, 224, 224, requires_grad=True)
torch_out = torch_model(x)

# Export the model
torch.onnx.export(torch_model,               # model being run
                  x,                         # model input (or a tuple for multiple inputs)
                  "super_resolution.onnx",   # where to save the model (can be a file or file-like object)
                  export_params=True,        # store the trained parameter weights inside the model file
                  opset_version=10,          # the ONNX version to export the model to
                  do_constant_folding=True,  # whether to execute constant folding for optimization
                  input_names = ['input'],   # the model's input names
                  output_names = ['output'], # the model's output names
                  dynamic_axes={'input' : {0 : 'batch_size'},    # variable length axes
                                'output' : {0 : 'batch_size'}})

onnxruntime推理

onnxruntime中加载onnx模型为InferenceSession对象，会自动读取模型的网络结构，不需要自己进行编写。

import numpy as np
import onnx
import onnxruntime as ort

ort_session = ort.InferenceSession("/home/wyq/hobby/model_deploy/onnx/export_onnx/super_resolution.onnx",
                                   providers=['CUDAExecutionProvider','CPUExecutionProvider'])
# onnxruntime中加载模型为InferenceSession对象，可以指定使用的provider，如CUDAExecutionProvider和CPUExecutionProvider
# 通过onnxruntime的InferenceSession对象，可以进行模型的推理
#还可以通过options参数指定模型的一些配置，如logid等

onnx通过加载模型时指定provider来选择推理设备，可以通过get_available_providers来获取可用的providers，providers是一个列表，按照先后顺序进行选择，在前一个provider无法使用的情况下会向后选择。

print(ort.get_available_providers()) # 获取onnxruntime支持的provider

onnxruntime

项目地址：https://gitcode.com/gh_mirrors/on/onnxruntime

onnx中存储数据的类型是OrtValue，可以通过ortvalue_from_numpy等函数来实现类型转换,生成ortvalue时可以添加参数设定ortvalue所在的设备，cuda代表存储于GPU，后面的0代表设备号。默认设备是cpu。

nparray = np.random.randn(1, 1, 224, 224).astype(np.float32)
X_ortvalue = ort.OrtValue.ortvalue_from_numpy(nparray,'cuda',0)
Y_tensor = torch.zeros(1, 1, 672, 672).cuda()
Y_ortvalue = ort.OrtValue.ortvalue_from_tensor(Y_tensor)

ortvalue的主要属性有：

print("ortvalue device:",ortvalue.device_name())  # 获取ortvalue对象所在的设备名称
print("ortvalue data type:",ortvalue.data_type())  # 获取ortvalue对象的数据类型
print("ortvalue shape:",ortvalue.shape())  # 获取ortvalue对象的形状
print("if ortvalue is tensor:",ortvalue.is_tensor())  # 判断ortvalue对象是否是tensor对象
np.array_equal(ortvalue.numpy(), nparray)  # 判断ortvalue对象的数据是否与numpy数组相等

有了数据和模型，onnxruntime的推理方式如下

result = ort_session.run(None, {'input': ortvalue})
# onnxruntime的InferenceSession对象的run方法可以进行模型推理，输入参数为模型的输入，返回值为模型的输出
# run方法的第一个参数为模型的输出名称，第二个参数为模型的输入，返回值为模型的输出

onnxruntime还支持另一种推理方式，即：绑定输入输出，通过io_binding来进行推理。
绑定时需要设置好输入输出的shape，否则会报错，输出的绑定可以不用管，onnxruntime会自动进行推测输出的shape并在设备上分配内存给输出。

个人理解这种io_binding做法的好处就是固定好输入输出的内存地址，避免不断allocated和free的过程，更加高效。有点像设计模式中的单例模式。

io_binding = ort_session.io_binding() # 创建io_binding对象
io_binding.bind_input('input',
                      device_type=X_ortvalue.device_name(),
                      device_id=0,
                      element_type=np.float32,
                      shape=X_ortvalue.shape(),
                      buffer_ptr=X_ortvalue.data_ptr()) # 绑定模型的输入
# buffer_ptr参数为tensor对象的数据指针，可以通过tensor对象的data_ptr方法获取
io_binding.bind_output('output') # 绑定模型的输出
#onnxruntime可以为output动态分配内存，也可以通过bind_output方法指定output的形状和数据类型
ort_session.run_with_iobinding(io_binding) # 使用io_binding方法进行模型推理

Y = io_binding.copy_outputs_to_cpu()[0] # 将模型的输出从GPU上拷贝到CPU上
print("Y shape:", Y.shape)

onnxruntime不仅可以使用ortvalue进行推理，还可以使用numpy中的array和pytorch中的tensor来进行推理。

#onnxruntime io_binding方法可以提高模型推理的效率，特别是在模型的输入和输出形状不变的情况下
#onnxruntime io_binding还可以绑定到pytorch的tensor对象上，可以通过pytorch的tensor对象的data_ptr方法获取数据指针
Y_tensor = torch.zeros(1, 1, 672, 672).cuda()
io_binding.bind_output(
    'output',
    device_type=Y_tensor.device.type,
    device_id=Y_tensor.device.index,
    element_type=np.float32,
    shape=Y_tensor.shape,
    buffer_ptr=Y_tensor.data_ptr())
ort_session.run_with_iobinding(io_binding)
print("Y_tensor shape:", Y_tensor.shape)

总结

这只是一个简单的demo，只涉及到了一些很常见的算子(operator)，对于复杂的模型，有一些算子是onnxruntime中没有的就需要我们自己去写。
后续会更新对于复杂模型，如何写一个自己的operator的教程。

点赞+收藏是作者更新的动力，感谢阅读！

阅读全文

AI总结

GitHub 加速计划 / on / onnxruntime

下载

最近提交(Master分支：7 个月前 )

554fb4ad ### Description [VitisAI EP] export InferShapes to VitisAIEP --------- Co-authored-by: Wang Chunye <chunywan@amd.com> Co-authored-by: Zhenze <zhenzew@xilinx.com> 20 小时前

b803429a ### Description Exclude WebGPU from Conv3D tests ### Motivation and Context Fix failing tests in packaging pipelines. 22 小时前

GitCode 开源社区

旨在为数千万中国开发者提供一个无缝且高效的云端环境，以支持学习、使用和贡献开源项目。

更多推荐

Dify：开源的大型语言模型应用开发平台深度解析

GitCode 开源社区

大语言模型的知识蒸馏研究综述

摘要——在大语言模型（LLMs）时代，知识蒸馏（KD）成为将GPT-4等领先专有大模型的高级能力迁移至LLaMA、Mistral等开源模型的核心方法。随着开源LLMs的蓬勃发展，KD不仅在这些模型的压缩过程中发挥关键作用，还能通过自我教学机制促进模型迭代优化。本文系统综述了KD在LLM领域的三重功能：向小模型传递高阶知识、实现模型压缩以及推动自我提升。研究围绕算法、技能和垂直领域三大支柱展开——深