把 Python 脚本“就地”升级成 Rust：一次 10 倍提速、内存减半的实战记录

维度	Rust	Go	C++	Julia
零成本抽象	✅	❌	✅	✅
与 Python 无缝交互	✅ PyO3/maturin	✅ cgo	✅ pybind11	❌
生态（NLP）	✅ tokenizers, rust-bert	一般	零散	✅
包管理	cargo	go mod	cmake	Pkg
学习曲线	中	低	高	中

结论：Rust 可以在不增加心智负担的前提下带来最大性能红利，且 PyO3 让 Python → Rust 的迁移粒度可以小到“一个函数”。

迁移策略：最小可交付单元（MVP）

Profiler 先行py-spy top -p $PID 发现 70% 时间耗在正则 + 字符串拷贝，20% 在 spaCy，剩下是 JSON 序列化。于是决定先替换正则 + 聚合逻辑。
接口对齐 Python 侧原函数签名：

   def extract_and_aggregate(texts: List[str]) -> List[Dict[str, Any]]:
       ...

Rust 侧用 PyO3 暴露同名函数，保证调用方 0 改动。

渐进式迁移
1. Week 1：核心正则 → Rust
2. Week 2：聚合逻辑 → Rust
3. Week 3：spaCy 模型用 rust-bert 替代
4. Week 4：端到端压测 & 灰度

核心代码片段

正则加速

Python 版代码（删减后）：

EMAIL_RE = re.compile(r"[a-z0-9\.\-+_]+@[a-z0-9\.\-+_]+\.[a-z]+")
for text in texts:
    emails = EMAIL_RE.findall(text.lower())

Rust 版代码：

use std::collections::HashMap;

use regex::Regex;
use pyo3::prelude::*;
use rayon::prelude::*;

#[pyfunction]
fn extract_and_aggregate(texts: Vec<String>) -> Vec<HashMap<String, PyObject>> {
    let re = Regex::new(r"(?i)[a-z0-9.\-+_]+@[a-z0-9.\-+_]+\.[a-z]+").unwrap();
    
    // 并行提取邮箱（纯 Rust 操作，不依赖 Python GIL）
    let email_lists: Vec<Vec<String>> = texts
        .into_par_iter()
        .map(|t| {
            re.find_iter(&t)
                .map(|m| m.as_str().to_lowercase())
                .collect()
        })
        .collect();
    
    Python::with_gil(|py| {
        email_lists
            .into_iter()
            .map(|emails| {
                let mut dict = HashMap::new();
                dict.insert("emails".to_string(), emails.into_py(py));
                dict
            })
            .collect()
    })
}

#[pymodule]
fn email_extractor(_py: Python<'_>, m: &Bound<'_, PyModule>) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(extract_and_aggregate, m)?)?;
    Ok(())
}

让我们编译一下，看一下编译截图：

让我们写一个简单的py测试程序测试这个功能：

import email_extractor
texts = [
    "请联系我：john.doe@EXAMPLE.COM 和 alice@test.org",
    "这段文本没有邮箱",
    "多个邮箱：user1@domain.com, USER2@DOMAIN.COM, test+tag@gmail.com"
]

print("📧 邮箱提取器测试")
print("输入文本：")
for i, text in enumerate(texts):
    print(f"  {i+1}. {text}")

# 调用 Rust 函数
result = email_extractor.extract_and_aggregate(texts)

print("\n✅ 提取结果：")
for i, item in enumerate(result):
    emails = item["emails"]
    print(f"  文本 {i+1}: {emails}")

print("\n🎉 测试成功！模块工作正常。")

然后运行一个这个测试程序：

内存优化：从 `String` 到 `&str`

Python 每次切片都拷贝 → 峰值 64 GB Rust 利用 &str 零拷贝 + Arc<str> 共享 → 峰值 26 GB

Benchmark：数字说话

指标	Python 3.9	Rust 1.78	提升
平均耗时	240 min	18 min	13×
峰值 RSS	64 GB	26 GB	2.5×
CPU 利用率	400%	1 400%	3.5×
代码行数	4 万	5 500	-86%

注：Rust 侧已开 lto = true, codegen-units = 1，Python 侧已开 pypy 无提升。

遇到的 3 个深坑 & 解法

GIL 与并行

问题：PyO3 默认持 GIL，Rayon 并行无效。解法：用 Python::allow_threads 释放 GIL。

Python::with_gil(|py| {
    py.allow_threads(|| {
        texts.into_par_iter().map(...).collect()
    })
});

JSON 序列化瓶颈

serde_json 默认 pretty 格式慢，改成 to_writer + BufWriter 后提升 2×。

内存碎片

jemalloc 在 musl 镜像里表现差，切到 mimalloc 后 RSS 再降 10%。

开发者体验：Rust 带来的额外红利

维度	Python	Rust
单元测试	pytest	cargo test 内建
文档	Sphinx	cargo doc --open
格式化	black	cargo fmt
静态分析	flake8	cargo clippy
交叉编译	复杂	cargo zigbuild

最惊喜的是 clippy：一次 CI 直接指出 3 处潜在 panic，上线前全部消灭。

经验总结：给后来者的 6 条建议

先用 py-spy / cProfile 找到真瓶颈，别一上来就重写全部。

PyO3 + maturin 让 Python ↔ Rust 混合开发非常丝滑，一次 pip install 即可。

Rayon 并行前一定 allow_threads，否则等于串行。

jemalloc vs mimalloc 在容器里差距很大，压测时记得对比。

把 Rust 侧拆成独立 crate，Python 侧只需 import rust_core。

Rust 不是银弹，但它在“CPU + 内存”双敏感场景，收益极高。