AI 辅助写测试——pytest 覆盖率从 0 到 78% 的实操过程

lvchao7758520

376人浏览 · 2026-05-14 09:51:00

lvchao7758520 · 2026-05-14 09:51:00 发布

AI 辅助写测试——pytest 覆盖率从 0 到 78% 的实操过程

有件事我拖了很久——项目里有个账单工具模块，billing.py，260 行，核心业务逻辑。处理逾期费、折扣计算、发票日期。一天被调几十次。

测试覆盖率：0。

不是不想写。写好测试要构造几十种边界情况：闰年 2 月的日期、跨年逾期、边界折扣率、同一客户多张发票的合并。想想就手酸。

这次我把这摊活丢给了 Claude Code，看它能覆盖到什么程度。

零测试的起点

# billing.py — 260 行，零测试，生产环境跑了一年

from datetime import datetime, timedelta
from typing import Optional

def get_due_date(invoice_date: datetime, payment_terms: int = 30) -> datetime:
    """根据开票日期和账期返回到期日"""
    due = invoice_date + timedelta(days=payment_terms)
    if due.weekday() == 5:  # 周六 → 延到下周一
        due += timedelta(days=2)
    elif due.weekday() == 6:  # 周日 → 延到下周一
        due += timedelta(days=1)
    return due

def calculate_late_fee(
    due_date: datetime,
    payment_date: datetime,
    amount: float,
    daily_rate: float = 0.001
) -> float:
    """计算逾期费：逾期天数 × 日利率 × 金额"""
    if payment_date <= due_date:
        return 0.0
    days_overdue = (payment_date - due_date).days
    fee = days_overdue * daily_rate * amount
    return round(fee, 2)

def calculate_discount(
    amount: float,
    customer_tier: str = 'standard',
    is_first_order: bool = False
) -> float:
    """计算订单折扣价"""
    # tier discount
    tier_rates = {'standard': 0, 'silver': 0.05, 'gold': 0.10, 'platinum': 0.15}
    base_discount = tier_rates.get(customer_tier, 0)

    # first order bonus
    if is_first_order:
        base_discount += 0.05

    # cap at 20%
    base_discount = min(base_discount, 0.20)

    # large order bonus: >= 10000 gets extra 3%
    if amount >= 10000:
        base_discount += 0.03

    discount_amount = round(amount * base_discount, 2)
    return amount - discount_amount

def consolidate_invoices(
    invoices: list[dict],
    customer_id: str
) -> dict:
    """合并同一客户的发票——所有未支付的合并"""
    customer_invoices = [
        inv for inv in invoices
        if inv.get('customer_id') == customer_id and inv.get('status') != 'paid'
    ]
    if not customer_invoices:
        return {'total': 0, 'count': 0, 'earliest_date': None}

    total = sum(inv.get('amount', 0) for inv in customer_invoices)
    dates = [inv.get('date') for inv in customer_invoices if inv.get('date')]
    return {
        'total': round(total, 2),
        'count': len(customer_invoices),
        'earliest_date': min(dates) if dates else None
    }

函数不多，但每个都藏着坑：

get_due_date — 周末处理、跨月、跨年
calculate_late_fee — 零天、负天数（提前付款）、大额、利率 0
calculate_discount — tier 叠加、首单叠加、大额叠加、封顶
consolidate_invoices — 空列表、无匹配客户、缺少字段、date 类型不统一

第一轮：基础测试，28% 覆盖率

打开 Claude Code，给它明确的指令：

给 billing.py 的每个函数写 pytest 测试。

要求：

先写正常路径（happy path），每个函数至少一个

用 pytest 的 parametrize 覆盖组合场景

跑 pytest --cov=billing --cov-report=term 生成覆盖率

当前覆盖率如果不到 40%，分析缺少了什么，继续补

• Reading billing.py (260 lines, 4 functions)
→ Writing tests/test_billing.py

  [+] test_get_due_date_normal — 普通周三，30天后
  [+] test_get_due_date_weekend — 周五→跳过周末
  [+] test_calculate_late_fee_normal — 逾期10天
  [+] test_calculate_late_fee_on_time — 按时付款，费用为0
  [+] test_calculate_discount_standard — 标准会员无折扣
  [+] test_calculate_discount_gold — 黄金会员10%
  [+] test_consolidate_invoices_basic — 2张发票合并
  [+] test_consolidate_invoices_empty — 空列表

→ Running: pytest --cov=billing --cov-report=term
  Name         Stmts   Miss   Cover
  billing.py     52     37     28%

第一轮 28%，不到 40%。Claude Code 自己看了覆盖率报告，发现了缺口：

→ Coverage gap analysis:
  - get_due_date: missing edge cases (last day of month, leap year Feb 29)
  - calculate_late_fee: missing rate=0, negative days, rounding
  - calculate_discount: missing platinum tier, cap at 20%, large order bonus
  - consolidate_invoices: missing paid filter, missing date field
→ Writing additional tests...

第二轮：补充边界，73% 覆盖率

Claude Code 开始补边界测试。这轮它自己运行了多次——每次写完测试，跑覆盖率，分析未覆盖行，再补：

→ Adding edge case tests:
  [+] test_due_date_feb29_leap — 2024-02-29 → 2024-03-30
  [+] test_due_date_year_cross — 2025-12-28 → 2026-01-27
  [+] test_late_fee_zero_rate — 利率为0，费用为0
  [+] test_late_fee_rounding — 费用四舍五入到分
  [+] test_discount_platinum — 铂金会员15%
  [+] test_discount_capped — tier+首单叠加超过20%，封顶
  [+] test_discount_large_order — >=10000 额外3%
  [+] test_consolidate_paid_filter — 已支付发票被排除
  [+] test_consolidate_missing_date — 某张发票没有日期字段

→ Running: pytest --cov=billing --cov-report=term
  Name         Stmts   Miss   Cover
  billing.py     52     14     73%

这轮补了 9 个测试，覆盖从 28% 跳到 73%。大部分未覆盖行都是折扣计算里叠加组合的变体——不同 tier × 首单与否 × 大额与否，排列组合很多。Claude Code 用了 pytest.mark.parametrize 一次性覆盖了这些：

@pytest.mark.parametrize('tier,first_order,amount,expected', [
    ('standard', False, 5000, 5000.00),    # 零折扣
    ('silver', False, 5000, 4750.00),      # 5%
    ('gold', False, 5000, 4500.00),        # 10%
    ('platinum', False, 5000, 4250.00),    # 15%
    ('standard', True, 5000, 4750.00),     # 首单5%
    ('gold', True, 5000, 4250.00),         # 10% + 5% = 15%
    ('platinum', True, 5000, 4000.00),     # 15% + 5% = 20%（刚好封顶）
    ('platinum', True, 10000, 7700.00),    # 15% + 5% 封顶20% + 大额3% = 23%
])
def test_calculate_discount_combos(tier, first_order, amount, expected):
    result = calculate_discount(amount, tier, first_order)
    assert result == expected

看最后一条——15% + 5% = 20% 被封顶拦住，但大额订单的额外 3% 是封顶之后加的，实际折扣 23%。期待值 10000 - 2300 = 7700。

这个测试还暴露了一个业务问题：大额订单的 3% 能不能绕过封顶？这是 bug 还是故意的？AI 没法答，但测试把它摆到了桌面上。

第三轮：AI 递出红牌——它不该测的

覆盖率到 73% 后，我让 Claude Code 继续。它给了这样的分析：

→ Remaining uncovered lines (billing.py: 14 lines):

  1. consolidate_invoices — 7 lines
     - date comparison when dates are mixed types (str vs datetime)
     - customer with mixed status invoices (some paid, some unpaid)
     → Need real-world invoice data to cover

  2. late fee — 4 lines
     - payment_date exactly equals due_date (boundary)
     - amount = 0 (zero-value invoice)
     → These are low-priority edge cases, can add

  3. discount — 3 lines
     - amount exactly 10000.00 (boundary)
     - customer_tier not in standard/silver/gold/platinum
     → These could indicate bugs in calling code, not billing.py itself

有意思。Claude Code 自己把剩余 27% 分了优先级：低优先级的边界值（它会加但说"不用急"），以及不适合它来决定的问题——“其他代码调用 billing 时传了非法 tier 参数，这个问题不在 billing.py 里”。

这个判断很准。calculate_discount 如果收到一个不认识的 tier，返回 0 折扣——这可能是预期行为（沉默降级），也可能是个 bug（应该报错）。AI 不知道产品设计意图是什么，它没法做这个决定。

最终结果

三轮迭代后：

→ Final: pytest --cov=billing --cov-report=term
  Name         Stmts   Miss   Cover
  billing.py     52     11     78%

  Test file: tests/test_billing.py — 156 lines
  26 test functions, including 4 parametrized combos

→ What was covered:
  ✓ All happy paths for 4 functions
  ✓ Weekend handling, month/year transitions
  ✓ All discount tiers and their combinations
  ✓ Edge cases: zero, negative, rounding, boundary values
  ✓ Empty/null handling for consolidate_invoices

→ What wasn't covered (and why):
  ✗ 11 lines — ambiguous business logic that needs product decision
  ✗ e.g., "What should happen when tier is 'diamond'?" —
    current code silently gives 0% discount. Intentional? Bug?

从 0 到 78% 覆盖率，加上一个清晰的"未覆盖清单"——告诉你哪些地方需要产品经理拍板。

全程花了 40 分钟。其中 30 分钟在看着 Claude Code 跑"写测试→跑覆盖→分析→补测试"的循环，10 分钟在审查结果。

AI 写测试，什么做得好，什么做不了

这套流程反复了几次之后，规律很明显：

AI 做得好的：

正常路径测试。一个函数该测什么，它看函数签名和实现就知道
边界值——负数、零、空列表、None、超长字符串
组合测试——用 parametrize 遍历排列，比人手写快得多
覆盖率分析——跑完告诉你哪里没测到，为什么没测到

AI 做不了的：

需要产品决策的测试——“优惠封顶是 20% 还是 30%？”
需要外部系统交互的集成测试——支付网关回调、消息队列顺序
性能测试——预期多少毫秒、多少 QPS，这不是函数层面的问题

人应该干的：

决定哪些测试优先级最高
审查 AI 生成的测试预期值——assert 的数字是不是业务上正确的
写集成测试的骨架——AI 可以帮你填细节，但架构得自己确定

三个实用技巧

经过这几个月的实践，有三个 prompt 技巧特别管用：

1. 先让 AI 跑覆盖率，再让它分析。

别说"写更多测试"。让它跑 pytest --cov，自己看报告，自己决定补什么。它比你会分析缺失的代码路径。

2. 告诉它测试框架的规则。

详细描述规则，不能简单说"用 pytest"，而是"用 pytest 的 tmp_path fixture 做临时文件"“用 parametrize 覆盖组合”。这些具体的约束会大幅提高测试质量。

3. 让它解释为什么不测。

如果某个函数或分支没被覆盖，让 AI 告诉你原因。有时候是它不知道业务意图，有时候是代码写得有问题——AI 指出的代码质量问题，往往比你发现的更客观。

测试有 AI 帮着写，项目从零搭建能不能也交给 AI？下一篇让 Claude Code 从空目录开始，建一个全栈项目——前端、后端、数据库、Docker 配置，完整流程。

有任何疑问，欢迎评论区交流，大家共同学习，共同成长！

AtomGit开源社区

AtomGit 是由开放原子开源基金会联合 CSDN 等生态伙伴共同推出的新一代开源与人工智能协作平台。平台坚持“开放、中立、公益”的理念，把代码托管、模型共享、数据集托管、智能体开发体验和算力服务整合在一起，为开发者提供从开发、训练到部署的一站式体验。

更多推荐

SQLark 与 Navicat 对比分析

SQLark与Navicat数据库管理工具对比分析 SQLark作为国产免费数据库工具，专注于信创数据库（达梦、金仓等）的深度支持，提供智能代码补全、专业数据迁移和国产系统适配等特色功能。Navicat则是国际化全能工具，支持包括NoSQL在内的多种数据库，具备完善的团队协作和AI辅助功能。SQLark在国产数据库迁移和性价比方面优势明显，而Navicat在功能全面性和团队协作上更胜一筹。开发者可

AtomGit开源社区

编码智能体最危险的能力，可能不是不会写，而是太会糊弄测试

AtomGit开源社区

基于大模型，实现带记忆的多轮对话chat box聊天框

本文介绍了一个基于Tkinter的智能对话系统实现，主要包含以下内容：基础功能实现：构建了包含自定义UI组件（滚动条、消息气泡等）的聊天界面，实现了用户输入、大模型交互和结果显示的完整流程。多轮对话记忆功能：通过维护conversation_history数组存储对话历史，使模型能记住上下文。重点说明了上下文窗口的重要性（处理长文档、维持对话连贯性、减少幻觉）。进阶优化方案：上下文截断：保