基于大模型的数据库慢查询自动诊断与修复建议：从执行计划到索引优化的全链路

国医中兴

177人浏览 · 2026-06-11 10:25:51

国医中兴 · 2026-06-11 10:25:51 发布

基于大模型的数据库慢查询自动诊断与修复建议：从执行计划到索引优化的全链路

cover

一、慢查询的"黑箱诊断"：DBA 排障的效率瓶颈

数据库慢查询是生产环境最常见的性能问题。一个查询从"正常"变为"慢"，可能的原因包括：索引缺失、统计信息过期、数据倾斜、锁等待、缓冲池不足。传统的排障流程是：查看慢查询日志 → 分析执行计划 → 检查索引 → 调整参数。这个流程依赖 DBA 的经验，且每个步骤都需要手动执行。

大模型辅助的慢查询诊断，核心思路是：将查询 SQL、执行计划、表结构和索引信息输入大模型，自动分析性能瓶颈并生成修复建议。大模型可以理解执行计划中"全表扫描"与"索引扫描"的差异，判断"扫描行数远大于返回行数"意味着索引选择性差，并给出具体的索引创建语句。

二、AI 慢查询诊断的架构设计

flowchart TD
    A[慢查询日志] --> B[提取 SQL + 执行时间]
    B --> C[获取 EXPLAIN 执行计划]
    C --> D[收集表结构 + 索引信息]
    D --> E[大模型分析: 执行计划 + 上下文]
    E --> F{瓶颈类型}
    F -->|索引缺失| G[建议创建索引]
    F -->|统计信息过期| H[建议 ANALYZE TABLE]
    F -->|数据倾斜| I[建议分区或强制索引]
    F -->|锁等待| J[建议拆分事务或降低隔离级别]
    G & H & I & J --> K[生成修复 SQL + 风险评估]
    K --> L[人工审核后执行]

三、AI 慢查询诊断的代码实现

3.1 慢查询信息采集

import pymysql
from dataclasses import dataclass
from typing import List, Optional

@dataclass
class SlowQueryInfo:
    sql_text: str
    execution_time_ms: float
    rows_examined: int
    rows_sent: int
    explain_rows: List[dict]
    table_schema: dict  # 表结构
    index_info: dict    # 索引信息

class SlowQueryCollector:
    """慢查询信息采集器：从 MySQL 获取完整的诊断上下文"""

    def collect(self, sql: str, schema: str) -> SlowQueryInfo:
        conn = pymysql.connect(host='localhost', db=schema)

        with conn.cursor() as cursor:
            # 1. 获取执行计划
            cursor.execute(f"EXPLAIN FORMAT=JSON {sql}")
            explain_json = cursor.fetchone()[0]

            # 2. 获取涉及的表结构
            tables = self._extract_tables(sql)
            table_schema = {}
            for table in tables:
                cursor.execute(f"SHOW CREATE TABLE {table}")
                table_schema[table] = cursor.fetchone()[1]

            # 3. 获取索引信息
            index_info = {}
            for table in tables:
                cursor.execute(f"SHOW INDEX FROM {table}")
                indexes = cursor.fetchall()
                index_info[table] = [self._parse_index(idx) for idx in indexes]

            # 4. 获取表统计信息
            cursor.execute(
                f"SELECT TABLE_NAME, TABLE_ROWS, DATA_LENGTH, INDEX_LENGTH "
                f"FROM information_schema.TABLES WHERE TABLE_SCHEMA = %s",
                (schema,)
            )

        return SlowQueryInfo(
            sql_text=sql,
            execution_time_ms=0,
            rows_examined=0,
            rows_sent=0,
            explain_rows=explain_json,
            table_schema=table_schema,
            index_info=index_info,
        )

3.2 AI 诊断引擎

class SlowQueryDiagnoser:
    """大模型驱动的慢查询诊断引擎"""

    def diagnose(self, query_info: SlowQueryInfo) -> dict:
        prompt = self._build_diagnosis_prompt(query_info)
        response = self.llm_client.chat(prompt)
        return self._parse_diagnosis(response)

    def _build_diagnosis_prompt(self, info: SlowQueryInfo) -> str:
        return f"""
你是数据库性能优化专家。请分析以下慢查询并给出修复建议。

## 查询 SQL
```sql
{info.sql_text}

执行计划

{info.explain_rows}

表结构与索引

{self._format_schema_info(info.table_schema, info.index_info)}

分析要求

识别执行计划中的性能瓶颈（全表扫描、临时表、文件排序等）
判断瓶颈根因（索引缺失、统计信息过期、数据倾斜等）
给出具体的修复建议（包含 SQL 语句）
评估修复风险（索引创建对写入的影响、锁等待风险等）

输出格式

bottleneck: 瓶颈描述
root_cause: 根因分析
fix_suggestion: 修复建议（包含 SQL）
risk_assessment: 风险评估
priority: HIGH/MEDIUM/LOW
"""

def _format_schema_info(self, table_schema: dict, index_info: dict) -> str:
result = []
for table, ddl in table_schema.items():
result.append(f"### {table}\nsql\n{ddl}\n")
if table in index_info:
indexes = index_info[table]
result.append(f"索引: {', '.join(idx['name'] for idx in indexes)}")
return '\n'.join(result)


### 3.3 索引建议的风险评估

```python
class IndexRiskAssessor:
    """索引创建的风险评估器"""

    def assess(self, table: str, index_columns: List[str],
               table_rows: int, write_qps: int) -> dict:
        # 1. 写入性能影响：每个索引增加约 5-10% 的写入开销
        write_impact = len(index_columns) * 0.07 * write_qps

        # 2. 存储开销：索引大小约为数据量的 20-40%
        estimated_index_size_mb = table_rows * len(index_columns) * 50 / (1024 * 1024)

        # 3. 在线创建风险：大表创建索引可能锁表
        online_ddl_safe = table_rows < 10_000_000

        return {
            'write_impact_qps': write_impact,
            'estimated_index_size_mb': round(estimated_index_size_mb, 2),
            'online_ddl_safe': online_ddl_safe,
            'recommendation': (
                '可直接在线创建' if online_ddl_safe
                else '建议使用 pt-online-schema-change 或 gh-ost 在线创建'
            ),
        }