基于 LiteLLM 接入 NVIDIA NIM 全指南(含 VSCode Claude Code 配置、踩坑与解决方案)

本文为作者本文全程实验测试,走通后由由chatgpt辅助生成,经作者本人审核修改后发布

如果你希望把 NVIDIA NIM(GLM / DeepSeek / Kimi 等模型)统一接入 VSCode、OpenAI SDK、Claude Code 插件,那么 LiteLLM 是目前非常高效的一套方案。【目前NIM可以白嫖1年的使用】


一、整体架构

VSCode Claude Code / Continue / OpenAI SDK
                ↓
           LiteLLM Proxy
                ↓
      NVIDIA NIM / OpenAI / Claude

作用:

1. 统一接口
2. 统一 API Key
3. 多模型切换
4. 兼容 VSCode 插件
5. 做企业内部模型网关

二、NVIDIA NIM 可接模型示例(太多了,具体请去build.nvidia.com上查看)

z-ai/glm5.1
z-ai/glm5
z-ai/glm4.7
deepseek-ai/deepseek-v3.2
moonshotai/kimi-k2.5

接口:

https://integrate.api.nvidia.com/v1

三、Docker Compose 部署 LiteLLM

基本步骤按照官方文档来即可:https://docs.litellm.ai/docs/proxy/docker_quick_start

docker-compose.yml【作者本人使用的】

services:
  litellm:
    build:
      context: .
      args:
        target: runtime
    image: ghcr.io/berriai/litellm:main-stable
    #########################################
    # Uncomment these lines to start proxy with a config.yaml file ##
    volumes:
      - ./config.yaml:/app/config.yaml
    command:
      - "--config=/app/config.yaml"
    ##############################################
    ports:
      - "4000:4000" # Map the container port to the host, change the host port if necessary
    environment:
      DATABASE_URL: "postgresql://llmproxy:dbpassword9090@db:5432/litellm"
      STORE_MODEL_IN_DB: "True" # allows adding models to proxy via UI
    env_file:
      - .env # Load local .env file
    depends_on:
      - db  # Indicates that this service depends on the 'db' service, ensuring 'db' starts first
    healthcheck:  # Defines the health check configuration for the container
      test:
        - CMD-SHELL
        - python3 -c "import urllib.request; urllib.request.urlopen('http://localhost:4000/health/liveliness')"  # Command to execute for health check
      interval: 30s  # Perform health check every 30 seconds
      timeout: 10s   # Health check command times out after 10 seconds
      retries: 3     # Retry up to 3 times if health check fails
      start_period: 40s  # Wait 40 seconds after container start before beginning health checks

  db:
    image: postgres:16
    restart: always
    container_name: litellm_db
    environment:
      POSTGRES_DB: litellm
      POSTGRES_USER: llmproxy
      POSTGRES_PASSWORD: dbpassword9090
    ports:
      - "5432:5432"
    volumes:
      - postgres_data:/var/lib/postgresql/data # Persists Postgres data across container restarts
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -d litellm -U llmproxy"]
      interval: 1s
      timeout: 5s
      retries: 10

  prometheus:
    image: prom/prometheus
    volumes:
      - prometheus_data:/prometheus
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - "9090:9090"
    command:
      - "--config.file=/etc/prometheus/prometheus.yml"
      - "--storage.tsdb.path=/prometheus"
      - "--storage.tsdb.retention.time=15d"
    restart: always

volumes:
  prometheus_data:
    driver: local
  postgres_data:
    name: litellm_postgres_data # Named volume for Postgres data persistence


四、.env 配置【作者本人的,替换下NVIDIA_NIM_API_KEY即可】

NVIDIA_NIM_API_KEY:这个属于自定义命名,和config.yaml中对应即可

LITELLM_MASTER_KEY=sk-1234
LITELLM_SALT_KEY=sk-1234
NVIDIA_NIM_API_KEY=nvapi-xxxxxx

五、LiteLLM config.yaml(推荐稳定版)

注意drop_params和additional_drop_params这两个参数,官方参考样例里没有写这两个参数,但使用NIM时需要配置这两个参数,因为claude的请求里会有output_config这个参数,LiteLLM也会封装这个参数,但NIM是不支持这个参数的,所以这两个参数配置后,就可以过滤掉output_config参数,不然调用NIM接口时会报错,提示不支持output_config
在这里插入图片描述

model_list:
  - model_name: nvidia-glm-5.1
    litellm_params: # all params accepted by litellm.completion() - https://docs.litellm.ai/docs/completion/input
      model: nvidia_nim/z-ai/glm5.1 ### MODEL NAME sent to `litellm.completion()` ###
      api_base: https://integrate.api.nvidia.com/v1
      api_key: os.environ/NVIDIA_NIM_API_KEY # does os.getenv("AZURE_API_KEY_EU")
      rpm: 6      # [OPTIONAL] Rate limit for this deployment: in requests per minute (rpm)
      drop_params: true
      additional_drop_params: ["output_config"]
  - model_name: nvidia-glm5
    litellm_params:
      model: nvidia_nim/z-ai/glm5
      api_base: https://integrate.api.nvidia.com/v1
      api_key: os.environ/NVIDIA_NIM_API_KEY
      rpm: 6
      drop_params: true
      additional_drop_params: ["output_config"]
  - model_name: nvidia-glm4.7
    litellm_params:
      model: nvidia_nim/z-ai/glm4.7
      api_base: https://integrate.api.nvidia.com/v1
      api_key: os.environ/NVIDIA_NIM_API_KEY
      rpm: 6
      drop_params: true
      additional_drop_params: ["output_config"]
  - model_name: nvidia-deepseek-v3.2
    litellm_params:
      model: nvidia_nim/deepseek-ai/deepseek-v3.2
      api_base: https://integrate.api.nvidia.com/v1
      api_key: os.environ/NVIDIA_NIM_API_KEY
      rpm: 6
      drop_params: true
      additional_drop_params: ["output_config"]
  - model_name: nvidia-kimi-k2.5
    litellm_params:
      model: nvidia_nim/moonshotai/kimi-k2.5
      api_base: https://integrate.api.nvidia.com/v1
      api_key: os.environ/NVIDIA_NIM_API_KEY
      rpm: 6
      drop_params: true
      additional_drop_params: ["output_config"]

  # Use this if you want to make requests to `claude-3-haiku-20240307`,`claude-3-opus-20240229`,`claude-2.1` without defining them on the config.yaml
  # Default models
  # Works for ALL Providers and needs the default provider credentials in .env
  # - model_name: "*" 
  #  litellm_params:
  #    model: "*"

general_settings: 
  master_key: sk-1234 # [OPTIONAL] Only use this if you to require all calls to contain this key (Authorization: Bearer sk-1234)
  database_url: "postgresql://llmproxy:dbpassword9090@db:5432/litellm"

litellm_settings:
  num_retries: 2 # retry call 3 times on each model_name (e.g. zephyr-beta)
  request_timeout: 30 # raise Timeout error if call takes longer than 10s. Sets litellm.request_timeout 


六、为什么 model 必须写 nvidia_nim/ 前缀

LiteLLM解析的需要,详见 https://docs.litellm.ai/docs/providers/nvidia_nim

错误写法:

model: z-ai/glm4.7

报错:

LLM Provider NOT provided

正确写法:

model: nvidia_nim/z-ai/glm4.7

七、VSCode Claude Code 插件配置(重点)

如果你想让 Claude Code 插件调用 LiteLLM,再由 LiteLLM 调用 NVIDIA NIM,可在 VSCode settings.json 中加入:

{
  "python.defaultInterpreterPath": "/bin/python",
  "http.systemCertificates": true,
  "claudeCode.preferredLocation": "panel",
  "claudeCode.environmentVariables": [
    {
      "name": "ANTHROPIC_BASE_URL",
      "value": "http://10.11.11.124:4000"
    },
    {
      "name": "ANTHROPIC_AUTH_TOKEN",
      "value": "sk-你的LiteLLM密钥"
    },
    {
      "name": "ANTHROPIC_DEFAULT_OPUS_MODEL",
      "value": "nvidia-glm4.7"
    },
    {
      "name": "ANTHROPIC_DEFAULT_SONNET_MODEL",
      "value": "nvidia-glm4.7"
    },
    {
      "name": "ANTHROPIC_DEFAULT_HAIKU_MODEL",
      "value": "nvidia-glm4.7"
    },
    {
      "name": "CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC",
      "value": "1"
    },
    {
      "name": "CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS",
      "value": "1"
    }
  ]
}

密钥:LiteLLM启动后,访问http://10.11.11.124:4000/ui/,然后在Virtual Keys页面创建密钥即可


八、这几个环境变量是什么意思(非常重要)


1. ANTHROPIC_BASE_URL

Claude 插件请求地址:

http://你的LiteLLM服务器:4000

2. ANTHROPIC_AUTH_TOKEN

LiteLLM 的 master_key:

sk-xxxx

3. 默认模型映射

OPUS   → nvidia-glm4.7
SONNET → nvidia-glm4.7
HAIKU  → nvidia-glm4.7

Claude 插件以为自己在调 Claude,实际走 GLM。


4. 禁用额外流量

CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1

关闭遥测、非必要请求。


5. 禁用实验特性(强烈建议)

CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS=1

避免发送:

output_config
thinking
beta params

这对 NIM 非常关键。


九、常见问题汇总


问题1:LLM Provider NOT provided

原因:

model: z-ai/glm4.7

解决:

model: nvidia_nim/z-ai/glm4.7

问题2:OPENAI_API_KEY missing

原因:

启用了:

model_name: "*"

LiteLLM 默认兜底走 OpenAI。

解决:删除 * 模型。


问题3:Unsupported parameter output_config

原因:

Claude 插件走 Anthropic 协议,会发送私有字段。

解决:

drop_params: true
additional_drop_params:
  - output_config

同时建议 VSCode 配:

CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS=1

问题4:LiteLLM UI 报错 crypto.randomUUID

解决:

升级 LiteLLM:

docker pull ghcr.io/berriai/litellm:main-stable

十、OpenAI SDK 调用示例

from openai import OpenAI

client = OpenAI(
    api_key="sk-1234",
    base_url="http://服务器IP:4000/v1"
)

resp = client.chat.completions.create(
    model="nvidia-glm4.7",
    messages=[{"role":"user","content":"你好"}]
)

print(resp.choices[0].message.content)

十一、生产环境建议


1. 做限流

rpm: 6

2. 多模型统一管理

glm4.7
glm5
deepseek-v3.2
kimi-k2.5

十二、个人实战总结(重点)

如果你是:

VSCode 用户
想用 Claude 插件
又想走国产/第三方模型

最佳方案就是:

Claude Code 插件
      ↓
LiteLLM
      ↓
NVIDIA NIM(GLM)

关键点只有三个:

1. provider 前缀写对
2. 去掉星号模型
3. 过滤 output_config

十三、最终推荐组合

VSCode Claude Code 插件(前端体验)
LiteLLM(协议转换)
NVIDIA NIM(模型能力)

十四、如果你也踩坑了

欢迎留言讨论:

你卡在 LiteLLM 配置?
Claude 插件参数?
VSCode 远程 SSH?
NIM Key?

Logo

AtomGit 是由开放原子开源基金会联合 CSDN 等生态伙伴共同推出的新一代开源与人工智能协作平台。平台坚持“开放、中立、公益”的理念,把代码托管、模型共享、数据集托管、智能体开发体验和算力服务整合在一起,为开发者提供从开发、训练到部署的一站式体验。

更多推荐