FAISS 向量维度不匹配：从 AssertionError 到 MRL 客户端截断的实践

xujianfei2851

371人浏览 · 2026-05-05 23:02:26

xujianfei2851 · 2026-05-05 23:02:26 发布

FAISS 向量维度不匹配：从 AssertionError 到 MRL 客户端截断的实践

问题现象

在使用 FAISS 构建向量索引时，遇到了一个看似简单的 AssertionError：

index.add_with_ids(vectors_np, vector_ids_np)

报错信息：

File "...\faiss\class_wrappers.py", line 293, in replacement_add_with_ids
    assert d == self.d
AssertionError

这个断言失败的含义很直接：向量的维度与索引声明的维度不一致。

代码背景

当时的代码大致如下：

# 调用 OpenAI 兼容 API 生成向量
completion = client.embeddings.create(
    model="text-embedding-qwen3-embedding-8b",
    input=text,
    dimensions=1024,  # 请求 1024 维
    encoding_format="float"
)
vector = completion.data[0].embedding

# FAISS 索引声明为 1024 维
dimension = 1024
index = faiss.IndexIDMap(faiss.IndexFlatL2(dimension))
index.add_with_ids(vectors_np, vector_ids_np)  # 💥 报错

看起来一切正常——请求了 1024 维，索引也声明为 1024 维。为什么还会维度不匹配？

根因分析

1. 模型实际返回了 4096 维

通过打印 vectors_np.shape，发现实际维度是 4096，而不是请求的 1024。

vectors_np = np.array(vectors_list).astype('float32')
print(vectors_np.shape)  # 输出: (4, 4096)

2. 为什么 `dimensions=1024` 被忽略了？

这涉及两个层面的原因：

模型层面：Qwen3-Embedding-8B 的原生输出维度确实是 4096。虽然该模型在训练时使用了 Matryoshka Representation Learning (MRL)，支持在客户端截断到更小的维度（如 256、512、1024 等），但并非所有推理后端都支持通过 OpenAI API 的 dimensions 参数进行服务端截断。

部署层面：LM Studio、Ollama、vLLM 等本地推理工具的 OpenAI 兼容层，很多并没有实现 dimensions 参数的处理逻辑。它们直接返回模型的原始输出，导致参数被静默忽略。

这与 OpenAI 官方的 text-embedding-3 系列不同——后者是在服务端完成截断后返回的。

3. 本地模型维度参考

模型	原生维度	MRL 支持	`dimensions` 参数有效性
text-embedding-3-small (OpenAI)	1536	✅	✅ 服务端截断
text-embedding-3-large (OpenAI)	3072	✅	✅ 服务端截断
Qwen3-Embedding-8B	4096	✅	❌ 取决于推理后端
nomic-embed-text	768	❌	❌
bge-m3	1024	✅	❌ 取决于推理后端

解决方案

方案一：动态获取维度（保底方案）

最简单直接的修复是让 FAISS 索引维度跟随实际向量维度：

vectors_np = np.array(vectors_list).astype('float32')
dimension = vectors_np.shape[1]  # 动态获取实际维度
print(f"向量实际维度: {dimension}")  # 4096

index = faiss.IndexIDMap(faiss.IndexFlatL2(dimension))

这个方案能立即解决问题，但如果模型维度很大（如 4096），会浪费存储和计算资源。

方案二：MRL 客户端截断（推荐方案）

既然模型支持 MRL，我们可以在客户端手动截断向量，既能降低维度，又能保持语义效果。

核心逻辑：截取前 N 维，然后重新 L2 归一化。

import numpy as np

TARGET_DIMENSION = 1024


def truncate_and_normalize(vector, target_dim=TARGET_DIMENSION):
    """
    MRL 截断：取前 target_dim 维，并重新 L2 归一化。
    适用于支持 Matryoshka Representation Learning 的模型。
    """
    arr = np.array(vector, dtype='float32')
    if arr.shape[-1] > target_dim:
        arr = arr[:target_dim]
    # L2 归一化：保证余弦相似度与内积等价
    norm = np.linalg.norm(arr)
    if norm > 0:
        arr = arr / norm
    return arr

使用方式：

# 文档入库前截断
vector = truncate_and_normalize(completion.data[0].embedding)
vectors_list.append(vector)

# 查询时同样截断
query_vector = np.array([truncate_and_normalize(query_embedding)]).astype('float32')
distances, ids = index.search(query_vector, k=2)

为什么必须归一化？

MRL 截断后，向量的长度会变短，导致相似度计算失真。L2 归一化将向量缩放到单位长度，使得点积等价于余弦相似度，保证搜索结果准确。

完整代码示例

import os
import sys
import io
import numpy as np
import faiss
from openai import OpenAI

# 强制 UTF-8 输出，防止 Windows 终端中文乱码
sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')

client = OpenAI(api_key="lm-studio", base_url="http://localhost:1234/v1")

TARGET_DIMENSION = 1024


def truncate_and_normalize(vector, target_dim=TARGET_DIMENSION):
    arr = np.array(vector, dtype='float32')
    if arr.shape[-1] > target_dim:
        arr = arr[:target_dim]
    norm = np.linalg.norm(arr)
    if norm > 0:
        arr = arr / norm
    return arr


documents = [
    {"id": "doc1", "text": "迪士尼乐园门票原则上不予退换..."},
    {"id": "doc2", "text": "购买奇妙年卡可享受多次入园..."},
]

vectors_list = []
for doc in documents:
    completion = client.embeddings.create(
        model="text-embedding-qwen3-embedding-8b",
        input=doc["text"],
        encoding_format="float"
    )
    # 客户端截断 + 归一化
    vector = truncate_and_normalize(completion.data[0].embedding)
    vectors_list.append(vector)

# FAISS 索引使用截断后的维度
vectors_np = np.array(vectors_list).astype('float32')
index = faiss.IndexIDMap(faiss.IndexFlatL2(TARGET_DIMENSION))
index.add_with_ids(vectors_np, np.arange(len(documents)))

print(f"索引已创建，维度: {TARGET_DIMENSION}, 向量数: {index.ntotal}")