Elasticsearch 语义搜索实战:.NET 向量 + 关键词混合检索
📖 目录
-
[索引设计与 Mapping 配置](#3-索引设计与 mapping-配置)
1. 引言:为什么需要混合检索?
1.1 传统检索的困境
在企业级搜索场景中,我们常常面临这样的挑战:
场景一:纯关键词检索的局限
用户搜索:"粉色连衣裙 夏季"
传统 ES 匹配:必须包含"粉色"、"连衣裙"、"夏季"这些词
问题:无法理解"粉色裙子 夏天穿"这样的同义表达
场景二:纯向量检索的不足
用户搜索:"iPhone 15 Pro Max 256G"
向量检索:可能返回所有手机产品
问题:无法精确匹配具体型号和规格
1.2 混合检索的优势
混合检索 = 向量语义理解 + 关键词精确匹配 的完美结合
|
检索方式 |
准确率 |
召回率 |
适用场景 |
|---|---|---|---|
|
纯关键词 |
85% |
60% |
精确匹配、品牌型号 |
|
纯向量 |
70% |
85% |
语义理解、同义词 |
|
混合检索 |
92% |
90% |
综合场景 |
1.3 为什么选择 Elasticsearch 8.x?
-
✅ 原生向量支持:
dense_vector字段类型 -
✅ 高性能检索:HNSW 索引算法
-
✅ 混合查询灵活:Bool Query + Script Score
-
✅ 生态完善:监控、告警、可视化
-
✅ .NET 友好:NEST 客户端完善支持
2. Elasticsearch 8.x 向量功能全解析
2.1 Dense Vector 字段类型
{
"text_vector": {
"type": "dense_vector",
"dims": 1024,
"index": true,
"similarity": "cosine"
}
}
核心参数说明:
|
参数 |
说明 |
可选值 |
|---|---|---|
|
|
向量维度 |
BGE-M3: 1024, CLIP: 512 |
|
|
是否建立索引 |
true/false |
|
|
相似度算法 |
|
2.2 四种相似度算法对比
// 1. cosine(余弦相似度)- 最常用
// 公式:A·B / (||A|| * ||B||)
// 范围:[-1, 1],越接近 1 越相似
// 适用:已归一化的向量(如 BGE-M3)
// 2. dot_product(点积)
// 公式:A·B
// 范围:(-∞, +∞)
// 适用:已归一化向量,性能最优
// 3. l2_norm(欧氏距离)
// 公式:||A - B||
// 范围:[0, +∞),越小越相似
// 适用:空间距离计算
// 4. max_inner_product(最大内积)
// 公式:-(A·B)
// 适用:未归一化向量的近似最近邻
推荐选择:
-
BGE-M3 向量 →
cosine(已 L2 归一化) -
CLIP 向量 →
cosine(已 L2 归一化) -
自定义向量 → 根据是否归一化选择
2.3 KNN 检索 vs Script Score
方式一:KNN 检索(推荐)
{
"knn": {
"field": "text_vector",
"query_vector": [0.1, 0.2, ...],
"k": 10,
"num_candidates": 100
}
}
方式二:Script Score(灵活)
{
"script_score": {
"query": { "match_all": {} },
"script": {
"source": "cosineSimilarity(params.query_vector, 'text_vector') + 1.0",
"params": { "query_vector": [0.1, 0.2, ...] }
}
}
}
对比分析:
|
特性 |
KNN |
Script Score |
|---|---|---|
|
性能 |
⭐⭐⭐⭐⭐ |
⭐⭐⭐ |
|
灵活性 |
⭐⭐⭐ |
⭐⭐⭐⭐⭐ |
|
适用场景 |
纯向量检索 |
混合检索 |
3. 索引设计与 Mapping 配置
3.1 完整 Mapping 示例
PUT /content_vectors
{
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1,
"refresh_interval": "5s",
"index.similarity.top_k": 100
},
"mappings": {
"properties": {
"id": {
"type": "long"
},
"title": {
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"description": {
"type": "text",
"analyzer": "ik_max_word"
},
"content": {
"type": "text",
"analyzer": "ik_max_word"
},
"text_vector": {
"type": "dense_vector",
"dims": 1024,
"index": true,
"similarity": "cosine",
"index_options": {
"type": "hnsw",
"m": 16,
"ef_construction": 100
}
},
"image_vector": {
"type": "dense_vector",
"dims": 512,
"index": true,
"similarity": "cosine"
},
"category": {
"type": "keyword"
},
"tags": {
"type": "keyword"
},
"brand": {
"type": "keyword"
},
"price": {
"type": "float"
},
"create_time": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"status": {
"type": "integer"
}
}
}
}
3.2 HNSW 索引参数调优
"text_vector": {
"type": "dense_vector",
"index_options": {
"type": "hnsw",
"m": 16, // 每个节点的最大连接数(默认 16)
"ef_construction": 100 // 构建时的搜索深度(默认 100)
}
}
参数调优指南:
|
参数 |
调大效果 |
调小效果 |
推荐值 |
|---|---|---|---|
|
|
精度↑,内存↑,速度↓ |
精度↓,内存↓,速度↑ |
16-64 |
|
|
构建质量↑,时间↑ |
构建质量↓,时间↓ |
100-400 |
经验法则:
-
数据量 < 100万:
m=16,ef_construction=100 -
数据量 100-1000 万:
m=32,ef_construction=200 -
数据量 > 1000 万:
m=64,ef_construction=400
3.3 中文分词器配置
PUT /_analyze
{
"analyzer": "ik_max_word",
"text": "粉色连衣裙夏季新款"
}
返回结果:
{
"tokens": [
{ "token": "粉色", "position": 0 },
{ "token": "连衣裙", "position": 1 },
{ "token": "裙子", "position": 1 },
{ "token": "夏季", "position": 2 },
{ "token": "夏天", "position": 2 },
{ "token": "新款", "position": 3 }
]
}
4. .NET C# 集成实战
4.1 环境准备
安装 NuGet 包:
dotnet add package NEST --version 8.0.0
dotnet add package Microsoft.Extensions.Logging
配置连接:
using Nest;
using Microsoft.Extensions.DependencyInjection;
// Program.cs
var settings = new ConnectionSettings(new Uri("http://localhost:9200"))
.DefaultIndex("content_vectors")
.PrettyJson()
.EnableApiVersioningHeader();
var client = new ElasticClient(settings);
// 注册为单例
builder.Services.AddSingleton<IElasticClient>(client);
4.2 文档模型定义
using Nest;
using System;
namespace VectorSearch.Models
{
/// <summary>
/// Elasticsearch 文档模型
/// </summary>
[ElasticsearchType(Name = "content_vector")]
public class ContentVectorDocument
{
/// <summary>
/// 文档 ID
/// </summary>
[Keyword(IgnoreAbove = 100)]
public string Id { get; set; }
/// <summary>
/// 业务实体 ID(用于关联数据库)
/// </summary>
[Number(NumberType.Long)]
public long EntityId { get; set; }
/// <summary>
/// 标题(支持中文分词)
/// </summary>
[Text(Analyzer = "ik_max_word", SearchAnalyzer = "ik_smart")]
public string Title { get; set; }
/// <summary>
/// 描述
/// </summary>
[Text(Analyzer = "ik_max_word")]
public string Description { get; set; }
/// <summary>
/// 完整内容
/// </summary>
[Text(Analyzer = "ik_max_word")]
public string Content { get; set; }
/// <summary>
/// 文本向量(BGE-M3,1024 维)
/// </summary>
[DenseVector(Dims = 1024)]
public float[] TextVector { get; set; }
/// <summary>
/// 图片向量(CLIP,512 维)
/// </summary>
[DenseVector(Dims = 512)]
public float[] ImageVector { get; set; }
/// <summary>
/// 分类(精确匹配)
/// </summary>
[Keyword]
public string Category { get; set; }
/// <summary>
/// 标签数组
/// </summary>
[Keyword]
public string[] Tags { get; set; }
/// <summary>
/// 品牌
/// </summary>
[Keyword]
public string Brand { get; set; }
/// <summary>
/// 价格
/// </summary>
[Number(NumberType.Float)]
public float Price { get; set; }
/// <summary>
/// 创建时间
/// </summary>
[Date(Format = "yyyy-MM-dd HH:mm:ss")]
public DateTime CreateTime { get; set; }
/// <summary>
/// 状态(1=上架,0=下架)
/// </summary>
[Number(NumberType.Integer)]
public int Status { get; set; }
}
}
4.3 混合搜索服务实现
using Microsoft.Extensions.Logging;
using Nest;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
namespace VectorSearch.Services
{
/// <summary>
/// 混合搜索请求参数
/// </summary>
public class HybridSearchRequest
{
/// <summary>
/// 文本查询词
/// </summary>
public string QueryText { get; set; }
/// <summary>
/// 查询向量(可选)
/// </summary>
public float[] QueryVector { get; set; }
/// <summary>
/// 分类过滤(可选)
/// </summary>
public string Category { get; set; }
/// <summary>
/// 品牌过滤(可选)
/// </summary>
public List<string> Brands { get; set; }
/// <summary>
/// 价格区间(可选)
/// </summary>
public float? MinPrice { get; set; }
public float? MaxPrice { get; set; }
/// <summary>
/// 状态过滤
/// </summary>
public int Status { get; set; } = 1;
/// <summary>
/// 返回数量
/// </summary>
public int TopK { get; set; } = 10;
/// <summary>
/// 最小相似度阈值
/// </summary>
public double MinSimilarity { get; set; } = 0.5;
}
/// <summary>
/// 混合搜索结果
/// </summary>
public class HybridSearchResult
{
public long EntityId { get; set; }
public string Title { get; set; }
public string Description { get; set; }
public double SimilarityScore { get; set; }
public float Price { get; set; }
public string Category { get; set; }
public string[] Tags { get; set; }
}
/// <summary>
/// Elasticsearch 混合搜索服务
/// </summary>
public class ElasticsearchHybridSearchService
{
private readonly IElasticClient _client;
private readonly ILogger<ElasticsearchHybridSearchService> _logger;
public ElasticsearchHybridSearchService(
IElasticClient client,
ILogger<ElasticsearchHybridSearchService> logger)
{
_client = client;
_logger = logger;
}
/// <summary>
/// 执行混合搜索
/// </summary>
public async Task<List<HybridSearchResult>> SearchAsync(
HybridSearchRequest request,
CancellationToken ct = default)
{
var stopwatch = System.Diagnostics.Stopwatch.StartNew();
try
{
var searchDescriptor = new SearchDescriptor<ContentVectorDocument>()
.Size(request.TopK);
// 构建布尔查询
var boolQuery = new BoolQueryDescriptor<ContentVectorDocument>();
// ========== 1. 向量相似度查询(语义检索)==========
if (request.QueryVector != null && request.QueryVector.Length > 0)
{
boolQuery.Should(q => q
.ScriptScore(ss => ss
.Query(qq => qq.MatchAll())
.Script(s => s
.Source("cosineSimilarity(params.query_vector, 'text_vector') + 1.0")
.Params(p => p
.Add("query_vector", request.QueryVector)
)
)
)
);
}
// ========== 2. 关键词查询(精确匹配)==========
if (!string.IsNullOrWhiteSpace(request.QueryText))
{
boolQuery.Should(q => q
.MultiMatch(mm => mm
.Query(request.QueryText)
.Fields(f => f
.Field(fd => fd.Title.Suffix("keyword"), 3.0) // 标题精确匹配,权重 3
.Field(fd => fd.Title, 2.0) // 标题分词匹配,权重 2
.Field(fd => fd.Description, 1.5) // 描述,权重 1.5
.Field(fd => fd.Tags, 1.0) // 标签,权重 1
)
.Type(TextQueryType.BestFields)
.Fuzziness(Fuzziness.Auto)
)
);
}
// ========== 3. 过滤条件 ==========
// 状态过滤
boolQuery.Filter(f => f
.Term(t => t
.Field(d => d.Status)
.Value(request.Status)
)
);
// 分类过滤
if (!string.IsNullOrWhiteSpace(request.Category))
{
boolQuery.Filter(f => f
.Term(t => t
.Field(d => d.Category)
.Value(request.Category)
)
);
}
// 品牌过滤
if (request.Brands != null && request.Brands.Any())
{
boolQuery.Filter(f => f
.Terms(t => t
.Field(d => d.Brand)
.Terms(request.Brands.Select(b => b.ToLower()).Cast<object>())
)
);
}
// 价格区间过滤
if (request.MinPrice.HasValue || request.MaxPrice.HasValue)
{
boolQuery.Filter(f => f
.Range(r => r
.Field(d => d.Price)
.GreaterThanOrEquals(request.MinPrice ?? 0)
.LessThanOrEquals(request.MaxPrice ?? float.MaxValue)
)
);
}
searchDescriptor.Query(q => boolQuery);
// ========== 4. 执行搜索 ==========
var response = await _client.SearchAsync<ContentVectorDocument>(searchDescriptor, ct);
stopwatch.Stop();
_logger.LogInformation(
"ES 混合搜索完成:查询='{Query}', 耗时={ElapsedMs}ms, 命中={TotalHits}",
request.QueryText,
stopwatch.ElapsedMilliseconds,
response.Total
);
// ========== 5. 处理结果 ==========
return response.Hits
.Select(hit => new HybridSearchResult
{
EntityId = hit.Source.EntityId,
Title = hit.Source.Title,
Description = hit.Source.Description,
Price = hit.Source.Price,
Category = hit.Source.Category,
Tags = hit.Source.Tags,
SimilarityScore = hit.Score ?? 0
})
.OrderByDescending(r => r.SimilarityScore)
.ToList();
}
catch (Exception ex)
{
stopwatch.Stop();
_logger.LogError(ex,
"ES 混合搜索失败:查询='{Query}', 耗时={ElapsedMs}ms",
request.QueryText,
stopwatch.ElapsedMilliseconds
);
throw;
}
}
/// <summary>
/// 纯向量检索(高性能模式)
/// </summary>
public async Task<List<HybridSearchResult>> VectorSearchAsync(
float[] queryVector,
int topK = 10,
CancellationToken ct = default)
{
var response = await _client.SearchAsync<ContentVectorDocument>(s => s
.Size(topK)
.Query(q => q
.ScriptScore(ss => ss
.Query(qq => qq.MatchAll())
.Script(script => script
.Source("cosineSimilarity(params.query_vector, 'text_vector') + 1.0")
.Params(p => p.Add("query_vector", queryVector))
)
)
)
);
return response.Hits
.Select(hit => new HybridSearchResult
{
EntityId = hit.Source.EntityId,
Title = hit.Source.Title,
SimilarityScore = hit.Score ?? 0
})
.ToList();
}
/// <summary>
/// 纯关键词检索(兜底模式)
/// </summary>
public async Task<List<HybridSearchResult>> KeywordSearchAsync(
string queryText,
int topK = 10,
CancellationToken ct = default)
{
var response = await _client.SearchAsync<ContentVectorDocument>(s => s
.Size(topK)
.Query(q => q
.MultiMatch(mm => mm
.Query(queryText)
.Fields(f => f
.Field(fd => fd.Title, 2.0)
.Field(fd => fd.Description)
.Field(fd => fd.Tags)
)
)
)
);
return response.Hits
.Select(hit => new HybridSearchResult
{
EntityId = hit.Source.EntityId,
Title = hit.Source.Title,
SimilarityScore = hit.Score ?? 0
})
.ToList();
}
}
}
4.4 查询策略选择器
namespace VectorSearch.Services
{
/// <summary>
/// 搜索策略枚举
/// </summary>
public enum SearchStrategy
{
/// <summary>
/// 内存向量检索(最快)
/// </summary>
InMemory,
/// <summary>
/// Elasticsearch 混合检索(推荐)
/// </summary>
Elasticsearch,
/// <summary>
/// CLIP 多模态检索
/// </summary>
Clip
}
/// <summary>
/// 搜索策略选择器
/// </summary>
public class SearchStrategySelector
{
private readonly IConfiguration _configuration;
private readonly ElasticsearchHybridSearchService _esService;
private readonly InMemoryVectorSearchService _memoryService;
private readonly ILogger<SearchStrategySelector> _logger;
public SearchStrategySelector(
IConfiguration configuration,
ElasticsearchHybridSearchService esService,
InMemoryVectorSearchService memoryService,
ILogger<SearchStrategySelector> logger)
{
_configuration = configuration;
_esService = esService;
_memoryService = memoryService;
_logger = logger;
}
/// <summary>
/// 根据配置选择搜索策略
/// </summary>
public async Task<List<HybridSearchResult>> SearchAsync(
string queryText,
float[] queryVector,
int topK = 10,
CancellationToken ct = default)
{
var strategy = _configuration.GetValue<SearchStrategy>("Search:Strategy");
_logger.LogInformation("使用搜索策略:{Strategy}", strategy);
return strategy switch
{
SearchStrategy.Elasticsearch => await _esService.SearchAsync(new HybridSearchRequest
{
QueryText = queryText,
QueryVector = queryVector,
TopK = topK
}, ct),
SearchStrategy.InMemory => await _memoryService.SearchAsync(queryText, queryVector, topK, ct),
_ => await _esService.SearchAsync(new HybridSearchRequest
{
QueryText = queryText,
QueryVector = queryVector,
TopK = topK
}, ct)
};
}
}
}
5. 数据同步与双写策略
5.1 同步服务实现
using Microsoft.Extensions.Logging;
using Nest;
using System;
using System.Collections.Generic;
using System.Threading;
using System.Threading.Tasks;
namespace VectorSearch.Services
{
/// <summary>
/// 内容实体(来自数据库)
/// </summary>
public class ContentEntity
{
public long Id { get; set; }
public string Title { get; set; }
public string Description { get; set; }
public string Content { get; set; }
public string Category { get; set; }
public string Brand { get; set; }
public string[] Tags { get; set; }
public float Price { get; set; }
public DateTime CreateTime { get; set; }
public int Status { get; set; }
public float[] TextVector { get; set; } // BGE-M3 向量
public float[] ImageVector { get; set; } // CLIP 向量
}
/// <summary>
/// Elasticsearch 数据同步服务
/// </summary>
public class ElasticsearchSyncService
{
private readonly IElasticClient _client;
private readonly ILogger<ElasticsearchSyncService> _logger;
private readonly string _indexName;
public ElasticsearchSyncService(
IElasticClient client,
ILogger<ElasticsearchSyncService> logger,
string indexName = "content_vectors")
{
_client = client;
_logger = logger;
_indexName = indexName;
}
/// <summary>
/// 同步单个实体到 ES
/// </summary>
public async Task<bool> SyncToElasticsearchAsync(
ContentEntity entity,
CancellationToken ct = default)
{
try
{
if (entity == null)
{
_logger.LogWarning("同步实体为空");
return false;
}
// 转换为 ES 文档
var document = new ContentVectorDocument
{
Id = $"entity_{entity.Id}",
EntityId = entity.Id,
Title = entity.Title ?? "",
Description = entity.Description ?? "",
Content = entity.Content ?? "",
TextVector = entity.TextVector,
ImageVector = entity.ImageVector,
Category = entity.Category ?? "",
Brand = entity.Brand ?? "",
Tags = entity.Tags ?? Array.Empty<string>(),
Price = entity.Price,
CreateTime = entity.CreateTime,
Status = entity.Status
};
// 索引到 ES
var response = await _client.IndexAsync(document, idx => idx
.Index(_indexName)
.Id(document.Id)
.Refresh(Refresh.WaitFor)
, ct);
if (!response.IsValid)
{
_logger.LogError("ES 索引失败:{Error}", response.DebugInformation);
return false;
}
_logger.LogDebug("ES 同步成功:EntityId={Id}", entity.Id);
return true;
}
catch (Exception ex)
{
_logger.LogError(ex, "ES 同步异常:EntityId={Id}", entity.Id);
return false;
}
}
/// <summary>
/// 批量同步实体到 ES
/// </summary>
public async Task<bool> BulkSyncToElasticsearchAsync(
List<ContentEntity> entities,
CancellationToken ct = default)
{
try
{
if (entities == null || entities.Count == 0)
{
_logger.LogWarning("批量同步实体列表为空");
return false;
}
var bulkDescriptor = new BulkDescriptor();
foreach (var entity in entities)
{
var document = new ContentVectorDocument
{
Id = $"entity_{entity.Id}",
EntityId = entity.Id,
Title = entity.Title ?? "",
Description = entity.Description ?? "",
Content = entity.Content ?? "",
TextVector = entity.TextVector,
ImageVector = entity.ImageVector,
Category = entity.Category ?? "",
Brand = entity.Brand ?? "",
Tags = entity.Tags ?? Array.Empty<string>(),
Price = entity.Price,
CreateTime = entity.CreateTime,
Status = entity.Status
};
bulkDescriptor.Index<ContentVectorDocument>(i => i
.Index(_indexName)
.Document(document)
.Id(document.Id)
);
}
var response = await _client.BulkAsync(bulkDescriptor, ct);
if (!response.IsValid)
{
_logger.LogError("ES 批量索引失败:{Error}", response.DebugInformation);
return false;
}
_logger.LogInformation(
"ES 批量同步成功:总数={Total}, 成功={Success}, 失败={Failed}",
entities.Count,
response.ItemsSuccessful,
entities.Count - response.ItemsSuccessful
);
return response.ItemsSuccessful == entities.Count;
}
catch (Exception ex)
{
_logger.LogError(ex, "ES 批量同步异常");
return false;
}
}
/// <summary>
/// 从 ES 删除实体
/// </summary>
public async Task<bool> DeleteFromElasticsearchAsync(
long entityId,
CancellationToken ct = default)
{
try
{
var response = await _client.DeleteAsync<ContentVectorDocument>(
$"entity_{entityId}",
d => d.Index(_indexName),
ct
);
if (!response.IsValid && response.HttpStatusCode != 404)
{
_logger.LogError("ES 删除失败:EntityId={Id}, Error={Error}",
entityId, response.DebugInformation);
return false;
}
_logger.LogDebug("ES 删除成功:EntityId={Id}", entityId);
return true;
}
catch (Exception ex)
{
_logger.LogError(ex, "ES 删除异常:EntityId={Id}", entityId);
return false;
}
}
/// <summary>
/// 更新 ES 文档的部分字段
/// </summary>
public async Task<bool> UpdatePartialAsync(
long entityId,
object partialDocument,
CancellationToken ct = default)
{
try
{
var response = await _client.UpdateAsync<ContentVectorDocument, object>(
$"entity_{entityId}",
u => u
.Index(_indexName)
.Doc(partialDocument)
.RetryOnConflict(3)
, ct
);
if (!response.IsValid)
{
_logger.LogError("ES 部分更新失败:EntityId={Id}, Error={Error}",
entityId, response.DebugInformation);
return false;
}
_logger.LogDebug("ES 部分更新成功:EntityId={Id}", entityId);
return true;
}
catch (Exception ex)
{
_logger.LogError(ex, "ES 部分更新异常:EntityId={Id}", entityId);
return false;
}
}
}
}
5.2 双写模式集成
/// <summary>
/// 内容服务(双写模式示例)
/// </summary>
public class ContentService
{
private readonly IRepository<ContentEntity> _repository;
private readonly ElasticsearchSyncService _esSyncService;
private readonly BgeM3EmbeddingGenerator _embeddingGenerator;
private readonly ILogger<ContentService> _logger;
public ContentService(
IRepository<ContentEntity> repository,
ElasticsearchSyncService esSyncService,
BgeM3EmbeddingGenerator embeddingGenerator,
ILogger<ContentService> logger)
{
_repository = repository;
_esSyncService = esSyncService;
_embeddingGenerator = embeddingGenerator;
_logger = logger;
}
/// <summary>
/// 创建内容(双写:DB + ES)
/// </summary>
public async Task<long> CreateAsync(ContentEntity entity)
{
// 1. 生成向量
string textForEmbedding = $"{entity.Title} {entity.Description} {entity.Content}";
entity.TextVector = _embeddingGenerator.GenerateEmbedding(textForEmbedding);
// 2. 写入数据库
var id = await _repository.InsertAsync(entity);
// 3. 异步同步到 ES(不阻塞主流程)
_ = Task.Run(async () =>
{
try
{
await _esSyncService.SyncToElasticsearchAsync(entity);
}
catch (Exception ex)
{
_logger.LogError(ex, "异步同步 ES 失败:EntityId={Id}", id);
// 可以加入重试队列
}
});
_logger.LogInformation("内容创建成功:Id={Id}", id);
return id;
}
/// <summary>
/// 更新内容(双写:DB + ES)
/// </summary>
public async Task<bool> UpdateAsync(ContentEntity entity)
{
// 1. 更新数据库
var success = await _repository.UpdateAsync(entity);
if (!success) return false;
// 2. 重新生成向量
string textForEmbedding = $"{entity.Title} {entity.Description} {entity.Content}";
entity.TextVector = _embeddingGenerator.GenerateEmbedding(textForEmbedding);
// 3. 同步到 ES
return await _esSyncService.SyncToElasticsearchAsync(entity);
}
/// <summary>
/// 删除内容(双删:DB + ES)
/// </summary>
public async Task<bool> DeleteAsync(long id)
{
// 1. 删除数据库(软删除)
var success = await _repository.DeleteAsync(id);
if (!success) return false;
// 2. 从 ES 删除
return await _esSyncService.DeleteFromElasticsearchAsync(id);
}
}
6. 性能优化与监控
6.1 批量索引优化
/// <summary>
/// 批量索引优化配置
/// </summary>
public class BulkIndexOptions
{
/// <summary>
/// 批量大小(默认 1000)
/// </summary>
public int BulkSize { get; set; } = 1000;
/// <summary>
/// 并发度(默认 4)
/// </summary>
public int Concurrency { get; set; } = 4;
/// <summary>
/// 刷新间隔(默认 -1,不刷新)
/// </summary>
public int RefreshInterval { get; set; } = -1;
}
/// <summary>
/// 高性能批量索引服务
/// </summary>
public class BulkIndexService
{
private readonly IElasticClient _client;
private readonly ILogger<BulkIndexService> _logger;
public BulkIndexService(IElasticClient client, ILogger<BulkIndexService> logger)
{
_client = client;
_logger = logger;
}
/// <summary>
/// 高性能批量索引
/// </summary>
public async Task<bool> BulkIndexAsync(
List<ContentVectorDocument> documents,
BulkIndexOptions options = null,
CancellationToken ct = default)
{
options ??= new BulkIndexOptions();
var stopwatch = System.Diagnostics.Stopwatch.StartNew();
_logger.LogInformation("开始批量索引:总数={Count}", documents.Count);
try
{
// 1. 临时调整刷新间隔
if (options.RefreshInterval > 0)
{
await _client.Indices.RefreshAsync(Indices.Index<ContentVectorDocument>());
}
// 2. 分批处理
var batches = documents
.Select((doc, index) => new { doc, index })
.GroupBy(x => x.index / options.BulkSize)
.Select(g => g.Select(x => x.doc).ToList())
.ToList();
_logger.LogInformation("分批次:{BatchCount}, 每批大小:{BatchSize}",
batches.Count, options.BulkSize);
// 3. 并发批量索引
var tasks = batches.Select(async batch =>
{
var bulkDescriptor = new BulkDescriptor();
foreach (var doc in batch)
{
bulkDescriptor.Index<ContentVectorDocument>(i => i
.Document(doc)
.Id(doc.Id)
);
}
var response = await _client.BulkAsync(bulkDescriptor, ct);
if (!response.IsValid)
{
_logger.LogError("批量索引失败:{Error}", response.DebugInformation);
throw new Exception($"ES 批量索引失败:{response.DebugInformation}");
}
return response.ItemsSuccessful;
});
var results = await Task.WhenAll(tasks);
var totalSuccess = results.Sum();
stopwatch.Stop();
_logger.LogInformation(
"批量索引完成:总数={Total}, 成功={Success}, 耗时={ElapsedMs}ms, QPS={QPS}",
documents.Count,
totalSuccess,
stopwatch.ElapsedMilliseconds,
documents.Count * 1000.0 / stopwatch.ElapsedMilliseconds
);
// 4. 恢复刷新间隔
if (options.RefreshInterval > 0)
{
await _client.Indices.RefreshAsync(Indices.Index<ContentVectorDocument>());
}
return totalSuccess == documents.Count;
}
catch (Exception ex)
{
stopwatch.Stop();
_logger.LogError(ex, "批量索引异常:耗时={ElapsedMs}ms", stopwatch.ElapsedMilliseconds);
throw;
}
}
}
6.2 查询性能监控
/// <summary>
/// ES 查询性能监控中间件
/// </summary>
public class ElasticsearchPerformanceMonitor
{
private readonly ILogger<ElasticsearchPerformanceMonitor> _logger;
private readonly IMetricsService _metricsService;
public ElasticsearchPerformanceMonitor(
ILogger<ElasticsearchPerformanceMonitor> logger,
IMetricsService metricsService)
{
_logger = logger;
_metricsService = metricsService;
}
/// <summary>
/// 记录查询性能
/// </summary>
public void RecordQuery(string queryType, long elapsedMs, long totalHits, bool success)
{
// 记录指标
_metricsService.RecordHistogram(
"elasticsearch.query.duration",
elapsedMs,
new Dictionary<string, string>
{
{ "query_type", queryType },
{ "success", success.ToString() }
}
);
// 慢查询告警
if (elapsedMs > 1000)
{
_logger.LogWarning(
"慢查询告警:类型={Type}, 耗时={ElapsedMs}ms, 命中={Hits}",
queryType, elapsedMs, totalHits
);
}
// 错误告警
if (!success)
{
_logger.LogError(
"查询失败:类型={Type}, 耗时={ElapsedMs}ms",
queryType, elapsedMs
);
}
}
}
/// <summary>
/// 性能监控装饰器
/// </summary>
public class MonitoredElasticsearchService : ElasticsearchHybridSearchService
{
private readonly ElasticsearchPerformanceMonitor _monitor;
public MonitoredElasticsearchService(
IElasticClient client,
ILogger<MonitoredElasticsearchService> logger,
ElasticsearchPerformanceMonitor monitor)
: base(client, logger)
{
_monitor = monitor;
}
public override async Task<List<HybridSearchResult>> SearchAsync(
HybridSearchRequest request,
CancellationToken ct = default)
{
var stopwatch = System.Diagnostics.Stopwatch.StartNew();
bool success = false;
long totalHits = 0;
try
{
var results = await base.SearchAsync(request, ct);
totalHits = results.Count;
success = true;
return results;
}
finally
{
stopwatch.Stop();
_monitor.RecordQuery(
"hybrid_search",
stopwatch.ElapsedMilliseconds,
totalHits,
success
);
}
}
}
6.3 健康检查与告警
/// <summary>
/// ES 健康检查
/// </summary>
public class ElasticsearchHealthCheck
{
private readonly IElasticClient _client;
private readonly ILogger<ElasticsearchHealthCheck> _logger;
public ElasticsearchHealthCheck(
IElasticClient client,
ILogger<ElasticsearchHealthCheck> logger)
{
_client = client;
_logger = logger;
}
/// <summary>
/// 检查 ES 集群健康状态
/// </summary>
public async Task<bool> CheckHealthAsync(CancellationToken ct = default)
{
try
{
var healthResponse = await _client.Cluster.HealthAsync(ct);
if (!healthResponse.IsValid)
{
_logger.LogError("ES 健康检查失败:{Error}", healthResponse.DebugInformation);
return false;
}
var status = healthResponse.Status.ToString().ToLower();
if (status == "red")
{
_logger.Critical("ES 集群状态:RED - 部分数据不可用");
return false;
}
else if (status == "yellow")
{
_logger.LogWarning("ES 集群状态:YELLOW - 副本分片未分配");
return true; // 警告但可用
}
else
{
_logger.LogDebug("ES 集群状态:GREEN - 健康");
return true;
}
}
catch (Exception ex)
{
_logger.LogError(ex, "ES 健康检查异常");
return false;
}
}
/// <summary>
/// 检查索引状态
/// </summary>
public async Task<bool> CheckIndexHealthAsync(string indexName, CancellationToken ct = default)
{
try
{
var statsResponse = await _client.Indices.StatsAsync(Indices.Index(indexName), ct);
if (!statsResponse.IsValid)
{
_logger.LogError("索引状态检查失败:{Error}", statsResponse.DebugInformation);
return false;
}
var docCount = statsResponse.Total?.Documents?.Count ?? 0;
_logger.LogDebug("索引文档数:{Count}", docCount);
return docCount > 0;
}
catch (Exception ex)
{
_logger.LogError(ex, "索引状态检查异常");
return false;
}
}
}
7. 生产环境部署指南
7.1 Docker Compose 部署
version: '3.8'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:8.11.0
container_name: elasticsearch
environment:
- discovery.type=single-node
- xpack.security.enabled=false
- "ES_JAVA_OPTS=-Xms2g -Xmx2g"
- cluster.name=es-cluster
ports:
- "9200:9200"
- "9300:9300"
volumes:
- es_data:/usr/share/elasticsearch/data
deploy:
resources:
limits:
memory: 4G
healthcheck:
test: ["CMD-SHELL", "curl -f http://localhost:9200/_cluster/health || exit 1"]
interval: 30s
timeout: 10s
retries: 5
kibana:
image: docker.elastic.co/kibana/kibana:8.11.0
container_name: kibana
environment:
- ELASTICSEARCH_HOSTS=http://elasticsearch:9200
ports:
- "5601:5601"
depends_on:
- elasticsearch
volumes:
es_data:
driver: local
7.2 生产配置优化
# elasticsearch.yml
cluster.name: production-es-cluster
node.name: es-node-1
# 内存锁定
bootstrap.memory_lock: true
# 网络配置
network.host: 0.0.0.0
http.port: 9200
transport.port: 9300
# 发现配置
discovery.type: single-node
# 关闭安全(内网环境)
xpack.security.enabled: false
# 慢查询日志
index.search.slowlog.threshold.query.warn: 10s
index.search.slowlog.threshold.query.info: 5s
index.search.slowlog.threshold.fetch.warn: 1s
index.indexing.slowlog.threshold.index.warn: 10s
7.3 .NET 应用配置
// appsettings.json
{
"Elasticsearch": {
"Uri": "http://localhost:9200",
"IndexName": "content_vectors",
"NumberOfShards": 3,
"NumberOfReplicas": 1,
"RequestTimeout": 30000,
"PoolSize": 10
},
"Search": {
"Strategy": "Elasticsearch",
"DefaultTopK": 10,
"MinSimilarity": 0.5,
"EnableCache": true,
"CacheExpirationMinutes": 30
},
"VectorGeneration": {
"BatchSize": 100,
"MaxConcurrency": 4
}
}
// Program.cs
using Nest;
using VectorSearch.Services;
var builder = WebApplication.CreateBuilder(args);
// ES 客户端配置
var esConfig = builder.Configuration.GetSection("Elasticsearch");
var settings = new ConnectionSettings(new Uri(esConfig["Uri"]))
.DefaultIndex(esConfig["IndexName"])
.PrettyJson()
.EnableApiVersioningHeader()
.RequestTimeout(TimeSpan.FromMilliseconds(int.Parse(esConfig["RequestTimeout"])))
.ConnectionPool(new StaticConnectionPool(new[] { new Uri(esConfig["Uri"]) }))
.MaxRetryTimeout(TimeSpan.FromMinutes(2));
var client = new ElasticClient(settings);
builder.Services.AddSingleton<IElasticClient>(client);
// 注册服务
builder.Services.AddScoped<ElasticsearchHybridSearchService>();
builder.Services.AddScoped<ElasticsearchSyncService>();
builder.Services.AddScoped<BulkIndexService>();
builder.Services.AddScoped<ElasticsearchHealthCheck>();
builder.Services.AddScoped<SearchStrategySelector>();
// 健康检查
builder.Services.AddHealthChecks()
.AddCheck<ElasticsearchHealthCheck>("elasticsearch");
var app = builder.Build();
// 健康检查端点
app.MapHealthChecks("/health");
app.Run();
8. 总结与最佳实践
8.1 核心要点回顾
-
混合检索优势
-
向量语义理解 + 关键词精确匹配
-
准确率提升 10-15%
-
召回率提升 20-30%
-
-
性能优化关键
-
HNSW 索引参数调优
-
批量索引(1000 条/批)
-
异步双写(不阻塞主流程)
-
查询结果缓存
-
-
生产环境要点
-
健康检查与告警
-
慢查询监控
-
索引分片策略
-
容灾降级方案
-
8.2 常见陷阱与解决方案
|
问题 |
原因 |
解决方案 |
|---|---|---|
|
查询慢 |
全表扫描 |
使用 HNSW 索引,设置 |
|
内存溢出 |
批量太大 |
减小 |
|
向量不准 |
未归一化 |
确保向量 L2 归一化 |
|
分词不准 |
分词器问题 |
使用 |
|
数据不一致 |
双写失败 |
加入重试队列 + 定时对账 |
8.3 性能基准测试
测试环境:
-
ES 8.11.0,3 分片,1 副本
-
数据量:100 万条
-
向量维度:1024(BGE-M3)
-
硬件:8 核 16G,SSD
测试结果:
|
查询类型 |
P50 |
P95 |
P99 |
|---|---|---|---|
|
纯向量检索 |
15ms |
45ms |
80ms |
|
纯关键词 |
10ms |
30ms |
50ms |
|
混合检索 |
25ms |
60ms |
100ms |
|
批量索引(1000 条) |
- |
- |
500ms |
📚 参考资源
官方文档
相关工具
作者寄语:混合检索是企业级搜索的必然趋势。掌握 Elasticsearch 的向量检索能力,将让你的 .NET 应用具备更强的语义理解能力。欢迎在评论区交流讨论!
系列文章:
AtomGit 是由开放原子开源基金会联合 CSDN 等生态伙伴共同推出的新一代开源与人工智能协作平台。平台坚持“开放、中立、公益”的理念,把代码托管、模型共享、数据集托管、智能体开发体验和算力服务整合在一起,为开发者提供从开发、训练到部署的一站式体验。
更多推荐




所有评论(0)