Spring集成DeepSeek方法3:Ollama本地部署 + Spring AI Ollama
·
Spring集成DeepSeek方法3:Ollama本地部署 + Spring AI Ollama
概述
本文介绍如何使用Ollama在本地部署DeepSeek模型,并通过Spring AI的Ollama模块进行访问。这种方法实现了完全的本地化部署,数据无需上传到云端,适合对数据隐私有严格要求的场景。
前置条件
- Java 17+
- Spring Boot 3.2+
- macOS/Linux/Windows操作系统
- 足够的硬件资源(根据模型大小,建议至少16GB RAM和8GB+ GPU显存)
- Docker(可选,用于容器化部署)
第一部分:Ollama本地部署
1. 安装Ollama
macOS
# 使用Homebrew安装
brew install ollama
# 或下载安装包
# 访问 https://ollama.com/download 下载
Linux
# 使用官方安装脚本
curl -fsSL https://ollama.com/install.sh | sh
# 或使用包管理器(如Ubuntu)
curl -fsSL https://ollama.com/install.sh | sh
Windows
# 使用PowerShell安装
winget install Ollama.Ollama
# 或下载安装包
# 访问 https://ollama.com/download 下载
2. 验证安装
# 检查Ollama版本
ollama --version
# 启动Ollama服务
ollama serve
3. 下载DeepSeek模型
Ollama官方库中有DeepSeek模型,可以直接下载使用:
# 下载DeepSeek-Coder模型(7B参数)
ollama pull deepseek-coder
# 下载DeepSeek-Chat模型(7B参数)
ollama pull deepseek-chat
# 查看已下载的模型
ollama list
# 查看模型信息
ollama show deepseek-chat
4. 创建自定义DeepSeek模型
如果需要使用其他版本的DeepSeek模型,可以创建自定义模型:
创建Modelfile
创建一个名为 DeepSeekCoder.Modelfile 的文件:
FROM deepseek-coder:latest
# 设置模型参数
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER num_ctx 4096
# 设置系统提示
SYSTEM You are a helpful AI coding assistant.
构建自定义模型
# 构建模型
ollama create my-deepseek-coder -f DeepSeekCoder.Modelfile
# 运行自定义模型
ollama run my-deepseek-coder
5. 测试模型
# 交互式对话
ollama run deepseek-chat
# 单次查询
echo "你好,请介绍一下你自己" | ollama run deepseek-chat
# API测试
curl http://localhost:11434/api/generate -d '{
"model": "deepseek-chat",
"prompt": "你好"
}'
6. Ollama配置
配置文件位置
- macOS/Linux:
~/.ollama/config.json - Windows:
%USERPROFILE%\.ollama\config.json
示例配置
{
"host": "0.0.0.0",
"port": 11434,
"set_user_agent": "Ollama/0.1.0",
"tls": {
"cert": "",
"key": "",
"ca": ""
},
"debug": false,
"log_level": "INFO",
"models_dir": "",
"keep_alive": "5m"
}
7. Docker部署(可选)
使用Docker运行Ollama
# 拉取Ollama镜像
docker pull ollama/ollama
# 运行Ollama容器
docker run -d \
--gpus=all \
-p 11434:11434 \
-v ollama:/root/.ollama \
--name ollama \
ollama/ollama
# 进入容器下载模型
docker exec -it ollama ollama pull deepseek-chat
Docker Compose部署
创建 docker-compose.yml:
version: '3.8'
services:
ollama:
image: ollama/ollama
container_name: ollama
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
restart: unless-stopped
volumes:
ollama_data:
启动服务:
docker-compose up -d
# 下载模型
docker exec -it ollama ollama pull deepseek-chat
第二部分:Spring AI Ollama集成
项目依赖
Maven (pom.xml)
<dependencies>
<!-- Spring Boot Starter Web -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<!-- Spring AI Ollama Starter -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-ollama-spring-boot-starter</artifactId>
<version>1.0.0-M4</version>
</dependency>
<!-- Spring AI BOM -->
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-bom</artifactId>
<version>1.0.0-M4</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<!-- Lombok (可选) -->
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<optional>true</optional>
</dependency>
</dependencies>
Gradle (build.gradle)
plugins {
id 'java'
id 'org.springframework.boot' version '3.2.0'
}
dependencies {
implementation 'org.springframework.boot:spring-boot-starter-web'
implementation 'org.springframework.ai:spring-ai-ollama-spring-boot-starter:1.0.0-M4'
compileOnly 'org.projectlombok:lombok'
annotationProcessor 'org.projectlombok:lombok'
}
配置
application.yml
spring:
ai:
ollama:
# Ollama服务地址
base-url: http://localhost:11434
# 默认模型
chat:
options:
model: deepseek-chat
temperature: 0.7
num-predict: 4096
top-k: 40
top-p: 0.9
application.properties
# Ollama服务配置
spring.ai.ollama.base-url=http://localhost:11434
# 聊天模型配置
spring.ai.ollama.chat.options.model=deepseek-chat
spring.ai.ollama.chat.options.temperature=0.7
spring.ai.ollama.chat.options.num-predict=4096
spring.ai.ollama.chat.options.top-k=40
spring.ai.ollama.chat.options.top-p=0.9
基础使用
简单聊天示例
package com.example.deepseek.controller;
import org.springframework.ai.chat.ChatClient;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.*;
@RestController
@RequestMapping("/api/ollama")
public class OllamaChatController {
private final ChatClient chatClient;
@Autowired
public OllamaChatController(ChatClient chatClient) {
this.chatClient = chatClient;
}
@GetMapping("/chat")
public String chat(@RequestParam String message) {
return chatClient.call(message);
}
@PostMapping("/chat")
public String chatPost(@RequestBody String message) {
return chatClient.call(message);
}
}
使用ChatModel API
package com.example.deepseek.service;
import org.springframework.ai.chat.ChatModel;
import org.springframework.ai.chat.ChatResponse;
import org.springframework.ai.chat.messages.*;
import org.springframework.ai.chat.prompt.Prompt;
import org.springframework.ai.chat.prompt.SystemPromptTemplate;
import org.springframework.ai.ollama.OllamaChatOptions;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.core.io.Resource;
import org.springframework.stereotype.Service;
import java.util.List;
import java.util.Map;
@Service
public class OllamaChatService {
private final ChatModel chatModel;
@Value("classpath:prompts/ollama-system-prompt.st")
private Resource systemPromptResource;
@Autowired
public OllamaChatService(ChatModel chatModel) {
this.chatModel = chatModel;
}
/**
* 简单聊天
*/
public String chat(String message) {
return chatModel.call(message);
}
/**
* 带系统提示的聊天
*/
public String chatWithSystemPrompt(String userMessage, String systemPrompt) {
SystemPromptTemplate systemPromptTemplate = new SystemPromptTemplate(systemPrompt);
Message systemMessage = systemPromptTemplate.createMessage();
UserMessage userMessageObj = new UserMessage(userMessage);
Prompt prompt = new Prompt(List.of(systemMessage, userMessageObj));
return chatModel.call(prompt).getResult().getOutput().getContent();
}
/**
* 使用DeepSeek-Coder模型
*/
public String codeChat(String codeQuestion) {
OllamaChatOptions options = OllamaChatOptions.builder()
.withModel("deepseek-coder")
.withTemperature(0.2) // 代码生成使用较低温度
.build();
Prompt prompt = new Prompt(new UserMessage(codeQuestion), options);
return chatModel.call(prompt).getResult().getOutput().getContent();
}
/**
* 获取完整响应
*/
public ChatResponse chatWithMetadata(String message) {
Prompt prompt = new Prompt(new UserMessage(message));
return chatModel.call(prompt);
}
/**
* 多轮对话
*/
public String multiTurnChat(List<Message> messages) {
Prompt prompt = new Prompt(messages);
return chatModel.call(prompt).getResult().getOutput().getContent();
}
}
流式响应
package com.example.deepseek.controller;
import org.springframework.ai.chat.ChatModel;
import org.springframework.ai.chat.messages.UserMessage;
import org.springframework.ai.chat.prompt.Prompt;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.http.MediaType;
import org.springframework.web.bind.annotation.*;
import org.springframework.web.servlet.mvc.method.annotation.SseEmitter;
import java.io.IOException;
@RestController
@RequestMapping("/api/ollama/stream")
public class OllamaStreamingController {
private final ChatModel chatModel;
@Autowired
public OllamaStreamingController(ChatModel chatModel) {
this.chatModel = chatModel;
}
/**
* 流式聊天
*/
@GetMapping(value = "/chat", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public SseEmitter streamChat(@RequestParam String message) {
SseEmitter emitter = new SseEmitter(60000L);
Prompt prompt = new Prompt(new UserMessage(message));
chatModel.stream(prompt)
.subscribe(
chunk -> {
try {
String content = chunk.getResult().getOutput().getContent();
if (content != null && !content.isEmpty()) {
emitter.send(SseEmitter.event().data(content));
}
} catch (IOException e) {
emitter.completeWithError(e);
}
},
error -> emitter.completeWithError(error),
() -> emitter.complete()
);
return emitter;
}
}
模型管理
package com.example.deepseek.service;
import org.springframework.ai.ollama.OllamaChatModel;
import org.springframework.ai.ollama.api.OllamaApi;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;
import java.util.List;
import java.util.Map;
@Service
public class OllamaModelService {
private final OllamaApi ollamaApi;
private final OllamaChatModel chatModel;
@Autowired
public OllamaModelService(OllamaApi ollamaApi, OllamaChatModel chatModel) {
this.ollamaApi = ollamaApi;
this.chatModel = chatModel;
}
/**
* 列出所有已安装的模型
*/
public List<Map<String, Object>> listModels() {
return ollamaApi.listModels().getModels();
}
/**
* 获取模型信息
*/
public Map<String, Object> getModelInfo(String modelName) {
return ollamaApi.showModelInformation(modelName);
}
/**
* 拉取新模型
*/
public void pullModel(String modelName) {
ollamaApi.pullModel(modelName);
}
/**
* 删除模型
*/
public void deleteModel(String modelName) {
ollamaApi.deleteModel(modelName);
}
/**
* 检查模型是否存在
*/
public boolean modelExists(String modelName) {
return listModels().stream()
.anyMatch(model -> modelName.equals(model.get("name")));
}
}
自定义选项
package com.example.deepseek.service;
import org.springframework.ai.chat.ChatModel;
import org.springframework.ai.chat.messages.UserMessage;
import org.springframework.ai.chat.prompt.Prompt;
import org.springframework.ai.ollama.OllamaChatOptions;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;
@Service
public class OllamaOptionsService {
private final ChatModel chatModel;
@Autowired
public OllamaOptionsService(ChatModel chatModel) {
this.chatModel = chatModel;
}
/**
* 自定义温度和Top-P
*/
public String chatWithTemperature(String message, Double temperature, Double topP) {
OllamaChatOptions options = OllamaChatOptions.builder()
.withModel("deepseek-chat")
.withTemperature(temperature)
.withTopP(topP)
.build();
Prompt prompt = new Prompt(new UserMessage(message), options);
return chatModel.call(prompt).getResult().getOutput().getContent();
}
/**
* 自定义上下文窗口大小
*/
public String chatWithContext(String message, Integer numCtx) {
OllamaChatOptions options = OllamaChatOptions.builder()
.withModel("deepseek-chat")
.withNumCtx(numCtx)
.build();
Prompt prompt = new Prompt(new UserMessage(message), options);
return chatModel.call(prompt).getResult().getOutput().getContent();
}
/**
* 自定义Top-K
*/
public String chatWithTopK(String message, Integer topK) {
OllamaChatOptions options = OllamaChatOptions.builder()
.withModel("deepseek-chat")
.withTopK(topK)
.build();
Prompt prompt = new Prompt(new UserMessage(message), options);
return chatModel.call(prompt).getResult().getOutput().getContent();
}
/**
* 自定义预测token数量
*/
public String chatWithNumPredict(String message, Integer numPredict) {
OllamaChatOptions options = OllamaChatOptions.builder()
.withModel("deepseek-chat")
.withNumPredict(numPredict)
.build();
Prompt prompt = new Prompt(new UserMessage(message), options);
return chatModel.call(prompt).getResult().getOutput().getContent();
}
}
对话记忆管理
package com.example.deepseek.service;
import org.springframework.ai.chat.ChatModel;
import org.springframework.ai.chat.messages.*;
import org.springframework.ai.chat.prompt.Prompt;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;
import java.util.ArrayList;
import java.util.List;
import java.util.UUID;
import java.util.concurrent.ConcurrentHashMap;
@Service
public class OllamaMemoryService {
private final ChatModel chatModel;
private final Map<String, List<Message>> conversationHistory = new ConcurrentHashMap<>();
@Autowired
public OllamaMemoryService(ChatModel chatModel) {
this.chatModel = chatModel;
}
/**
* 创建新对话
*/
public String createConversation() {
String conversationId = UUID.randomUUID().toString();
conversationHistory.put(conversationId, new ArrayList<>());
return conversationId;
}
/**
* 发送消息并保持上下文
*/
public String chat(String conversationId, String userMessage) {
List<Message> messages = conversationHistory.get(conversationId);
if (messages == null) {
throw new IllegalArgumentException("Conversation not found: " + conversationId);
}
// 添加用户消息
messages.add(new UserMessage(userMessage));
// 调用模型
Prompt prompt = new Prompt(messages);
String response = chatModel.call(prompt).getResult().getOutput().getContent();
// 添加助手回复
messages.add(new AssistantMessage(response));
return response;
}
/**
* 限制对话历史长度
*/
public String chatWithHistoryLimit(String conversationId, String userMessage, int maxHistory) {
List<Message> messages = conversationHistory.get(conversationId);
if (messages == null) {
throw new IllegalArgumentException("Conversation not found: " + conversationId);
}
// 添加用户消息
messages.add(new UserMessage(userMessage));
// 限制历史长度
List<Message> limitedMessages = messages.size() > maxHistory
? messages.subList(messages.size() - maxHistory, messages.size())
: messages;
// 调用模型
Prompt prompt = new Prompt(limitedMessages);
String response = chatModel.call(prompt).getResult().getOutput().getContent();
// 添加助手回复
messages.add(new AssistantMessage(response));
return response;
}
/**
* 清空对话历史
*/
public void clearConversation(String conversationId) {
conversationHistory.remove(conversationId);
}
}
完整控制器
package com.example.deepseek.controller;
import com.example.deepseek.service.*;
import org.springframework.ai.chat.ChatResponse;
import org.springframework.ai.chat.messages.Message;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.http.MediaType;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.*;
import org.springframework.web.servlet.mvc.method.annotation.SseEmitter;
import java.util.List;
import java.util.Map;
@RestController
@RequestMapping("/api/ollama")
public class OllamaController {
private final OllamaChatService chatService;
private final OllamaStreamingService streamingService;
private final OllamaModelService modelService;
private final OllamaOptionsService optionsService;
private final OllamaMemoryService memoryService;
@Autowired
public OllamaController(OllamaChatService chatService,
OllamaStreamingService streamingService,
OllamaModelService modelService,
OllamaOptionsService optionsService,
OllamaMemoryService memoryService) {
this.chatService = chatService;
this.streamingService = streamingService;
this.modelService = modelService;
this.optionsService = optionsService;
this.memoryService = memoryService;
}
// 基础聊天
@PostMapping("/chat")
public String chat(@RequestBody String message) {
return chatService.chat(message);
}
// 代码助手
@PostMapping("/code")
public String codeChat(@RequestBody String question) {
return chatService.codeChat(question);
}
// 带系统提示的聊天
@PostMapping("/chat/system")
public String chatWithSystem(@RequestBody Map<String, String> request) {
return chatService.chatWithSystemPrompt(
request.get("message"),
request.get("systemPrompt")
);
}
// 获取完整响应
@PostMapping("/chat/full")
public ChatResponse chatFull(@RequestBody String message) {
return chatService.chatWithMetadata(message);
}
// 流式聊天
@GetMapping(value = "/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public SseEmitter stream(@RequestParam String message) {
return streamingService.streamChat(message);
}
// 模型管理
@GetMapping("/models")
public List<Map<String, Object>> listModels() {
return modelService.listModels();
}
@GetMapping("/models/{name}")
public Map<String, Object> getModelInfo(@PathVariable String name) {
return modelService.getModelInfo(name);
}
@PostMapping("/models/pull")
public ResponseEntity<String> pullModel(@RequestParam String modelName) {
modelService.pullModel(modelName);
return ResponseEntity.ok("Model " + modelName + " pulled successfully");
}
@DeleteMapping("/models/{name}")
public ResponseEntity<String> deleteModel(@PathVariable String name) {
modelService.deleteModel(name);
return ResponseEntity.ok("Model " + name + " deleted successfully");
}
// 自定义选项
@PostMapping("/chat/options")
public String chatWithOptions(@RequestBody Map<String, Object> request) {
String message = (String) request.get("message");
Double temperature = request.containsKey("temperature")
? ((Number) request.get("temperature")).doubleValue()
: null;
Double topP = request.containsKey("topP")
? ((Number) request.get("topP")).doubleValue()
: null;
return optionsService.chatWithTemperature(message, temperature, topP);
}
// 对话管理
@PostMapping("/conversations")
public ResponseEntity<String> createConversation() {
String conversationId = memoryService.createConversation();
return ResponseEntity.ok(conversationId);
}
@PostMapping("/conversations/{id}/messages")
public String chatInConversation(@PathVariable String id, @RequestBody String message) {
return memoryService.chat(id, message);
}
@DeleteMapping("/conversations/{id}")
public ResponseEntity<String> clearConversation(@PathVariable String id) {
memoryService.clearConversation(id);
return ResponseEntity.ok("Conversation cleared");
}
}
健康检查
package com.example.deepseek.health;
import org.springframework.ai.ollama.api.OllamaApi;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.actuate.health.Health;
import org.springframework.boot.actuate.health.HealthIndicator;
import org.springframework.stereotype.Component;
@Component
public class OllamaHealthIndicator implements HealthIndicator {
private final OllamaApi ollamaApi;
@Autowired
public OllamaHealthIndicator(OllamaApi ollamaApi) {
this.ollamaApi = ollamaApi;
}
@Override
public Health health() {
try {
// 尝试列出模型来检查Ollama服务是否可用
ollamaApi.listModels();
return Health.up()
.withDetail("service", "Ollama")
.withDetail("status", "Connected")
.build();
} catch (Exception e) {
return Health.down()
.withDetail("service", "Ollama")
.withDetail("error", e.getMessage())
.build();
}
}
}
测试示例
package com.example.deepseek;
import com.example.deepseek.service.OllamaChatService;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import static org.junit.jupiter.api.Assertions.*;
@SpringBootTest
class OllamaServiceTest {
@Autowired
private OllamaChatService chatService;
@Test
void testSimpleChat() {
String response = chatService.chat("你好");
assertNotNull(response);
assertFalse(response.isEmpty());
}
@Test
void testCodeChat() {
String response = chatService.codeChat("请写一个Java的Hello World程序");
assertNotNull(response);
assertTrue(response.contains("public") || response.contains("class"));
}
}
性能优化建议
1. GPU加速
确保Ollama能够使用GPU:
# 检查GPU可用性
nvidia-smi
# 确保Ollama使用GPU
# Linux: 确保nvidia-container-toolkit已安装
# macOS: 支持Metal加速(M系列芯片)
2. 模型量化
使用量化模型减少内存占用:
# 下载量化版本(如果可用)
ollama pull deepseek-chat:7b-q4_0
3. 批量处理
对于批量请求,考虑使用异步处理:
package com.example.deepseek.service;
import org.springframework.ai.chat.ChatModel;
import org.springframework.ai.chat.messages.UserMessage;
import org.springframework.ai.chat.prompt.Prompt;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;
import java.util.List;
import java.util.concurrent.CompletableFuture;
import java.util.stream.Collectors;
@Service
public class OllamaBatchService {
private final ChatModel chatModel;
@Autowired
public OllamaBatchService(ChatModel chatModel) {
this.chatModel = chatModel;
}
/**
* 批量处理消息
*/
public List<String> batchChat(List<String> messages) {
List<CompletableFuture<String>> futures = messages.stream()
.map(message -> CompletableFuture.supplyAsync(() -> {
Prompt prompt = new Prompt(new UserMessage(message));
return chatModel.call(prompt).getResult().getOutput().getContent();
}))
.collect(Collectors.toList());
return futures.stream()
.map(CompletableFuture::join)
.collect(Collectors.toList());
}
}
优点与缺点
优点
- 数据隐私:所有数据在本地处理,不上传云端
- 无网络依赖:离线环境下也能使用
- 成本控制:无需支付API调用费用
- 完全控制:可以自定义模型参数和行为
- 低延迟:本地部署响应更快(取决于硬件)
缺点
- 硬件要求高:需要强大的CPU/GPU和内存
- 模型更新:需要手动更新模型
- 扩展性有限:难以横向扩展
- 维护成本:需要维护Ollama服务和模型
- 功能限制:某些高级功能可能不如云端API完善
适用场景
- 对数据隐私有极高要求的场景(如医疗、金融)
- 离线环境或内网环境
- 需要深度定制模型行为的场景
- 预算有限但硬件充足的场景
- 需要低延迟响应的实时应用
硬件建议
| 模型大小 | RAM推荐 | GPU显存推荐 | 用途 |
|---|---|---|---|
| 7B (量化) | 8GB | 6GB | 基础聊天 |
| 7B (完整) | 16GB | 8GB | 代码生成 |
| 14B | 32GB | 16GB | 复杂任务 |
| 33B | 64GB | 24GB | 企业级应用 |
AtomGit 是由开放原子开源基金会联合 CSDN 等生态伙伴共同推出的新一代开源与人工智能协作平台。平台坚持“开放、中立、公益”的理念,把代码托管、模型共享、数据集托管、智能体开发体验和算力服务整合在一起,为开发者提供从开发、训练到部署的一站式体验。
更多推荐


所有评论(0)