Streamlit+Qwen3.5构造多模态对话助手

大靠山

335人浏览 · 2026-03-20 22:12:59

大靠山 · 2026-03-20 22:12:59 发布

🌈

本文介绍了一种基于本地的qwen3.5-9b、Z-Image-Turbo模型以及搜索功能的多模态对话助手实现方法。

今天分享的是我基于Streamlit与本地的qwen3.5-9b、Z-Image-Turbo模型以及搜索功能构建的多模态对话助手。该助手具有对话交流，自动意图识别，自动生成图片，对图片内容进行理解，自动联网搜索，上下文记忆、多轮对话功能。为了解决本地显存不足的问题，采用了多卡模式，不足时自动清理显存。

实现效果

实现代码

import streamlit as stimport torchfrom transformers import (    AutoModelForCausalLM,    AutoTokenizer,    AutoModelForImageTextToText,    AutoProcessor,    TextIteratorStreamer)from diffusers import DiffusionPipelinefrom threading import Threadfrom PIL import Imagefrom datetime import datetimeimport ioimport gcimport reimport osimport jsonimport randomtry:    from FreeKnowledge_AI import knowledge_center    SEARCH_AVAILABLE = Trueexcept ImportError:    SEARCH_AVAILABLE = False# -------------------- 页面配置 --------------------st.set_page_config(    page_title="多模态对话助手（多卡优化版）",    page_icon="🤖",    layout="wide",    initial_sidebar_state="expanded")# -------------------- 自定义样式 --------------------st.markdown("""<style>    .stChatMessage {        padding: 0.8rem 1rem;    }    .generated-image-container {        max-width: 420px;        margin: 0.5rem 0;        border-radius: 12px;        overflow: hidden;        box-shadow: 0 2px 8px rgba(0,0,0,0.1);    }    .memory-status {        font-size: 0.8rem;        color: #888;        text-align: center;        padding: 4px 0;    }    div[data-testid="stImage"] img {        border-radius: 8px;    }    .intent-badge {        display: inline-block;        padding: 2px 10px;        border-radius: 12px;        font-size: 0.75rem;        margin-bottom: 6px;    }    .search-result-box {        background: rgba(100, 149, 237, 0.08);        border-left: 3px solid #6495ed;        padding: 8px 12px;        margin: 6px 0;        border-radius: 0 8px 8px 0;        font-size: 0.9rem;    }    .poem-title {        font-style: italic;        color: #5a4a3a;        margin-bottom: 4px;        font-size: 0.95rem;    }    .keyword-tag {        display: inline-block;        background: rgba(100, 149, 237, 0.15);        color: #3366cc;        padding: 2px 8px;        border-radius: 10px;        font-size: 0.8rem;        margin: 2px 3px;    }</style>""", unsafe_allow_html=True)# -------------------- 宽高比配置 --------------------ASPECT_RATIO_OPTIONS = {    "9:16 (576×1024)": (576, 1024),    "1:1 (1024×1024)": (1024, 1024),    "3:4 (768×1024)": (768, 1024),    "4:3 (1024×768)": (1024, 768),    "16:9 (1024×576)": (1024, 576),    "2:3 (682×1024)": (682, 1024),    "3:2 (1024×682)": (1024, 682),    "1:2 (512×1024)": (512, 1024),    "2:1 (1024×512)": (1024, 512),}# -------------------- 对话历史限制常量 --------------------MAX_TEXT_HISTORY_TURNS = 20MAX_VL_HISTORY_TURNS = 6MAX_VL_IMAGES = 4MAX_SEARCH_HISTORY_TURNS = 10# -------------------- 初始化会话状态 --------------------if"messages"notin st.session_state:    st.session_state.messages = []if"uploaded_image"notin st.session_state:    st.session_state.uploaded_image = Noneif"loaded_model"notin st.session_state:    st.session_state.loaded_model = {        "name": None,        "model": None,        "tokenizer/processor": None,        "device": None    }if"memory_file"notin st.session_state:    st.session_state.memory_file = "memory.md"# -------------------- 模型路径配置（默认值） --------------------DEFAULT_TEXT_PATH = r"E:\Qwen\Qwen3.5-9B\models"DEFAULT_VL_PATH = r"E:\Qwen\Qwen3.5-9B\models"DEFAULT_GEN_PATH = r"E:\Qwen\Z-Image-Turbo"if"text_model_path"notin st.session_state:    st.session_state.text_model_path = DEFAULT_TEXT_PATHif"vl_model_path"notin st.session_state:    st.session_state.vl_model_path = DEFAULT_VL_PATHif"gen_model_path"notin st.session_state:    st.session_state.gen_model_path = DEFAULT_GEN_PATHif"device_text"notin st.session_state:    st.session_state.device_text = "auto"if"device_vl"notin st.session_state:    st.session_state.device_vl = "auto"if"device_gen"notin st.session_state:    st.session_state.device_gen = "auto"if"aspect_ratio"notin st.session_state:    st.session_state.aspect_ratio = "1:1 (1024×1024)"# -------------------- 联网搜索配置（默认值） --------------------if"search_mode"notin st.session_state:    st.session_state.search_mode = "BAIDU"if"search_api_key"notin st.session_state:    st.session_state.search_api_key = "sk-****这里替换成在硅基流动创建的key****"if"search_api_base"notin st.session_state:    st.session_state.search_api_base = "https://api.siliconflow.cn/v1/chat/completions"if"search_api_model"notin st.session_state:    st.session_state.search_api_model = "internlm/internlm2_5-7b-chat"if"search_max_results"notin st.session_state:    st.session_state.search_max_results = 5if"search_enabled"notin st.session_state:    st.session_state.search_enabled = True# -------------------- 记忆系统 --------------------MEMORY_FILE = "memory.md"definit_memory_file():    ifnot os.path.exists(MEMORY_FILE):        withopen(MEMORY_FILE, "w", encoding="utf-8") as f:            f.write("# 对话记忆\n\n")            f.write(f"> 创建时间：{datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n\n")            f.write("---\n\n")defsave_to_memory(role, content_text, intent=None, has_image=False):    init_memory_file()    timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")    withopen(MEMORY_FILE, "a", encoding="utf-8") as f:        role_label = "👤 用户"if role == "user"else"🤖 助手"        f.write(f"### {role_label} [{timestamp}]\n\n")        if intent:            intent_map = {                "text": "💬 文本对话",                "generate": "🎨 图像生成",                "understand": "👁️ 图像理解",                "search": "🔍 联网搜索"            }            f.write(f"**意图识别**: {intent_map.get(intent, intent)}\n\n")        if has_image:            f.write("📎 *附带图片*\n\n")        if content_text:            clean_text = content_text.replace("\n", "\n> ")            f.write(f"> {clean_text}\n\n")        f.write("---\n\n")defload_memory_context(max_entries=20):    ifnot os.path.exists(MEMORY_FILE):        return""    withopen(MEMORY_FILE, "r", encoding="utf-8") as f:        content = f.read()    sections = content.split("---")    recent = sections[-max_entries - 1:-1] iflen(sections) > max_entries + 1else sections[1:-1]    ifnot recent:        return""    summary_parts = []    for section in recent:        lines = [line.strip() for line in section.strip().split("\n") if line.strip()]        for line in lines:            if line.startswith(">"):                summary_parts.append(line.lstrip("> ").strip())    if summary_parts:        return"以下是之前对话的摘要记忆：\n" + "\n".join(summary_parts[-10:])    return""defget_memory_stats():    ifnot os.path.exists(MEMORY_FILE):        return {"exists": False, "entries": 0, "size_kb": 0}    withopen(MEMORY_FILE, "r", encoding="utf-8") as f:        content = f.read()    entries = content.count("### ")    size_kb = os.path.getsize(MEMORY_FILE) / 1024    return {"exists": True, "entries": entries, "size_kb": round(size_kb, 1)}# -------------------- 辅助函数：显存清理 --------------------defclear_gpu_memory():    gc.collect()    if torch.cuda.is_available():        torch.cuda.empty_cache()        torch.cuda.synchronize()defunload_current_model():    if st.session_state.loaded_model["model"] isnotNone:        del st.session_state.loaded_model["model"]        del st.session_state.loaded_model["tokenizer/processor"]        st.session_state.loaded_model["model"] = None        st.session_state.loaded_model["tokenizer/processor"] = None        st.session_state.loaded_model["name"] = None        st.session_state.loaded_model["device"] = None        clear_gpu_memory()defget_target_device(device_setting):    if device_setting == "auto":        return"cuda:0"if torch.cuda.is_available() else"cpu"    else:        return device_setting if torch.cuda.is_available() else"cpu"# -------------------- 对话历史截断工具函数 --------------------deftruncate_messages_for_text(messages, max_turns=MAX_TEXT_HISTORY_TURNS):    iflen(messages) <= max_turns:        return messages    return messages[-max_turns:]deftruncate_messages_for_vl(messages, max_turns=MAX_VL_HISTORY_TURNS, max_images=MAX_VL_IMAGES):    iflen(messages) <= max_turns:        recent = messages    else:        recent = messages[-max_turns:]    image_count = 0    filtered = []    for msg inreversed(recent):        if msg["type"] in ("multimodal", "image"):            content = msg["content"]            imgs = content.get("images", [])            if image_count + len(imgs) > max_images:                text_only_content = {}                if"text"in content and content["text"]:                    text_only_content["text"] = content["text"]                if text_only_content:                    filtered.append({                        "role": msg["role"],                        "type": "text",                        "content": text_only_content.get("text", "")                    })            else:                image_count += len(imgs)                filtered.append(msg)        else:            filtered.append(msg)    filtered.reverse()    return filtereddeftruncate_messages_for_search(messages, max_turns=MAX_SEARCH_HISTORY_TURNS):    iflen(messages) <= max_turns:        return messages    return messages[-max_turns:]# -------------------- 模型加载函数 --------------------defload_text_model(model_path, device):    tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)    model = AutoModelForCausalLM.from_pretrained(        model_path,        torch_dtype=torch.float16,        device_map=device,        trust_remote_code=True    )    model.eval()    return tokenizer, modeldefload_vl_model(model_path, device):    processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)    model = AutoModelForImageTextToText.from_pretrained(        model_path,        torch_dtype=torch.float16,        device_map=device,        trust_remote_code=True    )    model.eval()    return processor, modeldefload_gen_model(model_path, device):    pipe = DiffusionPipeline.from_pretrained(        model_path,        torch_dtype=torch.bfloat16,        low_cpu_mem_usage=True,    )    pipe.to(device)    return pipe# -------------------- 模型切换函数 --------------------defensure_model_loaded(model_type):    if st.session_state.loaded_model["name"] == model_type:        return    unload_current_model()    device_setting = None    model_path = None    if model_type == "text":        device_setting = st.session_state.device_text        model_path = st.session_state.text_model_path    elif model_type == "vl":        device_setting = st.session_state.device_vl        model_path = st.session_state.vl_model_path    elif model_type == "gen":        device_setting = st.session_state.device_gen        model_path = st.session_state.gen_model_path    else:        raise ValueError(f"Unknown model type: {model_type}")    device = get_target_device(device_setting)    with st.spinner(f"🔄 正在加载 {model_type} 模型到 {device}..."):        if model_type == "text":            tokenizer, model = load_text_model(model_path, device)            st.session_state.loaded_model["tokenizer/processor"] = tokenizer            st.session_state.loaded_model["model"] = model        elif model_type == "vl":            processor, model = load_vl_model(model_path, device)            st.session_state.loaded_model["tokenizer/processor"] = processor            st.session_state.loaded_model["model"] = model        elif model_type == "gen":            model = load_gen_model(model_path, device)            st.session_state.loaded_model["model"] = model            st.session_state.loaded_model["tokenizer/processor"] = None    st.session_state.loaded_model["name"] = model_type    st.session_state.loaded_model["device"] = device# -------------------- 联网搜索功能 --------------------defperform_web_search(query_string):    ifnot SEARCH_AVAILABLE:        returnNone    try:        center = knowledge_center.Center()        results = center.get_response(            query_string,            True,            st.session_state.search_mode,            model=st.session_state.search_api_model,            base_url=st.session_state.search_api_base,            key=st.session_state.search_api_key,            max_web_results=st.session_state.search_max_results        )        if results andisinstance(results, str) and results.strip():            return results.strip()        elif results andisinstance(results, dict):            return json.dumps(results, ensure_ascii=False, indent=2)        elif results andisinstance(results, list):            return"\n\n".join([str(item) for item in results])        else:            returnstr(results) if results elseNone    except Exception as e:        returnf"[搜索异常: {str(e)}]"defextract_search_info(raw_results):    ifnot raw_results:        return""    text = str(raw_results)    text = re.sub(r'<[^>]+>', '', text)    text = re.sub(r'\n{3,}', '\n\n', text)    text = text.strip()    max_chars = 4000    iflen(text) > max_chars:        text = text[:max_chars] + "\n\n...（搜索结果已截断）"    return text# -------------------- 搜索关键词提取（备用，不依赖LLM） --------------------deffallback_extract_keywords(user_text):    stop_words = {        "帮我", "帮忙", "请", "请问", "你好", "可以", "能不能", "能否",        "我想", "我要", "我想要", "麻烦", "一下", "吗", "呢", "吧",        "的", "了", "在", "是", "有", "和", "与", "或", "也", "都",        "把", "被", "让", "给", "对", "从", "到", "以", "为",        "什么", "怎么", "怎样", "如何", "哪里", "哪个", "谁",        "这个", "那个", "这些", "那些", "它", "他", "她",        "搜索", "搜一下", "查一下", "查询", "检索", "帮我查", "帮我搜",        "上网", "联网", "查找", "找一下", "看看",        "告诉我", "说说", "介绍", "整理", "总结", "列出",    }    text = user_text.strip()    text = re.sub(r'[，。！？、；：""''（）【】《》\s]+', ' ', text)    words = text.split()    keywords = []    for word in words:        word = word.strip()        ifnot word:            continue        if word in stop_words:            continue        iflen(word) == 1andnot word.isdigit():            continue        keywords.append(word)    ifnot keywords:        keywords = [w.strip() for w in user_text.split() if w.strip()]    ifnot keywords:        keywords = [user_text.strip()]    return keywords# -------------------- 智能意图识别（调用语言模型） --------------------INTENT_SYSTEM_PROMPT = """你是一个意图分类器。用户会给你一段话，你需要判断用户的意图属于以下四种之一：1. "generate" — 用户希望生成、绘制、创作图片/图像。包括但不限于：   - 明确要求画画、生成图片（如"画一只猫"、"生成一张风景图"）   - 要求为某内容配图、插图（如"给这首诗配几幅图"、"帮这段话配一张图"）   - 要求创作视觉内容（如"设计一个海报"、"做一张壁纸"）   - 任何需要产出图像作为结果的请求2. "understand" — 用户上传了图片并希望理解、分析、描述图片内容。或者用户在对话中引用之前的图片要求进一步解读。3. "search" — 用户的问题需要联网搜索才能准确回答。包括但不限于：   - 询问最新新闻、时事、实时信息   - 需要查询实时数据（天气、股价、赛事比分、汇率）   - 询问你不确定或可能过时的事实性问题   - 用户明确要求搜索或查询   - 涉及具体日期、地点的近期事件4. "text" — 纯文本对话，包括问答、写作、翻译、代码、闲聊等。当问题可以用已有知识准确回答时选此项。请只回复一个JSON对象，格式如下，不要有任何其他内容：当需要生成图片时：{"intent": "generate", "prompts": ["英文提示词1", "英文提示词2", ...], "titles": ["图片标题1", "图片标题2", ...]}当需要联网搜索时：{"intent": "search", "keywords": ["关键词1", "关键词2", "关键词3", ...]}当为纯文本对话时：{"intent": "text"}当为图像理解时：{"intent": "understand"}关键规则（图片生成）：- "prompts" 数组中每个元素是适合AI绘画模型的英文提示词，描述具体场景、物体、光线、色彩、风格。- "titles" 数组中每个元素是对应图片的中文标题，与 prompts 一一对应。【titles 标题规则——非常重要】：- 如果用户要求为古诗、诗词、歌词、名句等配图，titles 必须使用原文诗句/歌词/名句本身作为标题，每张图对应一句原文。例如用户说"给静夜思配图"，titles 应为 ["床前明月光", "疑是地上霜", "举头望明月", "低头思故乡"]，而不是"静夜思意境图一"之类的描述。- 如果用户要求为某段文字、故事、场景配图，titles 应摘取或概括该文字中最核心的短句作为标题。- 如果用户只是要求画某样东西（如"画一只猫"），titles 使用简洁描述即可（如"慵懒的猫咪"）。- titles 应简短凝练，每个不超过15字。- 如果用户要求多张图，prompts 数组里应包含多个不同提示词，每个描述不同场景。最多6张，"几幅"理解为3-4幅。关键规则（联网搜索）：- "keywords" 是一个数组，包含从用户问题中提取的核心搜索关键词。- 关键词提取原则：  1. 去除所有语气词、助词、连接词（如"帮我"、"请"、"一下"、"的"、"吗"等）  2. 去除动作指令词（如"搜索"、"查一下"、"告诉我"、"整理"、"总结"等）  3. 保留实义名词、时间词、地点词、人名、专有名词、数字等核心信息词  4. 如果涉及时间，自动补充具体年份（如"今年" → "2026年"，"最近" → "2026"）  5. 每个关键词应是一个独立的信息单元，不要过长  6. 一般提取3-6个关键词- 示例：  - "帮我查一下今年植树节在哪里举办的，整理一下对应新闻稿" → ["2026年", "植树节", "举办地点", "新闻"]  - "最近英伟达的股票表现怎么样" → ["英伟达", "股票", "走势", "2026"]  - "苹果公司最新发布了什么产品" → ["苹果公司", "最新", "发布", "产品", "2026"]- 只输出JSON，不要有任何解释文字。"""defdetect_intent_with_llm(user_text, has_image, conversation_history):    if has_image:        return"understand", [], [], []    ensure_model_loaded("text")    tokenizer = st.session_state.loaded_model["tokenizer/processor"]    model = st.session_state.loaded_model["model"]    device = st.session_state.loaded_model["device"]    recent_context = ""    recent_msgs = conversation_history[-6:] iflen(conversation_history) > 6else conversation_history    for msg in recent_msgs:        role_label = "用户"if msg["role"] == "user"else"助手"        if msg["type"] == "text":            recent_context += f"{role_label}: {msg['content']}\n"        elif msg["type"] == "multimodal":            content = msg["content"]            text_part = content.get("text", "")            has_img = "images"in content andlen(content["images"]) > 0            if has_img:                recent_context += f"{role_label}: [附带图片] {text_part}\n"            else:                recent_context += f"{role_label}: {text_part}\n"        elif msg["type"] == "image":            content = msg["content"]            text_part = content.get("text", "")            recent_context += f"{role_label}: [生成了图片] {text_part}\n"        elif msg["type"] == "search":            content = msg["content"]            text_part = content.get("text", "")            recent_context += f"{role_label}: [搜索回答] {text_part[:100]}\n"    current_time = datetime.now().strftime("%Y年%m月%d日 %H:%M")    classify_messages = [        {"role": "system", "content": INTENT_SYSTEM_PROMPT},        {"role": "user", "content": f"当前时间：{current_time}\n\n对话上下文：\n{recent_context}\n\n当前用户输入：{user_text}\n\n请判断意图并输出JSON："}    ]    text = tokenizer.apply_chat_template(        classify_messages,        tokenize=False,        add_generation_prompt=True,        enable_thinking=False    )    inputs = tokenizer(text, return_tensors="pt").to(device)    with torch.no_grad():        outputs = model.generate(            **inputs,            max_new_tokens=600,            temperature=0.1,            top_p=0.9,            do_sample=True,        )    input_len = inputs["input_ids"].shape[1]    generated_ids = outputs[0][input_len:]    response_text = tokenizer.decode(generated_ids, skip_special_tokens=True).strip()    del inputs, outputs, generated_ids    clear_gpu_memory()    intent, prompts_list, titles_list, search_keywords = parse_intent_response(response_text)    if intent == "search"and (not st.session_state.search_enabled ornot SEARCH_AVAILABLE):        intent = "text"        search_keywords = []    return intent, prompts_list, titles_list, search_keywordsdefparse_intent_response(response_text):    json_match = re.search(r'\{.*\}', response_text, re.DOTALL)    if json_match:        try:            result = json.loads(json_match.group())            intent = result.get("intent", "text")            if intent notin ("generate", "understand", "text", "search"):                intent = "text"            if intent == "generate":                prompts = result.get("prompts", [])                titles = result.get("titles", [])                ifisinstance(prompts, str):                    prompts = [prompts]                ifisinstance(titles, str):                    titles = [titles]                old_prompt = result.get("prompt", None)                ifnot prompts and old_prompt:                    prompts = [old_prompt]                prompts = [p for p in prompts ifisinstance(p, str) and p.strip()]                iflen(prompts) > 6:                    prompts = prompts[:6]                    titles = titles[:6]                whilelen(titles) < len(prompts):                    titles.append(f"图片 {len(titles) + 1}")                return intent, prompts, titles, []            if intent == "search":                keywords = result.get("keywords", [])                ifisinstance(keywords, str):                    keywords = [k.strip() for k in keywords.split() if k.strip()]                old_query = result.get("query", None)                ifnot keywords and old_query:                    keywords = [k.strip() for k in old_query.split() if k.strip()]                keywords = [k for k in keywords ifisinstance(k, str) and k.strip()]                return intent, [], [], keywords            return intent, [], [], []        except json.JSONDecodeError:            pass    text_lower = response_text.lower()    if"generate"in text_lower:        return"generate", [], [], []    elif"understand"in text_lower:        return"understand", [], [], []    elif"search"in text_lower:        return"search", [], [], []    else:        return"text", [], [], []deffallback_detect_intent(user_text):    text_lower = user_text.lower().strip()    generate_patterns = [        r"(帮我|请|给我|我想|我要|能不能|可以).{0,15}(画|绘制|生成|创建|制作|配).{0,15}(图|图片|图像|照片|壁纸|头像|海报|插画|漫画)",        r"(画|绘制|生成|创建|制作)一?(张|幅|个|副|组)?.{0,20}(图|图片|图像|照片|壁纸|头像|海报|插画|漫画)",        r"(draw|paint|generate|create|make)\s+(a|an|the|me)?\s*(image|picture|photo|illustration|poster|avatar|wallpaper)",        r"(配|搭配|添加|加上).{0,10}(图|图片|插图|插画|配图)",        r"(想要|需要|来).{0,10}(图|图片|图像)",    ]    for pattern in generate_patterns:        if re.search(pattern, text_lower):            return"generate", [], [], []    search_patterns = [        r"(搜索|搜一下|查一下|查询|检索|帮我查|帮我搜|上网|联网)",        r"(最新|最近|今天|昨天|今年|本月|这周|刚刚|实时|当前).{0,10}(新闻|消息|动态|情况|进展|数据|信息|价格|天气|比分)",        r"(现在|目前|当下).{0,10}(是谁|多少|怎样|如何|什么)",        r"(news|latest|today|current|recent|search|look up|find out)",    ]    for pattern in search_patterns:        if re.search(pattern, text_lower):            if st.session_state.search_enabled and SEARCH_AVAILABLE:                keywords = fallback_extract_keywords(user_text)                return"search", [], [], keywords            else:                return"text", [], [], []    return"text", [], [], []# -------------------- 为图像生成构造英文提示词（备用/补充） --------------------defgenerate_image_prompts_with_llm(user_text, count=1):    ensure_model_loaded("text")    tokenizer = st.session_state.loaded_model["tokenizer/processor"]    model = st.session_state.loaded_model["model"]    device = st.session_state.loaded_model["device"]    if count <= 1:        system_content = """你是一个AI绘画提示词专家。用户会给你一段描述，你需要将其转换为高质量英文AI绘画提示词。请只输出一个JSON对象：{"prompts": ["英文prompt"], "titles": ["标题"]}【titles 标题规则】：- 如果用户要求为古诗、诗词、歌词、名句配图，titles 必须使用原文诗句本身，例如"床前明月光"而不是"月光图"- 如果是为故事/文字配图，titles 使用文中核心短句- 如果是自由创作，titles 使用简洁描述不要有任何其他文字。"""    else:        system_content = f"""你是一个AI绘画提示词专家。用户会给你一段描述，你需要根据描述生成{count}段不同角度/场景的高质量英文AI绘画提示词。请只输出一个JSON对象：{{"prompts": ["英文prompt1", ...], "titles": ["标题1", ...]}}【titles 标题规则——非常重要】：- 如果用户要求为古诗、诗词、歌词、名句配图，titles 必须使用原文诗句/歌词本身作为标题，每张图对应一句原文。例如用户说"给静夜思配图"，titles 应为 ["床前明月光", "疑是地上霜", "举头望明月", "低头思故乡"]- 如果是为故事/文字配图，titles 使用文中核心短句- 如果是自由创作，titles 使用简洁描述- titles 每个不超过15字不要有任何其他文字。"""    prompt_messages = [        {"role": "system", "content": system_content},        {"role": "user", "content": user_text}    ]    text = tokenizer.apply_chat_template(        prompt_messages,        tokenize=False,        add_generation_prompt=True,        enable_thinking=False    )    inputs = tokenizer(text, return_tensors="pt").to(device)    with torch.no_grad():        outputs = model.generate(            **inputs,            max_new_tokens=500,            temperature=0.7,            top_p=0.9,            do_sample=True,        )    input_len = inputs["input_ids"].shape[1]    generated_ids = outputs[0][input_len:]    response_text = tokenizer.decode(generated_ids, skip_special_tokens=True).strip()    del inputs, outputs, generated_ids    clear_gpu_memory()    json_match = re.search(r'\{.*\}', response_text, re.DOTALL)    if json_match:        try:            result = json.loads(json_match.group())            prompts = result.get("prompts", [])            titles = result.get("titles", [])            ifisinstance(prompts, str):                prompts = [prompts]            ifisinstance(titles, str):                titles = [titles]            prompts = [p for p in prompts ifisinstance(p, str) and p.strip()]            whilelen(titles) < len(prompts):                titles.append(f"图片 {len(titles) + 1}")            if prompts:                return prompts, titles        except json.JSONDecodeError:            pass    cleaned = response_text.strip('"').strip("'").strip()    return [cleaned] if cleaned else [user_text], ["图片 1"]defestimate_image_count(user_text):    num_map = {        "一": 1, "二": 2, "两": 2, "三": 3, "四": 4,        "五": 5, "六": 6, "七": 7, "八": 8, "九": 9, "十": 10,    }    match = re.search(r'(\d+)\s*[张幅个副组]', user_text)    ifmatch:        returnmin(int(match.group(1)), 6)    for cn_char, num in num_map.items():        pattern = cn_char + r'\s*[张幅个副组]'        if re.search(pattern, user_text):            returnmin(num, 6)    multi_keywords = ["几幅", "几张", "几个", "几副", "多张", "多幅", "一些", "一组", "一批", "一套", "系列"]    for kw in multi_keywords:        if kw in user_text:            return4    return1# -------------------- 搜索结果 + 语言模型整合回答 --------------------defgenerate_search_response_stream(user_text, search_results, messages, max_tokens, temperature, top_p, top_k, enable_thinking):    ensure_model_loaded("text")    tokenizer = st.session_state.loaded_model["tokenizer/processor"]    model = st.session_state.loaded_model["model"]    device = st.session_state.loaded_model["device"]    text_messages = []    memory_context = load_memory_context(max_entries=10)    system_content = "你是一个智能助手，擅长根据联网搜索结果来回答用户问题。"    if memory_context:        system_content += f"\n\n{memory_context}"    system_content += f"\n\n以下是联网搜索获取的参考资料：\n\n{search_results}\n\n请根据以上参考资料，结合你自己的知识，准确、详细地回答用户的问题。要求：\n1. 优先使用搜索结果中的最新信息\n2. 如果搜索结果包含具体的时间、地点、人物等，请准确引用\n3. 对信息进行整理和总结，以清晰易读的方式呈现\n4. 如果搜索结果不足以完整回答问题，可以补充你已有的知识，但需注明\n5. 用中文回答（除非用户用其他语言提问）"    text_messages.append({"role": "system", "content": system_content})    truncated = truncate_messages_for_search(messages[:-1])    for msg in truncated:        if msg["type"] == "text":            text_messages.append({"role": msg["role"], "content": msg["content"]})        elif msg["type"] == "multimodal":            content = msg["content"]            if"text"in content and content["text"]:                text_messages.append({"role": msg["role"], "content": content["text"]})        elif msg["type"] == "image":            content = msg["content"]            if"text"in content and content["text"]:                text_messages.append({"role": msg["role"], "content": content["text"]})        elif msg["type"] == "search":            content = msg["content"]            if"text"in content and content["text"]:                text_messages.append({"role": msg["role"], "content": content["text"]})    text_messages.append({"role": "user", "content": user_text})    text = tokenizer.apply_chat_template(        text_messages,        tokenize=False,        add_generation_prompt=True,        enable_thinking=enable_thinking    )    inputs = tokenizer(text, return_tensors="pt").to(device)    streamer = TextIteratorStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)    generate_kwargs = {        **inputs,        "max_new_tokens": max_tokens,        "temperature": temperature if temperature > 0else1.0,        "top_p": top_p,        "top_k": top_k,        "do_sample": temperature > 0,        "streamer": streamer,    }    def_generate():        with torch.no_grad():            model.generate(**generate_kwargs)    thread = Thread(target=_generate)    thread.start()    for new_text in streamer:        yield new_text    thread.join()    del inputs    clear_gpu_memory()# -------------------- 图片显示尺寸计算 --------------------defget_display_width(num_images):    if num_images == 1:        return480    elif num_images == 2:        return400    else:        return340defdisplay_images_grid(images, titles, num_images):    display_w = get_display_width(num_images)    if num_images == 1:        if titles:            st.markdown(f"**{titles[0]}**")        st.image(images[0], width=display_w)    elif num_images == 2:        cols = st.columns(2)        for idx, img inenumerate(images):            with cols[idx]:                if idx < len(titles):                    st.markdown(f"**{titles[idx]}**")                st.image(img, width=display_w)    else:        cols_per_row = 3if num_images >= 3else num_images        for row_start inrange(0, num_images, cols_per_row):            row_imgs = images[row_start:row_start + cols_per_row]            row_titles = titles[row_start:row_start + cols_per_row] if titles else []            cols = st.columns(len(row_imgs))            for idx, img inenumerate(row_imgs):                with cols[idx]:                    if idx < len(row_titles):                        st.markdown(f"**{row_titles[idx]}**")                    st.image(img, width=display_w)# -------------------- 侧边栏 --------------------with st.sidebar:    st.title("⚙️ 设置")    st.subheader("🎛️ 生成参数")    max_tokens = st.slider("最大生成令牌数", 256, 8192, 2048)    temperature = st.slider("温度", 0.0, 2.0, 0.7, step=0.1)    top_p = st.slider("Top P", 0.0, 1.0, 0.9, step=0.05)    top_k = st.slider("Top K", 1, 100, 50)    enable_thinking = st.checkbox("启用思考模式（仅文本模型）", value=False)    st.divider()    st.subheader("🖼️ 图像生成设置")    st.selectbox(        "图片宽高比",        list(ASPECT_RATIO_OPTIONS.keys()),        key="aspect_ratio",        help="选择生成图片的宽高比，图片始终以最高质量生成"    )    selected_ratio = ASPECT_RATIO_OPTIONS[st.session_state.aspect_ratio]    st.caption(f"生成分辨率: {selected_ratio[0]} × {selected_ratio[1]} 像素")    st.divider()    st.subheader("🔍 联网搜索设置")    ifnot SEARCH_AVAILABLE:        st.warning("⚠️ FreeKnowledge_AI 未安装，搜索功能不可用。\n请运行: `pip install FreeKnowledge_AI`")    with st.expander("搜索高级设置", expanded=False):        st.selectbox("搜索引擎", ["BAIDU", "DUCKDUCKGO"], key="search_mode")        st.number_input("最大搜索结果数", min_value=1, max_value=20, value=5, key="search_max_results")        st.text_input("搜索API模型", key="search_api_model")        st.text_input("搜索API地址", key="search_api_base")        st.text_input("搜索API密钥", key="search_api_key", type="password")    st.divider()    st.subheader("🧠 记忆管理")    memory_stats = get_memory_stats()    if memory_stats["exists"]:        st.markdown(f"📝 记忆条目: **{memory_stats['entries']}** 条")        st.markdown(f"💾 文件大小: **{memory_stats['size_kb']}** KB")    else:        st.markdown("📝 暂无记忆记录")    col_mem1, col_mem2 = st.columns(2)    with col_mem1:        if st.button("📥 导出记忆", use_container_width=True):            if os.path.exists(MEMORY_FILE):                withopen(MEMORY_FILE, "r", encoding="utf-8") as f:                    memory_content = f.read()                st.download_button(                    label="下载 memory.md",                    data=memory_content,                    file_name="memory.md",                    mime="text/markdown",                    use_container_width=True                )    with col_mem2:        if st.button("🗑️ 清除记忆", use_container_width=True):            if os.path.exists(MEMORY_FILE):                os.remove(MEMORY_FILE)            st.success("记忆已清除")            st.rerun()    st.divider()    uploaded_file = st.file_uploader("📤 上传图片（用于图像理解）", type=["png", "jpg", "jpeg"])    if uploaded_file isnotNone:        st.session_state.uploaded_image = Image.open(io.BytesIO(uploaded_file.getvalue())).convert("RGB")        st.success("图片已暂存")    if st.button("🗑️ 清除对话历史", type="secondary", use_container_width=True):        st.session_state.messages = []        st.session_state.uploaded_image = None        st.rerun()    st.divider()    st.subheader("🔧 显存管理")    if torch.cuda.is_available():        for i inrange(torch.cuda.device_count()):            total = torch.cuda.get_device_properties(i).total_memory / (1024 ** 3)            allocated = torch.cuda.memory_allocated(i) / (1024 ** 3)            reserved = torch.cuda.memory_reserved(i) / (1024 ** 3)            st.caption(f"GPU {i}: 已分配 {allocated:.1f}G / 已预留 {reserved:.1f}G / 总计 {total:.1f}G")    else:        st.caption("未检测到 CUDA GPU")    if st.button("🧹 手动清理显存", use_container_width=True):        clear_gpu_memory()        st.success("显存缓存已清理")# -------------------- 主界面 --------------------st.title("🤖 多模态对话助手（多卡优化版）")st.markdown("支持文本对话、多图生成、图像理解、联网搜索 | 自动管理显存 | 多卡分配 | 上下文记忆 | LLM智能意图识别")st.markdown("---")for msg in st.session_state.messages:    with st.chat_message(msg["role"]):        if msg["type"] == "text":            st.markdown(msg["content"])        elif msg["type"] == "multimodal":            content = msg["content"]            if"text"in content and content["text"]:                st.markdown(content["text"])            if"images"in content:                for img in content["images"]:                    st.image(img, width=400)        elif msg["type"] == "image":            content = msg["content"]            if"text"in content:                st.markdown(content["text"])            if"images"in content:                img_list = content["images"]                titles = content.get("titles", [])                display_images_grid(img_list, titles, len(img_list))        elif msg["type"] == "search":            content = msg["content"]            if"search_keywords"in content and content["search_keywords"]:                kw_tags = " ".join([f"`{k}`"for k in content["search_keywords"]])                st.caption(f"🔍 搜索关键词: {kw_tags}")            if"text"in content and content["text"]:                st.markdown(content["text"])# -------------------- 文本对话流式生成 --------------------defgenerate_text_stream(messages, max_tokens, temperature, top_p, top_k, enable_thinking):    ensure_model_loaded("text")    tokenizer = st.session_state.loaded_model["tokenizer/processor"]    model = st.session_state.loaded_model["model"]    device = st.session_state.loaded_model["device"]    text_messages = []    memory_context = load_memory_context(max_entries=15)    if memory_context:        text_messages.append({"role": "system", "content": memory_context})    truncated = truncate_messages_for_text(messages)    for msg in truncated:        if msg["type"] == "text":            text_messages.append({"role": msg["role"], "content": msg["content"]})        elif msg["type"] == "multimodal":            content = msg["content"]            if"text"in content and content["text"]:                text_messages.append({"role": msg["role"], "content": content["text"]})        elif msg["type"] == "image":            content = msg["content"]            if"text"in content and content["text"]:                text_messages.append({"role": msg["role"], "content": content["text"]})        elif msg["type"] == "search":            content = msg["content"]            if"text"in content and content["text"]:                text_messages.append({"role": msg["role"], "content": content["text"]})    text = tokenizer.apply_chat_template(        text_messages,        tokenize=False,        add_generation_prompt=True,        enable_thinking=enable_thinking    )    inputs = tokenizer(text, return_tensors="pt").to(device)    streamer = TextIteratorStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)    generate_kwargs = {        **inputs,        "max_new_tokens": max_tokens,        "temperature": temperature if temperature > 0else1.0,        "top_p": top_p,        "top_k": top_k,        "do_sample": temperature > 0,        "streamer": streamer,    }    def_generate():        with torch.no_grad():            model.generate(**generate_kwargs)    thread = Thread(target=_generate)    thread.start()    for new_text in streamer:        yield new_text    thread.join()    del inputs    clear_gpu_memory()# -------------------- 图像理解流式生成 --------------------defgenerate_vl_stream(messages, max_tokens, temperature, top_p, top_k):    ensure_model_loaded("vl")    processor = st.session_state.loaded_model["tokenizer/processor"]    model = st.session_state.loaded_model["model"]    device = st.session_state.loaded_model["device"]    vl_messages = []    images = []    memory_context = load_memory_context(max_entries=10)    if memory_context:        vl_messages.append({"role": "system", "content": memory_context})    truncated = truncate_messages_for_vl(messages)    for msg in truncated:        if msg["type"] == "text":            vl_messages.append({"role": msg["role"], "content": msg["content"]})        elif msg["type"] == "multimodal":            content = msg["content"]            content_list = []            if"images"in content:                for img in content["images"]:                    content_list.append({"type": "image", "image": img})                    images.append(img)            if"text"in content and content["text"]:                content_list.append({"type": "text", "text": content["text"]})            vl_messages.append({"role": msg["role"], "content": content_list})        elif msg["type"] == "image":            content = msg["content"]            content_list = []            if"images"in content:                for img in content["images"]:                    content_list.append({"type": "image", "image": img})                    images.append(img)            if"text"in content and content["text"]:                content_list.append({"type": "text", "text": content["text"]})            vl_messages.append({"role": msg["role"], "content": content_list})    prompt = processor.apply_chat_template(vl_messages, add_generation_prompt=True, tokenize=False)    inputs = processor(text=prompt, images=images if images elseNone, return_tensors="pt", padding=True)    inputs = {k: v.to(device) for k, v in inputs.items()}    streamer = TextIteratorStreamer(processor, skip_prompt=True, skip_special_tokens=True)    generate_kwargs = {        **inputs,        "max_new_tokens": max_tokens,        "temperature": temperature if temperature > 0else1.0,        "top_p": top_p,        "top_k": top_k,        "do_sample": temperature > 0,        "streamer": streamer,    }    def_generate():        with torch.no_grad():            model.generate(**generate_kwargs)    thread = Thread(target=_generate)    thread.start()    for new_text in streamer:        yield new_text    thread.join()    del inputs, images    clear_gpu_memory()# -------------------- 图像生成（单张，支持自定义宽高） --------------------defgenerate_image(prompt, seed=None, num_inference_steps=9, guidance_scale=0.0, img_width=1024, img_height=1024):    ensure_model_loaded("gen")    pipe = st.session_state.loaded_model["model"]    device = st.session_state.loaded_model["device"]    if seed isNone:        seed = random.randint(0, 2147483647)    generator = torch.Generator(device=device).manual_seed(seed)    with torch.no_grad():        image = pipe(            prompt=prompt,            height=img_height,            width=img_width,            num_inference_steps=num_inference_steps,            guidance_scale=guidance_scale,            generator=generator,        ).images[0]    return image# -------------------- 处理用户输入 --------------------if prompt := st.chat_input("输入您的问题或描述..."):    has_image = st.session_state.uploaded_image isnotNone    user_image = st.session_state.uploaded_image    st.session_state.uploaded_image = None    if has_image:        user_msg = {            "role": "user",            "type": "multimodal",            "content": {"text": prompt, "images": [user_image]}        }        with st.chat_message("user"):            if prompt:                st.markdown(prompt)            st.image(user_image, width=400)    else:        user_msg = {"role": "user", "type": "text", "content": prompt}        with st.chat_message("user"):            st.markdown(prompt)    st.session_state.messages.append(user_msg)    with st.chat_message("assistant"):        with st.spinner("🧠 正在分析意图..."):            try:                intent, prompts_list, titles_list, search_keywords = detect_intent_with_llm(                    prompt, has_image, st.session_state.messages                )            except Exception as e:                st.warning(f"LLM意图识别异常，启用备用规则: {str(e)}")                intent, prompts_list, titles_list, search_keywords = fallback_detect_intent(prompt)        intent_labels = {            "text": "💬 文本对话",            "generate": "🎨 图像生成",            "understand": "👁️ 图像理解",            "search": "🔍 联网搜索"        }        st.caption(f"识别意图: {intent_labels.get(intent, intent)}")        save_to_memory("user", prompt, intent=intent, has_image=has_image)        try:            if intent == "search":                ifnot search_keywords:                    search_keywords = fallback_extract_keywords(prompt)                search_query_string = " ".join(search_keywords)                kw_display = " ".join([f"`{k}`"for k in search_keywords])                st.info(f"🔍 提取关键词: {kw_display}\n\n📡 搜索语句: **{search_query_string}**")                with st.spinner("🌐 正在联网搜索，请稍候..."):                    raw_results = perform_web_search(search_query_string)                if raw_results:                    clean_results = extract_search_info(raw_results)                    with st.expander("📄 查看搜索原始结果", expanded=False):                        st.text(clean_results[:2000] + ("..."iflen(clean_results) > 2000else""))                    st.markdown("---")                    st.markdown("**📝 基于搜索结果生成回答：**")                    response = st.write_stream(                        generate_search_response_stream(                            prompt,                            clean_results,                            st.session_state.messages,                            max_tokens,                            temperature,                            top_p,                            top_k,                            enable_thinking                        )                    )                    assistant_msg = {                        "role": "assistant",                        "type": "search",                        "content": {                            "text": response,                            "search_keywords": search_keywords,                            "search_query": search_query_string,                        }                    }                    st.session_state.messages.append(assistant_msg)                    save_to_memory("assistant", f"[搜索: {search_query_string}]\n{response}")                else:                    st.warning("搜索未返回有效结果，将使用本地模型直接回答。")                    response = st.write_stream(                        generate_text_stream(                            st.session_state.messages,                            max_tokens,                            temperature,                            top_p,                            top_k,                            enable_thinking                        )                    )                    assistant_msg = {"role": "assistant", "type": "text", "content": response}                    st.session_state.messages.append(assistant_msg)                    save_to_memory("assistant", response)            elif intent == "generate":                ifnot prompts_list:                    estimated_count = estimate_image_count(prompt)                    with st.spinner(f"🎨 正在生成 {estimated_count} 条绘画提示词..."):                        prompts_list, titles_list = generate_image_prompts_with_llm(prompt, count=estimated_count)                num_images = len(prompts_list)                gen_width, gen_height = ASPECT_RATIO_OPTIONS[st.session_state.aspect_ratio]                st.markdown(f"📋 将生成 **{num_images}** 张图片（{st.session_state.aspect_ratio}，{gen_width}×{gen_height}）：")                for i, (title, p) inenumerate(zip(titles_list, prompts_list)):                    st.caption(f"  {i + 1}. **{title}** — {p}")                generated_images = []                progress_bar = st.progress(0, text="正在生成图片...")                for i, img_prompt inenumerate(prompts_list):                    progress_bar.progress(                        (i) / num_images,                        text=f"🖼️ 正在生成第 {i + 1}/{num_images} 张: {titles_list[i] if i < len(titles_list) else ''}"                    )                    seed = random.randint(0, 2147483647)                    img = generate_image(                        img_prompt,                        seed=seed,                        img_width=gen_width,                        img_height=gen_height                    )                    generated_images.append(img)                    clear_gpu_memory()                progress_bar.progress(1.0, text=f"✅ 全部 {num_images} 张图片生成完成！")                display_images_grid(generated_images, titles_list, num_images)                prompts_summary = " | ".join([f"{t}: {p}"for t, p inzip(titles_list, prompts_list)])                response_text = f"根据「{prompt}」生成了 {num_images} 张图片（{prompts_summary}）"                assistant_msg = {                    "role": "assistant",                    "type": "image",                    "content": {                        "text": response_text,                        "images": generated_images,                        "titles": titles_list                    }                }                st.session_state.messages.append(assistant_msg)                save_to_memory("assistant", response_text)            elif intent == "understand":                response = st.write_stream(                    generate_vl_stream(                        st.session_state.messages,                        max_tokens,                        temperature,                        top_p,                        top_k                    )                )                assistant_msg = {"role": "assistant", "type": "text", "content": response}                st.session_state.messages.append(assistant_msg)                save_to_memory("assistant", response)            else:                response = st.write_stream(                    generate_text_stream(                        st.session_state.messages,                        max_tokens,                        temperature,                        top_p,                        top_k,                        enable_thinking                    )                )                assistant_msg = {"role": "assistant", "type": "text", "content": response}                st.session_state.messages.append(assistant_msg)                save_to_memory("assistant", response)        except Exception as e:            st.error(f"发生错误: {str(e)}")            unload_current_model()# -------------------- 底部信息 --------------------st.markdown("---")memory_stats = get_memory_stats()memory_info = f"记忆: {memory_stats['entries']} 条"if memory_stats["exists"] else"记忆: 无"search_status = "搜索: ✅"if (st.session_state.search_enabled and SEARCH_AVAILABLE) else"搜索: ❌"st.markdown(    f"""    <div style='text-align: center; color: gray; font-size: 0.85rem;'>        <p>多模态对话助手 | 多卡优化 | 显存自动管理 | LLM智能意图识别 | 多图生成 | 关键词搜索 | 📝 {memory_info} | {search_status}</p>    </div>    """,    unsafe_allow_html=True)

准备模型

我们的模型从modelscope上下载。 1、Qwen3.5-9B模型下载地址

https://modelscope.cn/models/Qwen/Qwen3.5-9B

下载方法

git clone https://www.modelscope.cn/Qwen/Qwen3.5-9B.git

2、Z-IMAGE-Turbo模型下载地址

https://modelscope.cn/models/Tongyi-MAI/Z-Image-Turbo

下载方法

git clone https://www.modelscope.cn/Tongyi-MAI/Z-Image-Turbo.git

运行环境

nvcc: NVIDIA (R) Cuda compiler driverCopyright (c) 2005-2024 NVIDIA CorporationBuilt on Fri_Jun_14_16:44:19_Pacific_Daylight_Time_2024Cuda compilation tools, release 12.6, V12.6.20Build cuda_12.6.r12.6/compiler.34431801_0
``````plaintext
Package                   Version------------------------- ------------accelerate                1.13.0addict                    2.4.0altair                    6.0.0annotated-doc             0.0.4annotated-types           0.7.0anyio                     4.12.1attrs                     25.4.0beautifulsoup4            4.14.3blinker                   1.9.0cachetools                7.0.5certifi                   2026.2.25charset-normalizer        3.4.5click                     8.3.1colorama                  0.4.6contourpy                 1.3.3cycler                    0.12.1diffusers                 0.37.0distro                    1.9.0einops                    0.8.2filelock                  3.25.2fonttools                 4.62.1FreeKnowledge_AI          0.3.1fsspec                    2026.2.0gitdb                     4.0.12GitPython                 3.1.46h11                       0.16.0hf-xet                    1.4.2httpcore                  1.0.9httpx                     0.28.1huggingface_hub           1.7.1idna                      3.11importlib_metadata        8.7.1jieba                     0.42.1Jinja2                    3.1.6jiter                     0.13.0jsonschema                4.26.0jsonschema-specifications 2025.9.1kiwisolver                1.5.0markdown-it-py            4.0.0MarkupSafe                3.0.3matplotlib                3.10.8mdurl                     0.1.2modelscope                1.35.0mpmath                    1.3.0narwhals                  2.18.0networkx                  3.6.1numpy                     1.26.4openai                    2.28.0packaging                 26.0pandas                    2.3.3pillow                    12.1.1pip                       26.0.1protobuf                  6.33.5psutil                    7.2.2pyarrow                   23.0.1pydantic                  2.12.5pydantic_core             2.41.5pydeck                    0.9.1Pygments                  2.19.2pyparsing                 3.3.2python-dateutil           2.9.0.post0pytz                      2026.1.post1PyYAML                    6.0.3referencing               0.37.0regex                     2026.2.28requests                  2.32.5rich                      14.3.3rpds-py                   0.30.0safetensors               0.7.0setuptools                65.5.0shellingham               1.5.4six                       1.17.0smmap                     5.0.3sniffio                   1.3.1soupsieve                 2.8.3streamlit                 1.55.0sympy                     1.13.1tenacity                  9.1.4tokenizers                0.22.2toml                      0.10.2torch                     2.6.0+cu126torchvision               0.21.0+cu126tornado                   6.5.5tqdm                      4.67.3transformers              5.3.0typer                     0.24.1typing_extensions         4.15.0typing-inspection         0.4.2tzdata                    2025.3urllib3                   2.6.3watchdog                  6.0.0zipp                      3.23.0

搜索配置

需要在硅基流动网站实名认证一下，并在下方控制台生成免费的Key，搜索后端将默认使用上海书生浦语大模型进行处理

https://cloud.siliconflow.cn/me/account/ak

同时安装搜索包

pip install FreeKnowledge-AI

学AI大模型的正确顺序，千万不要搞错了

🤔2026年AI风口已来！各行各业的AI渗透肉眼可见，超多公司要么转型做AI相关产品，要么高薪挖AI技术人才，机遇直接摆在眼前！

有往AI方向发展，或者本身有后端编程基础的朋友，直接冲AI大模型应用开发转岗超合适！

就算暂时不打算转岗，了解大模型、RAG、Prompt、Agent这些热门概念，能上手做简单项目，也绝对是求职加分王🔋

在这里插入图片描述

📝给大家整理了超全最新的AI大模型应用开发学习清单和资料，手把手帮你快速入门！👇👇

学习路线:

✅大模型基础认知—大模型核心原理、发展历程、主流模型（GPT、文心一言等）特点解析
✅核心技术模块—RAG检索增强生成、Prompt工程实战、Agent智能体开发逻辑
✅开发基础能力—Python进阶、API接口调用、大模型开发框架（LangChain等）实操
✅应用场景开发—智能问答系统、企业知识库、AIGC内容生成工具、行业定制化大模型应用
✅项目落地流程—需求拆解、技术选型、模型调优、测试上线、运维迭代
✅面试求职冲刺—岗位JD解析、简历AI项目包装、高频面试题汇总、模拟面经

以上6大模块，看似清晰好上手，实则每个部分都有扎实的核心内容需要吃透！

我把大模型的学习全流程已经整理📚好了！抓住AI时代风口，轻松解锁职业新可能，希望大家都能把握机遇，实现薪资/职业跃迁～