用Coze工作流将小说自动转视频分镜：我用AI省下80%创作时间

小丶舟

890人浏览 · 2026-04-30 20:40:05

小丶舟 · 2026-04-30 20:40:05 发布

前言

做过视频内容的朋友都知道，最耗时的环节不是拍摄，而是写分镜脚本。

一个1分钟的视频往往需要写10-20个分镜，每个分镜要包含画面描述、镜头运动、台词、时长…如果是做故事类内容，还要考虑角色一致性、场景衔接、情绪节奏。

今天分享我的实战方案：用Coze工作流把小说文本自动转换成视频分镜，实测将分镜制作时间压缩了80%。

核心思路

整个流程分三个阶段：

上游：JSON拆分 → 中游：角色/场景定妆 → 下游：分镜生成

每个阶段独立运作，通过ID绑定实现全链路追踪。这是工业化生产的基础 —— 任何环节都可以单独优化、替换，而不影响其他环节。

一、上游：JSON拆分

1.1 为什么先拆分JSON

小说原文是连续文本，但视频分镜需要独立的画面单元。一个完整的场景描写可能包含：

环境描写
角色动作
对话
心理活动

这些不能放在同一个分镜里，否则画面会变得混乱。

1.2 拆分规则

我的经验是：

每条JSON控制在 35-100字
包含 1个完整的视觉画面
独立的情节单元
场景变化时开启新条目

1.3 代码实现

import re
import json

def split_novel_to_scenes(text, max_length=100, min_length=35):
    """
    将小说文本拆分为独立场景
    
    Args:
        text: 原始小说文本
        max_length: 单条最大长度（推荐80-100字）
        min_length: 单条最小长度（推荐35字）
    Returns:
        list: 场景列表
    """
    # 按段落分割
    paragraphs = [p.strip() for p in text.split('\n') if p.strip()]
    
    scenes = []
    current_scene = ""
    scene_id = 1
    
    for para in paragraphs:
        # 判断是否是对话
        is_dialogue = para.startswith('"') or para.startswith('「')
        
        # 如果当前场景+新段落会超长，先保存当前场景
        if len(current_scene) + len(para) > max_length and current_scene:
            scenes.append({
                "id": f"S{scene_id:03d}",
                "content": current_scene.strip()
            })
            scene_id += 1
            current_scene = ""
        
        # 如果是对话，单独成条
        if is_dialogue and current_scene:
            scenes.append({
                "id": f"S{scene_id:03d}",
                "content": current_scene.strip()
            })
            scene_id += 1
            current_scene = ""
        
        current_scene += para + " "
    
    # 保存最后一条
    if current_scene.strip():
        scenes.append({
            "id": f"S{scene_id:03d}",
            "content": current_scene.strip()
        })
    
    return scenes

# 使用示例
novel_text = '''
夜色如墨，明月高悬。李玄站在断崖边，衣袂被山风吹得猎猎作响。
他低头看着手中的半块玉佩，那是师父临终前交给他的唯一信物。
"三年了。"他喃喃自语，"师父，徒儿终于找到了进入仙门的办法。"
身后传来轻微的脚步声。
"李玄，你果然在这里。"
'''

scenes = split_novel_to_scenes(novel_text)
print(json.dumps(scenes, ensure_ascii=False, indent=2))

1.4 输出示例

[
  {
    "id": "S001",
    "content": "夜色如墨，明月高悬。李玄站在断崖边，衣袂被山风吹得猎猎作响。他低头看着手中的半块玉佩，那是师父临终前交给他的唯一信物。"
  },
  {
    "id": "S002",
    "content": "\"三年了。\"他喃喃自语，"师父，徒儿终于找到了进入仙门的办法。\""
  },
  {
    "id": "S003",
    "content": "身后传来轻微的脚步声。\"李玄，你果然在这里。\""
  }
]

二、中游：角色定妆

2.1 定妆的作用

视频分镜最怕的就是角色脸崩。上一秒是长发飘飘的女主，下一秒变成了短发壮汉。

定妆阶段要解决两个问题：

角色一致性 - 统一描述角色的外貌特征
场景连续性 - 保持环境描述的统一

2.2 角色描述模板

[角色名]：
[身份]：[在故事中的身份]
[外貌]：[面部特征、发型、身材、服装细节]
[性格]：[性格特点]
[道具]：[携带的重要道具]
[小传]：[50字内的背景故事]

2.3 定妆提示词生成

def generate_costume_prompt(character, image_ratio="3:4"):
    """
    生成用于AI绘图的定妆提示词
    
    Args:
        character: 角色字典
        image_ratio: 图片比例
    """
    # 提取关键信息
    appearance = character.get('外貌', '')
    outfit = character.get('服装', '')
    expression = character.get('表情', '表情平静')
    
    # 构建英文提示词（更适合AI绘图模型）
    prompt = f"""
{appearance}.
{outfit}.
表情{expression}.
月光映照下轮廓分明。
高分辨率，极其详细，大师级画作，8k画质，精致的面部细节，中国古风风格。
--ar {image_ratio} --v 6.0
""".strip()
    
    return prompt

# 使用示例
character = {
    "name": "李玄",
    "身份": "仙门弃徒",
    "外貌": "约二十五六岁，面容清瘦，眉宇间带着沧桑与坚毅。黑色长发被山风吹乱",
    "服装": "身穿灰色布袍（已被江湖风霜磨损），腰间无佩剑",
    "表情": "神情复杂，目光深邃如渊",
    "道具": "右手紧握半块古玉玉佩"
}

prompt = generate_costume_prompt(character)
print(prompt)

三、下游：分镜生成

3.1 分镜要素

一个完整的视频分镜应该包含：

要素	说明	示例
镜头	景别和运动	中景，缓慢推进
画面	环境+角色描述	断崖边，明月高悬，男子站立
台词	对话或旁白	“三年了…”
时长	建议时长	3-5秒

3.2 分镜生成代码

def generate_storyboard(scene_data, character_prompts, scene_prompts):
    """
    生成视频分镜
    
    Args:
        scene_data: 场景JSON列表
        character_prompts: 角色定妆提示词字典
        scene_prompts: 场景定妆提示词字典
    """
    storyboard = []
    
    for scene in scene_data:
        scene_id = scene['id']
        content = scene['content']
        
        # 分析内容类型
        is_dialogue = '"' in content or '「' in content
        has_action = any(kw in content for kw in ['走', '站', '转身', '抬头'])
        
        # 生成镜头建议
        if is_dialogue:
            shot = "中近景，缓慢推进"
        elif has_action:
            shot = "全景+特写切换"
        else:
            shot = "全景，固定镜头"
        
        # 组合画面描述
        scene_key = extract_scene_key(content)  # 提取场景关键词
        related_scene = scene_prompts.get(scene_key, "")
        related_char = detect_character(content, character_prompts)
        
        storyboard.append({
            "scene_id": scene_id,
            "shot": shot,
            "description": content,
            "visual_prompt": f"{related_scene} {related_char}",
            "duration": estimate_duration(len(content))
        })
    
    return storyboard

def estimate_duration(char_count):
    """估算镜头时长"""
    if char_count < 50:
        return "2-3秒"
    elif char_count < 100:
        return "3-5秒"
    else:
        return "5-8秒"

3.3 Sora分镜提示词示例

针对AI视频生成工具（如Sora），需要生成更详细的视觉描述：

{
  "scene_id": "S001",
  "sora_prompt": "A young man in tattered gray robes stands at the edge of a desolate cliff at night. The full moon casts silver light across his weathered face. He holds half of a jade pendant in his right hand, gazing into the distance. His long black hair is blown by the mountain wind. Thick clouds swirl in the abyss below. Cinematic lighting, dramatic atmosphere, Chinese xianxia aesthetic. Slow camera push-in. 4K, film grain, cinematic color grading."
}