背景介绍

如何基于使用js快速编辑word文档?如何让ai协助润色word格式?

为了解决以上问题,docx-edit  被设计出来。这是一个基于 JavaScript 的 面向.docx文件解析与修改库。相比于传统的word文件解析库,docx-edit的优势在于:对于word动态修改更加友好的支持,操作效率更高准确率更优秀,以及更好的AI Agent适配性。

当前版本ddocx-edit库已经支持以下的操作:

  • 文档虚拟树 diff / patch
  • 段落与 run 的样式级建模
  • 样式新增、修改、清空
  • 组件之间的样式迁移

docx-edit的特性

  • 解析正文、页眉、页脚、批注、脚注、尾注
  • 识别 paragraphruntexttabletable-rowtable-cellhyperlinktext-box
  • 支持段落跨多个 w:t 的整段文本读取和回写
  • 支持真正的虚拟树 diff / patch
  • 支持段落样式和 run 样式的建模、修改和迁移
  • 保存修改后的 .docx

安装

npm install docx-edit

本库当前使用 CommonJS 导出,对应 Node.js 环境建议为 >=18

    快速开始

    const { loadDocx } = require("docx-edit");
    
    async function main() {
      const doc = await loadDocx("./sample.docx");
    
      doc.replaceAll("旧词", "新词");
      await doc.saveAs("./sample.modified.docx");
    }
    
    main();
    

      虚拟树模型

      文档会被解析成一棵虚拟树,典型结构如下:

      document
        body
          paragraph
            run
              text
          table
            table-row
              table-cell
                paragraph
        header
          paragraph
        footer
          paragraph
        comments
          comment
            paragraph

      目前支持的节点类型:

      • document
      • body
      • header
      • footer
      • footnotes
      • endnotes
      • comments
      • paragraph
      • run
      • text
      • table
      • table-row
      • table-cell
      • hyperlink
      • tab
      • break
      • text-box
      • comment
      • footnote
      • endnote

      内部写入流程

      无论你调用的是旧控制器接口,还是直接使用 doc.patch(nextTree),内部流程都是一致的:

      1. 从当前文档生成一棵新的虚拟树副本
      2. 在副本上修改目标节点
      3. 调用 doc.patch(nextTree)
      4. patch 引擎执行 INSERT / REMOVE / REPLACE / MOVE / PROPS/TEXT_UPDATE
      5. 将结果同步回底层 OOXML
      6. 从 XML 重新建树并重建索引

      对于段落文本修改,仍然保留当前 ParagraphTextModel 的策略:

      • 尽量保留原有 w:r / w:t
      • 尽量保留 tab / break
      • 只将新的文本重新分配回原有文本节点

      样式模型

      当前已经支持两层样式建模:

      • paragraph.props.style 对应 w:pPr
      • run.props.style 对应 w:rPr

      已支持的段落样式字段

      {
        styleId: "BodyText",
        alignment: "center",
        keepNext: true,
        keepLines: true,
        pageBreakBefore: false,
        spacing: {
          before: "120",
          after: "240",
          line: "360",
          lineRule: "auto",
        },
        indent: {
          left: "240",
          right: "120",
          firstLine: "240",
          hanging: "240",
        },
      }

      已支持的 run 样式字段

      {
        styleId: "Emphasis",
        bold: true,
        italic: true,
        underline: "single",
        color: "FF0000",
        highlight: "yellow",
        fontSize: "28",
        fontFamily: {
          ascii: "Calibri",
          hAnsi: "Calibri",
          eastAsia: "宋体",
          cs: "Arial",
        },
      }

      导出 API

      入口定义在 src/index.js

      const {
        loadDocx,
        VirtualWordDocument,
        VNode,
        createVNode,
        cloneVNode,
        DocumentPartController,
        ParagraphController,
        RunController,
        TableController,
        TableRowController,
        TableCellController,
        TextBoxController,
        StructuredEntryController,
      } = require("docx-edit");

      文档 API

      loadDocx(input)

      加载 .docx 文件。

      • input: string | Buffer
      • 返回:Promise<VirtualWordDocument>
      const doc = await loadDocx("./sample.docx");

      doc.toComponentTree()

      返回当前文档虚拟树的副本。你可以在这棵树上修改,再传给 doc.patch()

      const tree = doc.toComponentTree();
      console.log(tree.type); // document

      doc.patch(nextTree)

      对完整虚拟树执行 patch,并把结果同步到底层 XML。

      • 根节点类型必须为 document
      • 支持文本更新、结构新增、删除、替换、重排
      • 支持段落样式和 run 样式修改
      • 返回 patch 结果,包含执行的操作列表
      const tree = doc.toComponentTree();
      const body = tree.children.find((node) => node.type === "body");
      body.children[0].props.text = "新的第一段";
      
      const result = doc.patch(tree);
      console.log(result.operations);

      doc.toBuffer()

      返回修改后的 .docx 二进制内容。

      const buffer = await doc.toBuffer();

      doc.saveAs(outputPath)

      保存文档到指定路径。

      await doc.saveAs("./sample.modified.docx");

      文档级查询接口

      doc.getParts();
      doc.getBody();
      doc.getHeaders();
      doc.getFooters();
      doc.getParagraphs();
      doc.getParagraph(0);
      doc.getTables();
      doc.getTextBoxes();
      doc.getFootnotes();
      doc.getEndnotes();
      doc.getComments();

      doc.replaceAll(searchValue, replacement, options?)

      全文替换段落文本。

      • searchValue: string | RegExp
      • replacement: string | Function
      • options.partTypes?: string[]
      doc.replaceAll("活动", "主题活动");
      doc.replaceAll(/2025/g, "2026");
      doc.replaceAll("页眉", "新页眉", { partTypes: ["header"] });

      控制器 API

      旧控制器 API 仍然保留,但内部已经迁移到虚拟树 patch。

      DocumentPartController

      常见来源:

      const body = doc.getBody();
      const header = doc.getHeaders()[0];
      const footer = doc.getFooters()[0];

      可用方法:

      body.toComponentTree();
      body.getParagraphs();
      body.getParagraph(0);
      body.getTables();
      body.getTable(0);
      body.getTextBoxes();
      body.replaceAll("旧词", "新词");

      对于 comments / footnotes / endnotes part,还可以:

      const commentsPart = doc.getParts().find((part) => part.type === "comments");
      commentsPart.getEntries();
      commentsPart.getEntries({ includeSpecial: true });

      ParagraphController

      const paragraph = doc.getBody().getParagraph(0);
      
      paragraph.getText();
      paragraph.setText("新的段落内容");
      paragraph.replace("旧词", "新词");
      paragraph.replaceAll("青年", "青年学生");
      paragraph.getStyle();
      paragraph.setStyle({ alignment: "center" });
      paragraph.patchStyle({ spacing: { after: "240" } });
      paragraph.getRuns();
      paragraph.getRun(0);

      RunController

      const run = doc.getBody().getParagraph(0).getRun(0);
      
      run.getText();
      run.getStyle();
      run.setStyle({
        bold: true,
        color: "FF0000",
        fontSize: "28",
      });
      run.patchStyle({
        italic: true,
        underline: "single",
      });

      样式迁移

      const paragraphA = doc.getBody().getParagraph(0);
      const paragraphB = doc.getBody().getParagraph(1);
      
      paragraphB.copyStyleFrom(paragraphA);
      
      const runA = paragraphA.getRun(0);
      const runB = paragraphB.getRun(0);
      runB.copyStyleFrom(runA);

      TableController

      const table = doc.getTables()[0];
      
      table.getRows();
      table.getRow(0);
      table.getCell(1, 2);
      
      table.fill(
        [
          ["活动名称", "日期", "负责人", "备注"],
          ["分享会", "2026-03-24", "张三", "已确认"],
        ],
        { startRow: 0 },
      );

      TableRowController

      const row = doc.getTables()[0].getRow(0);
      
      row.getCells();
      row.getCell(0);

      TableCellController

      const cell = doc.getTables()[0].getCell(1, 0);
      
      cell.getParagraphs();
      cell.getParagraph(0);
      cell.getText();
      cell.setText("新的单元格内容");

      TextBoxController

      const textBox = doc.getTextBoxes()[0];
      
      textBox.getParagraphs();
      textBox.getText();

      StructuredEntryController

      用于 comment / footnote / endnote

      const comment = doc.getComments()[0];
      
      comment.getParagraphs();
      comment.getText();
      comment.replaceAll("原文", "新文");

      虚拟树 API 调用说明

      1. 修改已有段落文本

      const { loadDocx } = require("docx-edit");
      
      const doc = await loadDocx("./sample.docx");
      const tree = doc.toComponentTree();
      const body = tree.children.find((node) => node.type === "body");
      
      body.children[0].props.text = "这是更新后的第一段";
      
      await doc.patch(tree);
      await doc.saveAs("./sample.modified.docx");

      2. 插入一个新段落

      const { createVNode, loadDocx } = require("docx-edit");
      
      const doc = await loadDocx("./sample.docx");
      const tree = doc.toComponentTree();
      const body = tree.children.find((node) => node.type === "body");
      
      body.children.splice(
        1,
        0,
        createVNode({
          type: "paragraph",
          props: { text: "这是新插入的段落" },
          children: [],
        }),
      );
      
      await doc.patch(tree);
      await doc.saveAs("./sample.modified.docx");

      3. 删除一个段落

      const doc = await loadDocx("./sample.docx");
      const tree = doc.toComponentTree();
      const body = tree.children.find((node) => node.type === "body");
      
      body.children.splice(0, 1);
      
      await doc.patch(tree);

      4. 使用 key 做稳定重排

      如果你要频繁重排同层节点,建议设置 key

      const tree = doc.toComponentTree();
      const body = tree.children.find((node) => node.type === "body");
      
      body.children[0].key = "first";
      body.children[1].key = "second";
      body.children[2].key = "third";
      
      await doc.patch(tree);
      
      const nextTree = doc.toComponentTree();
      const nextBody = nextTree.children.find((node) => node.type === "body");
      nextBody.children = [nextBody.children[2], nextBody.children[0], nextBody.children[1]];
      
      await doc.patch(nextTree);

      5. 修改段落样式

      const tree = doc.toComponentTree();
      const body = tree.children.find((node) => node.type === "body");
      
      body.children[0].props.style = {
        styleId: "BodyText",
        alignment: "center",
        spacing: {
          before: "120",
          after: "240",
        },
      };
      
      await doc.patch(tree);

      6. 修改 run 样式

      const tree = doc.toComponentTree();
      const body = tree.children.find((node) => node.type === "body");
      const firstRun = body.children[0].children[0];
      
      firstRun.props.style = {
        bold: true,
        italic: true,
        color: "FF0000",
        underline: "single",
      };
      
      await doc.patch(tree);

      7. 在组件之间迁移样式

      const tree = doc.toComponentTree();
      const body = tree.children.find((node) => node.type === "body");
      
      const sourceParagraphStyle = body.children[0].props.style;
      body.children[1].props.style = sourceParagraphStyle;
      
      const sourceRunStyle = body.children[0].children[0].props.style;
      body.children[1].children[0].props.style = sourceRunStyle;
      
      await doc.patch(tree);

      8. 修改页眉、页脚、批注、文本框

      const tree = doc.toComponentTree();
      
      const header = tree.children.find((node) => node.type === "header");
      const footer = tree.children.find((node) => node.type === "footer");
      const comments = tree.children.find((node) => node.type === "comments");
      
      header.children[0].props.text = "新的页眉";
      footer.children[0].props.text = "新的页脚";
      comments.children[0].children[0].props.text = "新的批注内容";
      
      await doc.patch(tree);

      createVNode() 说明

      createVNode() 用来手动创建新节点。

      const node = createVNode({
        type: "paragraph",
        key: "intro",
        props: { text: "介绍段落" },
        children: [],
      });

      参数说明:

      • type: 节点类型
      • key: 可选,同层稳定重排时推荐提供
      • props: 节点属性
      • children: 子节点数组

      注意:

      • 根节点必须是 document
      • patch 时必须保持已有 part 不变,不能随意删除 body/header/footer/comments 这些 part 根
      • 新增节点时,要符合当前支持的父子关系
      • 样式修改建议直接写到 paragraph.props.style 或 run.props.style
      Logo

      AtomGit 是由开放原子开源基金会联合 CSDN 等生态伙伴共同推出的新一代开源与人工智能协作平台。平台坚持“开放、中立、公益”的理念,把代码托管、模型共享、数据集托管、智能体开发体验和算力服务整合在一起,为开发者提供从开发、训练到部署的一站式体验。

      更多推荐