由于项目的需求,需要对大量的word文档进行处理。

查找了大量的文档发现很多的博客对这个进行了介绍,主要有2种方案做处理,jacob 和poi。但是现在的服务器基本上是部署在Linux上,所以jacob基本上是不可行的。所以呢,主要是使用poi来进行这些操作。

       Apache poi的hwpf模块是专门用来对word doc文件进行读写操作的。在hwpf里面我们使用HWPFDocument来表示一个word doc文档。在HWPFDocument里面有这么几个概念:
 Range:它表示一个范围,这个范围可以是整个文档,也可以是里面的某一小节(Section),也可以是某一个段落(Paragraph),还可以是拥有共同属性的一段文本(CharacterRun)。

 Section:word文档的一个小节,一个word文档可以由多个小节构成。

 Paragraph:word文档的一个段落,一个小节可以由多个段落构成。

 CharacterRun:具有相同属性的一段文本,一个段落可以由多个CharacterRun组成。
 
Table:一个表格。
TableRow:表格对应的行。
TableCell:表格对应的单元格。
Section、Paragraph、CharacterRun和Table都继承自Range。

1、基本的替换方法

        InputStream inputStream = new FileInputStream(modulePath);
        HWPFDocument document = new HWPFDocument(inputStream);
        Range range = document.getRange();
        for (Map.Entry<String, String> entry : maps.entrySet()) {
            range.replaceText("@" + entry.getKey() + "@", entry.getValue());

        }
        OutputStream outputStream = new FileOutputStream(outPath);
        document.write(outputStream);
        this.closeStream(outputStream);
        this.closeStream(inputStream);

这些在网上已经有很普遍的使用了,但是这些基本上是基于3.9poi进行使用的,目前poi的版本已经更新到了3.17了,而且后续的就不会对Java6的支持了,最低支持Java8的,所以我们要使用3.17来进行对word进行文本的替换,书签的操作。

我们这里主要使用了两个类。(这两个类主要是参考http://www.jb51.net/article/101910.htm)中的dome的fang

BookMarkWord 文件中标签的封装类,保存了其定义和内部的操作

package com;
import java.util.List;
import java.util.Stack;

import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;
import org.apache.poi.xwpf.usermodel.XWPFTable;
import org.apache.poi.xwpf.usermodel.XWPFTableCell;
import org.apache.poi.xwpf.usermodel.XWPFTableRow;
import org.apache.xmlbeans.XmlException;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTBookmark;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTText;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;

/**
 *
 * Word 文件中标签的封装类,保存了其定义和内部的操作
 *
 * @author
 *
 * <p>Modification History:</p>
 * <p>Date       Author      Description</p>
 * <p>------------------------------------------------------------------</p>
 * <p> </p>
 * <p>  </p>
 */
public class BookMark {

    //以下为定义的常量

    /** 替换标签时,设于标签的后面   **/
    public static final int INSERT_AFTER = 0;

    /** 替换标签时,设于标签的前面   **/
    public static final int INSERT_BEFORE = 1;

    /** 替换标签时,将内容替换书签   **/
    public static final int REPLACE = 2;

    /** docx中定义的部分常量引用  **/
    public static final String RUN_NODE_NAME = "w:r";
    public static final String TEXT_NODE_NAME = "w:t";
    public static final String BOOKMARK_START_TAG = "bookmarkStart";
    public static final String BOOKMARK_END_TAG = "bookmarkEnd";
    public static final String BOOKMARK_ID_ATTR_NAME = "w:id";
    public static final String STYLE_NODE_NAME = "w:rPr";

    /** 内部的标签定义类  **/
    private CTBookmark _ctBookmark = null;

    /** 标签所处的段落  **/
    private XWPFParagraph _para = null;

    /** 标签所在的表cell对象  **/
    private XWPFTableCell _tableCell = null;

    /** 标签名称 **/
    private String _bookmarkName = null;

    /** 该标签是否处于表格内  **/
    private boolean _isCell = false;

    /**
     * 构造函数
     * @param ctBookmark
     * @param para
     */
    public BookMark(CTBookmark ctBookmark, XWPFParagraph para) {
        this._ctBookmark = ctBookmark;
        this._para = para;
        this._bookmarkName = ctBookmark.getName();
        this._tableCell = null;
        this._isCell = false;
    }

    /**
     * 构造函数,用于表格中的标签
     * @param ctBookmark
     * @param para
     * @param tableCell
     */
    public BookMark(CTBookmark ctBookmark, XWPFParagraph para, XWPFTableCell tableCell) {
        this(ctBookmark, para);
        this._tableCell = tableCell;
        this._isCell = true;
    }

    public boolean isInTable() {
        return this._isCell;
    }

    public XWPFTable getContainerTable() {
        return this._tableCell.getTableRow().getTable();
    }

    public XWPFTableRow getContainerTableRow() {
        return this._tableCell.getTableRow();
    }

    public String getBookmarkName() {
        return  this._bookmarkName;
    }

    /**
     * Insert text into the Word document in the location indicated by this
     * bookmark.
     *
     * @param bookmarkValue An instance of the String class that encapsulates
     * the text to insert into the document.
     * @param where A primitive int whose value indicates where the text ought
     * to be inserted. There are three options controlled by constants; insert
     * the text immediately in front of the bookmark (Bookmark.INSERT_BEFORE),
     * insert text immediately after the bookmark (Bookmark.INSERT_AFTER) and
     * replace any and all text that appears between the bookmark's square
     * brackets (Bookmark.REPLACE).
     */
    public void insertTextAtBookMark(String bookmarkValue, int where) {

        //根据标签的类型,进行不同的操作
        if(this._isCell) {
            this.handleBookmarkedCells(bookmarkValue, where);
        } else {

            //普通标签,直接创建一个元素
            XWPFRun run = this._para.createRun();
            run.setText(bookmarkValue);
            switch(where) {
                case BookMark.INSERT_AFTER:
                    this.insertAfterBookmark(run);
                    break;
                case BookMark.INSERT_BEFORE:
                    this.insertBeforeBookmark(run);
                    break;
                case BookMark.REPLACE:
                    this.replaceBookmark(run);
                    break;
            }
        }
    }

    /**
     * Inserts some text into a Word document in a position that is immediately 
     * after a named bookmark. 
     *
     * Bookmarks can take two forms, they can either simply mark a location 
     * within a document or they can do this but contain some text. The 
     * difference is obvious from looking at some XML markup. The simple 
     * placeholder bookmark will look like this; 
     *
     * <pre>
     *
     * <w:bookmarkStart w:name="AllAlone" w:id="0"/><w:bookmarkEnd w:id="0"/>
     *
     * </pre>
     *
     * Simply a pair of tags where one tag has the name bookmarkStart, the other 
     * the name bookmarkEnd and both share matching id attributes. In this case, 
     * the text will simply be inserted into the document at a point immediately 
     * after the bookmarkEnd tag. No styling will be applied to the text, it 
     * will simply inherit the documents defaults. 
     *
     * The more complex case looks like this; 
     *
     * <pre>
     *
     * <w:bookmarkStart w:name="InStyledText" w:id="3"/>
     *   <w:r w:rsidRPr="00DA438C">
     *     <w:rPr>
     *       <w:rFonts w:hAnsi="Engravers MT" w:ascii="Engravers MT" w:cs="Arimo"/>
     *       <w:color w:val="FF0000"/>
     *     </w:rPr>
     *     <w:t>text</w:t>
     *   </w:r>
     * <w:bookmarkEnd w:id="3"/>
     *
     * </pre>
     *
     * Here, the user has selected the word 'text' and chosen to insert a 
     * bookmark into the document at that point. So, the bookmark tags 'contain' 
     * a character run that is styled. Inserting any text after this bookmark, 
     * it is important to ensure that the styling is preserved and copied over 
     * to the newly inserted text. 
     *
     * The approach taken to dealing with both cases is similar but slightly 
     * different. In both cases, the code simply steps along the document nodes 
     * until it finds the bookmarkEnd tag whose ID matches that of the 
     * bookmarkStart tag. Then, it will look to see if there is one further node 
     * following the bookmarkEnd tag. If there is, it will insert the text into 
     * the paragraph immediately in front of this node. If, on the other hand, 
     * there are no more nodes following the bookmarkEnd tag, then the new run 
     * will simply be positioned at the end of the paragraph. 
     *
     * Styles are dealt with by 'looking' for a 'w:rPr' element whilst iterating 
     * through the nodes. If one is found, its details will be captured and 
     * applied to the run before the run is inserted into the paragraph. If 
     * there are multiple runs between the bookmarkStart and bookmarkEnd tags 
     * and these have different styles applied to them, then the style applied 
     * to the last run before the bookmarkEnd tag - if any - will be cloned and 
     * applied to the newly inserted text. 
     *
     * @param run An instance of the XWPFRun class that encapsulates the text 
     * that is to be inserted into the document following the bookmark. 
     */
    private void insertAfterBookmark(XWPFRun run) {
        Node nextNode = null;
        Node insertBeforeNode = null;
        Node styleNode = null;
        int bookmarkStartID = 0;
        int bookmarkEndID = -1;

        // Capture the id of the bookmarkStart tag. The code will step through 
        // the document nodes 'contained' within the start and end tags that have 
        // matching id numbers. 
        bookmarkStartID = this._ctBookmark.getId().intValue();

        // Get the node for the bookmark start tag and then enter a loop that 
        // will step from one node to the next until the bookmarkEnd tag with 
        // a matching id is fouind. 
        nextNode = this._ctBookmark.getDomNode();
        while (bookmarkStartID != bookmarkEndID) {

            // Get the next node along and check to see if it is a bookmarkEnd 
            // tag. If it is, get its id so that the containing while loop can 
            // be terminated once the correct end tag is found. Note that the 
            // id will be obtained as a String and must be converted into an 
            // integer. This has been coded to fail safely so that if an error 
            // is encuntered converting the id to an int value, the while loop 
            // will still terminate. 
            nextNode = nextNode.getNextSibling();
            if (nextNode.getNodeName().contains(BookMark.BOOKMARK_END_TAG)) {
                try {
                    bookmarkEndID = Integer.parseInt(
                            nextNode.getAttributes().getNamedItem(
                                    BookMark.BOOKMARK_ID_ATTR_NAME).getNodeValue());
                } catch (NumberFormatException nfe) {
                    bookmarkEndID = bookmarkStartID;
                }
            } // If we are not dealing with a bookmarkEnd node, are we dealing 
            // with a run node that MAY contains styling information. If so, 
            // then get that style information from the run. 
            else {
                if (nextNode.getNodeName().equals(BookMark.RUN_NODE_NAME)) {
                    styleNode = this.getStyleNode(nextNode);
                }
            }
        }

        // After the while loop completes, it should have located the correct 
        // bookmarkEnd tag but we cannot perform an insert after only an insert 
        // before operation and must, therefore, get the next node. 
        insertBeforeNode = nextNode.getNextSibling();

        // Style the newly inserted text. Note that the code copies or clones 
        // the style it found in another run, failure to do this would remove the 
        // style from one node and apply it to another. 
        if (styleNode != null) {
            run.getCTR().getDomNode().insertBefore(
                    styleNode.cloneNode(true), run.getCTR().getDomNode().getFirstChild());
        }

        // Finally, check to see if there was a node after the bookmarkEnd 
        // tag. If there was, then this code will insert the run in front of 
        // that tag. If there was no node following the bookmarkEnd tag then the 
        // run will be inserted at the end of the paragarph and this was taken 
        // care of at the point of creation. 
        if (insertBeforeNode != null) {
            this._para.getCTP().getDomNode().insertBefore(
                    run.getCTR().getDomNode(), insertBeforeNode);
        }
    }

    /**
     * Inserts some text into a Word document immediately in front of the 
     * location of a bookmark. 
     *
     * This case is slightly more straightforward than inserting after the 
     * bookmark. For example, it is possible only to insert a new node in front 
     * of an existing node. When inserting after the bookmark, then end node had 
     * to be located whereas, in this case, the node is already known, it is the 
     * CTBookmark itself. The only information that must be discovered is 
     * whether there is a run immediately in front of the boookmarkStart tag and 
     * whether that run is styled. If there is and if it is, then this style 
     * must be cloned and applied the text which will be inserted into the 
     * paragraph. 
     *
     * @param run An instance of the XWPFRun class that encapsulates the text 
     * that is to be inserted into the document following the bookmark. 
     */
    private void insertBeforeBookmark(XWPFRun run) {
        Node insertBeforeNode = null;
        Node childNode = null;
        Node styleNode = null;

        // Get the dom node from the bookmarkStart tag and look for another 
        // node immediately preceding it. 
        insertBeforeNode = this._ctBookmark.getDomNode();
        childNode = insertBeforeNode.getPreviousSibling();

        // If a node is found, try to get the styling from it. 
        if (childNode != null) {
            styleNode = this.getStyleNode(childNode);

            // If that previous node was styled, then apply this style to the 
            // text which will be inserted. 
            if (styleNode != null) {
                run.getCTR().getDomNode().insertBefore(
                        styleNode.cloneNode(true), run.getCTR().getDomNode().getFirstChild());
            }
        }

        // Insert the text into the paragraph immediately in front of the 
        // bookmarkStart tag. 
        this._para.getCTP().getDomNode().insertBefore(
                run.getCTR().getDomNode(), insertBeforeNode);
    }

    /**
     * Replace the text - if any - contained between the bookmarkStart and it's 
     * matching bookmarkEnd tag with the text specified. The technique used will 
     * resemble that employed when inserting text after the bookmark. In short, 
     * the code will iterate along the nodes until it encounters a matching 
     * bookmarkEnd tag. Each node encountered will be deleted unless it is the 
     * final node before the bookmarkEnd tag is encountered and it is a 
     * character run. If this is the case, then it can simply be updated to 
     * contain the text the users wishes to see inserted into the document. If 
     * the last node is not a character run, then it will be deleted, a new run 
     * will be created and inserted into the paragraph between the bookmarkStart 
     * and bookmarkEnd tags. 
     *
     * @param run An instance of the XWPFRun class that encapsulates the text 
     * that is to be inserted into the document following the bookmark. 
     */
    private void replaceBookmark(XWPFRun run) {
        Node nextNode = null;
        Node styleNode = null;
        Node lastRunNode = null;
        Node toDelete = null;
        NodeList childNodes = null;
        Stack<Node> nodeStack = null;
        boolean textNodeFound = false;
        boolean foundNested = true;
        int bookmarkStartID = 0;
        int bookmarkEndID = -1;
        int numChildNodes = 0;

        nodeStack = new Stack<Node>();
        bookmarkStartID = this._ctBookmark.getId().intValue();
        nextNode = this._ctBookmark.getDomNode();
        nodeStack.push(nextNode);

        // Loop through the nodes looking for a matching bookmarkEnd tag 
        while (bookmarkStartID != bookmarkEndID) {
            nextNode = nextNode.getNextSibling();
            nodeStack.push(nextNode);

            // If an end tag is found, does it match the start tag? If so, end 
            // the while loop. 
            if (nextNode.getNodeName().contains(BookMark.BOOKMARK_END_TAG)) {
                try {
                    bookmarkEndID = Integer.parseInt(
                            nextNode.getAttributes().getNamedItem(
                                    BookMark.BOOKMARK_ID_ATTR_NAME).getNodeValue());
                } catch (NumberFormatException nfe) {
                    bookmarkEndID = bookmarkStartID;
                }
            }
            //else { 
            // Place a reference to the node on the nodeStack
            //    nodeStack.push(nextNode); 
            //} 
        }

        // If the stack of nodes found between the bookmark tags is not empty 
        // then they have to be removed. 
        if (!nodeStack.isEmpty()) {

            // Check the node at the top of the stack. If it is a run, get it's 
            // style - if any - and apply to the run that will be replacing it. 
            //lastRunNode = nodeStack.pop(); 
            lastRunNode = nodeStack.peek();

            if ((lastRunNode.getNodeName().equals(BookMark.RUN_NODE_NAME))) {
                styleNode = this.getStyleNode(lastRunNode);
                if (styleNode != null) {
                    run.getCTR().getDomNode().insertBefore(
                            styleNode.cloneNode(true), run.getCTR().getDomNode().getFirstChild());
                }
            }

            // Delete any and all node that were found in between the start and 
            // end tags. This is slightly safer that trying to delete the nodes 
            // as they are found while stepping through them in the loop above. 

            // If we are peeking, then this line can be commented out.             
            //this._para.getCTP().getDomNode().removeChild(lastRunNode); 
            this.deleteChildNodes(nodeStack);
        }

        // Place the text into position, between the bookmark tags. 
        this._para.getCTP().getDomNode().insertBefore(
                run.getCTR().getDomNode(), nextNode);
    }

    /**
     * When replacing the bookmark's text, it is necessary to delete any nodes 
     * that are found between matching start and end tags. Complications occur 
     * here because it is possible to have bookmarks nested within bookmarks to 
     * almost any level and it is important to not remove any inner or nested 
     * bookmarks when replacing the contents of an outer or containing 
     * bookmark. This code successfully handles the simplest occurrence - where 
     * one bookmark completely contains another - but not more complex cases 
     * where one bookmark overlaps another in the markup. That is still to do. 
     *
     * @param nodeStack An instance of the Stack class that encapsulates 
     * references to any and all nodes found between the opening and closing 
     * tags of a bookmark. 
     */
    private void deleteChildNodes(Stack<Node> nodeStack) {
        Node toDelete = null;
        int bookmarkStartID = 0;
        int bookmarkEndID = 0;
        boolean inNestedBookmark = false;

        // The first element in the list will be a bookmarkStart tag and that 
        // must not be deleted. 
        for(int i = 1; i < nodeStack.size(); i++) {

            // Get an element. If it is another bookmarkStart tag then 
            // again, we do not want to delete it, it's matching end tag 
            // or any nodes that fall inbetween. 
            toDelete = nodeStack.elementAt(i);
            if(toDelete.getNodeName().contains(BookMark.BOOKMARK_START_TAG)) {
                bookmarkStartID = Integer.parseInt(
                        toDelete.getAttributes().getNamedItem(BookMark.BOOKMARK_ID_ATTR_NAME).getNodeValue());
                inNestedBookmark = true;
            }
            else if(toDelete.getNodeName().contains(BookMark.BOOKMARK_END_TAG)) {
                bookmarkEndID = Integer.parseInt(
                        toDelete.getAttributes().getNamedItem(BookMark.BOOKMARK_ID_ATTR_NAME).getNodeValue());
                if(bookmarkEndID == bookmarkStartID) {
                    inNestedBookmark = false;
                }
            }
            else {
                if(!inNestedBookmark) {
                    this._para.getCTP().getDomNode().removeChild(toDelete);
                }
            }
        }
    }

    /**
     * Recover styling information - if any - from another document node. Note 
     * that it is only possible to accomplish this if the node is a run (w:r) 
     * and this could be tested for in the code that calls this method. However, 
     * a check is made in the calling code as to whether a style has been found 
     * and only if a style is found is it applied. This method always returns 
     * null if it does not find a style making that checking process easier. 
     *
     * @param parentNode An instance of the Node class that encapsulates a 
     * reference to a document node. 
     * @return An instance of the Node class that encapsulates the styling 
     * information applied to a character run. Note that if no styling 
     * information is found in the run OR if the node passed as an argument to 
     * the parentNode parameter is NOT a run, then a null value will be 
     * returned. 
     */
    private Node getStyleNode(Node parentNode) {
        Node childNode = null;
        Node styleNode = null;
        if (parentNode != null) {

            // If the node represents a run and it has child nodes then 
            // it can be processed further. Note, whilst testing the code, it 
            // was observed that although it is possible to get a list of a nodes 
            // children, even when a node did have children, trying to obtain this 
            // list would often return a null value. This is the reason why the 
            // technique of stepping from one node to the next is used here. 
            if (parentNode.getNodeName().equalsIgnoreCase(BookMark.RUN_NODE_NAME)
                    && parentNode.hasChildNodes()) {

                // Get the first node and catch it's reference for return if 
                // the first child node is a style node (w:rPr). 
                childNode = parentNode.getFirstChild();
                if (childNode.getNodeName().equals("w:rPr")) {
                    styleNode = childNode;
                } else {
                    // If the first node was not a style node and there are other 
                    // child nodes remaining to be checked, then step through 
                    // the remaining child nodes until either a style node is 
                    // found or until all child nodes have been processed. 
                    while ((childNode = childNode.getNextSibling()) != null) {
                        if (childNode.getNodeName().equals(BookMark.STYLE_NODE_NAME)) {
                            styleNode = childNode;
                            // Note setting to null here if a style node is 
                            // found in order order to terminate any further 
                            // checking 
                            childNode = null;
                        }
                    }
                }
            }
        }
        return (styleNode);
    }

    /**
     * Get the text - if any - encapsulated by this bookmark. The creator of a 
     * Word document can chose to select one or more items of text and then 
     * insert a bookmark at that location. The highlighted text will appear 
     * between the square brackets that denote the location of a bookmark in the 
     * document's text and they will be returned by a call to this method. 
     *
     * @return An instance of the String class encapsulating any text that 
     * appeared between the opening and closing square bracket associated with 
     * this bookmark. 
     * @throws XmlException Thrown if a problem is encountered parsing the XML 
     * markup recovered from the document in order to construct a CTText 
     * instance which may required to obtain the bookmarks text. 
     */
    public String getBookmarkText() throws XmlException {
        StringBuilder builder = null;
        // Are we dealing with a bookmarked table cell? If so, the entire 
        // contents of the cell - if anything - must be recovered and returned. 
        if(this._tableCell != null) {
            builder = new StringBuilder(this._tableCell.getText());
        }
        else {
            builder = this.getTextFromBookmark();
        }
        return(builder == null ? null : builder.toString());
    }

    /**
     * There are two types of bookmarks. One is a simple placeholder whilst the 
     * second is still a placeholder but it 'contains' some text. In the second 
     * instance, the creator of the document has selected some text and then 
     * chosen to insert a bookmark there and the difference if obvious when 
     * looking at the XML markup. 
     *
     * The simple case; 
     *
     * <pre>
     *
     * <w:bookmarkStart w:name="AllAlone" w:id="0"/><w:bookmarkEnd w:id="0"/>
     *
     * </pre>
     *
     * The more complex case; 
     *
     * <pre>
     *
     * <w:bookmarkStart w:name="InStyledText" w:id="3"/>
     *   <w:r w:rsidRPr="00DA438C">
     *     <w:rPr>
     *       <w:rFonts w:hAnsi="Engravers MT" w:ascii="Engravers MT" w:cs="Arimo"/>
     *       <w:color w:val="FF0000"/>
     *     </w:rPr>
     *     <w:t>text</w:t>
     *   </w:r>
     * <w:bookmarkEnd w:id="3"/>
     *
     * </pre>
     *
     * This method assumes that the user wishes to recover the content from any 
     * character run that appears in the markup between a matching pair of 
     * bookmarkStart and bookmarkEnd tags; thus, using the example above again, 
     * this method would return the String 'text' to the user. It is possible 
     * however for a bookmark to contain more than one run and for a bookmark to 
     * contain other bookmarks. In both of these cases, this code will return 
     * the text contained within any and all runs that appear in the XML markup 
     * between matching bookmarkStart and bookmarkEnd tags. The term 'matching 
     * bookmarkStart and bookmarkEndtags' here means tags whose id attributes 
     * have matching value. 
     *
     * @return An instance of the StringBuilder class encapsulating the text 
     * recovered from any character run elements found between the bookmark's 
     * start and end tags. If no text is found then a null value will be 
     * returned. 
     * @throws XmlException Thrown if a problem is encountered parsing the XML 
     * markup recovered from the document in order to construct a CTText 
     * instance which may be required to obtain the bookmarks text. 
     */
    private StringBuilder getTextFromBookmark() throws XmlException {
        int startBookmarkID = 0;
        int endBookmarkID = -1;
        Node nextNode = null;
        Node childNode = null;
        CTText text = null;
        StringBuilder builder = null;
        String rawXML = null;

        // Get the ID of the bookmark from it's start tag, the DOM node from the 
        // bookmark (to make looping easier) and initialise the StringBuilder. 
        startBookmarkID = this._ctBookmark.getId().intValue();
        nextNode = this._ctBookmark.getDomNode();
        builder = new StringBuilder();

        // Loop through the nodes held between the bookmark's start and end 
        // tags. 
        while (startBookmarkID != endBookmarkID) {

            // Get the next node and, if it is a bookmarkEnd tag, get it's ID 
            // as matching ids will terminate the while loop.. 
            nextNode = nextNode.getNextSibling();
            if (nextNode.getNodeName().contains(BookMark.BOOKMARK_END_TAG)) {

                // Get the ID attribute from the node. It is a String that must 
                // be converted into an int. An exception could be thrown and so 
                // the catch clause will ensure the loop ends neatly even if the 
                // value might be incorrect. Must inform the user. 
                try {
                    endBookmarkID = Integer.parseInt(
                            nextNode.getAttributes().
                                    getNamedItem(BookMark.BOOKMARK_ID_ATTR_NAME).getNodeValue());
                } catch (NumberFormatException nfe) {
                    endBookmarkID = startBookmarkID;
                }
            } else {
                // This is not a bookmarkEnd node and can processed it for any 
                // text it may contain. Note the check for both type - it must 
                // be a run - and contain children. Interestingly, it seems as 
                // though the node may contain children and yet the call to 
                // nextNode.getChildNodes() will still return an empty list, 
                // hence the need to step through the child nodes. 
                if (nextNode.getNodeName().equals(BookMark.RUN_NODE_NAME)
                        && nextNode.hasChildNodes()) {
                    // Get the text from the child nodes. 
                    builder.append(this.getTextFromChildNodes(nextNode));
                }
            }
        }
        return (builder);
    }

    /**
     * Iterates through all and any children of the Node whose reference will be 
     * passed as an argument to the node parameter, and recover the contents of 
     * any text nodes. Testing revealed that a node can be called a text node 
     * and yet report it's type as being something different, an element node 
     * for example. Calling the getNodeValue() method on a text node will return 
     * the text the node encapsulates but doing the same on an element node will 
     * not. In fact, the call will simply return a null value. As a result, this 
     * method will test the nodes name to catch all text nodes - those whose 
     * name is to 'w:t' and then it's type. If the type is reported to be a text 
     * node, it is a trivial task to get at it's contents. However, if the type 
     * is not reported as a text type, then it is necessary to parse the raw XML 
     * markup for the node to recover it's value. 
     *
     * @param node An instance of the Node class that encapsulates a reference 
     * to a node recovered from the document being processed. It should be 
     * passed a reference to a character run - 'w:r' - node. 
     * @return An instance of the String class that encapsulates the text 
     * recovered from the nodes children, if they are text nodes. 
     * @throws XmlException Thrown if a problem is encountered parsing the XML 
     * markup recovered from the document in order to construct the CTText 
     * instance which may be required to obtain the bookmarks text. 
     */
    private String getTextFromChildNodes(Node node) throws XmlException {
        NodeList childNodes = null;
        Node childNode = null;
        CTText text = null;
        StringBuilder builder = new StringBuilder();
        int numChildNodes = 0;

        // Get a list of chid nodes from the node passed to the method and 
        // find out how many children there are in the list. 
        childNodes = node.getChildNodes();
        numChildNodes = childNodes.getLength();

        // Iterate through the children one at a time - it is possible for a 
        // run to ciontain zero, one or more text nodes - and recover the text 
        // from an text type child nodes. 
        for (int i = 0; i < numChildNodes; i++) {

            // Get a node and check it's name. If this is 'w:t' then process as 
            // text type node. 
            childNode = childNodes.item(i);

            if (childNode.getNodeName().equals(BookMark.TEXT_NODE_NAME)) {

                // If the node reports it's type as txet, then simply call the 
                // getNodeValue() method to get at it's text. 
                if (childNode.getNodeType() == Node.TEXT_NODE) {
                    builder.append(childNode.getNodeValue());
                } else {
                    // Correct the type by parsing the node's XML markup and 
                    // creating a CTText object. Call the getStringValue() 
                    // method on that to get the text. 
                    text = CTText.Factory.parse(childNode);
                    builder.append(text.getStringValue());
                }
            }
        }
        return (builder.toString());
    }

    private void handleBookmarkedCells(String bookmarkValue, int where) {
        List<XWPFParagraph> paraList = null;
        List<XWPFRun> runs = null;
        XWPFParagraph para = null;
        XWPFRun readRun = null;
        // Get a list if paragraphs from the table cell and remove any and all. 
        paraList = this._tableCell.getParagraphs();
        for(int i = 0; i < paraList.size(); i++) {
            this._tableCell.removeParagraph(i);
        }
        para = this._tableCell.addParagraph();
        para.createRun().setText(bookmarkValue);
    }
} 
BookMarks:    利用POI进行Word文件相关的操作,针对docx形式的封装
package com;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFTable;
import org.apache.poi.xwpf.usermodel.XWPFTableCell;
import org.apache.poi.xwpf.usermodel.XWPFTableRow;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTBookmark;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Collection;
import java.util.Set;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;

/**
 *
 * 利用POI进行Word文件相关的操作,针对docx形式的封装
 *
 * @author
 *
 * <p>Modification History:</p>
 * <p>Date       Author      Description</p>
 * <p>------------------------------------------------------------------</p>
 * <p> </p>
 * <p>  </p>
 */
public class BookMarks {

    /** 保存Word文件中定义的标签  **/
    private HashMap<String, BookMark> _bookmarks = null;

    /**
     * 构造函数,用以分析文档,解析出所有的标签
     *
     * @param document  Word OOXML document instance.
     */
    public BookMarks(XWPFDocument document) {

        //初始化标签缓存
        this._bookmarks = new HashMap<String, BookMark>();

        // 首先解析文档普通段落中的标签
        this.procParaList(document.getParagraphs());

        //利用繁琐的方法,从所有的表格中得到得到标签,处理比较原始和简单
        List<XWPFTable> tableList = document.getTables();

        for (XWPFTable table : tableList) {
            //得到表格的列信息
            List<XWPFTableRow> rowList = table.getRows();
            for (XWPFTableRow row : rowList){
                //得到行中的列信息
                List<XWPFTableCell> cellList = row.getTableCells();
                for (XWPFTableCell cell : cellList) {
                    //逐个解析标签信息
                    //this.procParaList(cell.getParagraphs(), row);
                    this.procParaList(cell);
                }
            }
        }
    }


    /**
     * 根据标签名称,获得标签的相关定义,如果不存在,则返回空
     * @param bookmarkName   标签名称
     * @return    返回封装好的对象
     */
    public BookMark getBookmark(String bookmarkName) {
        BookMark bookmark = null;
        if(this._bookmarks.containsKey(bookmarkName)) {
            bookmark = this._bookmarks.get(bookmarkName);
        }
        return   bookmark;
    }

    /**
     * 得到所有的标签信息集合
     *
     * @return 缓存的标签信息集合
     */
    public Collection<BookMark> getBookmarkList() {
        return(this._bookmarks.values());
    }

    /**
     * 返回文档中的标签名称迭代器
     * @return  由Map KEY 转换的迭代器
     */
    public Iterator<String> getNameIterator() {
        return(this._bookmarks.keySet().iterator());
    }


    private void procParaList(XWPFTableCell cell){
        List<XWPFParagraph> paragraphList = cell.getParagraphs();

        for(XWPFParagraph paragraph : paragraphList){
            //得到段落中的标签标记
            List<CTBookmark> bookmarkList = paragraph.getCTP().getBookmarkStartList();
            for (CTBookmark bookmark : bookmarkList ) {
                this._bookmarks.put(bookmark.getName(),
                        new BookMark(bookmark, paragraph, cell));
            }
        }
    }
    /**
     * 解析表格中的标签
     * @param paragraphList   传入的段落列表
     * @param tableRow   对应的表格行对象
     */
    private void procParaList(List<XWPFParagraph> paragraphList, XWPFTableRow tableRow) {

        NamedNodeMap attributes = null;
        Node colFirstNode = null;
        Node colLastNode = null;
        int firstColIndex = 0;
        int lastColIndex = 0;

        //循环判断,解析段落中的标签
        for (XWPFParagraph paragraph : paragraphList) {
            //得到段落中的标签标记
            List<CTBookmark> bookmarkList = paragraph.getCTP().getBookmarkStartList();

            for (CTBookmark bookmark : bookmarkList ) {
                // With a bookmark in hand, test to see if the bookmarkStart tag
                // has w:colFirst or w:colLast attributes. If it does, we are
                // dealing with a bookmarked table cell. This will need to be
                // handled differnetly - I think by an different concrete class
                // that implements the Bookmark interface!!
                attributes = bookmark.getDomNode().getAttributes();
                if(attributes != null) {

                    // Get the colFirst and colLast attributes. If both - for
                    // now - are found, then we are dealing with a bookmarked
                    // cell.
                    colFirstNode = attributes.getNamedItem("w:colFirst");
                    colLastNode = attributes.getNamedItem("w:colLast");
                    if(colFirstNode != null && colLastNode != null) {

                        // Get the index of the cell (or cells later) from them.
                        // First convefrt the String values both return to primitive
                        // int value. TO DO, what happens if there is a
                        // NumberFormatException.
                        firstColIndex = Integer.parseInt(colFirstNode.getNodeValue());
                        lastColIndex = Integer.parseInt(colLastNode.getNodeValue());
                        // if the indices are equal, then we are dealing with a#
                        // cell and can create the bookmark for it.
                        if(firstColIndex == lastColIndex) {
                            this._bookmarks.put(bookmark.getName(),
                                    new BookMark(bookmark, paragraph,
                                            tableRow.getCell(firstColIndex)));
                        }
                        else {
                            System.out.println("This bookmark " + bookmark.getName() +
                                    " identifies a number of cells in the "
                                    + "table. That condition is not handled yet.");
                        }
                    }
                    else {
                        this._bookmarks.put(bookmark.getName(),
                                new BookMark(bookmark, paragraph,tableRow.getCell(1)));
                    }
                }
                else {
                    this._bookmarks.put(bookmark.getName(),
                            new BookMark(bookmark, paragraph,tableRow.getCell(1)));
                }
            }
        }
    }

    /**
     * 解析普通段落中的标签
     * @param paragraphList  传入的段落
     */
    private void procParaList(List<XWPFParagraph> paragraphList) {

        for (XWPFParagraph paragraph : paragraphList) {
            List<CTBookmark>  bookmarkList = paragraph.getCTP().getBookmarkStartList();
            //循环加入标签
            for (CTBookmark bookmark : bookmarkList) {
                this._bookmarks.put(bookmark.getName(),
                        new BookMark(bookmark, paragraph));
            }
        }
    }
} 

使用的工具类:MSWordTool

package com;

import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.math.BigInteger;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;

import org.apache.poi.POIXMLDocument;
import org.apache.poi.xwpf.usermodel.ParagraphAlignment;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;
import org.apache.poi.xwpf.usermodel.XWPFTable;
import org.apache.poi.xwpf.usermodel.XWPFTableCell;
import org.apache.poi.xwpf.usermodel.XWPFTableRow;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTHeight;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTRow;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTTrPr;
import org.w3c.dom.Node;

/**
 * 使用POI,进行Word相关的操作
 *
 *
 * @author    xuyu
 *
 * <p>Modification History:</p>
 * <p>Date       Author      Description</p>
 * <p>------------------------------------------------------------------</p>
 * <p> </p>
 * <p>  </p>
 */
public class MSWordTool {

	/** 内部使用的文档对象 **/
	private XWPFDocument document;

	private BookMarks    bookMarks = null;

	/**
	 * 为文档设置模板
	 * @param templatePath  模板文件名称
	 */
	public void setTemplate(String templatePath) {
		try {
			this.document = new XWPFDocument(
					POIXMLDocument.openPackage(templatePath));

			bookMarks = new BookMarks(document);
		} catch (IOException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
	}


	/**
	 * 进行标签替换的例子,传入的Map中,key表示标签名称,value是替换的信息
	 * @param indicator
	 */
	public void  replaceBookMark(Map<String,String> indicator) {
		//循环进行替换
		Iterator<String> bookMarkIter = bookMarks.getNameIterator();
		while (bookMarkIter.hasNext()) {
			String bookMarkName = bookMarkIter.next();

			//得到标签名称
			BookMark bookMark = bookMarks.getBookmark(bookMarkName);

			//进行替换
			if (indicator.get(bookMarkName)!=null) {
				bookMark.insertTextAtBookMark(indicator.get(bookMarkName), BookMark.INSERT_BEFORE);
			}

		}

	}

	public void fillTableAtBookMark(String bookMarkName,List<Map<String,String>> content) {

		//rowNum来比较标签在表格的哪一行
		int rowNum = 0;

		//首先得到标签
		BookMark bookMark = bookMarks.getBookmark(bookMarkName);
		Map<String, String> columnMap = new HashMap<String, String>();
		Map<String, Node> styleNode = new HashMap<String, Node>();

		//标签是否处于表格内
		if(bookMark.isInTable()){

			//获得标签对应的Table对象和Row对象
			XWPFTable table = bookMark.getContainerTable();
			XWPFTableRow row = bookMark.getContainerTableRow();
			CTRow ctRow = row.getCtRow();
			List<XWPFTableCell> rowCell = row.getTableCells();
			for(int i = 0; i < rowCell.size(); i++){
				columnMap.put(i+"", rowCell.get(i).getText().trim());
				//System.out.println(rowCell.get(i).getParagraphs().get(0).createRun().getFontSize());
				//System.out.println(rowCell.get(i).getParagraphs().get(0).getCTP());
				//System.out.println(rowCell.get(i).getParagraphs().get(0).getStyle());

				//获取该单元格段落的xml,得到根节点
				Node node1 = rowCell.get(i).getParagraphs().get(0).getCTP().getDomNode();

				//遍历根节点的所有子节点
				for (int x=0;x<node1.getChildNodes().getLength();x++) {
					if (node1.getChildNodes().item(x).getNodeName().equals(BookMark.RUN_NODE_NAME)) {
						Node node2 = node1.getChildNodes().item(x);

						//遍历所有节点为"w:r"的所有自己点,找到节点名为"w:rPr"的节点
						for (int y=0;y<node2.getChildNodes().getLength();y++) {
							if(node2.getChildNodes().item(y).getNodeName().endsWith(BookMark.STYLE_NODE_NAME)){

								//将节点为"w:rPr"的节点(字体格式)存到HashMap中
								styleNode.put(i+"", node2.getChildNodes().item(y));
							}
						}
					} else {
						continue;
					}
				}
			}

			//循环对比,找到该行所处的位置,删除改行
			for(int i = 0; i < table.getNumberOfRows(); i++){
				if(table.getRow(i).equals(row)){
					rowNum = i;
					break;
				}
			}
			table.removeRow(rowNum);

			for(int i = 0; i < content.size(); i++){
				//创建新的一行,单元格数是表的第一行的单元格数,
				//后面添加数据时,要判断单元格数是否一致
				XWPFTableRow tableRow = table.createRow();
				CTTrPr trPr = tableRow.getCtRow().addNewTrPr();
				CTHeight ht = trPr.addNewTrHeight();
				ht.setVal(BigInteger.valueOf(360));
			}

			//得到表格行数
			int rcount = table.getNumberOfRows();
			for(int i = rowNum; i < rcount; i++){
				XWPFTableRow newRow = table.getRow(i);

				//判断newRow的单元格数是不是该书签所在行的单元格数
				if(newRow.getTableCells().size() != rowCell.size()){

					//计算newRow和书签所在行单元格数差的绝对值
					//如果newRow的单元格数多于书签所在行的单元格数,不能通过此方法来处理,可以通过表格中文本的替换来完成
					//如果newRow的单元格数少于书签所在行的单元格数,要将少的单元格补上
					int sub= Math.abs(newRow.getTableCells().size() - rowCell.size());
					//将缺少的单元格补上
					for(int j = 0;j < sub; j++){
						newRow.addNewTableCell();
					}
				}

				List<XWPFTableCell> cells = newRow.getTableCells();

				for(int j = 0; j < cells.size(); j++){
					XWPFParagraph para = cells.get(j).getParagraphs().get(0);
					XWPFRun run = para.createRun();
					if(content.get(i-rowNum).get(columnMap.get(j+"")) != null){

						//改变单元格的值,标题栏不用改变单元格的值
						run.setText(content.get(i-rowNum).get(columnMap.get(j+""))+"");

						//将单元格段落的字体格式设为原来单元格的字体格式
						run.getCTR().getDomNode().insertBefore(styleNode.get(j+"").cloneNode(true), run.getCTR().getDomNode().getFirstChild());
					}

					para.setAlignment(ParagraphAlignment.CENTER);
				}
			}
		}
	}

	public void replaceText(Map<String,String> bookmarkMap, String bookMarkName) {

		//首先得到标签
		BookMark bookMark = bookMarks.getBookmark(bookMarkName);
		//获得书签标记的表格
		XWPFTable table = bookMark.getContainerTable();
		//获得所有的表
		//Iterator<XWPFTable> it = document.getTablesIterator();

		if(table != null){
			//得到该表的所有行
			int rcount = table.getNumberOfRows();
			for(int i = 0 ;i < rcount; i++){
				XWPFTableRow row = table.getRow(i);

				//获到改行的所有单元格
				List<XWPFTableCell> cells = row.getTableCells();
				for(XWPFTableCell c : cells){
					for(Entry<String,String> e : bookmarkMap.entrySet()){
						if(c.getText().equals(e.getKey())){

							//删掉单元格内容
							c.removeParagraph(0);

							//给单元格赋值
							c.setText(e.getValue());
						}
					}
				}
			}
		}
	}

	public void saveAs() {
		File newFile = new File("e:\\test\\Word模版_REPLACE.docx");
		FileOutputStream fos = null;
		try {
			fos = new FileOutputStream(newFile);
		} catch (FileNotFoundException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
		try {
			this.document.write(fos);
			fos.flush();
			fos.close();
		} catch (IOException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
	}
}

测试方法

/**
	 * @param args
	 */
	public static void main(String[] args) {
		long startTime = System.currentTimeMillis();
		MSWordTool changer = new MSWordTool();
		changer.setTemplate("E:\\test\\Word.docx");
		Map<String,String> content = new HashMap<String,String>();
		content.put("Principles", "格式规范、标准统一、利于阅览");
		content.put("Purpose", "规范会议操作、提高会议质量");
		content.put("Scope", "公司会议、部门之间业务协调会议");

		content.put("customerName", "**有限公司");
		content.put("address", "机场路2号");
		content.put("userNo", "3021170207");
		content.put("tradeName", "水泥制造");
		content.put("price1", "1.085");
		content.put("price2", "0.906");
		content.put("price3", "0.433");
		content.put("numPrice", "0.675");

		content.put("company_name", "**有限公司");
		content.put("company_address", "机场路2号");
		changer.replaceBookMark(content);


		//替换表格标签
		List<Map<String ,String>> content2 = new ArrayList<Map<String, String>>();
		Map<String, String> table1 = new HashMap<String, String>();

		table1.put("MONTH", "*月份");
		table1.put("SALE_DEP", "75分");
		table1.put("TECH_CENTER", "80分");
		table1.put("CUSTOMER_SERVICE", "85分");
		table1.put("HUMAN_RESOURCES", "90分");
		table1.put("FINANCIAL", "95分");
		table1.put("WORKSHOP", "80分");
		table1.put("TOTAL", "85分");

		for(int i = 0; i < 3; i++){
			content2.add(table1);
		}
		changer.fillTableAtBookMark("Table" ,content2);
		changer.fillTableAtBookMark("month", content2);

		//表格中文本的替换
		Map<String, String> table = new HashMap<String, String>();
		table.put("CUSTOMER_NAME", "**有限公司");
		table.put("ADDRESS", "机场路2号");
		table.put("USER_NO", "3021170207");
		table.put("tradeName", "水泥制造");
		table.put("PRICE_1", "1.085");
		table.put("PRICE_2", "0.906");
		table.put("PRICE_3", "0.433");
		table.put("NUM_PRICE", "0.675");
		changer.replaceText(table,"Table2");

		//保存替换后的WORD
		changer.saveAs();
		System.out.println("time=="+(System.currentTimeMillis() - startTime));

	}

文案中使用的word文档也是从(http://www.jb51.net/article/101910.htm)中项目中获得的使用,测试完全可以

这里主要的区别就是,他使用的是poi3.9的,但是引用3.17的话就会报错,有些方法进行了修改。


它的修改之后,我们有一些方法不能使用。需要引入新的包。我们可以在poi的官网上进行下载3.17的包

这里附上下载的地址:https://poi.apache.org/download.html

下载解压后如下图所示:


其中ooxml-lib就是之前没有的或者说是修改后分离出来的。在项目中引用就可以了。


当然了,poi相关的也要添加进来。本人测试可行。

如有需要的童鞋,可以去克隆下来看一下https://github.com/cocoforgod/J2W


GitHub 加速计划 / li / linux-dash
10.39 K
1.2 K
下载
A beautiful web dashboard for Linux
最近提交(Master分支:2 个月前 )
186a802e added ecosystem file for PM2 4 年前
5def40a3 Add host customization support for the NodeJS version 4 年前
Logo

旨在为数千万中国开发者提供一个无缝且高效的云端环境,以支持学习、使用和贡献开源项目。

更多推荐