Java 使用 POI 3.17根据Word 模板替换、操作书签
由于项目的需求,需要对大量的word文档进行处理。
查找了大量的文档发现很多的博客对这个进行了介绍,主要有2种方案做处理,jacob 和poi。但是现在的服务器基本上是部署在Linux上,所以jacob基本上是不可行的。所以呢,主要是使用poi来进行这些操作。
Apache poi的hwpf模块是专门用来对word doc文件进行读写操作的。在hwpf里面我们使用HWPFDocument来表示一个word doc文档。在HWPFDocument里面有这么几个概念:
Range:它表示一个范围,这个范围可以是整个文档,也可以是里面的某一小节(Section),也可以是某一个段落(Paragraph),还可以是拥有共同属性的一段文本(CharacterRun)。
Section:word文档的一个小节,一个word文档可以由多个小节构成。
Paragraph:word文档的一个段落,一个小节可以由多个段落构成。
CharacterRun:具有相同属性的一段文本,一个段落可以由多个CharacterRun组成。Table:一个表格。
TableRow:表格对应的行。
TableCell:表格对应的单元格。
Section、Paragraph、CharacterRun和Table都继承自Range。
1、基本的替换方法
InputStream inputStream = new FileInputStream(modulePath);
HWPFDocument document = new HWPFDocument(inputStream);
Range range = document.getRange();
for (Map.Entry<String, String> entry : maps.entrySet()) {
range.replaceText("@" + entry.getKey() + "@", entry.getValue());
}
OutputStream outputStream = new FileOutputStream(outPath);
document.write(outputStream);
this.closeStream(outputStream);
this.closeStream(inputStream);
这些在网上已经有很普遍的使用了,但是这些基本上是基于3.9poi进行使用的,目前poi的版本已经更新到了3.17了,而且后续的就不会对Java6的支持了,最低支持Java8的,所以我们要使用3.17来进行对word进行文本的替换,书签的操作。
我们这里主要使用了两个类。(这两个类主要是参考http://www.jb51.net/article/101910.htm)中的dome的fang
BookMarkWord 文件中标签的封装类,保存了其定义和内部的操作
package com;
import java.util.List;
import java.util.Stack;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;
import org.apache.poi.xwpf.usermodel.XWPFTable;
import org.apache.poi.xwpf.usermodel.XWPFTableCell;
import org.apache.poi.xwpf.usermodel.XWPFTableRow;
import org.apache.xmlbeans.XmlException;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTBookmark;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTText;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
/**
*
* Word 文件中标签的封装类,保存了其定义和内部的操作
*
* @author
*
* <p>Modification History:</p>
* <p>Date Author Description</p>
* <p>------------------------------------------------------------------</p>
* <p> </p>
* <p> </p>
*/
public class BookMark {
//以下为定义的常量
/** 替换标签时,设于标签的后面 **/
public static final int INSERT_AFTER = 0;
/** 替换标签时,设于标签的前面 **/
public static final int INSERT_BEFORE = 1;
/** 替换标签时,将内容替换书签 **/
public static final int REPLACE = 2;
/** docx中定义的部分常量引用 **/
public static final String RUN_NODE_NAME = "w:r";
public static final String TEXT_NODE_NAME = "w:t";
public static final String BOOKMARK_START_TAG = "bookmarkStart";
public static final String BOOKMARK_END_TAG = "bookmarkEnd";
public static final String BOOKMARK_ID_ATTR_NAME = "w:id";
public static final String STYLE_NODE_NAME = "w:rPr";
/** 内部的标签定义类 **/
private CTBookmark _ctBookmark = null;
/** 标签所处的段落 **/
private XWPFParagraph _para = null;
/** 标签所在的表cell对象 **/
private XWPFTableCell _tableCell = null;
/** 标签名称 **/
private String _bookmarkName = null;
/** 该标签是否处于表格内 **/
private boolean _isCell = false;
/**
* 构造函数
* @param ctBookmark
* @param para
*/
public BookMark(CTBookmark ctBookmark, XWPFParagraph para) {
this._ctBookmark = ctBookmark;
this._para = para;
this._bookmarkName = ctBookmark.getName();
this._tableCell = null;
this._isCell = false;
}
/**
* 构造函数,用于表格中的标签
* @param ctBookmark
* @param para
* @param tableCell
*/
public BookMark(CTBookmark ctBookmark, XWPFParagraph para, XWPFTableCell tableCell) {
this(ctBookmark, para);
this._tableCell = tableCell;
this._isCell = true;
}
public boolean isInTable() {
return this._isCell;
}
public XWPFTable getContainerTable() {
return this._tableCell.getTableRow().getTable();
}
public XWPFTableRow getContainerTableRow() {
return this._tableCell.getTableRow();
}
public String getBookmarkName() {
return this._bookmarkName;
}
/**
* Insert text into the Word document in the location indicated by this
* bookmark.
*
* @param bookmarkValue An instance of the String class that encapsulates
* the text to insert into the document.
* @param where A primitive int whose value indicates where the text ought
* to be inserted. There are three options controlled by constants; insert
* the text immediately in front of the bookmark (Bookmark.INSERT_BEFORE),
* insert text immediately after the bookmark (Bookmark.INSERT_AFTER) and
* replace any and all text that appears between the bookmark's square
* brackets (Bookmark.REPLACE).
*/
public void insertTextAtBookMark(String bookmarkValue, int where) {
//根据标签的类型,进行不同的操作
if(this._isCell) {
this.handleBookmarkedCells(bookmarkValue, where);
} else {
//普通标签,直接创建一个元素
XWPFRun run = this._para.createRun();
run.setText(bookmarkValue);
switch(where) {
case BookMark.INSERT_AFTER:
this.insertAfterBookmark(run);
break;
case BookMark.INSERT_BEFORE:
this.insertBeforeBookmark(run);
break;
case BookMark.REPLACE:
this.replaceBookmark(run);
break;
}
}
}
/**
* Inserts some text into a Word document in a position that is immediately
* after a named bookmark.
*
* Bookmarks can take two forms, they can either simply mark a location
* within a document or they can do this but contain some text. The
* difference is obvious from looking at some XML markup. The simple
* placeholder bookmark will look like this;
*
* <pre>
*
* <w:bookmarkStart w:name="AllAlone" w:id="0"/><w:bookmarkEnd w:id="0"/>
*
* </pre>
*
* Simply a pair of tags where one tag has the name bookmarkStart, the other
* the name bookmarkEnd and both share matching id attributes. In this case,
* the text will simply be inserted into the document at a point immediately
* after the bookmarkEnd tag. No styling will be applied to the text, it
* will simply inherit the documents defaults.
*
* The more complex case looks like this;
*
* <pre>
*
* <w:bookmarkStart w:name="InStyledText" w:id="3"/>
* <w:r w:rsidRPr="00DA438C">
* <w:rPr>
* <w:rFonts w:hAnsi="Engravers MT" w:ascii="Engravers MT" w:cs="Arimo"/>
* <w:color w:val="FF0000"/>
* </w:rPr>
* <w:t>text</w:t>
* </w:r>
* <w:bookmarkEnd w:id="3"/>
*
* </pre>
*
* Here, the user has selected the word 'text' and chosen to insert a
* bookmark into the document at that point. So, the bookmark tags 'contain'
* a character run that is styled. Inserting any text after this bookmark,
* it is important to ensure that the styling is preserved and copied over
* to the newly inserted text.
*
* The approach taken to dealing with both cases is similar but slightly
* different. In both cases, the code simply steps along the document nodes
* until it finds the bookmarkEnd tag whose ID matches that of the
* bookmarkStart tag. Then, it will look to see if there is one further node
* following the bookmarkEnd tag. If there is, it will insert the text into
* the paragraph immediately in front of this node. If, on the other hand,
* there are no more nodes following the bookmarkEnd tag, then the new run
* will simply be positioned at the end of the paragraph.
*
* Styles are dealt with by 'looking' for a 'w:rPr' element whilst iterating
* through the nodes. If one is found, its details will be captured and
* applied to the run before the run is inserted into the paragraph. If
* there are multiple runs between the bookmarkStart and bookmarkEnd tags
* and these have different styles applied to them, then the style applied
* to the last run before the bookmarkEnd tag - if any - will be cloned and
* applied to the newly inserted text.
*
* @param run An instance of the XWPFRun class that encapsulates the text
* that is to be inserted into the document following the bookmark.
*/
private void insertAfterBookmark(XWPFRun run) {
Node nextNode = null;
Node insertBeforeNode = null;
Node styleNode = null;
int bookmarkStartID = 0;
int bookmarkEndID = -1;
// Capture the id of the bookmarkStart tag. The code will step through
// the document nodes 'contained' within the start and end tags that have
// matching id numbers.
bookmarkStartID = this._ctBookmark.getId().intValue();
// Get the node for the bookmark start tag and then enter a loop that
// will step from one node to the next until the bookmarkEnd tag with
// a matching id is fouind.
nextNode = this._ctBookmark.getDomNode();
while (bookmarkStartID != bookmarkEndID) {
// Get the next node along and check to see if it is a bookmarkEnd
// tag. If it is, get its id so that the containing while loop can
// be terminated once the correct end tag is found. Note that the
// id will be obtained as a String and must be converted into an
// integer. This has been coded to fail safely so that if an error
// is encuntered converting the id to an int value, the while loop
// will still terminate.
nextNode = nextNode.getNextSibling();
if (nextNode.getNodeName().contains(BookMark.BOOKMARK_END_TAG)) {
try {
bookmarkEndID = Integer.parseInt(
nextNode.getAttributes().getNamedItem(
BookMark.BOOKMARK_ID_ATTR_NAME).getNodeValue());
} catch (NumberFormatException nfe) {
bookmarkEndID = bookmarkStartID;
}
} // If we are not dealing with a bookmarkEnd node, are we dealing
// with a run node that MAY contains styling information. If so,
// then get that style information from the run.
else {
if (nextNode.getNodeName().equals(BookMark.RUN_NODE_NAME)) {
styleNode = this.getStyleNode(nextNode);
}
}
}
// After the while loop completes, it should have located the correct
// bookmarkEnd tag but we cannot perform an insert after only an insert
// before operation and must, therefore, get the next node.
insertBeforeNode = nextNode.getNextSibling();
// Style the newly inserted text. Note that the code copies or clones
// the style it found in another run, failure to do this would remove the
// style from one node and apply it to another.
if (styleNode != null) {
run.getCTR().getDomNode().insertBefore(
styleNode.cloneNode(true), run.getCTR().getDomNode().getFirstChild());
}
// Finally, check to see if there was a node after the bookmarkEnd
// tag. If there was, then this code will insert the run in front of
// that tag. If there was no node following the bookmarkEnd tag then the
// run will be inserted at the end of the paragarph and this was taken
// care of at the point of creation.
if (insertBeforeNode != null) {
this._para.getCTP().getDomNode().insertBefore(
run.getCTR().getDomNode(), insertBeforeNode);
}
}
/**
* Inserts some text into a Word document immediately in front of the
* location of a bookmark.
*
* This case is slightly more straightforward than inserting after the
* bookmark. For example, it is possible only to insert a new node in front
* of an existing node. When inserting after the bookmark, then end node had
* to be located whereas, in this case, the node is already known, it is the
* CTBookmark itself. The only information that must be discovered is
* whether there is a run immediately in front of the boookmarkStart tag and
* whether that run is styled. If there is and if it is, then this style
* must be cloned and applied the text which will be inserted into the
* paragraph.
*
* @param run An instance of the XWPFRun class that encapsulates the text
* that is to be inserted into the document following the bookmark.
*/
private void insertBeforeBookmark(XWPFRun run) {
Node insertBeforeNode = null;
Node childNode = null;
Node styleNode = null;
// Get the dom node from the bookmarkStart tag and look for another
// node immediately preceding it.
insertBeforeNode = this._ctBookmark.getDomNode();
childNode = insertBeforeNode.getPreviousSibling();
// If a node is found, try to get the styling from it.
if (childNode != null) {
styleNode = this.getStyleNode(childNode);
// If that previous node was styled, then apply this style to the
// text which will be inserted.
if (styleNode != null) {
run.getCTR().getDomNode().insertBefore(
styleNode.cloneNode(true), run.getCTR().getDomNode().getFirstChild());
}
}
// Insert the text into the paragraph immediately in front of the
// bookmarkStart tag.
this._para.getCTP().getDomNode().insertBefore(
run.getCTR().getDomNode(), insertBeforeNode);
}
/**
* Replace the text - if any - contained between the bookmarkStart and it's
* matching bookmarkEnd tag with the text specified. The technique used will
* resemble that employed when inserting text after the bookmark. In short,
* the code will iterate along the nodes until it encounters a matching
* bookmarkEnd tag. Each node encountered will be deleted unless it is the
* final node before the bookmarkEnd tag is encountered and it is a
* character run. If this is the case, then it can simply be updated to
* contain the text the users wishes to see inserted into the document. If
* the last node is not a character run, then it will be deleted, a new run
* will be created and inserted into the paragraph between the bookmarkStart
* and bookmarkEnd tags.
*
* @param run An instance of the XWPFRun class that encapsulates the text
* that is to be inserted into the document following the bookmark.
*/
private void replaceBookmark(XWPFRun run) {
Node nextNode = null;
Node styleNode = null;
Node lastRunNode = null;
Node toDelete = null;
NodeList childNodes = null;
Stack<Node> nodeStack = null;
boolean textNodeFound = false;
boolean foundNested = true;
int bookmarkStartID = 0;
int bookmarkEndID = -1;
int numChildNodes = 0;
nodeStack = new Stack<Node>();
bookmarkStartID = this._ctBookmark.getId().intValue();
nextNode = this._ctBookmark.getDomNode();
nodeStack.push(nextNode);
// Loop through the nodes looking for a matching bookmarkEnd tag
while (bookmarkStartID != bookmarkEndID) {
nextNode = nextNode.getNextSibling();
nodeStack.push(nextNode);
// If an end tag is found, does it match the start tag? If so, end
// the while loop.
if (nextNode.getNodeName().contains(BookMark.BOOKMARK_END_TAG)) {
try {
bookmarkEndID = Integer.parseInt(
nextNode.getAttributes().getNamedItem(
BookMark.BOOKMARK_ID_ATTR_NAME).getNodeValue());
} catch (NumberFormatException nfe) {
bookmarkEndID = bookmarkStartID;
}
}
//else {
// Place a reference to the node on the nodeStack
// nodeStack.push(nextNode);
//}
}
// If the stack of nodes found between the bookmark tags is not empty
// then they have to be removed.
if (!nodeStack.isEmpty()) {
// Check the node at the top of the stack. If it is a run, get it's
// style - if any - and apply to the run that will be replacing it.
//lastRunNode = nodeStack.pop();
lastRunNode = nodeStack.peek();
if ((lastRunNode.getNodeName().equals(BookMark.RUN_NODE_NAME))) {
styleNode = this.getStyleNode(lastRunNode);
if (styleNode != null) {
run.getCTR().getDomNode().insertBefore(
styleNode.cloneNode(true), run.getCTR().getDomNode().getFirstChild());
}
}
// Delete any and all node that were found in between the start and
// end tags. This is slightly safer that trying to delete the nodes
// as they are found while stepping through them in the loop above.
// If we are peeking, then this line can be commented out.
//this._para.getCTP().getDomNode().removeChild(lastRunNode);
this.deleteChildNodes(nodeStack);
}
// Place the text into position, between the bookmark tags.
this._para.getCTP().getDomNode().insertBefore(
run.getCTR().getDomNode(), nextNode);
}
/**
* When replacing the bookmark's text, it is necessary to delete any nodes
* that are found between matching start and end tags. Complications occur
* here because it is possible to have bookmarks nested within bookmarks to
* almost any level and it is important to not remove any inner or nested
* bookmarks when replacing the contents of an outer or containing
* bookmark. This code successfully handles the simplest occurrence - where
* one bookmark completely contains another - but not more complex cases
* where one bookmark overlaps another in the markup. That is still to do.
*
* @param nodeStack An instance of the Stack class that encapsulates
* references to any and all nodes found between the opening and closing
* tags of a bookmark.
*/
private void deleteChildNodes(Stack<Node> nodeStack) {
Node toDelete = null;
int bookmarkStartID = 0;
int bookmarkEndID = 0;
boolean inNestedBookmark = false;
// The first element in the list will be a bookmarkStart tag and that
// must not be deleted.
for(int i = 1; i < nodeStack.size(); i++) {
// Get an element. If it is another bookmarkStart tag then
// again, we do not want to delete it, it's matching end tag
// or any nodes that fall inbetween.
toDelete = nodeStack.elementAt(i);
if(toDelete.getNodeName().contains(BookMark.BOOKMARK_START_TAG)) {
bookmarkStartID = Integer.parseInt(
toDelete.getAttributes().getNamedItem(BookMark.BOOKMARK_ID_ATTR_NAME).getNodeValue());
inNestedBookmark = true;
}
else if(toDelete.getNodeName().contains(BookMark.BOOKMARK_END_TAG)) {
bookmarkEndID = Integer.parseInt(
toDelete.getAttributes().getNamedItem(BookMark.BOOKMARK_ID_ATTR_NAME).getNodeValue());
if(bookmarkEndID == bookmarkStartID) {
inNestedBookmark = false;
}
}
else {
if(!inNestedBookmark) {
this._para.getCTP().getDomNode().removeChild(toDelete);
}
}
}
}
/**
* Recover styling information - if any - from another document node. Note
* that it is only possible to accomplish this if the node is a run (w:r)
* and this could be tested for in the code that calls this method. However,
* a check is made in the calling code as to whether a style has been found
* and only if a style is found is it applied. This method always returns
* null if it does not find a style making that checking process easier.
*
* @param parentNode An instance of the Node class that encapsulates a
* reference to a document node.
* @return An instance of the Node class that encapsulates the styling
* information applied to a character run. Note that if no styling
* information is found in the run OR if the node passed as an argument to
* the parentNode parameter is NOT a run, then a null value will be
* returned.
*/
private Node getStyleNode(Node parentNode) {
Node childNode = null;
Node styleNode = null;
if (parentNode != null) {
// If the node represents a run and it has child nodes then
// it can be processed further. Note, whilst testing the code, it
// was observed that although it is possible to get a list of a nodes
// children, even when a node did have children, trying to obtain this
// list would often return a null value. This is the reason why the
// technique of stepping from one node to the next is used here.
if (parentNode.getNodeName().equalsIgnoreCase(BookMark.RUN_NODE_NAME)
&& parentNode.hasChildNodes()) {
// Get the first node and catch it's reference for return if
// the first child node is a style node (w:rPr).
childNode = parentNode.getFirstChild();
if (childNode.getNodeName().equals("w:rPr")) {
styleNode = childNode;
} else {
// If the first node was not a style node and there are other
// child nodes remaining to be checked, then step through
// the remaining child nodes until either a style node is
// found or until all child nodes have been processed.
while ((childNode = childNode.getNextSibling()) != null) {
if (childNode.getNodeName().equals(BookMark.STYLE_NODE_NAME)) {
styleNode = childNode;
// Note setting to null here if a style node is
// found in order order to terminate any further
// checking
childNode = null;
}
}
}
}
}
return (styleNode);
}
/**
* Get the text - if any - encapsulated by this bookmark. The creator of a
* Word document can chose to select one or more items of text and then
* insert a bookmark at that location. The highlighted text will appear
* between the square brackets that denote the location of a bookmark in the
* document's text and they will be returned by a call to this method.
*
* @return An instance of the String class encapsulating any text that
* appeared between the opening and closing square bracket associated with
* this bookmark.
* @throws XmlException Thrown if a problem is encountered parsing the XML
* markup recovered from the document in order to construct a CTText
* instance which may required to obtain the bookmarks text.
*/
public String getBookmarkText() throws XmlException {
StringBuilder builder = null;
// Are we dealing with a bookmarked table cell? If so, the entire
// contents of the cell - if anything - must be recovered and returned.
if(this._tableCell != null) {
builder = new StringBuilder(this._tableCell.getText());
}
else {
builder = this.getTextFromBookmark();
}
return(builder == null ? null : builder.toString());
}
/**
* There are two types of bookmarks. One is a simple placeholder whilst the
* second is still a placeholder but it 'contains' some text. In the second
* instance, the creator of the document has selected some text and then
* chosen to insert a bookmark there and the difference if obvious when
* looking at the XML markup.
*
* The simple case;
*
* <pre>
*
* <w:bookmarkStart w:name="AllAlone" w:id="0"/><w:bookmarkEnd w:id="0"/>
*
* </pre>
*
* The more complex case;
*
* <pre>
*
* <w:bookmarkStart w:name="InStyledText" w:id="3"/>
* <w:r w:rsidRPr="00DA438C">
* <w:rPr>
* <w:rFonts w:hAnsi="Engravers MT" w:ascii="Engravers MT" w:cs="Arimo"/>
* <w:color w:val="FF0000"/>
* </w:rPr>
* <w:t>text</w:t>
* </w:r>
* <w:bookmarkEnd w:id="3"/>
*
* </pre>
*
* This method assumes that the user wishes to recover the content from any
* character run that appears in the markup between a matching pair of
* bookmarkStart and bookmarkEnd tags; thus, using the example above again,
* this method would return the String 'text' to the user. It is possible
* however for a bookmark to contain more than one run and for a bookmark to
* contain other bookmarks. In both of these cases, this code will return
* the text contained within any and all runs that appear in the XML markup
* between matching bookmarkStart and bookmarkEnd tags. The term 'matching
* bookmarkStart and bookmarkEndtags' here means tags whose id attributes
* have matching value.
*
* @return An instance of the StringBuilder class encapsulating the text
* recovered from any character run elements found between the bookmark's
* start and end tags. If no text is found then a null value will be
* returned.
* @throws XmlException Thrown if a problem is encountered parsing the XML
* markup recovered from the document in order to construct a CTText
* instance which may be required to obtain the bookmarks text.
*/
private StringBuilder getTextFromBookmark() throws XmlException {
int startBookmarkID = 0;
int endBookmarkID = -1;
Node nextNode = null;
Node childNode = null;
CTText text = null;
StringBuilder builder = null;
String rawXML = null;
// Get the ID of the bookmark from it's start tag, the DOM node from the
// bookmark (to make looping easier) and initialise the StringBuilder.
startBookmarkID = this._ctBookmark.getId().intValue();
nextNode = this._ctBookmark.getDomNode();
builder = new StringBuilder();
// Loop through the nodes held between the bookmark's start and end
// tags.
while (startBookmarkID != endBookmarkID) {
// Get the next node and, if it is a bookmarkEnd tag, get it's ID
// as matching ids will terminate the while loop..
nextNode = nextNode.getNextSibling();
if (nextNode.getNodeName().contains(BookMark.BOOKMARK_END_TAG)) {
// Get the ID attribute from the node. It is a String that must
// be converted into an int. An exception could be thrown and so
// the catch clause will ensure the loop ends neatly even if the
// value might be incorrect. Must inform the user.
try {
endBookmarkID = Integer.parseInt(
nextNode.getAttributes().
getNamedItem(BookMark.BOOKMARK_ID_ATTR_NAME).getNodeValue());
} catch (NumberFormatException nfe) {
endBookmarkID = startBookmarkID;
}
} else {
// This is not a bookmarkEnd node and can processed it for any
// text it may contain. Note the check for both type - it must
// be a run - and contain children. Interestingly, it seems as
// though the node may contain children and yet the call to
// nextNode.getChildNodes() will still return an empty list,
// hence the need to step through the child nodes.
if (nextNode.getNodeName().equals(BookMark.RUN_NODE_NAME)
&& nextNode.hasChildNodes()) {
// Get the text from the child nodes.
builder.append(this.getTextFromChildNodes(nextNode));
}
}
}
return (builder);
}
/**
* Iterates through all and any children of the Node whose reference will be
* passed as an argument to the node parameter, and recover the contents of
* any text nodes. Testing revealed that a node can be called a text node
* and yet report it's type as being something different, an element node
* for example. Calling the getNodeValue() method on a text node will return
* the text the node encapsulates but doing the same on an element node will
* not. In fact, the call will simply return a null value. As a result, this
* method will test the nodes name to catch all text nodes - those whose
* name is to 'w:t' and then it's type. If the type is reported to be a text
* node, it is a trivial task to get at it's contents. However, if the type
* is not reported as a text type, then it is necessary to parse the raw XML
* markup for the node to recover it's value.
*
* @param node An instance of the Node class that encapsulates a reference
* to a node recovered from the document being processed. It should be
* passed a reference to a character run - 'w:r' - node.
* @return An instance of the String class that encapsulates the text
* recovered from the nodes children, if they are text nodes.
* @throws XmlException Thrown if a problem is encountered parsing the XML
* markup recovered from the document in order to construct the CTText
* instance which may be required to obtain the bookmarks text.
*/
private String getTextFromChildNodes(Node node) throws XmlException {
NodeList childNodes = null;
Node childNode = null;
CTText text = null;
StringBuilder builder = new StringBuilder();
int numChildNodes = 0;
// Get a list of chid nodes from the node passed to the method and
// find out how many children there are in the list.
childNodes = node.getChildNodes();
numChildNodes = childNodes.getLength();
// Iterate through the children one at a time - it is possible for a
// run to ciontain zero, one or more text nodes - and recover the text
// from an text type child nodes.
for (int i = 0; i < numChildNodes; i++) {
// Get a node and check it's name. If this is 'w:t' then process as
// text type node.
childNode = childNodes.item(i);
if (childNode.getNodeName().equals(BookMark.TEXT_NODE_NAME)) {
// If the node reports it's type as txet, then simply call the
// getNodeValue() method to get at it's text.
if (childNode.getNodeType() == Node.TEXT_NODE) {
builder.append(childNode.getNodeValue());
} else {
// Correct the type by parsing the node's XML markup and
// creating a CTText object. Call the getStringValue()
// method on that to get the text.
text = CTText.Factory.parse(childNode);
builder.append(text.getStringValue());
}
}
}
return (builder.toString());
}
private void handleBookmarkedCells(String bookmarkValue, int where) {
List<XWPFParagraph> paraList = null;
List<XWPFRun> runs = null;
XWPFParagraph para = null;
XWPFRun readRun = null;
// Get a list if paragraphs from the table cell and remove any and all.
paraList = this._tableCell.getParagraphs();
for(int i = 0; i < paraList.size(); i++) {
this._tableCell.removeParagraph(i);
}
para = this._tableCell.addParagraph();
para.createRun().setText(bookmarkValue);
}
}
BookMarks:
利用POI进行Word文件相关的操作,针对docx形式的封装
package com;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFTable;
import org.apache.poi.xwpf.usermodel.XWPFTableCell;
import org.apache.poi.xwpf.usermodel.XWPFTableRow;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTBookmark;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Collection;
import java.util.Set;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
/**
*
* 利用POI进行Word文件相关的操作,针对docx形式的封装
*
* @author
*
* <p>Modification History:</p>
* <p>Date Author Description</p>
* <p>------------------------------------------------------------------</p>
* <p> </p>
* <p> </p>
*/
public class BookMarks {
/** 保存Word文件中定义的标签 **/
private HashMap<String, BookMark> _bookmarks = null;
/**
* 构造函数,用以分析文档,解析出所有的标签
*
* @param document Word OOXML document instance.
*/
public BookMarks(XWPFDocument document) {
//初始化标签缓存
this._bookmarks = new HashMap<String, BookMark>();
// 首先解析文档普通段落中的标签
this.procParaList(document.getParagraphs());
//利用繁琐的方法,从所有的表格中得到得到标签,处理比较原始和简单
List<XWPFTable> tableList = document.getTables();
for (XWPFTable table : tableList) {
//得到表格的列信息
List<XWPFTableRow> rowList = table.getRows();
for (XWPFTableRow row : rowList){
//得到行中的列信息
List<XWPFTableCell> cellList = row.getTableCells();
for (XWPFTableCell cell : cellList) {
//逐个解析标签信息
//this.procParaList(cell.getParagraphs(), row);
this.procParaList(cell);
}
}
}
}
/**
* 根据标签名称,获得标签的相关定义,如果不存在,则返回空
* @param bookmarkName 标签名称
* @return 返回封装好的对象
*/
public BookMark getBookmark(String bookmarkName) {
BookMark bookmark = null;
if(this._bookmarks.containsKey(bookmarkName)) {
bookmark = this._bookmarks.get(bookmarkName);
}
return bookmark;
}
/**
* 得到所有的标签信息集合
*
* @return 缓存的标签信息集合
*/
public Collection<BookMark> getBookmarkList() {
return(this._bookmarks.values());
}
/**
* 返回文档中的标签名称迭代器
* @return 由Map KEY 转换的迭代器
*/
public Iterator<String> getNameIterator() {
return(this._bookmarks.keySet().iterator());
}
private void procParaList(XWPFTableCell cell){
List<XWPFParagraph> paragraphList = cell.getParagraphs();
for(XWPFParagraph paragraph : paragraphList){
//得到段落中的标签标记
List<CTBookmark> bookmarkList = paragraph.getCTP().getBookmarkStartList();
for (CTBookmark bookmark : bookmarkList ) {
this._bookmarks.put(bookmark.getName(),
new BookMark(bookmark, paragraph, cell));
}
}
}
/**
* 解析表格中的标签
* @param paragraphList 传入的段落列表
* @param tableRow 对应的表格行对象
*/
private void procParaList(List<XWPFParagraph> paragraphList, XWPFTableRow tableRow) {
NamedNodeMap attributes = null;
Node colFirstNode = null;
Node colLastNode = null;
int firstColIndex = 0;
int lastColIndex = 0;
//循环判断,解析段落中的标签
for (XWPFParagraph paragraph : paragraphList) {
//得到段落中的标签标记
List<CTBookmark> bookmarkList = paragraph.getCTP().getBookmarkStartList();
for (CTBookmark bookmark : bookmarkList ) {
// With a bookmark in hand, test to see if the bookmarkStart tag
// has w:colFirst or w:colLast attributes. If it does, we are
// dealing with a bookmarked table cell. This will need to be
// handled differnetly - I think by an different concrete class
// that implements the Bookmark interface!!
attributes = bookmark.getDomNode().getAttributes();
if(attributes != null) {
// Get the colFirst and colLast attributes. If both - for
// now - are found, then we are dealing with a bookmarked
// cell.
colFirstNode = attributes.getNamedItem("w:colFirst");
colLastNode = attributes.getNamedItem("w:colLast");
if(colFirstNode != null && colLastNode != null) {
// Get the index of the cell (or cells later) from them.
// First convefrt the String values both return to primitive
// int value. TO DO, what happens if there is a
// NumberFormatException.
firstColIndex = Integer.parseInt(colFirstNode.getNodeValue());
lastColIndex = Integer.parseInt(colLastNode.getNodeValue());
// if the indices are equal, then we are dealing with a#
// cell and can create the bookmark for it.
if(firstColIndex == lastColIndex) {
this._bookmarks.put(bookmark.getName(),
new BookMark(bookmark, paragraph,
tableRow.getCell(firstColIndex)));
}
else {
System.out.println("This bookmark " + bookmark.getName() +
" identifies a number of cells in the "
+ "table. That condition is not handled yet.");
}
}
else {
this._bookmarks.put(bookmark.getName(),
new BookMark(bookmark, paragraph,tableRow.getCell(1)));
}
}
else {
this._bookmarks.put(bookmark.getName(),
new BookMark(bookmark, paragraph,tableRow.getCell(1)));
}
}
}
}
/**
* 解析普通段落中的标签
* @param paragraphList 传入的段落
*/
private void procParaList(List<XWPFParagraph> paragraphList) {
for (XWPFParagraph paragraph : paragraphList) {
List<CTBookmark> bookmarkList = paragraph.getCTP().getBookmarkStartList();
//循环加入标签
for (CTBookmark bookmark : bookmarkList) {
this._bookmarks.put(bookmark.getName(),
new BookMark(bookmark, paragraph));
}
}
}
}
使用的工具类:MSWordTool
package com;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.math.BigInteger;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import org.apache.poi.POIXMLDocument;
import org.apache.poi.xwpf.usermodel.ParagraphAlignment;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;
import org.apache.poi.xwpf.usermodel.XWPFTable;
import org.apache.poi.xwpf.usermodel.XWPFTableCell;
import org.apache.poi.xwpf.usermodel.XWPFTableRow;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTHeight;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTRow;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTTrPr;
import org.w3c.dom.Node;
/**
* 使用POI,进行Word相关的操作
*
*
* @author xuyu
*
* <p>Modification History:</p>
* <p>Date Author Description</p>
* <p>------------------------------------------------------------------</p>
* <p> </p>
* <p> </p>
*/
public class MSWordTool {
/** 内部使用的文档对象 **/
private XWPFDocument document;
private BookMarks bookMarks = null;
/**
* 为文档设置模板
* @param templatePath 模板文件名称
*/
public void setTemplate(String templatePath) {
try {
this.document = new XWPFDocument(
POIXMLDocument.openPackage(templatePath));
bookMarks = new BookMarks(document);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
/**
* 进行标签替换的例子,传入的Map中,key表示标签名称,value是替换的信息
* @param indicator
*/
public void replaceBookMark(Map<String,String> indicator) {
//循环进行替换
Iterator<String> bookMarkIter = bookMarks.getNameIterator();
while (bookMarkIter.hasNext()) {
String bookMarkName = bookMarkIter.next();
//得到标签名称
BookMark bookMark = bookMarks.getBookmark(bookMarkName);
//进行替换
if (indicator.get(bookMarkName)!=null) {
bookMark.insertTextAtBookMark(indicator.get(bookMarkName), BookMark.INSERT_BEFORE);
}
}
}
public void fillTableAtBookMark(String bookMarkName,List<Map<String,String>> content) {
//rowNum来比较标签在表格的哪一行
int rowNum = 0;
//首先得到标签
BookMark bookMark = bookMarks.getBookmark(bookMarkName);
Map<String, String> columnMap = new HashMap<String, String>();
Map<String, Node> styleNode = new HashMap<String, Node>();
//标签是否处于表格内
if(bookMark.isInTable()){
//获得标签对应的Table对象和Row对象
XWPFTable table = bookMark.getContainerTable();
XWPFTableRow row = bookMark.getContainerTableRow();
CTRow ctRow = row.getCtRow();
List<XWPFTableCell> rowCell = row.getTableCells();
for(int i = 0; i < rowCell.size(); i++){
columnMap.put(i+"", rowCell.get(i).getText().trim());
//System.out.println(rowCell.get(i).getParagraphs().get(0).createRun().getFontSize());
//System.out.println(rowCell.get(i).getParagraphs().get(0).getCTP());
//System.out.println(rowCell.get(i).getParagraphs().get(0).getStyle());
//获取该单元格段落的xml,得到根节点
Node node1 = rowCell.get(i).getParagraphs().get(0).getCTP().getDomNode();
//遍历根节点的所有子节点
for (int x=0;x<node1.getChildNodes().getLength();x++) {
if (node1.getChildNodes().item(x).getNodeName().equals(BookMark.RUN_NODE_NAME)) {
Node node2 = node1.getChildNodes().item(x);
//遍历所有节点为"w:r"的所有自己点,找到节点名为"w:rPr"的节点
for (int y=0;y<node2.getChildNodes().getLength();y++) {
if(node2.getChildNodes().item(y).getNodeName().endsWith(BookMark.STYLE_NODE_NAME)){
//将节点为"w:rPr"的节点(字体格式)存到HashMap中
styleNode.put(i+"", node2.getChildNodes().item(y));
}
}
} else {
continue;
}
}
}
//循环对比,找到该行所处的位置,删除改行
for(int i = 0; i < table.getNumberOfRows(); i++){
if(table.getRow(i).equals(row)){
rowNum = i;
break;
}
}
table.removeRow(rowNum);
for(int i = 0; i < content.size(); i++){
//创建新的一行,单元格数是表的第一行的单元格数,
//后面添加数据时,要判断单元格数是否一致
XWPFTableRow tableRow = table.createRow();
CTTrPr trPr = tableRow.getCtRow().addNewTrPr();
CTHeight ht = trPr.addNewTrHeight();
ht.setVal(BigInteger.valueOf(360));
}
//得到表格行数
int rcount = table.getNumberOfRows();
for(int i = rowNum; i < rcount; i++){
XWPFTableRow newRow = table.getRow(i);
//判断newRow的单元格数是不是该书签所在行的单元格数
if(newRow.getTableCells().size() != rowCell.size()){
//计算newRow和书签所在行单元格数差的绝对值
//如果newRow的单元格数多于书签所在行的单元格数,不能通过此方法来处理,可以通过表格中文本的替换来完成
//如果newRow的单元格数少于书签所在行的单元格数,要将少的单元格补上
int sub= Math.abs(newRow.getTableCells().size() - rowCell.size());
//将缺少的单元格补上
for(int j = 0;j < sub; j++){
newRow.addNewTableCell();
}
}
List<XWPFTableCell> cells = newRow.getTableCells();
for(int j = 0; j < cells.size(); j++){
XWPFParagraph para = cells.get(j).getParagraphs().get(0);
XWPFRun run = para.createRun();
if(content.get(i-rowNum).get(columnMap.get(j+"")) != null){
//改变单元格的值,标题栏不用改变单元格的值
run.setText(content.get(i-rowNum).get(columnMap.get(j+""))+"");
//将单元格段落的字体格式设为原来单元格的字体格式
run.getCTR().getDomNode().insertBefore(styleNode.get(j+"").cloneNode(true), run.getCTR().getDomNode().getFirstChild());
}
para.setAlignment(ParagraphAlignment.CENTER);
}
}
}
}
public void replaceText(Map<String,String> bookmarkMap, String bookMarkName) {
//首先得到标签
BookMark bookMark = bookMarks.getBookmark(bookMarkName);
//获得书签标记的表格
XWPFTable table = bookMark.getContainerTable();
//获得所有的表
//Iterator<XWPFTable> it = document.getTablesIterator();
if(table != null){
//得到该表的所有行
int rcount = table.getNumberOfRows();
for(int i = 0 ;i < rcount; i++){
XWPFTableRow row = table.getRow(i);
//获到改行的所有单元格
List<XWPFTableCell> cells = row.getTableCells();
for(XWPFTableCell c : cells){
for(Entry<String,String> e : bookmarkMap.entrySet()){
if(c.getText().equals(e.getKey())){
//删掉单元格内容
c.removeParagraph(0);
//给单元格赋值
c.setText(e.getValue());
}
}
}
}
}
}
public void saveAs() {
File newFile = new File("e:\\test\\Word模版_REPLACE.docx");
FileOutputStream fos = null;
try {
fos = new FileOutputStream(newFile);
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
try {
this.document.write(fos);
fos.flush();
fos.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
测试方法
/**
* @param args
*/
public static void main(String[] args) {
long startTime = System.currentTimeMillis();
MSWordTool changer = new MSWordTool();
changer.setTemplate("E:\\test\\Word.docx");
Map<String,String> content = new HashMap<String,String>();
content.put("Principles", "格式规范、标准统一、利于阅览");
content.put("Purpose", "规范会议操作、提高会议质量");
content.put("Scope", "公司会议、部门之间业务协调会议");
content.put("customerName", "**有限公司");
content.put("address", "机场路2号");
content.put("userNo", "3021170207");
content.put("tradeName", "水泥制造");
content.put("price1", "1.085");
content.put("price2", "0.906");
content.put("price3", "0.433");
content.put("numPrice", "0.675");
content.put("company_name", "**有限公司");
content.put("company_address", "机场路2号");
changer.replaceBookMark(content);
//替换表格标签
List<Map<String ,String>> content2 = new ArrayList<Map<String, String>>();
Map<String, String> table1 = new HashMap<String, String>();
table1.put("MONTH", "*月份");
table1.put("SALE_DEP", "75分");
table1.put("TECH_CENTER", "80分");
table1.put("CUSTOMER_SERVICE", "85分");
table1.put("HUMAN_RESOURCES", "90分");
table1.put("FINANCIAL", "95分");
table1.put("WORKSHOP", "80分");
table1.put("TOTAL", "85分");
for(int i = 0; i < 3; i++){
content2.add(table1);
}
changer.fillTableAtBookMark("Table" ,content2);
changer.fillTableAtBookMark("month", content2);
//表格中文本的替换
Map<String, String> table = new HashMap<String, String>();
table.put("CUSTOMER_NAME", "**有限公司");
table.put("ADDRESS", "机场路2号");
table.put("USER_NO", "3021170207");
table.put("tradeName", "水泥制造");
table.put("PRICE_1", "1.085");
table.put("PRICE_2", "0.906");
table.put("PRICE_3", "0.433");
table.put("NUM_PRICE", "0.675");
changer.replaceText(table,"Table2");
//保存替换后的WORD
changer.saveAs();
System.out.println("time=="+(System.currentTimeMillis() - startTime));
}
文案中使用的word文档也是从(http://www.jb51.net/article/101910.htm)中项目中获得的使用,测试完全可以
这里主要的区别就是,他使用的是poi3.9的,但是引用3.17的话就会报错,有些方法进行了修改。
它的修改之后,我们有一些方法不能使用。需要引入新的包。我们可以在poi的官网上进行下载3.17的包
这里附上下载的地址:https://poi.apache.org/download.html
下载解压后如下图所示:
其中ooxml-lib就是之前没有的或者说是修改后分离出来的。在项目中引用就可以了。
当然了,poi相关的也要添加进来。本人测试可行。
如有需要的童鞋,可以去克隆下来看一下https://github.com/cocoforgod/J2W
更多推荐
所有评论(0)