java - Docx to Pdf conversion using docx4j produces an artifact in a numbered list - Stack Overflow

admin2025-04-17  5

I am trying to perform a straightforward conversion of docx document to pdf without applying any changes to its content. I am using 'export-FO' approach, as 'Microsoft Graph' and 'documents4j' approaches do not meet the requirements. My document contains a numbered list that causes a production of an artifact in a resulting pdf document. This artifact is always seen as overlaying the first number in a list with the last+1 number of the same list.

What causes this kind of behavior? What can I do to fix it?

Here is the link to the representative image of this artifact

This is the sample code I use to convert documents:

public class Main {
    public static void main(String[] args) throws Exception {
        InputStream templateInputStream = new FileInputStream("document.docx");
        WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(templateInputStream);

        Mapper fontMapper = new BestMatchingMapper();
        wordMLPackage.setFontMapper(fontMapper);

        OutputStream os = new FileOutputStream("document.pdf");
        Docx4J.toPDF(wordMLPackage, os);
    }
}

a list of dependencies I have in the sample project:

<dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j-core</artifactId>
    <version>11.5.2</version>
</dependency>

<dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j-export-fo</artifactId>
    <version>11.5.2</version>
</dependency>

<dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j-JAXB-ReferenceImpl</artifactId>
    <version>11.5.2</version>
</dependency>

<dependency>
    <groupId>org.apache.xmlgraphics</groupId>
    <artifactId>fop</artifactId>
    <version>2.10</version>
</dependency>

and a source docx document - google drive link here

I am trying to perform a straightforward conversion of docx document to pdf without applying any changes to its content. I am using 'export-FO' approach, as 'Microsoft Graph' and 'documents4j' approaches do not meet the requirements. My document contains a numbered list that causes a production of an artifact in a resulting pdf document. This artifact is always seen as overlaying the first number in a list with the last+1 number of the same list.

What causes this kind of behavior? What can I do to fix it?

Here is the link to the representative image of this artifact

This is the sample code I use to convert documents:

public class Main {
    public static void main(String[] args) throws Exception {
        InputStream templateInputStream = new FileInputStream("document.docx");
        WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(templateInputStream);

        Mapper fontMapper = new BestMatchingMapper();
        wordMLPackage.setFontMapper(fontMapper);

        OutputStream os = new FileOutputStream("document.pdf");
        Docx4J.toPDF(wordMLPackage, os);
    }
}

a list of dependencies I have in the sample project:

<dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j-core</artifactId>
    <version>11.5.2</version>
</dependency>

<dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j-export-fo</artifactId>
    <version>11.5.2</version>
</dependency>

<dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j-JAXB-ReferenceImpl</artifactId>
    <version>11.5.2</version>
</dependency>

<dependency>
    <groupId>org.apache.xmlgraphics</groupId>
    <artifactId>fop</artifactId>
    <version>2.10</version>
</dependency>

and a source docx document - google drive link here

Share Improve this question asked Feb 1 at 6:18 Anatoly SokolovAnatoly Sokolov 32 bronze badges 0
Add a comment  | 

1 Answer 1

Reset to default 0

This seems to be caused by feature PP_COMMON_CONTAINERIZATION.

It is grouping the list items in a content control, then seems to be incorrectly numbering the content control as well.

You need to turn that off, but Docx4J.toPDF doesn't give you that option.

You can use instead:

        FOSettings foSettings =Docx4J.createFOSettings();
        foSettings.setOpcPackage(wordMLPackage);
        foSettings.getFeatures().remove(ConversionFeatures.PP_COMMON_CONTAINERIZATION);
        
        Docx4J.toFO(foSettings, os, Docx4J.FLAG_EXPORT_PREFER_XSL);

Or

        FOSettings foSettings =Docx4J.createFOSettings();
        foSettings.setOpcPackage(wordMLPackage);                        
        Docx4J.toFO(foSettings, os, Docx4J.FLAG_EXPORT_PREFER_NONXSL); // NONXSL ignores content controls

Now tracking at https://github.com/plutext/docx4j/issues/607

转载请注明原文地址:http://www.anycun.com/QandA/1744836759a88302.html