How to Create a Word Document in RPG
Date Posted: March 10, 2011 12:00 AM

In a recent thread in the System iNetwork Forums, someone asked how to produce Microsoft Word documents from IBM i. He wanted to create a document in Word but then insert data from a DB2 for i database into the document with an RPG program.

I advised that he should investigate the .docx format that became Word's standard format starting with Office 2007. I knew that it was based on XML, and so you should be able to create it from any programming language, but I didn't elaborate, because I hadn't done it myself. Now I've decided it's time to experiment with it. In this article, I tell you what I discovered, and I demonstrate how to insert data into a Word document from RPG.

Microsoft Word and the .docx Format

With the release of Microsoft Office 2007, Microsoft changed its file formats dramatically. Word, Excel, and PowerPoint all have new native formats to store their data in. They still support the old formats, of course, but their "standard" format is the XML ones.

Microsoft Word is no exception. The older format uses the extension .doc to denote that it's a Word document, and the new format uses the extension .docx, which denotes that it's an XML variant of the Word document. Because I know that XML is plain text, and I know I can read and write XML from RPG, I should be able to work with the XML format, right?

Turns out, it's a little more complex than that. A .docx file isn't a single XML file; it's actually a .zip file that contains a whole directory structure. Within that structure are many XML files.

To experiment, I opened Microsoft Word and created the following Word document:

Notice that I've put placeholders where I wanted to insert data from my RPG program. For the date, I put ====DATE====. I figure that my RPG program can search the XML document for that string and replace it with the actual date. Likewise, I have placeholders like ====RECIP==== and ====TITLE==== for their corresponding fields from my RPG program. I chose the = character because it has no special meaning in XML, it works across all character sets, and it's unlikely that four consecutive = characters would appear in a normal business letter.

I saved this document to my PC as ACME.docx. I made certain to use the Office 2007 .docx format, since I know that's a zipped XML document. I used FTP in binary mode to upload this document to an IFS directory on my system that runs IBM i.

Next, I used the InfoZip Utility in PASE to unzip the Word document. (Actually, my first attempt used 7-Zip, which worked well for unzipping, but when I zipped up my result, it didn't work. Apparently, 7-Zip creates .zip files that Microsoft Word doesn't understand. InfoZip doesn't have as many features as 7-Zip but seems to be compatible with Word.) From QShell, I typed the following command:

unzip ACME.docx

And the command's output looked like this:

Archive:  ACME.docx
  inflating: [Content_Types].xml
   creating: _rels/
  inflating: _rels/.rels
   creating: docProps/
  inflating: docProps/core.xml
  inflating: docProps/app.xml
  inflating: docProps/custom.xml
   creating: word/
   creating: word/_rels/
  inflating: word/_rels/document.xml.rels
  inflating: word/_rels/settings.xml.rels
  inflating: word/document.xml
  inflating: word/footnotes.xml
  inflating: word/endnotes.xml
  inflating: word/header1.xml
   creating: word/theme/
  inflating: word/theme/theme1.xml
  inflating: word/settings.xml
  inflating: word/webSettings.xml
  inflating: word/styles.xml
  inflating: word/numbering.xml
  inflating: word/fontTable.xml

At this point, you may be wondering how that worked! After all, it was a .docx file, not a .zip file! How was I able to unzip it? In truth, it was a zip file. That's what today's Word documents are—they're .zip files that contain a particular directory structure. The files inside that structure are XML documents that contain the layout of the Word document.

I was amazed at how much data is stored inside a Word document. The files that contain the phrase "rels" are relationship documents that describe how the files relate to one another. Most of the others, including styles.xml, fontTable.xml, settings.xml, and theme1.xml are XML documents that describe how the document looks. What are the fonts? How is everything laid out? For now, I'm content to let Word figure all that out.

The only file I'm interested in is the document.xml file in the word subdirectory. It contains the actual document, including my ==== placeholders. If I load it up into my RPG program, I should be able to find those placeholders, insert my own text, save it back to disk, and re-zip it.

The Document.xml File

The document.xml file is, of course, an XML file. You can open it and look at its contents, and you'll see that it contains the text you typed into Word. I opened mine with the Firefox web browser, since Firefox formats XML nicely on the screen, making it easy to read. Here's an excerpt from the document.xml file:

<w:document>
  <w:body>
    .
    .
    <w:p w:rsidR="00BD0BBB" w:rsidRPr="00BD0BBB" w:rsidRDefault="001E5AC9" w:rsi
dP="00BD0BBB">
      <w:pPr>
        <w:pStyle w:val="Date"/>
      </w:pPr>
      <w:r>
        <w:t>====DATE====</w:t>
      </w:r>
    </w:p>
    .
    .

As you can see, it's an XML file, but what are all the elements in it? What does <w:Pr> do, for example? What is a w:rsidR="00BD0BBB"? I certainly don't know. Fortunately, I don't have to worry too much about them. I just need to replace ====DATE==== with data from my RPG program, and then I can save the rest of it back to disk unchanged.

So I did that. I wrote an RPG program that follows these steps:

  1. Calls InfoZip to unzip the .docx file into a temporary directory.
  2. Reads the document.xml file into a character string in my RPG program.
  3. Uses the %SCAN and %REPLACE BIFs to replace my placeholders with data from my program.
  4. Saves the document.xml file back to the IFS.
  5. Calls InfoZip to .zip the XML files again, creating a new .docx file.

One Tricky Problem

It partially worked. All the fields were replaced except my ====STATE==== and ====POSTAL=== fields. For some reason, they didn't get replaced! It took a while, but I eventually found the problem. In my document.xml, I was expecting to find this:

<w:t>====CITY====, ====STATE====  ====POSTAL====</w:t>

However, I didn't find that. Instead, I found this:

<w:t>====CITY====, ====STATE===</w:t></w:r><w:proofErr 
w:type="gramStart"/><w:r><w:t>=  =</w:t></w:r>
<w:proofErr w:type="gramEnd"/><w:r><w:t>===POSTAL====</w:t>

It appears that Word decided that my placeholders were bad grammar, so it inserted "proofErr" tags to show me where my grammar error started and ended. Because it happened to be in the middle of my ====STATE==== and ====POSTAL==== placeholders, my RPG program couldn't find the strings and therefore failed to replace them properly.

Once I finally realized this, I went into Word and disabled its spelling and grammar checking and tried again. This time, it worked!

The RPG Code

How does the RPG code work? It works by calling the QCMDEXC API to invoke QShell, and it uses QShell to unzip the .docx file.

         // ------------------------------------------------------
         // Extract the DOCX Template to a temporary
         // directory and mark the document.xml file w/CCSID 1208
         // ------------------------------------------------------

         cmd = 'QSH CMD(''export PATH=$PATH:/usr/local/bin +
                       && mkdir "' + TMPDIR + '" +
                       && cd "' + TMPDIR + '" +
                       && unzip "' + Template + '" +
                       && setccsid 1208 word/document.xml'')';
         QCMDEXC(cmd: %len(cmd));

I found I had to set the CCSID of document.xml to 1208 (UTF-8) in order for the IFS APIs to perform proper translation of the data when my program reads it in. In the preceding code, I used QShell's setccsid utility to do this. The CHGATR CL command is another good way to change the CCSID, but since I was already in QShell, I opted for QShell's command.

Now that my .docx file has been unzipped, I use the IFS APIs to load it into a variable in my RPG program.

     D buf             s          65535a
     D vbuf            s          65535a   varying
         .
         .
         // ------------------------------------------------------
         //   Load the document.xml file from the template
         //   into an RPG variable.
         // ------------------------------------------------------
         IfsPath = TMPDIR + '/word/document.xml';
         fd = open( IfsPath
                  : O_RDONLY + O_CCSID + O_TEXTDATA
                  : 0
                  : 0 );
         if (fd = -1);
            // handle error here
         endif;

         len = read(fd: %addr(buf): %size(buf));
         callp close(fd);

         vbuf = %subst(buf:1:len);

Since I decided I wanted my code to remain V5R4 compatible, I used an alphanumeric field that's only 65535 bytes long. It was more than large enough for my simple Word document. However, it's easy to imagine a situation where I might want to handle larger documents. In IBM i 6.1, you can change the size of buf and vbuf to much larger sizes, up to 16MB. I leave that change as an exercise for the reader.

Because I'm working with V5R4 code, I can't use RPG's new Scan and Replace (%SCANRPL) BIF, either, so I wrote myself a subprocedure to perform scanning and replacing.

     P scanrpl         B
     D                 PI
     D   vbuf                     65535a   varying
     D   oldval                     100a   varying const
     D   newval                     100a   varying const
     D pos             s             10i 0
      /free
         pos = %scan( oldval: vbuf );
         dow pos > 0;
            vbuf = %replace( newval: vbuf: pos: %len(oldval));
            pos = %scan( oldval: vbuf: pos+%len(newval) );
         enddo;
      /end-free
     P                 E

With this procedure, I can easily scan for my placeholders and replace them with data from my RPG program.

     D WordRepl_fields_t...
     D                 ds                  qualified
     D   date                        10a   varying
     D   recip                       30a   varying
     D   recipnm                     30a   varying
     D   title                       30a   varying
     D   company                     30a   varying
     D   address                     30a   varying
     D   city                        20a   varying
     D   state                        2a
     D   postal                      10a   varying

     D my              ds                  likeds(WordRepl_fields_t)
         .
         .
         scanrpl( vbuf : '====DATE===='    : my.date    );
         scanrpl( vbuf : '====RECIP===='   : my.recip   );
         scanrpl( vbuf : '====RECIPNM====' : my.recipnm );
         scanrpl( vbuf : '====TITLE===='   : my.title   );
         scanrpl( vbuf : '====COMPANY====' : my.company );
         scanrpl( vbuf : '====ADDRESS====' : my.address );
         scanrpl( vbuf : '====CITY===='    : my.city    );
         scanrpl( vbuf : '====STATE===='   : my.state   );
         scanrpl( vbuf : '====POSTAL===='  : my.postal  );

Prior to the preceding code, I set values for recipient, title, company, et al., in the my data structure. I just use simple variable assignment to hard-code these values in my RPG program. However, in a real-world program, you'd probably want to get this information either from the user or from a database file.

The result is that the placeholders are replaced with data from variables in my program. Now my vbuf variable contains the final XML document, with the data already filled in. I need to write it out to the IFS using the IFS APIs:

         fd = open( IfsPath
                  : O_TRUNC + O_WRONLY + O_TEXTDATA + O_CCSID
                  : 0
                  : 0 );
         if (fd = -1);
            // handle error here
         endif;

         callp write(fd: %addr(vbuf)+VARPREF: %len(vbuf));
         callp close(fd);

And the final step is to create a new .docx file by zipping up my temporary directory. I used QShell and InfoZip to do this.

         cmd = 'QSH CMD(''export PATH=$PATH:/usr/local/bin +
                       && cd "' + TMPDIR + '" +
                       && zip -r "' + NewDoc + '" *'')';
         QCMDEXC(cmd: %len(cmd));

The Result

When I open my new .docx file with Microsoft Word, it looks like this:

Code Download

Download the RPG code described in this article.

Download a copy of InfoZip that has been compiled for PASE.


Want to use this article? Click here for options!
Want to subscribe? Click here!
  • taford@us.ibm.com
    1 year ago
    May 04, 2011

    I've done very similar using HTML which doesn't necessarily have to be zipped or unzipped. Using a basic HTML template with tags for where the data goes I do the merge. when i open it up word, behold a nice mail merged doc appears

  • jacobusbezemer
    1 year ago
    Mar 23, 2011

    yes, jar will work just fine, but in order to execute the command from a normal command line or from an RPG or CL-program (so without going into Qshell) you have to set the environment variable to get an error message if it fails, see the following snippet, before you execute it (below the example to zip with jar):


    ADDENVVAR ENVVAR(QIBM_QSH_CMD_ESCAPE_MSG) VALUE(Y) +
    REPLACE(*YES)

    cd &fromdir

    CHGVAR VAR(&CMD) VALUE('jar -cvf ' || &TOFILE |< +
    ' ' || &FROMFILE)
    STRQSH CMD(&CMD)

  • tips@scottklement.com
    1 year ago
    Mar 12, 2011

    @dale_berta: I didn't try the jar utility, because I don't like using jar for .ZIP files. Back in the mid to late 90s when I learned Java, I did use jar for .ZIP files, and I showed many people how to do it. But, I got frustrated, because it was so slow, and didn't support so many of the features of .ZIP. Then, I found InfoZip, and later, 7-zip, both of which are far faster and more versatile.



    Having said that, jar might work for this. If so, it should be trivial for you to modify the RPG code to call jar instead of zip and unzip if you'd rather not install InfoZip on your system.

  • dale_berta@compaid.com
    1 year ago
    Mar 11, 2011

    Did you try QSH 'jar' utility to create the zip file? (I'm still using Office 2003, or I'd try it myself.) If the zip that jar creates is acceptable to Word, you wouldn't need any extra products to deal with zip and unzip.

  • You must log on before posting a comment.

    Are you a new visitor? Register Here
     

    around the forums

    PASE - HTMLDOC (Scott's binary version) Error: please Help!
    Forum Name: RPG
    16 May 2012 01:58 PM | Replies: 3
    IFS directory structure
    Forum Name: Systems Management
    16 May 2012 11:52 AM | Replies: 2
    IFS folder/file authority
    Forum Name: Communications/Networking
    16 May 2012 08:45 AM | Replies: 6

    ProVIP Sponsors

    BCD

    Join Our Community!

    Subscribe today to iPro Developer! iPro Developer is packed with technical know-how for developers of IBM i, iSeries, AS400 and System i. Sign up now to get your full subscriber benefits including:

    • Code available for download
    • Full access to the online article archive (including all System iNEWS ProVIP content)
    • Downloadable ebook with past 6 months of articles
    • Discounts on eLearning classes, self-paced training, in-person events, and more!
    iPro Developer Newsletters
    • Get the Latest News
    • Product Updates
    • Helpful Tricks
    • Productivity Tips