Insert Some Data into a Stream File
Date Posted: September 11, 2008 12:00 AM

Q: Can you shed some light on how I can add some data into a stream file stored in the IFS? I'd like to use the IFS APIs from my RPG program, but every attempt I've made has caused problems. I only need to insert an 8-digit number near the start of the document, but when I do that, it overwrites part of the next record! I tried to solve the problem by loading the whole thing into an array, but different XML documents seem to exceed the size of the fields in my array. How would you code something like this?

A: It's important to remember that stream files, including XML files, are not record-oriented files. Instead of thinking of them as a series of records, try thinking of them as one long string.

Why Did You Overwrite the Second Record?

Technically, that's all an IFS file is: one long string of bytes (though it's usually referred to as a "stream" of data instead of a "string," but in this context, they mean the same thing). It's not really divided into records. Instead, you look for special characters (by convention, the carriage return--CR--and linefeed--LF--characters) that denote the end of one record and the start of another. There's no magic to these characters, however. You can divide up your data anyway you like in a stream file--it's up to the program logic to do that and handle it properly. The operating system knows nothing about it, except that it's a big string of bytes.

For a moment, forget about XML or the fact that you're working with a file. Think about the following character string:

MyName = 'Scott / Klement / Mr.';

In this case, I had only one string of data to store data in, but I had more than one value that needed to be stored in that string. I wanted to fit a last name, first name, all into one variable. So my code inserted slashes between each value. When my code wants to extract the individual subfields, all I have to do is scan for the slash and substring out the values.

However, let's say I decided to change the "first name" part of the string to be a longer value, such as Christopher. I might do something like this:

%subst(MyName:1:11) = 'Christopher';

That wouldn't work, would it? I'd end up with this:

'Christopherment / Mr'

The system knows only that MyName is a string; it doesn't know that I divided into subfields with slashes. Since it doesn't know the format of the data, it does exactly what I told it to: It overwrites the first 11 bytes of the string with the new value--and that overlays my slash and part of the second field.

In the XML stream file, let's say the first line of data (terminated by CR/LF) is 50 characters long. Your application may have code to read until the CR/LF and to think of that as a "record," but the operating system doesn't know that. As far as the OS is concerned, your whole file is one big string, and the CR and LF characters are just ordinary bytes within that big string. If you insert your number into the string and make it 58 bytes long, you'll do the same thing that my failed %subst BIF did--you'll overwrite the first 58 bytes of the file. That includes the original 50-byte "record," followed by the CR and LF characters, followed by another 6 bytes, which you may perceive as being part of the next record--but the OS just perceives them as being bytes in a file.

Why Your Array Idea Didn't Work

The idea of loading the whole file into an array and then making your changes in the array and rewriting the whole file from the array is a good one because it lets you insert data into the file in memory, and then you can rewrite the file without messing up subsequent data.

The problem with this approach, however, is that XML files aren't really divided into records. There's no requirement that an XML file have any CR or LF characters, or any other delimiters other than the XML tags themselves. When you code an array of alphanumeric strings in RPG, you have to tell it how long each string is, and there's really no way to make your XML file fit that mold.

Instead of using an array, I'd use a single block of memory. Just load the whole file into one block of memory (dynamically allocated with the %ALLOC BIF), and then you can treat the whole file as one big string. That way, you don't have to worry about the CR or LF characters or whether they exist.

Using one big block of memory will actually require the use of less memory than an array would because you won't have wasted space in the form of trailing blanks at the end of each "record" that you'd have loaded into an array.

XML and Character Sets

Although you didn't mention it, another common mistake when working with XML is attempting to translate it to EBCDIC. XML data isn't typically stored in EBCDIC. Instead, it's encoded in some form of ASCII or Unicode. Translating that to EBCDIC can be problematic because there may be characters in the file that don't exist in EBCDIC.

Fortunately, RPG has excellent support for the UCS-2 dialect of Unicode. You can use UCS-2 in RPG nearly as easily as you would use EBCDIC.

The IFS APIs have the capability to translate your data from whichever CCSID the file is marked with to any other CCSID. That means that (provided that your XML file is marked correctly) the IFS APIs can easily convert it to UCS-2 for you to deal with in RPG, and then convert it back when you write your data back to the file.

Example

I've provided a lot of explanation so far, without any code to illustrate it. Now it's time to show you how to code all this. For the sake of example, let's say I have an XML file that has a fileheader tag. I want to insert an idnum attribute on this fileheader tag to assign an 8-digit ID number.

The original XML file looks like this:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<fileheader name="whutzit.xml" />
...more data here...

The fileheader tag will always be near the start of the XML document, but it might not be in the same byte position each time. For the sake of this example, we assume that it will always be within the first 16 KB of the file.

When you insert the idnum attribute, you want the result to look like this:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<fileheader idnum="12345678" name="whutzit.xml" />
...more data here...

I'd start by opening the file so that it will read the data and translate it into UCS-2. The open() API has a flag called O_TEXTDATA that tells the system to translate the data for you as it's read or written. It also has a flag called O_CCSID that lets you tell it the CCSID of your program's data. We'll use 13488, which is the CCSID for UCS-2.

     D CCSID_UCS2      c                   13488
     D fd              s             10i 0 inz(-1)
     D filename        s           5000a   varying
       .
       .
        filename = '/tmp/test.xml';

        fd = open( %trimr(filename)
                 : O_RDWR + O_LARGEFILE
                    + O_TEXTDATA + O_CCSID
                    + O_SHARE_NONE
                 : 0
                 : CCSID_UCS2 );
        if (fd = -1);
           // open() failed!  Check errno to find out why...
           return;
        endif;

The O_SHARE_NONE flag used in the preceding code will guarantee that this program has exclusive use of the file. Since I'm planning to read the whole file into memory and then rewrite the whole file, it's a good idea to make sure nobody else is trying to use the data!

To reserve memory for the whole file, I'm going to call the fstat() API to get the file size in bytes. Then I'll call the %ALLOC BIF to reserve a block of memory. Since the file might be in a single-byte character set, and I need to translate it to a double-byte character set (since that's what UCS-2 is), I'll want to allocate twice as much memory.

     
     D UCS_CHAR        s              1C
     D info            ds                  likeds(statds)
     D maxsize         s             10i 0
     D memblock        s          16383c   based(p_memblock)
        .
        .
        if fstat(fd: info) = -1;
           // handle error.
           return;
        endif;

        maxsize = info.st_size * %size(UCS_CHAR);
        p_memblock = %alloc(maxsize);

The fstat() API takes a file descriptor (the one returned by the open() API) and uses it to get information about the file. It returns this information in a data structure, in this case, I've named that structure info. It contains a field named st_size that will contain the size of the file in bytes.

Because the size of a 1C (UCS2 field) is 2 bytes long, if I multiply the file's byte length (from info.st_size) by %size(UCS_CHAR), it'll double the size of the buffer. I'm only using %size(UCS_CHAR) instead of a literal 2 because it makes the code a little more self-describing.

Now that I've done that, I can read the file into my allocated memory.

        lseek(fd: 0: SEEK_SET);
        len = read(fd: p_memblock: maxsize);

The lseek() API is sort of like a SETLL in traditional RPG. It positions the file's cursor to a byte position within the file. In this case, I'm telling it to position to byte zero, which is the start of the file.

The read() API will read data from the file into my p_memblock block of memory and return the number of bytes it read.

Because the memblock variable is based on the p_memblock pointer, it can view the memory stored at that pointer location. Since the memblock field is 16,383 long, it can view the first 16,383 characters that I loaded from the file. I can certainly load XML documents that are much larger than 16 KB, because my %ALLOC BIF is what reserved the actual memory--but I'll be able to scan for <fileheader> only within the first 16 KB.

     D HDRTAG          c                   %ucs2('<fileheader')
     D scansize        s             10i 0
     D pos             s             10i 0
     D len             s             10i 0
        .
        .
        scansize = %div(len : %size(UCS_CHAR));
        if scansize > %len(memblock);
            scansize = %len(memblock);
        endif;

        pos = %scan( HDRTAG : memblock );

        if (pos<1 or pos>scansize);
           // handle error -- tag not found.
        endif;

This code uses the %SCAN BIF to search for the <fileheader string. The position found will be placed in the pos variable.

Now that I know where the file header is, I can simply insert the idnum attribute right after it.

     D idnum           s             20c   varying
        .
        .
        idnum = %ucs2(' idnum="12345678"');

        pos = (pos + %len(HDRTAG) - 1) * %size(UCS_CHAR);
        callp lseek(fd: 0: SEEK_SET);
        callp ftruncate(fd: 0);
        callp write(fd: p_memblock: pos);
        callp write( fd
                   : %addr(idnum) + VARPREF
                   : %len(idnum) * %size(UCS_CHAR) );
        callp write(fd: p_memblock+pos: len-pos);

I can use the ftruncate() API to clear any data currently in the file, without having to close the file and reopen it. That works nicely in this case because I can clear the file and still maintain exclusive use of the file.

The first call to the write() API writes everything from before the <fileheader XML tag to disk. The second call to write() writes the idnum attribute right after the <fileheader tag. The final call to the write() API writes everything that came after the <fileheader tag. That way, the new copy of the file will contain everything, including the new idnum="12345678" attribute.

Code Download

You can download the complete code example that I've discussed in this article from the following link:
http://www.pentontech.com/IBMContent/Documents/article/57165_640_XmlInsert.zip


Want to use this article? Click here for options!
Want to subscribe? Click here!
There are no comments to display. Be the first to add your thoughts!
You must log on before posting a comment.

Are you a new visitor? Register Here
 

around the forums

PASE - HTMLDOC (Scott's binary version) Error: please Help!
Forum Name: RPG
16 May 2012 01:58 PM | Replies: 3
IFS directory structure
Forum Name: Systems Management
16 May 2012 11:52 AM | Replies: 2
IFS folder/file authority
Forum Name: Communications/Networking
16 May 2012 08:45 AM | Replies: 6

ProVIP Sponsors

BCD

Join Our Community!

Subscribe today to iPro Developer! iPro Developer is packed with technical know-how for developers of IBM i, iSeries, AS400 and System i. Sign up now to get your full subscriber benefits including:

  • Code available for download
  • Full access to the online article archive (including all System iNEWS ProVIP content)
  • Downloadable ebook with past 6 months of articles
  • Discounts on eLearning classes, self-paced training, in-person events, and more!
iPro Developer Newsletters
  • Get the Latest News
  • Product Updates
  • Helpful Tricks
  • Productivity Tips