There's no debating that XML has become the de facto industry standard for data exchange between organizations as well as data exchanges between software applications. Given this fact, application developers need solutions that let them store and process the XML values from these data exchanges.
Following in the footsteps of the DB2 family of products, the DB2 for i 7.1 release includes new features that make it easier for developers to store and process XML as well as generate XML values for new data exchange requirements. Three key components are at the core of the enhanced XML support in IBM i 7.1:
- A new XML data type that simplifies the storage and retrieval of XML
- Annotated decomposition of XML documents to allow applications to extract and store just the business data encapsulated within an XML document
- SQL XML publishing functions to easily generate XML values from existing DB2 for i database objects
Some of these capabilities will sound familiar to developers experienced with the DB2 XML Extender product. Although the XML Extender product is supported and available on IBM i 7.1, IBM intends to eventually replace that product with these new integrated XML capabilities. So without further ado, let's investigate these new capabilities so you will be ready to use them.
Storing, Retrieving, and Validating XML Documents
The XML data type is the building block for all the XML support in DB2 for i 7.1. The new type can obviously be used when defining a column to store an XML document, as Figure 1 shows. In addition, the XML data type can also be used for parameters on SQL routines, for host variables in HLL programs, and as the data type for the new SQL support for arrays and global variables. From embedded SQL to JDBC, IBM made enhancements to all the SQL programming interfaces to support the new XML data type. In contrast, no enhancements were made to the native record-level I/O interfaces, so the use of SQL is required when storing and retrieving values from an XML column.
One thing that might appear a little strange in the XML column example in Figure 1 is that no length value is specified on the resDoc column definition. Similar to variable-length columns, DB2 for i just allocates storage dynamically based on the size of the XML value being stored. The DB2 XML type maximum size limit is 2GB. One advantage of storing XML documents within an XML column is that all your XML documents will now get backed up and restored using your usual database backup and recovery process. When no Coded Character Set Identifier (CCSID) value is specified on the column definition, as in this example, DB2 uses a value of 1208 to allow the XML data to be stored with the UTF-8 encoding. UTF-8 is the preferred encoding scheme from a performance perspective when DB2 performs parsing and validation operations on an XML value.
DB2 implicitly parses the input XML value on all Inserts and Updates that don't specify the XMLPARSE function explicitly. This XML parsing process is necessary to guarantee that only well-formed XML documents are stored within a column. The Insert statements in Figure 2 show examples of both implicit and explicit XML parsing.
Validation of XML documents is never performed automatically by DB2. Instead, the developer decides whether XML validation is necessary. This developer control is demonstrated in Figure 2, which shows the first insertion of an XML document done without any validation, and shows the second Insert statement with the XMLVALIDATE function included to specify that validation be performed on the XML value being inserted. The validation is performed based on the XML Schema Definition (XSD) specified by the programmer. The ACCORDING TO clause on the XMLVALIDATE function controls which XML schema definition is used by DB2 to validate the XML value being inserted.
In addition to the XMLVALIDATE and XMLPARSE functions, these Insert examples use another new function, GET_XML_FILE, which is one of several integrated utilities provided to make XML integration easier. Figure 3 contains a complete list of these utilities. The GET_XML_FILE function makes it extremely simple for developers to access and use XML stored in a stream file or a source physical file member by allowing them to just specify the file name and let DB2 handle all the data retrieval.
Before an XML schema can be used for validation, the XML schema must be registered and stored within the DB2 XML Schema Repository (XSR). This prerequisite step is accomplished by again using the integrated XML utilities. Figure 4 shows how the registration is done by utilizing two system-stored procedures: XSR_REGISTER and XSR_COMPLETE.
The first two parameters on the XSR_REGISTER procedure are for the library and object name where DB2 will store the XML schema. In this case, the XML schema will be stored in an object named RESSCHEMA in the MYLIB library. The RESSCHEMA object will have an object type of *SQLXSR. These two parameters are also important because they're the same identifiers that the developer will need to use on the XMLVALIDATE function to direct DB2's usage of the correct XML schema document. The fourth parameter is where the actual XML schema document is passed to DB2. Once again, the handy GET_XML_FILE function is used to retrieve the XML schema from the specified file. The third and fifth parameters are set to the null value since they're optional parameters. I don't believe the optional parameters provide any real value. Consult the DB2 for i SQL Reference (available in the DB2 for i manuals section of the IBM i5/OS Information Center) to make your own decision.
A call to the XSR_COMPLETE procedure is necessary to complete the registration of the XSD, because you can combine multiple XML schemas into a single XSD by using the XSR_ADDSCHEMADOC procedure. The first two parameters should be obvious. Null again is passed for the optional third parameter. The fourth and final parameter is used to tell DB2 whether the registered XML schema definition will be used for validation or for decomposition of an XML document. The value of 0 is passed since validation is the purpose of this XML schema document. A value of 1 is passed when the schema document is to be used for decomposition.
Whether you're storing validated or non-validated XML data, applications and reports need to be able to quickly locate XML documents containing specific business values. The IBM OmniFind Text Search Server provides this high-speed text-search capability. For information about the XML search features of the IBM OmniFind server, read my article titled "Rev Up XML Searches with IBM OmniFind" (May 2010, article ID 64984 at SystemiNetwork.com).
The retrieval of values from an XML column is straightforward. A stored XML value can be fetched into a character variable in your application or copied into an external file. The XML data residing in an external file makes it easy to FTP the stored XML to another server or to display the XML on a website. When retrieving data from an XML column, the XSLTRANSFORM function can be used to convert the stored XML into HTML format. This function makes it easy to make your XML available on the web.
XML file reference variables are the easiest way to get the XML data copied into an external file. File reference variables were first introduced to support the export of character large object (CLOB) and binary large object (BLOB) values into external files, and the XML support just builds on that foundation. The example in Figure 5 demonstrates how little effort is required to copy a stored XML value into an external file. The declaration of an XML file reference variable results in the compiler generating a structure with the four fields that compose a file reference variable. The first step is setting the name and name length fields of the MY_FILE file reference variable to the output file path ('/outdir/out1.xml') and the character length of that file path value(16). Next, the file operation field (MY_FILE_FO) is assigned a constant value (SQFCRT) that tells DB2 to create an output file with the specified name and populate that file with the contents retrieved into the file reference variable. The final step is then using an SQL SELECT or FETCH statement to copy the XML column (resDoc) into the XML file reference variable.
Annotated Decomposition of XML Documents
Annotated decomposition of an XML document is an alternative to having to store the entire XML document. Annotated decomposition is just a fancy term used to describe the process of extracting the raw business values from an XML document and storing them in DB2 columns. Some developers may choose this approach because they don't want to waste storage space storing all those XML tags. This decision may also depend on whether audit requirements dictate the need to produce the original XML document. These are just some of the factors to consider when you're trying to decide on storing an XML document versus decomposing the document.
Figure 6 shows the decomposition process at a high level. The programmer reviews the XML document and maps the XML elements and attributes to columns within the DB2 databases. The developer formally defines this mapping in an annotated XML schema document. Once the mapping definitions are completed, the XML schema document needs to be stored and registered within the DB2 XML Schema Repository by using procedure calls almost identical to those contained in Figure 4. With the mapping defined, the application simply needs to invoke the XDBDECOMPXML stored procedure to decompose an XML document into the specified DB2 columns.
Figure 7 contains a sample XML document (ship1.xml) that will be decomposed using the annotated XML schema document in Figure 8. The mapping contained in the annotated XML schema will result in a single row being inserted into a table named Authors and two rows inserted into the Books table. Figure 9 shows the Insert statements generated by DB2 to perform the decomposition.
Although the annotated mapping syntax can be overwhelming, focus on the db2-xdb:rowset and db2-xdb:column tags. These two tags identify the target DB2 table and column that will receive the specified value from the XML document. For instance, the value of the name element will be copied to a column with the same identifier (Name) in the Authors table. The defaultSQLSchema element specifies the schema or library where the Authors table resides—in this case, MYLIB. If the defaultSQLSchema element isn't specified, the resolution of the unqualified table name follows the rules of the specified naming convention: SQL or System (*SYS) naming.
Notice that the ID attribute for the author element actually has two different columns associated with its annotated mapping. This mapping enables the author ID attribute value to be copied to both of the specified columns. This capability is useful when decomposing XML into tables with a parent-child relationship since there needs to be a shared key value between the two tables. The annotation support also gives the programmer the flexibility to ignore values in the XML document. In this example, the annotated schema contains no reference to the publisher element.
With the annotated XML schema document stored and registered within the DB2 XSR, the actual decomposition is performed with the XDBDECOMPXML stored procedure. Figure 9 contains a sample invocation of this procedure. The first two parameters identify the library and name of the annotated XML schema to use for the decomposition. The third parameter identifies the XML value to be decomposed, and the final parameter is another optional parameter in which I recommend just passing the Null value. Hopefully, this basic example gives you a good understanding of what can be accomplished with the annotated decomposition support. More details on the annotated syntax are found in the SQL documentation available in the DB2 for i manuals section of the IBM i5/OS Information Center.
SQL XML Publishing Functions
The SQL XML publishing functions (Figure 10) are the final piece of the DB2 for i 7.1 XML support. These publishing functions make it easy for applications and queries to generate XML values. The publishing functions can reference any column value or expression. For example, they can reference a column containing data entered from green-screen applications or a column storing data from decomposed XML documents. Reviewing a couple of examples is the best way to understand how these SQL XML publishing functions can be leveraged by your application.
The Select statement in Figure 11 leverages the XMLELEMENT and XMLATTRIBUTES functions to generate two rows of XML output (Figure 12 shows the output) that represent employees who have reached their 25th year of employment with the company. The name parameter on the XMLELEMENT function not surprisingly supplies the XML element name—the first element name in this example is <employee>. The next arguments of the XMLELEMENT function can be used to optionally supply a namespace or attribute value for the generated XML element using the XMLNAMESPACES and XMLATTRIBUTES functions, respectively. The XMLATTRIBUTES function supplies the empno column as the ID attribute value for the <employee> element. The final argument for the XMLELEMENT function is the element value. Given the hierarchical nature of XML, this may be an actual element value or a nested XML element, such as the <Name> element in this example.
Instead of just referencing a column value, the XMLELEMENT function for the <Name> element consumes the result of an SQL expression. The expression concatenates the employee's first and last name together with a blank in between before providing the concatenated value to the Name element. The remaining two XMLELEMENT invocations for the <Extension> and <DeptNo> elements are self-explanatory. The XMLSERIALIZE function wrappers the whole XMLELEMENT invocation to convert the generated XML value into a character string value that's easier for applications to consume and is displayable on text-based interfaces such as a 5250 emulator. Notice that the XML publishing functions are part of a Select statement that also includes join and selection conditions. This flexibility allows developers to add XML generation to the most complicated queries.
Be aware that the XML publishing output examples have been manually formatted for readability. The SQL XML publishing functions don't add any white space or new line characters for readability.
When the generated XML value contains a sequence or series of XML elements, the XMLFOREST function can be used as shown in Figure 13 to simplify the syntax of the XML publishing request. The XMLELEMENT function calls for the <Name>, <Extension>, and <DeptNo> elements from the previous example are all combined into a single XMLFOREST call that produces the exact same sequence of XML elements. This XMLFOREST example produces the same result (Figure 12) as the previous SQL XML publishing example.
A closer look at the output from the two publishing functions in Figure 12 reveals that each generated XML value contains only the values from a single row. When requirements state that the XML value needs to encapsulate data from multiple rows, developers can employ the XMLGROUP and XMLAGG aggregate functions. These aggregate functions perform processing similar to the GROUP BY clause. In fact, these XML aggregations can be used in conjunction with the GROUP BY clause, as Figure 14 shows.
This example uses the XMLAGG function to group the parts within the specified part types (C01 and E21) under the <Parts> root element with its part type as an attribute. The outer XMLELEMENT function is used to wrap the aggregated row values with values from the grouping column(s)—in this case, the parttype column. Nested under this root <Parts> element are all the <pid> elements containing the part identifier value for all of the parts in each part type. The generated <pid> elements are sorted by part weight within each part type. Figure 15 contains the outputted XML value paired with the count of parts within each part category.
Ease Your XML Exchanges
This article has merely scratches the surface of DB2's new XML capabilities. However, you should now have a good foundation of knowledge about how to leverage these new features to meet your organization's requirements for processing, generating, and storing XML data.
Kent Milligan is a senior DB2 for i specialist on IBM's ISV Enablement team. He spent the first eight years of his IBM career as a member of the DB2 development group in Rochester, Minnesota. He speaks and writes regularly about relational database topics.