V6R1 Journal Recovery Enhancements, Part 2
Date Posted: July 01, 2008 12:00 AM

In "V6R1 Journal Recovery Enhancements, Part 1" (April 2008, article ID 21229 at SystemiNetwork.com), you learned that V6R1 comes with a number of journal enhancements that might help address concerns and frustrations that you currently have or have had in the past. In Part 2, we continue our discussion of a few more of the enhancements, delving into the history of the problems they address and discussing how V6R1 handles them.

Mind Your Meter Maid

The date was 9/12/01. Larry was scheduled to fly from Minnesota to Florida to attend an IBM technical conference. It was the day after 9/11. There was uncertainty regarding how soon air traffic would be restored, so he dutifully set his alarm for 4:00 a.m. and by 5:00 was on the road, headed to the Minneapolis airport. You can probably guess the rest of the story. He never got off the ground that day, nor did he ever get to the conference.

Rules regarding airports, air travel, and unloading zones in front of airports have never been the same since — a fact that was driven home months later when Larry tried pulling up to the loading/unloading zone at another airport. Along came an officer who in no uncertain terms advised him to "move along and be quick about it."

What does this story have to do with V6R1? It turns out that for a number of releases, i5/OS has harbored a similar meter maid — a low-level microcode task that monitors for the presence of unwritten journal entries accumulating in main memory and schedules their departure to disk on a timely basis. We call it the journal cache sweeper task. Its mission is to ensure that folks who employ the journal caching feature (i.e., the i5/OS optional feature that helps improve performance in a journal intensive environment) don't have their cached journal entries linger in main memory too long. But how long is too long? That's the tough question!

Flush this main memory buffer too frequently, and you thwart the performance benefits achieved by caching. Flush this main memory buffer only occasionally, and you risk (especially in a remote journal environment) sending recent database changes to the target machine too infrequently.

In the earliest releases in which this meter maid was introduced, the decision regarding how frequently to have her swing by and empty the journal cache was a hard-coded value established in the lab — and we deliberately erred on the conservative side, not allowing her to show up so often that she could be accused of being a CPU hog. We quickly got the message that not all customers appreciated our heavy-handed approach. They wanted to exercise some control over the frequency with which this meter maid uttered the words "move along!" And so, for a few releases there were some rather obscure and poorly advertised APIs one could use to give the meter maid a kick in the pants.

Most folks thought these APIs were tough to work with and not very obvious. They wanted a more natural way to control the frequency with which cached journal entries are flushed from main memory.

V6R1 addresses that concern by letting you customize how often the meter maid shows up. You do so with the Change Journal Attributes (CHGJRNA) command, which lets you select a value between 1 and 600 seconds, indicating how frequently you want this sweeping behavior to occur. Simply insert your own customized value for Cache wait time (the second parameter on the screen in Figure 1).

The Two Shall Become One

Larry's 82-year-old mom faces a challenge when an electrical storm knocks out power to her residence. She calls Larry and asks, "What time is it?" Throughout her townhome are multiple gadgets that attempt to display the current time. There's the clock on the nightstand, the clock built into the stove, the clock on the microwave, the clock on the VCR, and probably more that Larry doesn't know about. If they all tell the same time, life is good. When they don't, Mom is troubled, and Larry's phone rings. You can imagine the chaos at Mom's place when the U.S. Congress changed the starting date for switching out of daylight savings time last fall — some devices seemed to have a mind of their own: Some fell back an hour two weeks early, some didn't move at all. It all led to confusion.

Confusion crops up for journal users when there are multiple commands for achieving the same objective and the commands behave differently. Yet for the past couple of i5/OS releases, that's what our customers faced: two competing commands for use when they wanted to apply a set of journal entries.

There was the traditional journal-apply command — APYJRNCHG — and the newer apply-extended command — APYJRNCHGX. If you used the first, you'd fail to replay certain types of objectwide journal entries. If you used the second, you'd restrict yourself to database objects and ignore the other types of objects that deserved journal protection (e.g., data areas, data queues). Confusion reigned.

What we needed was a single command — the same way Mom's life would be less stressful if she had a single clock! V6R1 achieves that objective.

Sure, we could have come up with yet another apply command and claimed that it was the culmination of this evolution in journal-apply technology, but doing so probably would have added to the confusion. Instead, we simply elected to enhance the granddaddy command (APYJRNCHG) to be as smart as the new young whippersnapper command (APYJRNCHGX). Hence, the two have effectively become one, and whereas we often suggested that sophisticated shops use APYJRNCHGX in releases prior to V6R1, the new advice is to all begin the process of using APYJRNCHG. In time, IBM will likely withdraw support for APYJRNCHGX. While we were at it, we also beefed up how the whole apply process works, so that

  • logical files are no longer second-class citizens
  • creation of new data queues and data areas now produces journal entries that can be replayed

As a consequence, you'll begin to see some new flavors of journal entries. Their purpose is to help ensure that changes to both physical files and the descriptive properties of logical files are all treated even-handedly and that all get replayed properly.

Inheritance, It's a Great Thing

To end up with some of these new journal behaviors and enable both IPL recovery and APYJRNCHG to strut their stuff, you must flag the surrounding library itself as a journaled object. The name of this new property is inheritance, and it is managed with the new Start Journal Library (STRJRNLIB) command.

Beginning in V6R1, when a library associates itself with a journal, all journal-eligible objects (e.g., physical files, logical files, data areas, data queues) created thereafter and therein inherit the same journal characteristics as the surrounding library. For example, if you issued the new Start Journal Library (STRJRNLIB) command, designating library Lib1 as the library that you want to behave in this new fashion and journal J1 as the matching journal of interest, any CRTPF issued thereafter that placed the resulting PF into Lib1 would correspondingly enable journal protection for the PF at birth. Hence, the new object inherits journal protection from the surrounding library and routes its changes to journal J1.

Why does this behavior matter? It's especially attractive in a high availability (HA) environment, in which you tend to have applications on the production machine that create new physical files or data areas as they are executed. Without inheritance, the HA software provided by your business partner has to monitor the audit journal to notice that a new object has been created. It's then a foot race to see whether the third-party HA software can snag a copy of the new PF, prime it to the target machine, and enable journaling before your application begins to populate the new PF. As you can probably guess, the HA software often loses the foot race.

This new support, called library inheritance, eliminates the foot race, letting you ensure that every new PF has journaling initiated at birth. This assurance, in turn, reduces the likelihood that you'll end up with missing, overlooked, or out-of-synch objects when you need to switch to the target machine.

In addition, the inheritance rules can be highly customized to influence which types of objects behave in this new fashion. You can also customize the conditions under which journaling is automatically started for an object. For example, you might want this behavior to occur when a new object of the designated type is created but not when it is restored from tape, or you might want automatic journal protection kicked off when an existing object is moved from a test library to a production library. For complete details about all the parameters, see the System i InfoCenter (ibm.com/systems/i/infocenter).

You can see these choices on the new STRJRNLIB command. For example, to direct i5/OS to use journal J1 for all objects created/moved/restored hereafter into library LIB1, you could issue the following command:

STRJRNLIB LIB(LIB1) JRN(MYJRNLIB/J1) INHRULES((
 *ALL *ALLOPR *INCLUDE *OBJDFT *OBJDFT ))

With One Wave of Your Hand

Designating preferred behavior (inheritance) that you want to happen hereafter is good, but what if you already have many objects residing in an existing library and want to enable journaling for all of them? Well . . . it used to be a pain. That's because, before V6R1, if you wanted to start journaling for every physical file in library Lib1, you had to know the name of each PF and issue a separate Start Journal Physical File (STRJRNPF) command for each object!

It reminds Larry of the chore he faces each autumn when the leaves begin to fall. Larry's fishing shack in northern Minnesota is surrounded by giant oaks. They provide shade on hot days in July and August, but when late October rolls around, he yearns for a means to clean up the yard with a gigantic wave of his hand.

In a sense, that's what V6R1 provides. It lets you issue a single STRJRNPF command but with a new option — one that suggests that you know what you're doing and truly do want ALL PFs within the designated library to have journal protection started (Figure 2). Such a command would look something like this:

STRJRNPF FILE(LIB1/*ALL) JRN(MYJRNLIB/J1)

Notice that you could even provide a generic name (perhaps aimed at including your production files because they start with PROD* but leaving out the work files).

Timely Responses

With all this journaling going on, talking about good housekeeping makes sense. Some of the best housekeeping happens when you simply refuse to let junk pile up. Journal receivers that stick around long past their useful life can become such clutter. They take up precious space, and it's unlikely anyone is going to use them. On numerous occasions, we've encountered customers who have complained about the need to purchase more and more disk and yet felt that their business had not grown by nearly the rate of disk consumption. What did we often discover? Journal receivers that were months old. Timely response to do a little housekeeping on such journal receivers would have been prudent.

Larry heard about one of the best examples of prompt response when he lived in northern Illinois and taught in a school district near a penitentiary. Among his students were the warden's daughters. The penitentiary was proud of the culinary skills it had imparted to some of the inmates and loved to show them off. Special guests were ushered into a formal dining room, where the stewards were inmates. The superintendent of the school district was one of those guests and told Larry how he had positioned the linen napkin on his lap, it had slipped off, and before he had time to reach down for it, an inmate was at his side with a fresh one. That's a timely response!

Journal users, beginning in V6R1, can orchestrate a similar timely response, because now there is an opportunity to register an exit routine to be invoked each time a journal receiver is detached. You can accomplish this task with a command such as this one:

ADDEXITPGM EXITPNT(QIBM_QJO_CHG_JRNRCV)
FORMAT(CRCV0100) PGMNBR(*LOW) PGM(MYLIB/MYPGM)

As you probably realize, journal receivers are like buckets, and journals are like funnels placed over those buckets. Hence, any database row images passed along to a journal for safekeeping travel through the funnel and end up in the bucket. The longer the bucket stays in place, the fuller it becomes. Eventually, it nears its capacity, and it's time to swap buckets. This is a two-step process known as detaching the old journal receiver and attaching a new one.

After a journal receiver is detached, it's time for some housekeeping. Although you've been able, for years, to use commands such as CHGJRN JRN(MYLIB/MYJRN) DLTRCV(*YES) to instruct the operating system to automatically delete the journal receiver as soon as it's detached, such automated behavior has felt too abrupt for some users. It meant that the journal receiver vanished before they had a chance to react. Therefore, few users capitalized on this automated delete receiver behavior. Instead, they had certain housekeeping chores they wanted to perform before the journal receiver disappeared. Managing that concern has just become easier.

Such housekeeping chores might include saving a copy of the receiver to a save file. With the preceding new exit routine interface, you could register a simple CL program that responds to the detach operation by saving the receiver before you delete it. That way, you not only ensure that you have a second copy available for the nightly save but also that old journal receivers don't linger in perpetuity.

Want to take this further? Save the detached receiver to a save file, ship a copy of the save file to a distant machine, and then clean up the space occupied by the save file and the journal receiver. This procedure becomes particularly important as part of a timely space-management strategy if your applications tend to slurp up multiple gigabytes of journal receiver space per hour.

Sometime housekeeping can't be delayed until the end of the day. If that's the need that you have, the new V6R1 support should make your chore easier.

Having a Really Bad Day

We've all had them: days that started out looking good only to take an unexpected turn for the worse. One of Larry's came in December 1976, when he had completed graduate school and was looking for a job. Larry had taken summer assignments over the years at a variety of places, including the telephone company in Chicago (a branch of AT&T at the time known as Illinois Bell). His work there had been interesting, but he wanted to move into a development lab, so his buddies at Illinois Bell pulled some strings and got him an interview in a Chicago suburb at the Bell Labs site.

This was in the days before laptops, e-mail, and electronic résumés. Larry's interview had been hastily arranged, and he'd had no opportunity to send his paper résumé ahead via snail mail. Instead, the Bell Labs HR office had told him over the phone to bring along a typed copy of his résumé. That sounded simple enough, so Larry put a fresh ribbon in the typewriter, used the finest cotton fiber paper he could afford, and carefully typed his one and only copy of a résumé.

Larry drove to the Bell Labs location and was careful to keep the one-page résumé crisp and unfolded. He arrived, announced his presence to the receptionist, and spotted a nearby washroom. Knowing how each interviewer had a tendency to offer the interviewee a cup of java, he elected to be proactive and assure he entered the fray with nothing on his mind but the interview. Seeing no convenient shelf to park his résumé, Larry improvised and tucked his neatly typed résumé under his chin as he went about his business. You know what's about to ensue, don't you?

He was doing fine until he moved to the wash basin and an old college chum who now worked at Bell Labs sauntered in, recognized Larry, and spoke his name. Instinctively Larry's head turned and, well, Larry ended up with one soggy résumé and sensed that his pretty-good day had suddenly turned sour.

What does all this have to do with V6R1? In the System i world, there can be days that seem to be going well only to have a hardware or software glitch suddenly bring the machine down. Depending on which database files are open at the time, and especially on how many SQL indexes or keyed logical files are affected, you can end up with a rather long recovery/IPL. Such long-duration IPLs are about as pleasant as a soggy résumé.

The good news is that built into i5/OS is a crucial piece of support — think of it as a safety net — aimed at making your IPL duration not nearly so painful. It's System Managed Access Protection (SMAPP), a type of subtle journaling specifically for access paths.

This SMAPP support has evolved over the years and now has screens for letting you take a peek inside the machine and find out what's going on. In particular, you can display a screen that estimates how long an abnormal IPL would spend rebuilding keyed access paths if the machine were to go down abruptly. There are also companion screens that help you understand which indexes are currently protected and which are not. It's this set of screens that has been beefed up for V6R1. Before V6R1, there were only two; now there are three. Getting to know these screens and how to find them can help you extract the data you need to make better SMAPP decisions.

To gain access to these screens and sample the current status of your SMAPP protection and/or revise such protection, you can use the Edit Recovery for Access Paths (EDTRCYAP) command. From the EDTRCYAP main screen, you can use function keys to navigate to the display screen of interest. The three SMAPP status screens of particular interest are

  • the one that reveals the list of access paths that are not eligible to be protected: F13 (Figure 3)
  • the one that reveals the list of currently protected access paths: F14 (Figure 4)
  • the one that reveals the list of currently "exposed" access paths that qualify for protection but are currently unprotected (often because they're too small): F15 — this one is new for V6R1 (Figure 5)

What insights deep into the behavior of the automated SMAPP support might allow us to learn? If you discover that lots of very small duration access paths are protected, it probably means that your SMAPP setting is too low and that you're wasting precious CPU cycles for very little payback. Discovering that some substantial-sized access paths are flagged as ineligible for SMAPP protection probably suggests that you have far more risk of long-duration recovery outages than you realize and that you should try to coax these access paths back onto the straight-and-narrow. Finding that lots of eligible access paths are on the unprotected screen might make you rethink your SMAPP settings — perhaps you're still locked into a choice made a decade ago and need to modernize.

V6R1 Journal Protection Has You Covered

V6R1 journal protection offers you many new goodies. The enhancements that we covered in Part 1 (i.e., more even-handed journal treatment, better monitoring for potential traffic jams, timely detection and enhanced prevention for garbling, and increased bandwidth for remote journal needs) plus the enhancements that we discussed here will help simplify your journal management chores. You can see that the new features are in keeping with a major V6R1 theme: It's a release that takes high availability seriously!

After more than 30 years of experience leading the design efforts for System i journal support at IBM, Larry Youngren recently retired from IBM and now lectures and consults on high availability issues.

Robert Andrews is an advisory software engineer at IBM and focuses on database and journaling technologies on the System i.


Want to use this article? Click here for options!
Want to subscribe? Click here!
There are no comments to display. Be the first to add your thoughts!
You must log on before posting a comment.

Are you a new visitor? Register Here
 

around the forums

better data access for AS400 applications
Forum Name: Systems Management
21 May 2012 06:22 AM | Replies: 0
Selection error involving field *N.
Forum Name: SQL, Query and Database
18 May 2012 02:19 PM | Replies: 6
WINDOWS 7 with CLIENT ACCESS 7 R1
Forum Name: Communications/Networking
18 May 2012 08:43 AM | Replies: 1

ProVIP Sponsors

BCD

Join Our Community!

Subscribe today to iPro Developer! iPro Developer is packed with technical know-how for developers of IBM i, iSeries, AS400 and System i. Sign up now to get your full subscriber benefits including:

  • Code available for download
  • Full access to the online article archive (including all System iNEWS ProVIP content)
  • Downloadable ebook with past 6 months of articles
  • Discounts on eLearning classes, self-paced training, in-person events, and more!
iPro Developer Newsletters
  • Get the Latest News
  • Product Updates
  • Helpful Tricks
  • Productivity Tips