Reprinted with Permission by Quest Software Feb.  2002

 

Programs and Packages, Plans and Collections
Robert Catterall

The ins and outs of DB2 program preparation and execution.

If you are a db2 for os/390 systems programmer, DBA, or application developer (or a consultant who works with such folks), you are probably very familiar with the terms package, collection, and plan. But could you explain these elements of program preparation and execution to a DB2 newbie? And do you know how something as seemingly innocuous as a plan's package list can affect a program's performance? Or how DB2 really finds what it needs when it's time to execute a static SQL statement? This stuff really is more interesting than you might think. Don't believe me? Read on, and see for yourself.

The Basics

In this column, I'll talk about programs through which static (prebound) SQL statements are executed. Static SQL statements are generally preferable to dynamic SQL statements (prepared by DB2 at execution time) because they typically make more efficient use of server processing resources and can allow for more stringent rules concerning data access authorization. The issue of static vs. dynamic SQL, however, is a topic for another time and another issue of this magazine.

The first step in DB2 program preparation is, of course, writing a program that contains embedded SQL statements. You could use COBOL, C (or C++), Assembler, FORTRAN, PL/I, or Java. (You may think that Java implies only dynamic SQL, but it doesn't if you take advantage of SQLJ. For a good introduction to SQLJ, see the Web Database column in the Spring 1999 issue.) You could write the program to run on the system on which the DB2 for OS/390 subsystem resides, or to access the DB2 database from a remote client via distributed relational database architecture (DRDA).

Once you've written the program, the DB2 precompiler processes it and generates two outputs:

1. A modified program source module. The precompiler comments out each of the program's embedded SQL statements, and inserts a call to DB2 for each statement.

2. A database request module (DBRM). A DBRM contains the SQL statements found in the program source.

The precompiler places a unique identifier, called a consistency token, into each of these outputs. I'll say more about consistency tokens when I get to program execution.

Following the precompile process, you compile and link-edit the modified source program into an executable load module and bind the associated DBRM. In the DB2 for OS/390 bind process, such tasks as access path selection (optimization), access authorization, and database object validation are performed. The output of the bind process is a control structure that DB2 will use to execute the SQL statements when the application program is run. The control structure will either be part of a plan (if the DBRM is bound directly into a plan) or contained within a package that will be executed via a plan.

How many of you work at a DB2 for OS/390 site that still binds DBRMs directly into plans? I wish I could see a show of hands. I would expect (and hope) that few hands would be raised. The DB2 package bind process has been around for quite some time (since DB2 version 2 release 3), and it offers some important advantages over plan-direct binds.

At CheckFree Corp., we've been using package bind exclusively for quite some time. I strongly recommend that you do the same if you're not already doing so, for these reasons:

If you use the package bind process, you have to bind the package into what is called a collection. How do you create a collection? Pretty simple: You bind a package into it.

Of course, even if you're using packages, you still need to bind one or more plans if the program in question will run on the local or a remote DB2 for OS/390 subsystem. Programs that run on other remote clients and access DB2 via DRDA use a default plan called DISTSERV. You can execute a particular package using a plan if the collection into which you've bound the package appears in what is called the plan's package list (a list of one or more collections specified via the PKLIST option of the BIND PLAN command). The program, in turn, invokes the plan through a specification in the resource control table (or a DB2ENTRY if you're using resource definition online) for a CICS transaction, via the application program load module name for an IMS transaction, or with a control statement in the job control language (JCL) for a batch job.

When you execute the application program, each call to DB2 directs the database manager to execute the corresponding prebound SQL statement in the package associated with the program. (Recall that the precompiler comments out SQL statements in the source program and adds calls to DB2.) DB2 searches for the package in one or more collections using as search criteria the package name (same as the program name) and the consistency token accompanying the call. (Recall that the consistency token, generated at precompile time, is carried in both the application program and the related package.) When a match is found, the statement is executed and control passes back to the application program until the next DB2 call is issued.

Finding the Package Isn't always simple

Sometimes, finding the right package to execute is a very straightforward process. Suppose, for example, that a plan has only one collection in its package list. In this case, it's obvious that DB2 will use packages bound into this collection.

Now, take the situation in which a plan's package list contains multiple collections. Why might this be the case? Two possible reasons are database segmentation and package versioning. By database segmentation, I refer to the practice of dividing at least a portion of the data among sets of otherwise identical tables within a single DB2 for OS/390 subsystem (or a single data sharing group). The division of data could be along geographic lines (for example, REGION1 and REGION2 tables) or by client or groups of client organizations (COMPANY1 and COMPANY2 tables, for example). Some DB2 shops with very large databases (including CheckFree) undertake such segmentation for database manageability and availability reasons. You distinguish tables in the various database segments by way of a segment-related high-level qualifier. For example, sales data would be divided between tables with fully qualified names such as REGION1.SALES, REGION2.SALES, REGION3.SALES, and so on. In this type of scenario, package bind allows you to develop one program that you can use in each database segment, as follows:

First, the program is written using unqualified table names (for example, SELECT TERRITORY FROM SALES). Then the program is bound into multiple collections, one for each database segment (REGION1 collection, REGION2 collection, and so on). For each of these multiple bind operations, you specify the appropriate high-level qualifier by way of the QUALIFIER option of the BIND PACKAGE command. Thus, the package bound into the REGION1 collection will, when executed, access tables in the REGION1 segment of the database.

Given this database and application architecture, accessing the right data means executing the right package; in other words, the package bound into the collection associated with the database segment of interest. Given a plan with a multicollection package list, how does DB2 know where to look for the package when executing an SQL statement? The search process is as follows (assuming that programs are not bound directly into the plan):

Given a situation in which multiple collections appear in a plan's package list and all packages are bound into each collection, some people assume that the first collection listed will serve, in effect, as the default collection. In other words, if the REGION1 collection is listed first, the package from that collection will be selected if the value of CURRENT PACKAGESET is blank when an application program issues a call to DB2. The problem with this assumption is that it does not take into account the possibility that the package might already be allocated to the thread, as I mentioned earlier. If, for example, DB2 needs to find package PROGABC, and if the version of that package bound into the REGION3 collection is already allocated to the program's DB2 thread, that version of the PROGABC package - and not the version in the REGION1 collection - will be used for SQL statement execution if CURRENT PACKAGESET is blank, even though the REGION1 collection is listed first in the plan's package list. If, for performance reasons, you drive thread reuse and bind your most frequently executed programs with RELEASE(DEALLOCATE), you'd better not think of any collection in a multicollection package list as the default. Instead, explicitly direct DB2 to the desired collection by way of SET CURRENT PACKAGESET. Even if all of your packages are bound with RELEASE(COMMIT), you should probably use SET CURRENT PACKAGESET to get where you want to go, in terms of collections. It's simply good programming practice.

Collection Order's Impact on Program Performance

Let's look at multicollection package lists from another angle. Suppose that, for example, you use a single plan in a production DB2 environment, which is a viable possibility with package bind. Then suppose that you group different packages into different collections by functionality. If each program is bound into only one collection, you might think that the order of collections in the "big plan's" package list is not very important - that DB2 will eventually find the package it needs as it searches the collections in package list order. In fact, collection sequence can matter, certainly in terms of program performance.

At CheckFree, we used to have a transaction that invoked a DB2 plan for which we'd specified a multicollection package list. We noticed that the execution of this transaction drove a large number of GETPAGEs for buffer pool BP0, in spite of the fact that the transaction executed nothing in the way of dynamic SQL (our BP0 basically is used only for the DB2 catalog and directory - user objects are assigned to other buffer pools). Upon investigating further, we noticed that a package the transaction executed very frequently was in a collection that was several spots behind the front of the plan's package list. Could these BP0 GETPAGEs be DB2 directory accesses associated with unsuccessful searches for the frequently executed package in the first few collections in the package list? To find out, we moved the collection with the frequently executed package to the front of the plan's collection list, and voila! The BP0 GETPAGEs dropped dramatically. Fewer GETPAGEs means less CPU and elapsed time.

So, in a multicollection situation such as the one I've just described, put the collections holding the most frequently executed packages at the front of the package list. Alternatively, you can use SET CURRENT PACKAGESET to point DB2 right at the collection that you should search. This, as I've said previously, is good programming practice. The practice may take some time to implement, as is often the case with even relatively simple code changes. A reordering of collections in a plan's package list could serve as a productive interim step.

A Few Words on Package Versioning

Allowing more than one version of a package (say, the newest and the next-most-recent versions) to exist within the package list of a plan can help in providing a quick backout capability in the event that a new version of a package causes problems. If such a situation occurs, you simply free the just-added version of the package (the one associated with a new and problematic version of the application program). When the previous (and well-behaved) version of the program is executed, the right package will be found because the previous version of the package is already in the collection list. If the previous version of the package had been removed from the collection list upon the bind of the new version, falling back to the old version would have required a rebind - an additional step that could delay the backout process and perhaps make the process more error-prone.

If you want to use package versioning in this way, you have at least two choices:

1.      You could keep the two versions of the package in two separate collections, with the newer version going in the collection listed first in the package list of the plan. (I will refer to the two collections as CURRENT and FALLBACK.) Before a new version of a package is bound, the version of the package in the FALLBACK collection is freed. Then, you move the version of the package in the CURRENT collection from that collection to the FALLBACK collection. Then you bind the new version of the package into the CURRENT collection. If a problem necessitating a fallback occurs, the version of the package in the CURRENT collection is freed. When you execute the previous version of the program, DB2 will search for the package in the CURRENT collection, since it is listed first in the plan's package list. On not finding the package in the collection (because it was freed), DB2 will search for the package in the FALLBACK collection, resulting in the previous version of the package being found and utilized for SQL statement execution.

2.      Alternatively, you could keep multiple versions of the package in the same collection, with the version ID (specified at precompile time) serving to distinguish one version of the package from another. (Version ID is intended primarily for user management of package versions; DB2 for OS/390 uses the consistency token to find the correct version of a package at program execution time, as I mentioned previously.) In this case, you'd want to ensure that you periodically remove old and unneeded package versions (those older than the current and next-most-recent versions) from the collection. You could accomplish this goal with a user-written program, in which case you could opt for a DB2-supplied version ID. (Specifying VERSION AUTO at precompile time causes DB2 to use a timestamp value as the package version ID.) The program would use the version ID to identify and free packages older than the current and next-most-recent versions. If you want to remove old package versions manually, you might want to go for a version ID that is shorter and easier to specify than the timestamp value generated by DB2 with VERSION AUTO. (You can specify your own version ID when a program is precompiled.)

I think both approaches are viable, so go with the package versioning approach that looks best to you - there isn't a right or wrong answer.

Exploitation: It's a Good Thing

I find the subject of packages, collections, and plans quite interesting and could go on and on. But before you tell me to get a life, I'd better wrap up. By taking a closer look at these and other tried-and-true DB2 for OS/390 capabilities, you can learn how to really exploit the product to benefit your business.

Robert Catterall is a strategic technology engineer with Atlanta-based CheckFree Corp. where he analyzes and evaluates database technology in order to facilitate the integration of new database capabilities with CheckFree application systems. You can reach him at rcatterall@checkfree.com.