|
|
What Aunt Louise's grocery shopping method reveals about index usage.
There are three stories that I am asked to tell again and again. One is the brick laying (or dish washing) tale about my two children, Scott and Beth, who do asynchronous and synchronous I/O, respectively. Another is the story of the large box of balls at the airport that I use to illustrate one of the many negatives of complex predicates. Both of the stories have been told (see "One Brick at a Time" and "Predicate Evaluation, Part I" listed in Resources). Recently, I was asked by a friend at the IBM DB2 Information Management Technical Conference to please document the third story, the story of my Aunt Louise's grocery buying habits. So, here goes.
Ages ago, when I first began using DB2 in a mainframe environment, there was only one method by which DB2 used an index. But, as of V2R1 of DB2 for MVS (now OS/390 and z/OS), there have been two methods: the original technique, which works much like we humans work when we use an index at the back of a book, and the newer, weird and wonderful technique, which works much like my Aunt Louise when she buys groceries.
Basic Index Information
One of the primary reasons for using an index in a book is to avoid having to read the entire book to locate just the bits of information you want to read. Likewise, one of the primary purposes of a DB2 index is to filter out the table rows that you wish to read from those in which you have no interest, thereby avoiding a tablespace scan of the whole table. The filtering is accomplished by applying at least one SQL WHERE clause predicate to an index. I discussed index filtering using both matching and screening in my prior columns on Predicate Evaluation (see Resources).
Just as an index in the back of a book contains values followed by page numbers, a DB2 index contains column values followed by one or more ROWIDs. These ROWIDs are pointers that consist of a table page number and an ID map number. The ID map is a chain of two-byte entries at the end of each table page. Each of these two-byte entries contains a Relative Byte Address (RBA) pointing to the beginning location of a row; there is an entry for each row on that particular page. These RBAs can be changed as needed so that rows can be moved within a single page (perhaps to allow enough contiguous space for a brand new row or an updated and longer row to fit back where it came from). The purpose of this ID map with its RBA is to place the responsibility for keeping track of the exact physical location of the row on the shoulders of the table and not the index. This design keeps index maintenance (and the accompanying I/O and CPU overhead) to a minimum, and it gives the table total authority for the exact physical location of the row on the page.
In fact, if a row expands and absolutely will not fit back on the same page, the ID map may contain an RBA that points to a location on the table page that contains another ROWID pointing to a totally different table page and ID map entry. This is called an offset pointer. Offset pointers are counted by the RUNSTATS utility, and the counts are stored in the CATALOG columns NEARINDREF and FARINDREF.
So, indexes consist of column values and ROWID pointers. The column values of an index are kept in order. We expect book indexes to be in order. Think of a phonebook: We expect the directory entries to be in order to facilitate lookup. A telephone directory would be totally useless if it weren't in order. And so would a DB2 index.
So each index is in its own order. The index on lastname, firstname, middleinitial is in order by lastname, firstname, middleinitital, and so on. But what order is the table in?
If we haven't taken charge of this responsibility, then the order of the table itself is determined by the order of the data at batch LOAD. However, we can decide to take responsibility and take advantage of the order of the table data by choosing one index per table to be the CLUSTER index, the index in charge of and aligned with the order of the table. If we have decided to take responsibility for the order of the table data, then DB2 will honor our choice and heed our request when a row is inserted, when the table is reorganized, and when ONLINE LOAD (LOAD SHRLEVEL CHANGE) is used.
The relationship between the order of the index (determined by the sort order of the columns in that index) and the order of the table (determined by the index that was anointed with the keyword CLUSTER at CREATE or ALTER time) is documented by the RUNSTATS utility in a CATALOG column called CLUSTERRATIO. The higher the CLUSTERRATIO the more aligned the table is to the order of the index; the lower the CLUSTERRATIO the less aligned the table is to the order of the index. The RUNSTATs utility can determine this alignment simply by reading the (ordered) index from beginning to end and looking at the order of the ROWIDs. If they, too, are (mostly) in order, then the index has a high CLUSTERRATIO; conversely, if they are not in order but rather point to randomly located table pages, then the index and the table are in disaccord and the CLUSTERRATIO is low.
The Original Technique
When you look up a value, such as DB2, in a book index you know instinctively where to start: For DB2, you'd look in the Ds — you wouldn't start at the beginning of the index. You use your eyes and your ordering skills to skip as quickly as possible to the Ds. And so does DB2. DB2 uses pathing pages (in a way, indexes to the index itself) to bypass unneeded pages and probe directly to the portion of the index that's needed.
So, if we're interested in RDBMSs and decide to do some research on DB2, "Oracle," and "Sybase" in a tome on RDBMSs, we first find the Ds. Then, holding a finger in the index, we toggle back and forth, index entry to book page, index entry to book page, until we have read all the information about DB2. We continue with Oracle, positioning in the Os and toggling back and forth until we read the information about Oracle. We do likewise with Sybase. If the book is not in RDBMS order, we may find that the information about DB2 is on pages 2, 8, 14, and 53. The information about Oracle is on 2, 10, 15, and 53. And the information about Sybase is on pages 2, 8, 17, and 54. We therefore will touch the index 12 times and read 12 pages of the table: 2, 8, 14, 53, 2 (again), 10, 15, 53 (again), 2 (yet again), 8 (again), 17, and 54.
But having used the index to facilitate and drive the read of the table, we discover that our information from the table is obtained in order by product. We learn all the book has to offer about DB2 before we learn what it says about Oracle or Sybase. In other words, our information is gathered and organized in index order. If that were important to us (SELECT ... FROM TOME WHERE PRODUCT IN ('DB2', 'ORACLE', 'SYBASE') ORDER BY PRODUCT), then our information would be returned in product order WITHOUT SORTING. In other words, if an index is used the original way, toggling back and forth from index entry to table row, the data will be returned in the order of the index (in our example, by PRODUCT).
You might have noticed that our index is in order by PRODUCT but our table is obviously in some other order, maybe by system feature or quality. Our index has a low CLUSTERRATIO. If the index had had a high CLUSTERRATIO (in other words, if the index were in the same order as the table) our read would have been much easier and more disciplined, with less random and thrashing I/O. But, we're allowed to name only one CLUSTER index per table and the CLUSTERRATIOs of all of the other indexes just are what they are. A REORG will not correct them; DB2 just has to deal with the misalignment.
Of course, DB2 indexes have more than just page numbers in their ROWIDs. Therefore, if we translated the book example into a real index example, we would touch the index 12 times and read 12 pages/rows of the table: 2-3, 8-4, 14-1, 53-2, 2-2 (page 2 again), 10-1, 15-3, 53-1 (page 53 again), 2-1 (page 2 yet again), 8-2 (page 8 again), 17-2, and 54-1.
The Aunt Louise LIST PREFETCH Technique
When we used our index the traditional way, we often read the same page more than once. This happens when the index is not in the same order as the table. And this is where my Aunt Louise comes in.
When I was a child, my brother and I would visit my Aunt Louise (and my six cousins) on a farm in Georgia. My Aunt Louise was a good cook, and fresh fruits and vegetables were plentiful. She raised chickens, pigs, and cows, and we tagged along with our cousins, gathering eggs, picking peaches, and shelling butterbeans. Aunt Louise would put us to work squeezing blackberries through cheesecloth so that she could make jelly and then she would let us have battles throwing the pulp at each other until we were purple from head to toe. This was definitely not work.
But Aunt Louise hated to go to the grocery store. She was quite content staying on her farm. Therefore, when we ran out of toothpaste, she would add toothpaste to her ever-growing grocery list, hand us the baking soda, and smile. When we ran out of shampoo, she would add shampoo to her list, and hand us the dishwashing liquid. But when she would hand us the Sears & Roebuck catalog after adding toilet paper to her list, my cousins, my brother, and I would rebel. No more egg gathering, no more peach picking, and no more shelling of butterbeans until a visit was made to the store.
Poor Aunt Louise would find her grocery list and morosely sit down at her kitchen table. She would stare at this incredible list she had been creating for months. Then her eyes would light up and she would get a second piece of paper. She would slowly transfer her list from the original piece of paper to the second. Her eyes would twinkle and she would hum contently as she worked there at her table.
I once asked my aunt why she spent so much attention on her list instead of just going to the store and getting it over with. She smiled and said, "Bonnie Kay, my original list was not in the same order as the grocery store, but my new list is." She knew the order of the grocery store in Cary, Ga., and she knew that she only wanted to go down each store aisle one time. She wanted to skip the aisles in which she had no interest and go down the other aisles one (and only one) time. And she wanted to do this as quickly as possible so that she could get back to her precious farm. (And, by the way, the grocery store manager in Cary doesn't dare rearrange his grocery store for fear of the wrath of my Aunt Louise.)
I have no doubt that one day there was a developer buying groceries using this same technique when a light bulb went on over his head. DB2 has lists (indexes) that aren't in the same order as the store (the table). We can use the same technique to write down all the ROWIDs that we want and then sort the RID list to be in the table page (store) order. Using the same example above, DB2 would go first to the index, apply the predicate on PRODUCT and write down all 12 RIDs in a list: 2-3, 8-4, 14-1, 53-2, 2-2, 10-1, 15-3, 53-1, 2-1, 8-2, 17-2, 54-1. DB2 could then take its figurative finger out of the index and organize the list to make the trip to the table more efficient.
DB2 uses a space called the RIDPOOL for its lists and sorts those lists to be in store/table/page order before going to the store/table/page. After the RID sort, the new list would be 2-1, 2-2, 2-3, 8-2, 8-4, 10-1, 14-1, 15-3, 17-2, 53-1, 53-2, 54-1.
DB2 can now use this new, ordered list to read the table pages sequentially, actually skip sequentially, and more efficiently with less I/O and CPU overhead. Before going to disk, DB2 makes a list of up to 32 page numbers (not RIDs, but page numbers). In our example, DB2's list would be 2, 8, 10, 14, 15, 17, 53, 54. There may have been 12 RIDs but there are only eight page numbers. DB2 can then take this list of eight page numbers, go to disk with a shopping cart, pick up all eight pages, and bring them back into the bufferpool so that they will be available as soon as the GET PAGE request is issued by the Data Manager. In this way, DB2 is doing a form of asynchronous I/O, retrieving the pages and stationing them in the bufferpool ahead of time.
When this technique is used, the information won't come back in the order of the index. We lost that advantage when we did the RID sort to put the RIDs in table page sequence. Rather the data will come back in the order of the table (in other words, the grocery store). And, if we said ORDER BY PRODUCT, DB2 will have to do a data sort to put the data back in order by product.
In other words, when DB2 uses the index the traditional way, the rows come back in the same order as the index. When DB2 uses the index the Aunt Louise way, otherwise known as LIST PREFETCH, the rows come back in the table order.
Why One Instead of the Other?
If you only need a row or two, there's no point in making a list. DB2 normally uses the index the traditional way. But if you need a lot of rows, then DB2 is concerned about random vs. sequential reads to the table. CLUSTERRATIO becomes very important as each possible access path is considered.
Factored into DB2's access path decision are:
Which One Did The Optimizer Choose?
As you can see, each technique has its strengths and its weaknesses. If you'd like to know which technique DB2 chose, you can run EXPLAIN on the query or the package and look in the plan_table. Whenever you see an L in the PREFETCH column, Aunt Louise is influencing the use of your index via LIST PREFETCH. And, as you'll remember from prior columns, whenever you see an S in the PREFETCH column, Scott is using the index the traditional way and prefetching pages into the bufferpool sequentially using SEQUENTIAL PREFETCH. And, whenever you see a blank (/b ) in the PREFETCH column, Beth is doing your I/O synchronously without any PREFETCH at all.
About the Author
Bonnie Baker is a consultant and educator specializing in application performance issues on the DB2 OS/390 and z/OS platforms. She is an IBM DB2 Gold Consultant, a five-time winner of the IDUG Best Speaker award, and a member of the IDUG Speakers' Hall of Fame. She is best known for her series of seminars entitled "Things I Wish They'd Told Me Eight Years Ago" and the Programmers Only column in DB2 Magazine. You can reach her through Bonnie Baker Corp. at 813-876-0124 or bkbaker@bonniebaker.com.
"Predicate Evaluation, Part I"
"Predicate Evaluation, Part II"