![]() |
![]() |
© 2000 RevealNet, Inc.
Many kinds of data are in effect for a time period. When modified, a new row is stored with the new information. For good performance and ease of query, it is important to design the table with the correct choice of effective date, or a start and stop date. Index design is also critical to performance.
The first design that comes to mind is to use an effective date - a row of data is in effect from the effective date until the next date of another row in the table.
The alternative is to use a start and stop date in each row. A row is in
effect between these dates.
Neither choice is always best. It depends on many factors including the number
of rows (for a given key), how frequently the data changes, is the latest row
the one that is usually wanted, and are there time periods when no data is in
effect?
More often than not, having both a start & stop date can result in a simpler
query, and more efficient access, at the cost of slightly more code for the
table maintanence functions. Adding a new row requires updating the previous row
as well as inserting the new one. With only one date, only the new row must be
inserted. However, in both cases, previous rows must be checked to ensure there
is no overlap of dates, or other incorrect action.
For retrieval, what is most often needed is either the current row or a row in
effect on a specific date, so I will concentrate on these situations.
To get the most recent effective date in a single statement (i.e., end-user
query) you need a subselect to ensure it is the highest date:
SELECT data FROM Table T1 WHERE key=value AND EFFDT =
(SELECT MAX(EFFDT) FROM Table T2 WHERE T1.key=T2.key)
In a program, the simpler and more efficient method is to have the index on the
key plus EFFDT DESC, and to use a cursor without the subselect, using just one
FETCH to obtain the most recent row. This way the DBMS has to only look at one
row to return your desired result.
To get the row in effect on a specific date you just need
SELECT data FROM Table T1 WHERE key=value AND
START <= date_wanted and STOP >= date_wanted
This simpler statement can be used by an end-user or in a program.
Index design must be thought out carefully to ensure good performance. If
there are only a few rows for each key, then this is not a significant issue,
but if there are many rows per key, then it is very significant (this applies to
many other situations in which part of the key is a range other than dates).
Assuming that the most recent row is needed more often, the DBA naturally
decides to make the index descending on the start date. However, this can lead
to poor performance. A much better choice is to have the index descending on the
stop date. Logically the two are equivalent (if we do not allow overlapping date
ranges). But, let's think about how the DBMS will access the data.
If the index is descending on STOPDATE, the predicate STOPDATE >
date_wanted will enable the dbms to go directly to the most recent row meeting
this criteria. If there is a row in effect on that date, the start date will
also meet the predicate STARTDATE < date_wanted. The next row in the index
will fail the predicate & we are done. Quick and easy.
If the index is descending on STARTDATE, the first predicate STARTDATE <
date_wanted will be met by every row for the given key (i.e., all rows started
before today's date). The DBMS must retrieve and examine every row for the key
to see if the stop date is > date_wanted. Only the first one (the most
recent) will meet it, but the DBMS doesn't know this. It must examine them all.
So, if you usually want the most recent row, use STOPDATE DESC in the index. If you usually want the earliest row, use STARTDATE ASC in the index.