|
|
Choosing to Use Dynamic SQL
Topic Extracted from Knowledge Xpert for DB2
In several cases, a
specific variation of dynamic SQL may be only one of several alternatives
available. These alternatives include the use of static SQL with various
BIND parameters as well as the deployment of stored procedures. This topic
explores these possibilities and introduces some of the issues that must be
considered before the appropriate choice is made.
Choosing Between Static and Dynamic SQL
Consider an application program that must return a list of employees for a
wide variety of user requirements. For example, the user may wish to list all
employees for a department or a list of all employees whose last name starts
with some set of characters. In general, the user requirements can be captured
via a screen that looks something like the one shown below.
Using this approach we code and process a cursor for each possible
combination. Since the user may specify any or all criteria, the number of
such cursors is prohibitively large (1 with no criterion specified, 7 with
exactly 1 criterion specified, 21 with exactly 2 criteria specified, etc.,
for a total of 128 without considering the sort order) which makes the code
cumbersome, since we need a DECLARE, OPEN, FETCH, and CLOSE for each such
cursor. Using this approach, you code a SELECT statement with all criteria
specified as a set of BETWEEN predicates. Ordering the data needs to be
handled by other means, or not handled at all. Here, the SQL looks like
this:
While this approach is simple to develop and uses static SQL, the
performance of such a statement is generally far from optimal, since the
access path is unaware of the criteria entered by the user. Substantial
performance improvement can generally be obtained using the REOPT(VARS) bind
option. Depending on the criteria entered and the sort sequence requested by the
user, a dynamic SQL statement is built and processed. For example, if the
user requested all employees in department ‘E21’ whose last name starts
with ‘S’ earning more than $10,000.00 in commission, sorted by last name
and first name, the SQL looks like this:
At least in cases like this, where a highly flexible application is needed,
dynamic SQL should be seriously considered as a possible solution. You should also consider the following factors before deciding
whether or not dynamic SQL is appropriate for your needs.
Static SQL can be made more flexible by using host
variables instead of literals. For example, the following code
fragment in COBOL can be used to update the salaries by various amounts for
various employees determined at execution time (both perhaps read form a
file):
In general, dynamic SQL is more complex to develop than static SQL.
However, the degree of complexity depends on the type of dynamic SQL that
must be handled. Non-SELECT statements without parameter markers add only
slightly to the complexity while the most complex variation (varying-list
SELECT with parameter markers) adds a considerable amount of complexity,
especially in COBOL. However, the additional complexity of dynamic SQL
should not be the determining factor in whether or not to choose for dynamic
SQL. Reasonably skilled programmers should have no problems coding dynamic
SQL programs. In general, a dynamic SQL statement must be prepared before it is
executed. The resources consumed for this step have the potential to make
dynamic SQL perform worse than static SQL. However, caching of dynamic SQL
can substantially reduce this overhead. Furthermore, when the data is
skewed, dynamic SQL can actually perform better than static SQL as explained
in the next section.
One important consideration for performance is the
cost of preparing the statement as compared to the cost of running it.
For example, if the statement is to be prepared once (say 100 ms of CPU) and
run once (many seconds of CPU), the prepare cost is negligible. When static SQL is used for accessing non-uniform data, the access path
determined at BIND time can be sub-optimal. Consider, for example, static
SQL that accesses all employees for a specified department (as a host
variable). If the department has a very small number of employees, an index
access may be more efficient, while a table space scan may be more efficient
for a department with a substantial number of employees.
Since dynamic SQL knows the value of the department during the prepare
phase (assuming literal are passed in the text instead of parameter
markers), it can produce a more efficient access path and the cost of the
prepare can be more than offset. In cases like this, static SQL can still be
used and the plan or package bound with REOPT(VARS). See "Choosing
Between REOPT(VARS) and Dynamic SQL" below. There are two security aspects you need to
consider: When using dynamic SQL, a user may require additional authorizations
to the objects accessed. The use of the DYNAMICRULES bind option and the
use of secondary authorization IDs can help to alleviate the
authorization maintenance cost. Let us quickly review the minimum authorization needed for a user to
run a program containing static SQL. In this case, the user only needs
an EXECUTE privilege on the plan or package containing the SQL. The user
does not require authorization to execute these statements outside the
control of the program. For example, consider a program containing
static SQL that updates the salary information for any employee. A user
with EXECUTE authority on the plan or package can issue these statements
by running the program, but will not be able to do so via QMF, SPUFI or
any external means. For dynamic SQL, the user normally has explicit
authorization to UPDATE the EMPLOYEE table (unless the bind option
DYNAMICRULES(BIND) is used). This applies only to embedded dynamic SQL.
This way the user can also access the EMPLOYEE table outside the
program, for example via SPUFI. Unlike static SQL that exists in source libraries, dynamic SQL is built
at runtime. The source of such SQL may be external and controlled (for
example, a file or a DB2 table) but it is impossible to ensure that the
statement is not modified prior to its execution. While methods to trap SQL
statements once they have executed exist, it is not possible to determine,
in advance, all such statements. If you have very
stringent audit requirements for certain applications, do not implement them
using dynamic SQL. Besides the resources consumed, the fact that the access path is
determined at runtime for dynamic SQL presents another challenge. For static
SQL, it is possible to obtain information about the access path by using the
EXPLAIN option at bind time at the package or plan level. This information
is not readily available for dynamic SQL, however it can be obtained.
Analysis of this information can then complement the similar analysis done
for static SQL as a means of quality assurance of embedded dynamic SQL. Since the exact SQL statement that will be executed is not known in
advance, how do you prevent a runaway query? Predictive governing is the
ability to estimate the resources needed for a statement and take
appropriate action based on thresholds (allow processing, warning, stop
execution before the statement starts to execute). In addition, execution of
dynamic SQL can be terminated if it consumes more resources than a specified
limit. To prevent runaway dynamic SQL queries from using too much system
resources, implement the DB2 governor functionality. In a complex operational system, with thousands of tables and programs to
manage, it is of paramount importance to keep track of where a certain table
or column is referred to. This may be needed, for example, to assess the
impact of a change to the length of a column. When dynamic SQL is used, it
is difficult (sometimes impossible) to know this without any uncertainty.
This means a potential outage if the complete impact is not assessed
properly.
Another important type of dependency is the use of indexes. Using static
SQL exclusively, it is easy to determine whether or not an index is used and
how effective it is. With the presence of dynamic SQL, this determination
involves collecting and analyzing performance data which still cannot
accurately predict the impact on dynamic SQL that is infrequently executed
(a batch job run quarterly or annually, for example), unless the data
collection spans a long period of time. The BIND option REOPT(VARS) can work in conjunction with dynamic SQL. Here,
we discuss its use with static SQL as an alternative to using dynamic SQL.
To overcome the potentially sub-optimal access path that can result for
static SQL in the presence of non-uniform data or highly correlated data, the
REOPT(VARS) bind option can be used. This makes static SQL behave like dynamic
SQL since the access path is selected at runtime after the values of the host
variables are known. If the sole reason why dynamic SQL is being considered is
to overcome the performance issues arising from non-uniform distribution of data
or column correlation affecting static SQL, you should consider the use of
REOPT(VARS) with static SQL first.
Also note that dynamic SQL can be written at the
statement level. The REOPT(VARS) bind option generally applies to all
statements within a package (unless optimization hints are used).
Choosing Between Dynamic SQL and Stored Procedures
One of the reasons why people are often reluctant to use dynamic SQL is
because of the way authorization works. By default, every user needs to be
granted access to all objects they access via dynamic SQL. Administering these
authorizations can be time consuming and once users have access to the objects,
they can access the objects “outside” the application as well.
In some cases, using stored procedures can be an alternative. If the
applications needs the use of dynamic SQL, for example, because you cannot code
all potential combinations using static SQL, you can get around some of the
security issues by using a stored procedure to encapsulate the logic that
requires the use of dynamic SQL. When using a stored procedure, with the correct
setup, you only have to authorize users to call the stored procedure and no
longer grant them access on the individual objects.
Importance of Catalog Statistics for Dynamic SQL
Catalog statistics influence the access of static as well as dynamic SQL.
Here we briefly point out why they are especially important for the performance
of dynamic SQL.
There are a few important considerations:
On the other hand, it is not just the statistics
themselves that should be considered important. During the prepare
process, DB2 has to retrieve these statistics from the DB2 catalog. Therefore
the performance of queries accessing catalog tables is equally important. Make
sure your catalog is in good shape when you use dynamic SQL by executing REORGs
of catalog table spaces when appropriate.
Examples of Interactive Dynamic SQL Programs
DB2 comes with a set of programs to process dynamic SQL which are used as a
productivity aid. The table below summarizes how they can be used.
1 Note that you can also use the
REORG UNLOAD EXTERNAL and UNLOAD (Version 7) utility to unload data. Those
usually provide much better performance than the sample DSNTIAUL program.
General Recommendations on the Use of Dynamic SQL
These are some recommendations on the use of dynamic SQL:
FROM TO SORT ORDER
LAST NAME ________ ________ ________
FIRST NAME ________ ________ ________
WORK DEPT. ________ ________ ________
HIRE DATE ________ ________ ________
SALARY ________ ________ ________
BONUS ________ ________ ________
COMMISSION ________ ________ ________
There are three main alternatives to developing an application to process such
requests and we discuss the merits of each approach below:
SELECT ...
FROM DSN8710.EMP
WHERE LASTNAME BETWEEN :LO-LASTNAME AND :HI-LASTNAME
AND FIRSTNAME BETWEEN :LO-FIRSTNME AND :HI-FIRSTNME
AND WORKDEPT BETWEEN :LO-WORKDEPT AND :HI-WORKDEPT
AND SALARY BETWEEN :LO-SALARY AND :HI-SALARY
AND BONUS BETWEEN :LO-BONUS AND :HI-BONUS
AND COMMISSION BETWEEN :LO-COMMISSION AND :HI-COMMISSION
Prior to the execution of this cursor, the host variables are populated
based on user requirements. For example, if a user specifies the range of
$50,000 to $100,000 for the salary, LO-SALARY contains 50,000 and HI-SALARY
contains 100,000. If the user specifies no criteria for salary, LO-SALARY
contains 0 and HI-SALARY contains some maximum permissible number.
SELECT ...
FROM DSN8710.EMP
WHERE WORKDEPT = ‘E21’
AND LASTNAME LIKE ‘S%’
AND COMMISION > 10000
ORDER BY LASTNAME, FIRSTNME
A different statement is built for a different request. This is completely
flexible and the access path is optimal (but it incurs the cost of PREPARE
multiple times).
Choosing Between REOPT(VARS) and Dynamic SQL
EXEC SQL
UPDATE DSN8710.EMP
SET SALARY = :NEW-SALARY
WHERE EMPNO = :EMPNO
END-EXEC.
However, if the user can update any column (perhaps of any table) or can
select one or many columns of a table (perhaps in a different sequence), the
number of variations of static SQL to handle all such combinations becomes
prohibitively large. Such business needs that must be highly flexible can
only be met by using dynamic SQL.
While this scenario is harmless in an environment with no dynamic SQL,
execution of RUNSTATS can trigger a change in the access path and cause a
degradation of performance in an environment containing dynamic SQL.
Program
name
Language
Purpose
and types of statements processed
Shipped
as
DSNTEP2
PL/I
All statements (SELECT and
non-SELECT)
Source and object module
DSNTIAD
Assembler
Processes all non-SELECT
statements
Source
DSNTIAUL
Assembler
Unload data. Also processes
all non-SELECT statements1
Source
SPUFI
PL/X
All statements (SELECT and
non-SELECT)
Load modules