Reprinted with Permission by Quest Software Oct. 2005


Putting the MySQL information_schema to Use

Roland Bouman  

The MySQL information schema database was introduced in the September issue of the newsletter. We've seen that the information schema allows us to access metadata in a much more flexible way than the MySQL specific SHOW commands. We owe this to the fact that the information schema allows us to query and format metadata using arbitrary SELECT statements. Let's explore this feature, and try to put that to good use.

In the introduction that appeared in the September issue of the newsletter, we stated that metadata provided by the information schema allows us to recreate the structure of any database schema. In the MySQL case, this is not particularly useful though. There are much more sophisticated, freely available tools that do just that, such as the Backup/Recovery functionality in MySQL Administrator, or the mysqldump utility. Also, MySQL supports the SHOW CREATE syntax, which can be used to generate complete DDL statements almost effortlessly. So, using the information schema in this way would be quite instructive, but otherwise rather useless.

Having said this, there's actually a use case that quite resembles recreation of database structure that's not quite covered by a built in language construct or a tool such as MySQL Administrator. We'll explore this use case as an illustration of the possibilities of the information schema.

The first half of this article describes the requirements for the history database, and a generic design to implement it. The second half describes the stepwise construction of code-generator that creates the SQL to construct and load the history database. The code-generator is driven by the information schema, so in this part, some features of the information schema are discussed in detail.

Use Case: A History Database

Sometimes, the need arises to maintain the history of changes of otherwise volatile data. The history of changes could be used for auditing purposes, or to feed a datawarehouse. Usually, a separate database is created to store the historic data: a history database. We will refer to the system that feeds the history database as the operational database.

Although one could modify the operational database to maintain history data, a separate database is usually created. Doing so minimizes the impact of collecting history data on the operational database and the applications that depend on it. Also, the storage requirements for a history database differ dramatically from those of the operational system. The same goes for backup/recovery management. So, keeping them separate allows for easier maintenance of both databases.

Typically, only INSERTs occur against a history database. Occasionally, the data in the history database may be removed to be archived or removed permanently. So, each time data is INSERTed, UPDATEd or DELETEd in the operational database, corresponding data should be automatically INSERTed into the history database, along with some extra history data. As for this extra data, a timestamp needs to be recorded at the very least, or else it wouldn't be a history database. For auditing purposes, we'll need some data that identifies the user that performed the action as well. If we plan on periodically removing data from our history database, it's probably a good idea to record whether the entry in the history database was caused by a INSERT, UPDATE, or DELETE in the operational database. This gives us a chance to discover whether a particular record history is closed, that is, whether a particular record has been deleted from the operational database. This could be quite useful when we want to clean the history database while maintaining history for 'active' records.

History for the Pers Database

Let's make all this tangible. Let's try and figure out how we could maintain change history for an arbitrary database schema.

(In the remainder of this article, we'll send some commands to the MySQL server. We did this using MySQL 5.0.13 version of the Server and the standard MySQL command line client tool, mysql.exe. You can use any client tool you like to do this, as long as it allows you to send arbitrary SQL commands to the server. The MySQL Server should have at least version 5.0.2 )

    -- setup the personnel database
    drop database if exists pers
    ;
    create schema pers
    ;
    use pers
    ;
    -- setup the employees table
    create table emps(
        emp_id   int            not null auto_increment primary key
    ,   emp_name varchar(30)
    ,   dept_id  int
    ,   salary   decimal(7,2)
    );    
    -- setup the departments table
    create table dept (
        dept_id     int         not null auto_increment primary key
    ,   description varchar(30)
    );

At a glance, it seems as if we don't need to do a whole lot of work to maintain history for this database. Of course, we would need to create the history database itself:

    -- setup the history database
    -- to mirror the personell database
    create schema history_pers
    ;
    use history_pers
    ;

Now, for each table in the operational system, we need an essentially similar table in the history database, with just a few extra column definitions. We will refer to these extra columns as the history columns. The history columns will store information regarding change events to individual records in the tables of the operational system.

By definition, we must have some column to record the date/time of the change event. Assuming want to be able to use our history database for auditing purposes, we should have a column to store the name of the current user as well. Also, we'll need to purge the history database now and then, and if we do so, we'd like to be able to remove only closed histories. This means that we need a column to store the action that occurred in the operational database: a INSERT, UPDATE or DELETE.

We must not recreate those features that make the records in the operational database unique: unique indexes, primary keys, unique constraints or auto_increment attributes. Doing so would prevent us from storing multiple versions of the operational database's equivalent version of 'the same record'. Still, we'd like to be able to identify a single record in our history database. So, we have to come up with a new primary key.

We could assume that the original primary key from the operational database combined with our timestamp are always unique. However, we can't really be sure. The smallest possible interval between any two MySQL timestamps is one second, and it's not hard to see that that is way too much time to assume iniquity. In a sense, the resolution of a timestamp can never be small enough for this purpose.

We'll employ a practical solution to tackle this problem: we add yet another extra column to the tables in the history database, which we declare to be AUTO_INCREMENT and PRIMARY KEY. An AUTO_INCREMENT gives us at least the chance to select records according to their insertion order in case the timestamp and 'natural' record key are not sufficient to identify a single record. Let's see how this would work out for the dept table:

-- set up a table to mirror the pers.dept table
    create table history_pers.dept (
        -- artificial primary key maintains insertion order
        history_auto_increment
            int unsigned 
            not null 
            auto_increment 
            primary key
        -- records date/time of change event occurring in pers.dept
    ,   history_timestamp                    
            timestamp    
            not null 
            default current_timestamp
        -- records the name of the user that initiates the change event 
    ,   history_user
            varchar(16)  
            not null
        -- records the nature of the change event 
    ,   history_action
            enum('DELETE','INSERT','UPDATE')
            not null    
        -- value of pers.dept.dept_id when change event occurred
    ,   dept_id                
            int          
            not null 
        -- value of pers.dept.description when change event occurred
    ,   description
            varchar(30)
    );

For each table in the operational system, we'll have to create such a corresponding table to store history. So, in case of the pers database, we have to write a statement exactly like this for the emps table as well.

Loading the history database

Once we've written equivalent statements for the other tables, we have to come up with a way of actually loading the history database with data. There's two ways to go about this. We could schedule the operating system to run a script overnight, recording only the last status in the operational database. Alternatively, we could add triggers to the tables of the operational system that will record the occurrences of every single INSERT, UPDATE or DELETE. We can even think of a combined solution that records only DELETEs using a trigger, but relies on the overnight script for UPDATEs and INSERTs.

All these solutions have their merits and their disadvantages. We won't dwell too long on this though. Instead, we'll focus on the trigger solution. This is a rather arbitrary choice, but hey, we're not so much trying to explain how one should create a history database as illustrating how one could use the information schema to do it.

As for the trigger solution: we need to record each INSERT, UPDATE or DELETE operation for each table in the operational system. That's three triggers for each table. These triggers would all perform very similar jobs: inserting the record as it exists in the operational system into the history database, providing values for the history columns where appropriate. Here's how this would look for the INSERT operation on the dept table in the pers database:

    delimiter $$

    create trigger air_dept
    after insert on pers.dept
    for each row 
    begin
        insert
        into    history_pers.dept (
                history_timestamp
        ,       history_user
        ,       history_action
        ,       dept_id
        ,       description
        ) values (
                CURRENT_TIMESTAMP
        ,       CURRENT_USER
        ,       'INSERT'
        ,       new.dept_id
        ,       new.description
        );
    end
    $$

It's not hard to see how this would work out for the other two triggers for this table. The trigger that handles the UPDATE operation is essentially a duplicate of this trigger. Obviously, it would be a AFTER UPDATE trigger instead of AFTER INSERT, and it would be called aur_dept instead of air_dept. Also, it would insert the value 'UPDATE' into the history_action column, and not 'INSERT'.

The trigger that handles the DELETE operation is only slightly different. Obviously, it would be a AFTER DELETE trigger and it would be called adr_dept. Of course, it would insert the value 'DELETE' into the history_action column. So far, these permutations are similar to the ones we made for the AFTER UPDATE trigger. The only real difference with both other triggers would be, that DELETE triggers cannot reference the NEW values; they can only reference the OLD values. Therefore, this trigger would insert OLD.dept_id and OLD.description rather than NEW.dept_id and OLD.dept_id.

Note that these three triggers perform essentially the same INSERT, only the values vary. It would be nice if we could create a single procedure for each table that would perform the insert for us. The triggers would then just call the procedure, passing it the appropriate values. (At the time of writing, calling a procedure from a trigger was not yet supported by MySQL. It is supported in MySQL version 5.0.12, but this paper does not explore this scenario any further).

So, do you feel it itching yet? Remember, we've only done one table so far. Lucky for us, the pers database only has two tables, with a little as 4 columns at the most. Imagine doing all this for a real-life database counting, say 30 to 60 or 100 tables, each having maybe 10, 20 or even more columns: not at all a nice thing to code for manually. We haven't even touched on the subject of recreating useful indexes or stored procedures to purge a particular history table! So, it's about time we find a solution that will write all this code for us. This is where we will put the information schema to our use.

Writing a Code-Generator

It has become quite clear that it's not really that hard to create a history database along these lines. It's bound to be a rather tedious, time consuming job though. It's a good thing that lot of code we've seen so far can be derived almost completely from the original table definitions. Having observed this, let's try to figure out if we actually can derive code from the table definitions.

One approach would be to issue the necessary SHOW CREATE SCHEMA and SHOW CREATE TABLE statements and then modify the results either by hand or with some text-macro oriented tool. This would not write the triggers for us, but it would give us a head start. However, there's another, more sophisticated approach. Because we can query the information schema using arbitrary SELECT statements, we can try and write one or several queries that SELECT a text expression that happens to be exactly the DDL statements we need. This is exactly what we will be doing the rest of this section.

For each type of statement that we want to generate, we will first outline how the statement that we want to generate should look like. In such an outline, we will mark varying bits of text by enclosing it between question marks. These bits will also be typeset in italics like so: ?varying bit of text?.

(Since the purpose of these statements is to generate other SQL statements, it would be nice if the query output was already exactly formatted without columns headings etc. When you are using the mysql command-line utility, you can achieve this by starting it with the following options: -N -B -r to skip column headings, set the batch mode and generate unescaped value output respectively. )

Generating CREATE SCHEMA

The first statement that needs to be executed to create the history database, is the CREATE SCHEMA command. Therefore we should likewise start off with a query that selects a string of text that corresponds exactly to a appropriate CREATE SCHEMA command. So, let's take a look in the MySQL manual and see how the CREATE SCHEMA command is defined. We find something equivalent to:

    CREATE SCHEMA 
    IF NOT EXISTS ?schema_name?
    DEFAULT CHARACTER SET ?character_set_name?
    DEFAULT COLLATE ?collation_name?
    ;

It certainly isn't hard to write a query that selects a string like this once we know what we should write instead of the varying bits of text. Let's pretend we have some actual variables that contain the appropriate values, and give it a go:

    SELECT   concat(
                 'CREATE SCHEMA',char(10)
             ,   'IF NOT EXISTS ',@schema_name,char(10)
             ,   'DEFAULT CHARACTER SET ',@character_set_name,char(10)
             ,   'DEFAULT COLLATE ',@character_set_name,char(10)
             ,   ';'
             ,   'USE ',@schema_name,char(10)
             ,   ';'
             );

Note that this would also generate a statement that makes the new schema the current database.

There's two things that might need some explaining: the CONCAT and the CHAR function.

The CONCAT function accepts a bunch of string arguments, and concatenates them to one big string. In MySQL, we favor CONCAT over the SQL Standard concatenation operator || because CONCAT works regardless of any particular setting of the sql_mode parameter.

The CHAR function accepts one or more integers. The integers are interpreted as character codes, and converted accordingly. It returns the concatenation of those characters as a string. We'll be using a lot of CHAR(10) calls as a argument to the CONCAT function in order to generate line breaks into the DDL statements that we generate.

Now, all we need to do is find out what sensible values we should use for the variables. The schema name is of course arbitrary, be we'd like it very much if we could somehow derive it from the name of the schema for which we are creating the history database. Doing this allows a cooperative database administrator to see at a glance that the operational database and it's history database are related. As for the character set name and the collation, we want those to match the exact default character set and collation of the operational database. The SCHEMATA table in the information schema contains all this information. So, assuming that the operational database is the current database, the following query would generate exactly the statements we need:

    SELECT   concat(
                 'CREATE SCHEMA',char(10)
             ,   'IF NOT EXISTS history_',s.schema_name,char(10)
             ,   'DEFAULT CHARACTER SET ',s.default_character_set_name,char(10)
             ,   'DEFAULT COLLATE ',s.default_collation_name,char(10)
             ,   ';',char(10)
             ,   'USE history_',s.schema_name,char(10)
             ,   ';',char(10)
             )
    FROM     information_schema.schemata s
    WHERE    s.schema_name = SCHEMA()
    ;

The SCHEMATA table in the information schema contains a row for each schema (database) residing in the mysql server, provided that the database is accessible to the current user. The values in the SCHEMA_NAME column of the SCHEMATA table are unique, so we can use this column to identify a single row. We assumed that the current database is the operational database for which we want to generate the history database. The SCHEMA() function is a synonym of the DATABASE() function. Both return the name of the current database. So, explicitly requiring that the SCHEMA_NAME column exactly equals the value returned by the SCHEMA() function selects only the row that describes the database for which we want to create a history database.

Generating CREATE TABLE

Let's proceed in this manner, and see if we can generate the statements to create the tables as well. According to the manual, the CREATE TABLE syntax is something equivalent to:

    CREATE TABLE ?table_name? (
        ?definition?
    )
    ENGINE=?engine?
    DEFAULT CHARACTER SET ?character_set_name?
    COLLATE ?collation_name?
    ;

Obviously, the ?table_name? bit is the name of the history table. As we are creating the corresponding tables in a separate database, we can use the original table names.

The ?storage_engine? bit specifies the MySQL storage engine. We would like to use the MyISAM storage engine where possible, mainly because of it's excellent performance, Also, we don't need any transaction support for the history database. ...or do we? If we plan to use the trigger solution, and the operational system's tables support transactions, than we really must support them too for our history tables. This is because any rollbacks that occur in the operational system should also rollback the triggered INSERT's into the history database. If we choose for the scheduled script solution, and assume that no transactions are pending during script execution, we can do without transaction support.

The ?character_set_name? bit specifies the default character set, and the ?collation_name? bit specifies the default collation. We want the character set and collation to match those of the corresponding table in the operational system. So, we want to recreate these options from our original tables.

The ?definitions? bit is essentially a list of columns. For the history tables, we want to recreate each column in the original table, including datatype, character set, collation, and not null constraint. We do not want to recreate the AUTO_INCREMENT attribute if present. Also, we want to add our list of history columns to this list. Syntactically, the ?definitions? bit could also include a primary key definition, one or more unique constraint definitions, one or more index definitions, and one or more foreign key definitions. For our history tables, we would be interested in recreating the indexes at some point. We might want to recreate the indexes underlying the primary key and any unique keys, but not as unique indexes though. We definitely do not want to recreate any foreign keys.

Again, we try to find an information schema table that would give us all this information. Taking a naive approach, we see that there is a table called TABLES. It contains a TABLE_SCHEMA column, which corresponds to the schema wherein the table resides. It also has a TABLE_NAME column, which corresponds to the ?table_name? bit. Other columns that we can match are the ENGINE and TABLE_COLLATION columns. Let's see how far this gets us:

    SELECT       concat(
                     'CREATE TABLE ',tables.table_name,'(',char(10)
                 ,   '    history_auto_increment',char(10)
                 ,   '        int unsigned',char(10)
                 ,   '        not null',char(10)
                 ,   '        auto_increment',char(10)
                 ,   '        primary key',char(10)
                 ,   ',   history_timestamp',char(10)
                 ,   '        timestamp',char(10)
                 ,   '        not null',char(10)
                 ,   ',   history_user',char(10)
                 ,   '        varchar(16)',char(10)
                 ,   '        not null',char(10)
                 ,   ',   history_operation',char(10)
                 ,   '        enum(''DELETE'',''INSERT'',''UPDATE'')',char(10)
                 ,   '        not null',char(10)
                 ,   ')',char(10)
                 ,   'ENGINE='
                 ,   case engine
                         when 'InnoDB' then 'InnoDB' 
                         else 'MyISAM' 
                     end,char(10)
                 ,   'COLLATE ',tables.table_collation,char(10)
                 ,   ';',char(10)
                 )
    FROM         information_schema.tables
    WHERE        tables.table_schema = SCHEMA()
    ;

This doesn't look bad, but some things are definitely wrong. First of all, if our operational database contains views, this statement will yield a CREATE TABLE statement for those as well. This is just the way the TABLES table is defined: it contains a row describing each table object accessible to the current user. In this context, a table is just about every schema object that can be used in a FROM list of a SELECT expression, including views and temporary tables. Also, We do generate a COLLATE option, but the DEFAULT CHARACTER SET option is missing. Therefore, this is not a valid statement according to MySQL. Foremost, We're not recreating any of the columns present in the original table, which is obviously not good at all.

As for preventing statements being generated for views, there's a quick fix for that. Upon inspection of the TABLES table, we see a TABLE_TYPE column which we can use to restrict the query to tables of the type 'BASE TABLE' only. As for the character set option, the TABLE_COLLATION column can be used to join the TABLES table to the COLLATIONS table, and the COLLATIONS table contains a CHARACTER_SET_NAME column. This pretty much solves the problem of generating the correct DEFAULT CHARACTER SET option. Ignoring the columns problem for now, our query will look like this:

    SELECT       concat(
                     'CREATE TABLE ',tables.table_name,'(',char(10)
                 ,   '    history_auto_increment',char(10)
                 ,   '        int unsigned',char(10)
                 ,   '        not null',char(10)
                 ,   '        auto_increment',char(10)
                 ,   '        primary key',char(10)
                 ,   ',   history_timestamp',char(10)
                 ,   '        timestamp',char(10)
                 ,   '        not null',char(10)
                 ,   ',   history_user',char(10)
                 ,   '        varchar(16)',char(10)
                 ,   '        not null',char(10)
                 ,   ',   history_operation',char(10)
                 ,   '        enum(''DELETE'',''INSERT'',''UPDATE'')',char(10)
                 ,   '        not null',char(10)
                 ,   ')',char(10)
                 ,   'ENGINE='
                 ,   case engine
                         when 'InnoDB' then 'InnoDB' 
                         else 'MyISAM' 
                     end,char(10)
                 ,   'DEFAULT CHARACTER SET ',collations.character_set_name,char(10)
                 ,   'COLLATE ',tables.table_collation,char(10)
                 ,   ';',char(10)
                 )
    FROM         information_schema.tables
    INNER JOIN   information_schema.collations
    ON           tables.table_collation = collations.collation_name
    WHERE        tables.table_schema    = SCHEMA()
    AND          tables.table_type      = 'BASE TABLE'
    ;

There's some explaining to do. No doubt about it: we can join the COLLATIONS table to the TABLES table. What's not obvious though is that we can do so safely using an INNER JOIN. The TABLE_COLLATION column is in fact nullable, so, in case there is a row in the TABLES table having a NULL 'value' for the TABLE_COLLATION column, that row would be removed from the result, unless we'd use a LEFT JOIN.

Well, formally, this is all very true. However, some inspection of the data in the TABLES table shows that under these circumstances, it is in fact perfectly safe to use an INNER JOIN. Although the TABLE_COLLATION column really does contain NULL values now and then, it's never NULL for the rows where the TABLE_TYPE column equals 'BASE TABLE'. If you really want to be sure, use the LEFT JOIN. You could even INNER JOIN tables to SCHEMATA and use the DEFAULT_CHARACTER_SET_NAME and DEFAULT_COLLATION columns in case the TABLE_COLLATION column really is NULL. In this example however we'll just assume the TABLE_COLLATION is in fact NOT NULL, provided we're requiring that TABLE_TYPE equals 'BASE TABLE'.

Generating the Columns List

Now, let's solve that columns problem. To recreate the column definitions, we dig into the manual again, and look up the exact syntax for specifying a column. We find something equivalent to this:

?column_name? ?column_type? ?character_set_name? ?collation_name? ?not null?

This needs almost no explanation. Obviously, the ?column_name? bit corresponds to name of the column. The ?column_type? bit is the datatype (and corresponding modifiers) of the values to be stored in the column. The ?character_set_name? and ?collation_name? pair apply to columns with some string datatype, and define the character set and collation for the column. The ?not null? bit specifies whether a column could contain NULL 'values'.

The majority of cooperative readers will have noticed the information schema's COLUMNS table by now. They're right in thinking that this table provides all the necessary information we need to recreate the column definitions. The ?column_name? corresponds to the COLUMN_NAME column from the COLUMNS table. The ?column_type? corresponds to the COLUMN_TYPE column from the COLUMNS table. This is a non-standard column that contains a concatenation of the data type name, precision and scale (where applicable) and any other data type related modifiers, such as UNSIGNED and ZEROFILL. Of course, the CHARACTER_SET_NAME and COLLATION_NAME columns match the ?character_set_name? and ?collation_name? items. The ?not null? corresponds to the IS_NULLABLE column from the COLUMNS table. The IS_NULLABLE column contains either the string 'YES' or 'NO', indicating whether the described column is nullable, or not nullable respectively.

We can see that it's not at all difficult to generate column definitions by querying the COLUMNS table:

    SELECT       concat(
                     columns.column_name,' '
                 ,   columns.column_type,' '
                 ,   case 
                         when columns.character_set_name is null then ''
                         else concat(
                             ' CHARACTER SET ',columns.character_set_name,
                         ,   ' COLLATE ',columns.collation_name
                         )
                     end
                 ,   case columns.is_nullable
                         when 'NO' then ' NOT NULL'
                         else ''
                     end
                 ,   char(10)
                 )
    FROM         information_schema.tables
    INNER JOIN   information_schema.columns
    ON           tables.table_schema    = columns.table_schema
    AND          tables.table_name      = columns.table_name
    WHERE        tables.table_schema    = SCHEMA()
    AND          tables.table_type      = 'BASE TABLE'
    ;

Now, when we want to combine this query with the previous one in order to generate complete CREATE TABLE statements, we run into a slight problem. Our previous query returned one row for each row in the TABLES table, whereas the latter query returned one row for each row in the COLUMNS table. Simple merging the CONCAT expression of both queries would have the effect of the leading 'CREATE TABLE ' bit being repeated for every column; the same goes for the trailing 'ENGINE=' bit. We'd like to rewrite the query so as to have the leading 'CREATE TABLE ' bit being generated just once per table, followed by a list of one column definition for each column in the table, and then again followed just once by the trailing 'ENGINE=' bit. In standard SQL, there's two ways to go about it.

We could create three separate queries: one for the leading bit, one for the list of column definitions, and one for the trailing bit. We could simply place these different queries in the right order, and execute them consecutively

-- The leading bit, once per table
    SELECT concat(
              ...
              ...
           )
    FROM   information_schema.tables
    WHERE  ...
    ;
    -- The middle bit, once per column in the table
    SELECT concat(
              ...
              ...
           )
    FROM   information_schema.columns
    WHERE  ...
    ;
    -- The trailing bit, once per table
    SELECT concat(
              ...
              ...
           )
    FROM   information_schema.tables
    WHERE  ...
    ;

The problem with this approach is that we can only do this for one table. For each table that we want to recreate, we'd have to repeat this sequence of statements. We could work around this by adding up the query results by using the UNION ALL set operator. This allows us to use just one query yielding all statements. In this case, we must force the right order using ORDER BY. The ORDER BY does the job of keeping the corresponding leading, middle and trailing bits together:

    SELECT  statement_text
    FROM   ( -- The leading bit, once per table
             SELECT table_name
             ,      1 as section
             ,      1 as line
             ,      concat(
                        ...
                        ...
                    )
             FROM   information_schema.tables
             ...
             UNION ALL
             -- The middle bit, once per column in the table
             SELECT table_name
             ,      2 as section
             ,      columns.ordinal_position as line
             ,      concat(
                        ...
                        ...
                    )
             FROM   information_schema.columns
             ...
             UNION ALL
             -- The trailing bit, once per table
             SELECT table_name
             ,      3 as section
             ,      1 as line
             ,      concat(
                        ...
                        ...
                    )
             FROM   information_schema.tables
             ...
    ) as statement_text
    ORDER BY table_name
    ,        section
    ,        line
    ;

The solution works fine, but does have some drawbacks. In particular, we need to repeat some of the JOINs and their conditions. Also, we need to pay close attention to the section and line columns. Because these are interspersed throughout the entire query, we could easily make a mistake there.

If we only would be able to spot the first and last rows from the COLUMNS table query! This would allow us to write a conditional expression to generate the leading and trailing bits accordingly. Well, we can in fact do this. It requires just a little bit of effort. For starters, our COLUMNS table contains an ORDINAL_POSITION column. This column contains an integer indicating the position of the column within the table. If we order the columns by this ordinal position then we know that the first column has an ordinal position equal to 1, so, we can write something like this:

    SELECT   concat(
                 -- The leading  bit
                 case columns.ordinal_position 
                     when 1 then 'CREATE TABLE ...'
                     else ''
                 end
                 -- The middle bit
             ,   ....
             )
    FROM     information_schema.columns 
    ...
    ORDER BY columns.table_name
    ,        columns.ordinal_position
    ;

Now, as for the last column, we can solve this by using MAX(ORDINAL_POSITION) or COUNT(ORDINAL_POSITION). Because we want to compare MAX(ORDINAL_POSITION) to the current ORDINAL_POSITION, we will need two instances of the COLUMNS table. One to retrieve the list of columns, and one to calculate MAX(ORDINAL_POSITION). It would look something like this:

    SELECT      concat(
                    -- The leading  bit
                   case columns.ordinal_position 
                       when 1 then 'CREATE TABLE ...'
                       else ''
                   end
                   -- The middle bit
               ,   ....
                   -- The trailing bit
               ,   case columns.ordinal_position 
                       when max(grouped_columns.ordinal_position) then ') ENGINE=...'
                       else ''
                   end
               )
    FROM       information_schema.columns columns
    INNER JOIN information_schema.columns grouped_columns
    ON         columns.table_schema = grouped_columns.table_schema
    AND        columns.table_name   = grouped_columns.table_name
    ...
    GROUP BY   columns.table_name
    ,          columns.ordinal_position
    ORDER BY   columns.table_name
    ,          columns.ordinal_position
    ;

This solution is fine. It does exactly what we want without having to repeat JOINs and conditions as was the case with the UNION ALL solutions. However, in MySQL, there's yet another, more elegant solution.

MySQL supports a wonderful feature that seems to be cut out exactly for this type of problem. This solution requires even less code, and allows for a even faster query. We're referring to the GROUP_CONCAT function here. Like MIN, MAX and COUNT, GROUP_CONCAT is an aggregate function. We can use GROUP_CONCAT to aggregate a group of strings into a separated list of values. The solution would look something like this:

    SELECT     concat(
                   -- The leading  bit
                   'CREATE TABLE ...'
                   -- The middle bit
               ,   GROUP_CONCAT(
                       ...
                       ORDER BY column.ordinal_position                       
                   )
                   -- The trailing bit
               ,   ') ENGINE=...'
               )
    FROM       information_schema.columns columns
    ...
    GROUP BY   columns.table_name
    ;

Note that this query is quite different from the previous one. Here, only one row is generated for each group of TABLE_NAME, instead of one row for each row in the COLUMNS table.

By default, GROUP_CONCAT() separates values by comma's, which is fine for now. We can override the default behavior be explicitly specifying the separator. When using GROUP_CONCAT(), one should know that a maximum length applies to the concatenated result. The maximum length is determined by the value of the group_concat_max_len server variable:

    SET @@group_concat_max_len = 32768;

This statement sets the maximum to 32k characters, which is probably enough for most purposes. Finally, we're in a position to present a convincing solution to generate the CREATE TABLE statements:

    SELECT       concat(
                     'CREATE TABLE ',tables.table_name,'(',char(10)
                 ,   '    history_auto_increment',char(10)
                 ,   '        int unsigned',char(10)
                 ,   '        not null',char(10)
                 ,   '        auto_increment',char(10)
                 ,   '        primary key',char(10)
                 ,   ',   history_timestamp',char(10)
                 ,   '        timestamp',char(10)
                 ,   '        not null',char(10)
                 ,   ',   history_user',char(10)
                 ,   '        varchar(16)',char(10)
                 ,   '        not null',char(10)
                 ,   ',   history_operation',char(10)
                 ,   '        enum(''DELETE'',''INSERT'',''UPDATE'')',char(10)
                 ,   '        not null',char(10)
                 ,   ','
                 ,   group_concat(
                         concat(
                             '   '
                         ,   columns.column_name,' '
                         ,   columns.column_type,' '
                         ,   case 
                                 when columns.character_set_name is null then ''
                                 else concat(
                                     ' CHARACTER SET ',columns.character_set_name
                                 ,   ' COLLATE ',columns.collation_name
                                 )
                             end
                         ,   case columns.is_nullable
                                 when 'NO' then ' NOT NULL'
                                 else ''
                             end
                         ,   char(10)
                         )
                         order by columns.ordinal_position
                         separator ','
                     )
                 ,   ')',char(10)
                 ,   'ENGINE='
                 ,   case tables.engine
                         when 'InnoDB' then 'InnoDB' 
                         else 'MyISAM' 
                     end,char(10)
                 ,   'DEFAULT CHARACTER SET ',collations.character_set_name,char(10)
                 ,   'COLLATE ',tables.table_collation,char(10)
                 ,   ';',char(10)
                 )
    FROM         information_schema.tables
    INNER JOIN   information_schema.columns
    ON           tables.table_schema    = columns.table_schema
    AND          tables.table_name      = columns.table_name
    INNER JOIN   information_schema.collations
    ON           tables.table_collation = collations.collation_name
    WHERE        tables.table_schema    = SCHEMA()
    AND          tables.table_type      = 'BASE TABLE'
    GROUP BY     tables.table_name
    ,            tables.engine
    ,            tables.table_collation
    ,            collations.character_set_name
    ;

So, it took some effort, but the bonus is that we can easily reuse this approach.

Generating the triggers

We already saw that all our triggers perform essentially the same INSERT statement. Generating those is a lot like generating lists of column definitions. It is even simpler than that, because for the INSERT statements, we only have to name the columns, not define them.

    select     concat(
                   'create trigger air_',tables.table_name,char(10)
               ,   'after insert on ',tables.table_schema,'.',tables.table_name,char(10)
               ,   'for each row ',char(10)
               ,   'begin',char(10)
               ,   '    insert',char(10)
               ,   '    into    history_',tables.table_schema,'.',tables.table_name,'(',char(10)
               ,   '            history_timestamp',char(10)
               ,   '    ,       history_user',char(10)
               ,   '    ,       history_action',char(10)
               ,   '    ,'
               ,   group_concat(
                       concat(
                           '       '
                       ,   columns.column_name
                       ,   char(10)
                       ,   '    '
                       )
                   )
               ,   ') values (',char(10)
               ,   '            CURRENT_TIMESTAMP',char(10)
               ,   '    ,       CURRENT_USER',char(10)
               ,   '    ,       ''INSERT''',char(10)
               ,   '    ,'
               ,   group_concat(
                       concat(
                           '       '
                       ,   'new.',columns.column_name
                       ,   char(10)
                       ,   '    '
                       )
                   )
               ,   ');',char(10)
               ,   'end',char(10)
               ,   '$$',char(10)
               )
    from       information_schema.tables       
    inner join information_schema.columns
    on         tables.table_schema        = columns.table_schema
    and        tables.table_name          = columns.table_name 
    where      tables.table_schema        = SCHEMA()
    and        tables.table_type          = 'BASE TABLE'
    group by   tables.table_schema
    ,          tables.table_name
    ;

Of course, this query must be rewritten twice more to generate the triggers that handle UPDATEs and DELETEs. We already discussed this when we first presented the trigger, and we won't repeat that here.

Conclusion

We learned that the information schema is an excellent toolkit for generate the code to create a database. We've used a couple of tables in the information schema in particular: SCHEMATA, TABLES, COLUMNS and COLLATIONS. We can extend the knowledge we now have to perform other generic tasks in a similar matter.

 
General info on the information schema from the MySQL reference
http://dev.mysql.com/doc/mysql/en/information-schema.html
MySQL grant tables
http://dev.mysql.com/doc/mysql/en/adding-users.html
MySQL Information Schema Forum
http://forums.mysql.com/list.php?101
A Diagram of the MySQL information schema
http://www.xcdsql.org/Misc/MySQL_INFORMATION_SCHEMA.html
Showing index selectivity
http://www.jpipes.com/article/mysql-5-stored-procedures-and-information_schema
Fuzzy foreign key matching
http://leithal.cool-tools.co.uk/?p=8