|

Reprinted with Permission by Quest
Software Aug. 2006
|
Ten Tips for a Successful Oracle Warehouse Builder Project
Mark Rittman, www.rittman.net
I worked for a client the other week that asked me to come up with some tips
and best practices for Oracle Warehouse Builder. Without giving the game away
(otherwise you'd never pay to hire me) here's ten tips for working with Oracle
Warehouse Builder 10g.
- Don't skimp on the design and analysis phase. Just because Oracle
Warehouse Builder is a graphical tool used for designing data structures and
mappings, it doesn't mean that you can skip the boring bit at the start where
you do all the analysis and design. You still need to understand your source
data, and you still need to specify how the data maps and how you deal with
the transformations and cleansing. This is especially important when you
consider that some customers use OWB as an opportunity to put the data
migration work into the hands of less experienced developers, who won't
necessarily be aware of design best practices and who won't be aware of all
the nuances of the source data. Every project I've ever worked on has said
that they wished they'd been able to spend even more time on analysis and
design, so don't think that OWB means you can miss this stage out.
- One the same subject, for each mapping produce a specification and test
plan on to the developer so that they know, item by item, what bit of data
maps on to where, and how to test whether their mapping is working ok. It
doesn't matter where you do this, it can be on a piece of paper, in a Word
document, in a spreadsheet or in a PowerPoint slide.
- Make sure from day one that developer machines are sufficiently powerful.
You need at least 1GB of RAM and a 1GHz CPU, and ideally your Design
Repository database will be on a server with locally attached disks within
your department, not on a SAN that's being shared with your E-Business Suite
implementation. Oracle Warehouse Builder is effectively an OLTP application
that stores each change to the warehouse model as a transaction against the
repository tables, and this application like any other OLTP application
suffers when the disks it's using are suffering contention.
- Use the Flashback technologies in Oracle 9i and 10g to implement mapping
and process flow "transactions", so that your mapping or process flow stores
the SCN (System Change Number) at the start of the mapping, and then rolls
either the tables within the mapping, or the whole database, back using
FLASHBACK TABLE or FLASHBACK DATABASE if the mapping or process flow fails.
What we do is use FLASHBACK table to roll back the tables within a process
flow if it fails, and use FLASHBACK DATABASE if we want to roll back the
entire ETL process. It's certainly quicker than performing a point-in-time
recovery if the whole ETL process goes belly-up and it gives us the ability to
pull together a number of mappings and processes into a single atomic package
of work which we can reverse out if need be.
- Use collections to give yourself the ability to create daily ETL releases.
Each day, collect together into a collection all of those mappings and other
objects that have changed and been "checked out" by the developers, and use
this to export those objects into the "Daily Build" environment...
- Create a "Daily Build" environment which contains the ETL process as at
the end of the previous day, and which includes all the changed items in the
collection you created yesterday. Then, using an automated deployment process
built using OMB*Plus scripts, deploy all of the objects and mappings and run a
"smoke test" to check that the build hasn't been broken. If it has, get it
fixed, and if it hasn't, you know that you can create an ETL release if you
need to. Thanks to Jon Mead and Donna Kelly for the last two tips.
- Record and review the run times of your various mappings and process
flows. Once a week, list these out in order of run time, longest at the top,
and in addition identify those whose run times have varied the most over time.
Use the ALL_RT_AUDIT_EXECUTIONS and ALL_RT_AUDIT_MAP_RUNS runtime repository
views to obtain your run times, and bring in additional statistics from
statspack and the Unix server running the database to add to your diagnostic
data.
- When tuning a mapping, look for the simple answers first. Assuming that
you've designed your mapping properly in the first place, the reason it's
running slowly is probably because an index isn't getting used, or you need to
increase the size of HASH_AREA_SIZE, or because you're joining too many tables
together. Only when you've tried the obvious try techniques such as tracing,
the danger with trying the complicated first is that you get drowned in
diagnostic data and miss the obvious solution.
- Keep mappings simple, do one thing at a time. The danger with loading
multiple tables, or loading one table after another after another, is that
you'll never be able to work out what's gone wrong if the test figures come
out wrong, and you'll have the devil of a job tuning the mapping if you've got
multiple levels of INSERT INTO .. SELECT in your mapping. Keep It Simple is
the watchword here.
- If you're starting out on a big project, you've got lots of data to
integrate and aggressive deadlines, hire someone who's done it before to give
you a hand before it all gets out of control. OWB only goes so far when it
comes to a data warehouse project, and the real skill comes in apply software
development best practices to what you're doing, and knowing when to use
Warehouse Builder and when to step outside of the product and build your own
framework. Better to spend a few days upfront than learn all the mistakes the
hard way.
Thanks to Jon and Donna for providing input and ideas along the way.
Mark Rittman is a consultant working with Oracle's BI & Data
Warehousing technologies, based in Brighton, UK. Mark runs a blog at
http://www.rittman.net and is a regular speaker at user group events in the
UK and USA.