BI Decision

Tuesday, July 18, 2006

Tips for folks switching between DB2 and Oracle

Some Oracle and DB2 differences that are hard to find in the manuals:

Generic 1 row table for fast queries:
Oracle: dual
DB2: sysibm.sysdummy1

Current date and time for the server:
Oracle: select to_char(sysdate, 'yyyy-mm-dd hh24:mi:ss') from dual;
DB2: select current_time, current_date from sysibm.sysdummy1;

Delete a whole table, fast:
Oracle: truncate table x
DB2: load an empty file into the table

Monday, July 17, 2006

Ab Initio's Web Site

Ab Initio has an interesting corporate style. From what I've seen and heard, they make a heck of an ETL tool. But they are surprisingly private. Here is their website: Ab Initio

Notice the lack of the traditional "about us" tab that every other private software company has. No information on board, investors, or management team. No brag list of revenue growth or customers. No public forums or downloadable demos.

Intriguing.

Saturday, July 15, 2006

Agile development and BI

Interesting conversation going on at Slashdot about the rise of rapid prototyping tools and the increasing emphasis on fast delivery.
If somebody came to you and said "hey, I've got this great new way to build a bridge! Instead of making up plans, we'll just start building it! We'll build it out of popsicle sticks first, and then we'll go in and add some steel beams, and toss some pavement on top of that," you'd say they were insane. Nobody does stuff like that in the real world -- yet that's exactly what a lot of poorly-managed 'agile' software projects are doing. They're getting short-term prototyping gains but at the cost of maintainability and probably stability as well.
Slashdot is mostly focused on web development tools, but the same ideas apply in the Business I ntelligence world. Tools like Ab Initio, Informatica, and Ascential (IBM Websphere Information Integration? IBM still has their touch for catchy naming) enable stunningly fast development.

Fast development tempts us to try to do the rest of the project fast as well. Faster is better, except when it comes at the cost of building to the wrong requirements or writing low quality code.

Wednesday, July 12, 2006

Inserting instead of updating

Why don't they teach this in school?

During prototyping or early development, I often want to add a column to a table that already has data in it.

The direct approach is alter table, update column.

Once the table gets bigger than a few hundred thousand rows, the update approach gets slow.

The faster approach is to create a new_table as select * from old_table, drop the old_table, and rename the new_table to the old name.

Why is that faster? Over to Tom Kyte.

Tuesday, July 11, 2006

Nullable fields in fact tables

Why should fact table fields not be nullable?

Three big reasons:
  1. Null key fields can’t have descriptions from a domain table attached (without special processing in the report/query)
  2. Nulls can cause wrong group by/drilling results.
  3. Using natural keys gives the optimizer more information to choose faster query paths

Kimball has thoughts on cases where fact table fields should be nullable in his latest design tip.

Friday, July 07, 2006

Kimball's Design Tips

Ralph Kimball publishes a periodic design tip. I recommend the email list.

Space is cheap, but every byte in a fact row counts

Replacing natural fields with synthetic integer keys reduces the row length. This can really speed up queries that scan significant numbers of rows.

For example, on a table I recently worked on, switching from natural fields to integer keys reduced the average row size from 122 bytes to 50 bytes. Switching the integer keys to smallints further reduced the row length to 32 . On a 200 Million row table, that reduced the core data storage from 24.4 GB to 6.4 GB (a 74% reduction).

Some quick testing on our prototyping box, a DB2 on XP machine, showed a pretty dramatic performance difference in full table scans. A table scan on the large table took 18 minutes, on the small table, 8 minutes.