Advertisements
Learn how to go beyond the wizards to build dimensional objects.
Oracle Warehouse Builder 11g Release 2 enables the entire spectrum of operations on dimensional objects, from defining them in a visual environment and then deploying them to either a relational schema or an analytical workspace in an Oracle database, to populating them with data, based on the logic you’ve defined in a mapping. Even a complex object, such as a cube, can be built within a few minutes with the help of a wizard, mostly with a few clicks of the mouse. Later, you can always use a respective object editor to edit that wizard-built object according to your requirements.
Although the wizards available in Oracle Warehouse Builder can do a lot of tedious work for you, the needs of your particular project may require you to roll up your sleeves and do some manual work, including coding. For example, you may need to specify the PL/SQL code that is part of a table function performing transformations or producing a set of source rows on the fly.
This article presents a few common tasks in Oracle Warehouse Builder that go beyond what you can accomplish with the wizards. In particular, it describes how you might add time series analysis at hour, day, month, and year levels to a data warehouse for analyzing Web traffic data. The article extends the example originally presented in "Cube Development for Beginners", which describes how to build a data warehouse designed to analyze the outgoing traffic of a Web site. To work through the examples here, you will need to complete the steps from that article as a starting point.
Organizing & Reorganizing Data with Dimensions
If you recall, a Time dimension that is created using the Warehouse Builder's Time Dimension wizard allows you to perform time series analysis at the year, quarter, month, and day levels. However, you may need to perform time analysis not only at the year-to-day levels but also down to the hour, minute and even second. For example, adding the hour level may sound reasonable if your data warehouse concentrates on traffic statistics, just as in the example in "Cube Development for Beginners". To achieve this, you might try to modify a wizard-built time dimension. That would not be easy, though. Instead, we will create another time dimension containing only the hour level data. The following steps describe how to do that:
In the Projects Navigator, expand the MY_PROJECT->Databases->Oracle->TARGET_MDL node and right-click the Dimensions node. In the popup menu select New Dimension.
On the Name and Description screen of the Create Dimension master, specify the name for the dimension: CLOCK_DM.
On the Storage Type screen, select ROLAP: Relational storage.
On the Dimension Attributes screen, modify the Length property for the NAME attribute to 2.
On the Levels screen, specify the only level: HOUR.
After the Create Dimension wizard is complete, the CLOCK_DM object should appear under the MY_PROJECT->Databases->Oracle->TARGET_MDL->Dimensions node in the Projects Navigator. Now double-click this object to open it in the Dimension Editor.
In the Dimension Editor, move on to the Hierarchies tab and delete the STANDARD hierarchy.
Finally, deploy the CLOCK_DM dimension as well as the objects created along with it by Warehouse Builder implicitly. Thus, you have to deploy the CLOCK_DM_TAB table, the CLOCK_DM_SEQ sequence, and then the CLOCK_DM dimension.
The next step is to load the newly created dimension with data. As you might guess, the CLOCK_DM dimension discussed here is supposed to contain only 24 members (one for each hour in the day).
As usual, loading a dimension with data can be done with a mapping in which you define the data flow and transformations from the source to the target. So, the first thing you need to do is define a source. In this case, unlike the mappings in the previous article, we do not need an external data source to know the 24 hours of the day. We will instead generate the dimension members for the CLOCK_DM dimension on the fly: 24 two-char values, each of which represents an hour of the 24-hour clock.
When we think of some nonpersistent data to be generated on the fly by the database, selecting from the DUAL table usually comes to mind. If you recall, however, the DUAL table contains only one row. So, to generate 24 values, you might want to put it in a loop. This might be accomplished with a table function that will generate and return those 24 values as a collection of rows; actually, as a varray. The following steps describe how you might define that varray and then the table function:
In the Projects Navigator, expand node MY_PROJECT->Databases->Oracle->TARGET_MDL->User Defined Types and then right-click the Varrays node. In the popup menu, select New Varray.
In the Create Varray dialog specify the name, say, HOURS_VAR, and then click OK. As a result, the Varray editor should appear, in which move on to the Details tab and change the settings as follows:
Data Type: VARCHAR2
Length: 2
Size: 24
Now deploy the newly created Varray. For that, in the Projects Navigator, right-click node MY_PROJECT->Databases->Oracle->TARGET_MDL->User Defined Types->HOURS_VAR, and select Deploy… in the popup menu.
You are ready now to create the table function. In the Projects Navigator, expand node MY_PROJECT->Databases->Oracle->TARGET_MDL->Transformations and right-click Table Functions. In the popup menu, select New Table Function. As a result, the first screen of the Create Table Function wizard should appear.
On the Name and Description screen, provide a name for the table function being created; for example, HOURS_GEN.
On the Return Type screen, select the HOURS_VAR varray created and deployed earlier. This should be found under the Collections->Private->TARGET_MDL->Varrays node.
On the Implementation screen, click the Code Editor button, and modify the table function implementation as follows:
--initialize variables here
s VARCHAR2(2);
-- main body
BEGIN
NULL;
FOR i IN 0..23 LOOP
--Fetch rows from the input cursor.
--Process the rows.
--Assign the row to the output_buffer record
-- Return rows with piperow statement.
SELECT TRIM(LEADING ' ' FROM TO_CHAR('00' + i, '09')) INTO s FROM DUAL;
PIPE ROW (s);
END LOOP
RETURN;
EXCEPTION
WHEN OTHERS THEN
NULL; -- enter any exception code here
RETURN;
END;
After the Create Table Function wizard has been completed, in the Projects Navigator, right-click the newly created table function, which should appear under the node MY_PROJECT->Databases->Oracle->TARGET_MDL->Transformations->Table Functions, and select Deploy… in the popup menu.
Now that you have the table function HOURS_GEN created and deployed, you can move on and design the mapping that will load the CLOCK_DM dimension. Here are the steps to follow:
In the Projects Navigator, expand the MY_PROJECT->Databases->Oracle->TARGET_MDL node and right-click Mappings. In the popup menu, select New Mapping to launch the Create Mapping dialog. In this dialog, specify the mapping name; for example, CLOCK_MAP, and click OK. As a result, the Mapping Editor canvas should appear.
From the Projects Navigator, select the HOURS_GEN table function object and drag it onto the Mapping canvas. Do the same operation to the CLOCK_DM dimension object.
In the mapping canvas, connect the VALUE attribute of the HOURS_GEN operator to the NAME attribute of the CLOCK_DM operator. Then, connect the VALUE attribute of the HOURS_GEN operator to the DESCRIPTION attribute of the CLOCK_DM operator.
Now you can deploy the mapping. In the Project Navigator, right-click the CLOCK_MAP object under the MY_PROJECT->Databases->Oracle->TARGET_MDL->Mapping node and select Deploy… from the popup menu.
After a successful deployment, you can execute the mapping, which should populate the CLOCK_DM dimension with data. To do this, right-click the CLOCK_MAP object and select Start… from the popup menu.
Altering Cube Definitions
Now that you have the CLOCK_DM dimension created and populated, how can you include it to the Traffic cube? Oracle Warehouse Builder provides you the Cube Editor to handle the tasks like this. Moreover, you’ll need to alter the underlying fact table too, doing it in the Table Editor. The following steps describe how you might do it:
In the Projects Navigator, expand the MY_PROJECT->Databases->Oracle->TARGET_MDL->Cubes node and double-click the TRAFFIC object. This should open the object in the Cube Editor.
In the Cube Editor, move on to the Dimensions tab, and add the CLOCK_DM dimension to the list of the dimensions already chosen. Then, chose the level for the CLOCK_DM dimension: HOUR.
The next step is to alter the traffic_tab underlying table. In the Project Navigator, double-click the TRAFFIC_TAB object under the MY_PROJECT->Databases->Oracle->TARGET_MDL->Tables node, to open this object in the Table Editor.
In the Table Editor, move on to the Columns tab, and add column CLOCK_DM of type NUMBER, setting Precision and Scale to 0.
In the Table Editor, move on to the Keys tab, on which click the Add Constraint button. Add the TRAFFIC_CLOCK_DM_FK key, defining it as of type Foreign Key. Next, click the Reference field and then the ellipsis button at the right of this field to invoke the Key Selector dialog. In this dialog, select CLOCK_DM_T_DIMENSION_KEY_PK under the CLOCK_DM_TAB node, and click OK. Having returned to the Keys tab of the Table Editor, click the Local Columns field to select CLOCK_DM.
In the Table Editor, move on to the Indexes tab, on which click the Add Index button. As a result a new index should appear in the list. Make sure the Key Columns’ value for the newly created index is set to CLOCK_DM. Also, set the Type field to Bitmap.
In the Project Navigator, right-click the TRAFFIC_TAB object, and select Deploy... in the popup menu. If you see a massage that the deployment has been aborted due to a problem with generating a valid upgrade plan, you may need to execute the grant_upgrade_privileges.sql script that can be found at the OWB_ORACLE_HOME/owb/rtp/sql folder. You can run this script from within an SQL*Plus session, being connected as sysdba and specifying the target schema as the parameter (it should be owbtarget in this particular example, if you followed the names suggested here).
Return to the Cube Editor and move on to the Physical Bindings tab. In the mapping canvas, connect the CLOCK_DM attribute of the TRAFFIC operator to the CLOCK_DM attribute of the TRAFFIC_TAB operator.
In the Project Navigator, right-click the TRAFFIC object, and select Deploy... in the popup menu.
Now that you have the CLOCK_DM dimension included to the cube structure, you still have to re-populate the cube, based on this changes. If you recall, the ETL logic for the traffic cube has been defined in the TRAFFIC_MAP mapping. We will alter this mapping as required.
In the Projects Navigator, expand the MY_PROJECT->Databases->Oracle->TARGET_MDL->Mappings node and double-click TRAFFIC_MAP to see its objects on the mapping canvas.
From the Component Palette, drop an Expression operator onto the mapping canvas.
In the mapping canvas, draw a line from the DATETIME output attribute of the JOINER operator to the INGRP1 group of the EXPRESSION operator.
In the mapping canvas, double-click the header of the EXPRESSION operator and move on to the Output Attributes tab of the Expression Editor dialog, where define attribute HOUR of type VARCHAR with length 2 and specify the following expression: TO_CHAR(INGRP1.DATETIME, ‘HH24’).
In the mapping canvas, find the EXPRESSION operator located between the JOINER and AGGREGATOR operators and double-click the header of that EXPRESSION operator to open the Aggregator Editor dialog. Then, move on to the Input Attributes tab of the dialog, in which define the HOUR attribute of type VARCHAR with the length of 2. Next, move on to the Output Attributes tab, in which define the HOUR attribute of the same type and specify the expression: INGRP1.HOUR.
In the mapping canvas, draw a line from the HOUR output attribute of the EXPRESSION operator to the HOUR input attribute of the AGGREGATOR operator.
In the mapping canvas, find the TRAFFIC operator bound to the cube. Right-click the header of that operator and select Synchronize in the popup menu. In the Synchronize dialog, click OK to start synchronization.
The next step is to deploy the modified mapping. For that, in the Project Navigator, right-click the TRAFFIC_MAP object and select Deploy…
The final step is to execute the mapping. In the Project Navigator, right-click the TRAFFIC_MAP object and select Start…
As a result of the above steps, you should have the Traffic cube altered to include the CLOCK_DM dimension, and reloaded with the updated dataset.
Getting to Answers with SQL
Having a warehouse is a nice thing, but how can you take advantage of all those dimensional objects created and populated with data? After all, the final goal is to let you answer business questions. As with most objects stored in the database, the primary access tool for dimensional objects is SQL. Oracle Database offers advanced capabilities to convert business queries into SQL deriving answers from a data warehouse. Moreover, Oracle Database's query optimizer applies powerful optimization techniques when dealing with join queries issued against a cube’s fact table and the dimension tables joined to it. Creating such SQL queries can be significantly simplified with the GROUP BY's extensions: CUBE, ROLLUP, GROUPING, and GROUPING SETS.
Let's take a look at one example. Suppose you need to know the traffic value generated within each hour of day for several days on end in order to determine the heavy loaded hours. The following query demonstrates how you might answer that:
SELECT g.region_name region
, c.name hour
, TO_CHAR(SUM(tr.out_traffic), '999G999G990') traffic
FROM time_dm_tab t
, clock_dm_tab c
, traffic_tab tr
, geography_dm_tab g
WHERE t.day_start_date = tr.time_dm
AND c.id = tr.clock_dm
AND g.country_id = tr.geography_dm
AND tr.time_dm BETWEEN TO_DATE('01-JAN-2011','dd-MON-yyyy')
AND TO_DATE('11-JAN-2011','dd-MON-yyyy')
GROUP BY GROUPING SETS(g.region_name, c.name)
ORDER BY c.name;
The above query uses the GROUP BY's extension GROUPING SETS so that you can see only the summary information generated at the hour level, along with the traffic value totals within the specified time frame, per region. The results might look like this:
REGION HO TRAFFIC
------------------------- --- --------------------
00 32,096,531
01 512,096
02 48,032
03 512,096
04 48,032
05 28,457
06 48,032
07 560,128
08 48,032
09 76,489
10 560,128
11 48,032
12 48,032
13 48,032
14 28,457
15 48,032
16 1,024,192
17 48,032
18 28,457
19 144,096
20 32,560,595
21 96,064
22 56,914
23 48,032
EU 34,546,935
AS 924,809
NA 33,293,276
27 rows selected.
Conclusion
In this article, you looked at some examples of how to go beyond the Warehouse Builder wizards to define your own time dimension and corresponding ETL logic. In particular, you looked at how to alter the wizard-built dimensionality of a cube. Finally, you looked at an example of how to convert a business query into SQL, aggregating over multiple dimensions of data in a single SQL query.