Vyoms OneStopTesting.com - Testing EBooks, Tutorials, Articles, Jobs, Training Institutes etc.
OneStopGate.com - Gate EBooks, Tutorials, Articles, FAQs, Jobs, Training Institutes etc.
OneStopMBA.com - MBA EBooks, Tutorials, Articles, FAQs, Jobs, Training Institutes etc.
OneStopIAS.com - IAS EBooks, Tutorials, Articles, FAQs, Jobs, Training Institutes etc.
OneStopSAP.com - SAP EBooks, Tutorials, Articles, FAQs, Jobs, Training Institutes etc.
OneStopGRE.com - of GRE EBooks, Tutorials, Articles, FAQs, Jobs, Training Institutes etc.
Bookmark and Share Rss Feeds

Managing Data Quality | Articles | Recent Articles | News Article | Interesting Articles | Technology Articles | Articles On Education | Articles On Corporate | Company Articles | College Articles | Articles on Recession
Sponsored Ads
Hot Jobs
Fresher Jobs
Experienced Jobs
Government Jobs
Walkin Jobs
Placement Section
Company Profiles
Interview Questions
Placement Papers
Resources @ VYOMS
Companies In India
Consultants In India
Colleges In India
Exams In India
Latest Results
Notifications In India
Call Centers In India
Training Institutes In India
Job Communities In India
Courses In India
Jobs by Keyskills
Jobs by Functional Areas
Learn @ VYOMS
GATE Preparation
GRE Preparation
GMAT Preparation
IAS Preparation
SAP Preparation
Testing Preparation
MBA Preparation
News @ VYOMS
Freshers News
Job Articles
Latest News
India News Network
Interview Ebook
Get 30,000+ Interview Questions & Answers in an eBook.
Interview Success Kit - Get Success in Job Interviews
  • 30,000+ Interview Questions
  • Most Questions Answered
  • 5 FREE Bonuses
  • Free Upgrades

VYOMS TOP EMPLOYERS

Wipro Technologies
Tata Consultancy Services
Accenture
IBM
Satyam
Genpact
Cognizant Technologies

Home » Articles » Managing Data Quality

Managing Data Quality








Article Posted On Date : Wednesday, March 21, 2012


Managing Data Quality
Advertisements

Oracle Warehouse Builder 10g Release 2 handles the truth.

How much confidence do you have in the quality of your data? How about the data your business partners are feeding into your data warehouse? As reported in a November 2005 Computerworld article, "Dirty Data Blights the Bottom Line," Gartner estimates that more than 25 percent of critical data in Fortune 1000 companies will remain flawed through 2007 ( computerworld.com/printhis/2005/0,4814,105928,00.html ).

It may be impossible to tell the good data from the bad, because both are reported identically through the same interface. And if you can't tell which data is good and which is bad, all of the data is suspect.

So, how can data quality be improved in the organization?

1. Involve users. Data quality is ultimately a business problem, so people in the business must be involved. People frequently enter the data being used, so they are the first line of defense. People are also the final consumers in most cases and provide the last line of defense.

2. Monitor processes. Bad data actually might have been accurate at one time but has since decayed. For example, prospect lists get outdated. The more outdated the information, the more time and money is wasted trying to sell goods or services to the wrong people. Business processes can ensure timely and accurate updates to data. Streamlining processes where possible can reduce the number of hands touching data, thereby reducing the chances of manual data corruption.

3. Use Oracle Warehouse Builder. In addition to offering database design and extract, transform, and load (ETL) features, it includes the ability to profile, cleanse, and audit data, based on data rules. This technology provides an umbrella over the data warehouse, using predefined rules to catch critical mistakes before they make their way into the decision-making process.

This article demonstrates how Oracle Warehouse Builder 10g Release 2 can profile datasources, determine data rules, and generate corrections. To run the examples in this article, install and configure Oracle Warehouse Builder 10g Release 2 and create a repository by using the Repository Assistant. See Oracle Warehouse Builder Installation and Configuration Guide for Microsoft Windows and UNIX , for setup instructions.

Introducing Oracle Warehouse Builder

Oracle Warehouse Builder is a design tool for modeling and maintaining a data warehouse. Its datasources are not limited to Oracle databases. Oracle Warehouse Builder also supports other databases, including DB2, SQL Server, Sybase, Informix, and Teradata. It also works with file and enterprise application datasources, so it supports the complete collection of data for the organization.

Oracle Warehouse Builder provides profiling and auditing capabilities through its user interface, allowing for complete data quality management when combined with the previously mentioned user participation and process improvements. Historically, profiling and cleansing data required good knowledge of SQL and PL/SQL, but the Oracle Warehouse Builder integrated profiler and Data Correction wizard make this task relatively easy. Auditing capabilities apply consistent data rules to the incoming data stream, making the cleansing of new data a very simple process.

Oracle Warehouse Builder core components are now bundled with Oracle Database 10g Release 2 at no additional charge.

Oracle Warehouse Builder Setup

To see profiling and data cleansing for yourself, run the LOAD_DATA.sql script, available for download at oracle.com/technology/oramag/oracle/06-nov/o66owb.zip , in the same database as your Oracle Warehouse Builder repository owner. The script creates users named STUDENTS and STUDENT_TARGET and creates a table called STUDENT_TESTS, with the following definition:

Name          Null?     Type
------        --------- --------------            
TEST_ID       NOT NULL  NUMBER(10)
TEST_NAME     NOT NULL  VARCHAR2(30)
CONTENT_AREA  NOT NULL  VARCHAR2(10)
GRADE         NOT NULL  VARCHAR2(2)
STATE         NOT NULL  VARCHAR2(2)
SCHOOL_ID     NOT NULL  VARCHAR2(15)
FIRST_NAME    NOT NULL  VARCHAR2(30)
MIDDLE_NAME             VARCHAR2(30)
LAST_NAME     NOT NULL  VARCHAR2(30)
ETHNICITY               VARCHAR2(1)
NEW_TO_SCHOOL           VARCHAR2(1)
AGE_TESTED    NOT NULL  VARCHAR2(3)
GENDER        NOT NULL  VARCHAR2(1)
TOTAL_SCORE   NOT NULL  NUMBER(3)
PERFORMANCE   NOT NULL  NUMBER(1)



The LOAD_DATA.sql script then populates this table with sample test results for use in the steps in this article.

Next, create a repository user as follows:

1. To start the repository assistant, select Start -> All Programs -> OWB Home -> Warehouse Builder -> Administration -> Repository Assistant .
2. Select Advanced Setup and click Next .
3. Click Next .
4. Select Manage Warehouse Builder repository users and click Next .
5. Select Create the registration of one or more Warehouse Builder repository users and click Next .
6. Select your repository owner, supply the password, and click Next .
7. Select the STUDENTS and STUDENT_TARGET users and click Next .
8. Supply the passwords for STUDENTS (which is STUDENTS by default) and STUDENT_TARGET (which is STUDENT_TARGET); click Next ; and click Finish .

With user, data, and repository user configuration complete, you can import the data.

Project Definition

The Oracle Warehouse Builder Design Center organizes metadata by project. Nodes are listed under each project and include items such as databases, data profiles, and data rules.

When you open the Design Center and are logged in as the repository owner, you will see a seed project called MY_PROJECT. You can either use this project or create your own. Perform the following steps to create your own project:

1. Right-click in the Project Explorer and click New .
2. Enter STUDENT_PROJECT as the new project name.
3. Expand the STUDENT_PROJECT project.
4. Expand the Databases node.
5. Right-click Oracle and click New .
6. Enter STUDENT_SOURCE as the module name, click Data Source , and click Next .
7. Select Edit next to the default.

    Provide all connection information, including the STUDENTS / STUDENTS user name and password.
    Make sure you select the correct database version.
    Click Test Connect to verify your connection information.



8. Click Finish .

At this point, the datasource connection is established but you must still select the objects in the datasource. The Import Metadata wizard opens automatically and prompts you for filter information. Complete the wizard as follows:

1. Uncheck everything except Table and click Next .
2. Expand Tables and select STUDENT_TESTS for import.
3. Click Finish .

The datasource is now defined. Next, create the target:

1. Right-click Oracle in the Databases node and click New .
2. Provide STUDENT_TARGET as the module name, click Warehouse Target , and click Next .
3. Select Edit next to the default.

    Provide all connection information, including the STUDENT_TARGET / STUDENT_TARGET user name and password.
    Make sure you select the correct database version.
    Click Test Connect to verify your connection information.

4. Click Finish .

The data is ready for profiling.

Profiling Datasources

Profiling identifies data attributes such as data structure, common formats, common values, and minimum/maximum values. From these findings, Oracle Warehouse Builder determines domain values, or values it believes should be allowed in that column. Not all findings are absolute, however. If your data includes the abbreviations for only half of the states in the United States, for example, only those 25 states will be shown as existing values. You must enter the other valid values before using the values to establish a rule.

To see this in action, do the following:

1. Under the STUDENT_PROJECT project in the Design Center, right-click Data Profiles , and select New .
2. Enter STUDENT_PROFILE for the name and click Next .
3. Select the STUDENT_TESTS table and click Finish .

The Data Profile Editor opens, but the data has not been profiled yet. The Profile Objects window (top left) shows the STUDENT_TESTS table being analyzed. Below that is the Property Inspector window. Modifying the properties in this window modifies the way the profiler analyzes the data.

To start profiling, navigate to Profile on the navigation menu at the top of the Data Profile Editor and click the Profile option. If you have never created a profile before, you will be prompted to create a schema to hold profile data. Follow the wizard's instructions, and continue to create the profile. When you're finished, Oracle Warehouse Builder submits a job that does the work, so even though it returns to the main window, it is working behind the scenes. You can monitor the status of the job in the Monitor Panel at the bottom left of the Data Profile Editor.

When the job completes, you are asked whether you want to retrieve the results now. Click Yes , and the profile results appear in the Profile Results Canvas . Click the Domain tab, and you will see the results shown in Figure 1.

If there is more than one of any value in the source data, it will appear in the Found Domain column. Also note the percentage displayed in the % Compliant column. This reflects the percentage of the source data that complies with the identified domains. CONTENT_AREA , for example, shows READING and MATH . Click the value in the Found Domain column, and examine the Data Drill Panel in the region beneath it to see how these values were derived, as shown in Figure 1.

SCIENCE and WRITING are in the table, but they occur only once, so they are not included in the Found Domain column by default. You'll also notice a value of WRITEING (misspelled) in the list. Click this value, and the source row is shown to the right.

Defining Data Rules

Now that the profiling is finished, it is time to create some data rules. With the value in the Found Domain column and the CONTENT_AREA row still selected, click the Derive Data Rule button to launch the wizard. Follow these steps:

1. Enter CONTENT_AREA_CHECK as the name for the data rule.
2. On the Define Rule page, choose SCIENCE as a selected value.
3. Type WRITING as a fourth value and click Finish .

The rule is displayed with a check box beside it in the Data Rule Panel , as shown in Figure 2. If additional rules are defined in the profile, they are listed here as well.

Data Correction Wizard

The data rule can be used to create an automated correction strategy. To create a correction, use the Data Correction wizard as follows:

1. Select Profile -> Create Correction from the menu at the top of the Data Profile Editor.
2. Choose Select an existing module , choose STUDENT_TARGET , and click Next .
3. Verify that the STUDENT_TESTS table is selected and click Next .
4. Verify that CONTENT_AREA_CHECK is selected as a data rule and click Next .
5. Click Next .
6. Choose Cleanse as the action and Soundex Match as the cleanse strategy, as shown in Figure 3, and click Next .
7. Complete the wizard by accepting the defaults on all remaining screens and click Finish .

The wizard generates the mappings and functions that are required to perform the correction. To see the mappings, tables, and functions created by the wizard, click the Corrected Modules tab, as shown in Figure 4. These mappings show how Oracle Warehouse Builder processes the data through temporary and staging tables to complete the correction.

Double-click the M_STUDENT_TESTS mapping to open the Mappings Editor. The tables may appear one behind the other in the window. To see how the mappings are laid out, separate the tables and organize them as shown in Figure 5.

figure 1
Figure 1: Domain tab in the Profile Results Canvas
figure 2
Figure 2: Data Rule Panel
figure 3
Figure 3: Cleansing with the Data Correction wizard
figure 4
Figure 4: Corrected Modules tab
figure 5
Figure 5: Viewing tables in the Mappings Editor


To see the function created by the Data Correction wizard, return to the Data Profile Editor and click the SDX_CONTENT_AREA function. Click the Implementation tab, and the function in Listing 1 is displayed.

Code Listing 1: SDX_CONTENT_AREA function

type t_arr is table of varchar2(2000) index by binary_integer;
s t_arr;
begin
  s(0):= 'MATH';  s(1):= 'READING';  s(2):= 'SCIENCE';  s(3):= 'WRITING';

  -- check for equality first. Function may be called when there is no error.
  for i in s.first..s.last loop
    if (s(i) = CONTENT_AREA) then
      return CONTENT_AREA;
    end if;
  end loop;

  for i in s.first..s.last loop
    if (soundex(s(i)) = soundex(CONTENT_AREA)) then
      return s(i);
    end if;
  end loop;
  return CONTENT_AREA;
end;



Next Steps




Conclusion

Oracle Warehouse Builder's Data Profile Editor simplifies rule definition. Simply specify the source and target, and all of your data is analyzed. Domain values are derived from existing data, providing a great starting point for rule definition. After you create rules, Oracle Warehouse Builder's Data Correction wizard makes generating corrections easy. Specify the rules that should be applied as well as the action to take when a rule is violated, and all mappings, tables, and functions are generated automatically. 






Sponsored Ads



Interview Questions
HR Interview Questions
Testing Interview Questions
SAP Interview Questions
Business Intelligence Interview Questions
Call Center Interview Questions

Databases

Clipper Interview Questions
DBA Interview Questions
Firebird Interview Questions
Hierarchical Interview Questions
Informix Interview Questions
Microsoft Access Interview Questions
MS SqlServer Interview Questions
MYSQL Interview Questions
Network Interview Questions
Object Relational Interview Questions
PL/SQL Interview Questions
PostgreSQL Interview Questions
Progress Interview Questions
Relational Interview Questions
SQL Interview Questions
SQL Server Interview Questions
Stored Procedures Interview Questions
Sybase Interview Questions
Teradata Interview Questions

Microsof Technologies

.Net Database Interview Questions
.Net Deployement Interview Questions
ADO.NET Interview Questions
ADO.NET 2.0 Interview Questions
Architecture Interview Questions
ASP Interview Questions
ASP.NET Interview Questions
ASP.NET 2.0 Interview Questions
C# Interview Questions
Csharp Interview Questions
DataGrid Interview Questions
DotNet Interview Questions
Microsoft Basics Interview Questions
Microsoft.NET Interview Questions
Microsoft.NET 2.0 Interview Questions
Share Point Interview Questions
Silverlight Interview Questions
VB.NET Interview Questions
VC++ Interview Questions
Visual Basic Interview Questions

Java / J2EE

Applet Interview Questions
Core Java Interview Questions
Eclipse Interview Questions
EJB Interview Questions
Hibernate Interview Questions
J2ME Interview Questions
J2SE Interview Questions
Java Interview Questions
Java Beans Interview Questions
Java Patterns Interview Questions
Java Security Interview Questions
Java Swing Interview Questions
JBOSS Interview Questions
JDBC Interview Questions
JMS Interview Questions
JSF Interview Questions
JSP Interview Questions
RMI Interview Questions
Servlet Interview Questions
Socket Programming Interview Questions
Springs Interview Questions
Struts Interview Questions
Web Sphere Interview Questions

Programming Languages

C Interview Questions
C++ Interview Questions
CGI Interview Questions
Delphi Interview Questions
Fortran Interview Questions
ILU Interview Questions
LISP Interview Questions
Pascal Interview Questions
Perl Interview Questions
PHP Interview Questions
Ruby Interview Questions
Signature Interview Questions
UML Interview Questions
VBA Interview Questions
Windows Interview Questions
Mainframe Interview Questions


Copyright © 2001-2024 Vyoms.com. All Rights Reserved. Home | About Us | Advertise With Vyoms.com | Jobs | Contact Us | Feedback | Link to Us | Privacy Policy | Terms & Conditions
Placement Papers | Get Your Free Website | IAS Preparation | C++ Interview Questions | C Interview Questions | Report a Bug | Romantic Shayari | CAT 2024

Fresher Jobs | Experienced Jobs | Government Jobs | Walkin Jobs | Company Profiles | Interview Questions | Placement Papers | Companies In India | Consultants In India | Colleges In India | Exams In India | Latest Results | Notifications In India | Call Centers In India | Training Institutes In India | Job Communities In India | Courses In India | Jobs by Keyskills | Jobs by Functional Areas

Testing Articles | Testing Books | Testing Certifications | Testing FAQs | Testing Downloads | Testing Interview Questions | Testing Jobs | Testing Training Institutes

Gate Articles | Gate Books | Gate Colleges | Gate Downloads | Gate Faqs | Gate Jobs | Gate News | Gate Sample Papers | Gate Training Institutes

MBA Articles | MBA Books | MBA Case Studies | MBA Business Schools | MBA Current Affairs | MBA Downloads | MBA Events | MBA Notifications | MBA FAQs | MBA Jobs
MBA Job Consultants | MBA News | MBA Results | MBA Courses | MBA Sample Papers | MBA Interview Questions | MBA Training Institutes

GRE Articles | GRE Books | GRE Colleges | GRE Downloads | GRE Events | GRE FAQs | GRE News | GRE Training Institutes | GRE Sample Papers

IAS Articles | IAS Books | IAS Current Affairs | IAS Downloads | IAS Events | IAS FAQs | IAS News | IAS Notifications | IAS UPSC Jobs | IAS Previous Question Papers
IAS Results | IAS Sample Papers | IAS Interview Questions | IAS Training Institutes | IAS Toppers Interview

SAP Articles | SAP Books | SAP Certifications | SAP Companies | SAP Study Materials | SAP Events | SAP FAQs | SAP Jobs | SAP Job Consultants
SAP Links | SAP News | SAP Sample Papers | SAP Interview Questions | SAP Training Institutes |


Copyright ©2001-2024 Vyoms.com, All Rights Reserved.
Disclaimer: VYOMS.com has taken all reasonable steps to ensure that information on this site is authentic. Applicants are advised to research bonafides of advertisers independently. VYOMS.com shall not have any responsibility in this regard.