Develop sql statements for major business intelligence


Case Study Part 2 Assignment

The assignment for part 2 of the case study extends your learning experience from units 6 and 7 with new Pentaho features and transformation concepts. You will use most of the Pentaho steps and transformation concepts from assignment 4. You will use new steps for adding constants and updating tables. You will extend transformation processing to capture rejected change data in an error log (Table Output step), use a one-sided outer join to delay rejection of non-matching rows, use the Oracle pseudo column RowNum in SELECT statements, and use output steps (Insert/Update and Update) to check referential integrity rather than a merge join step. In addition, you will work with a larger amount of change data.

As preparation for this assignment, you should review the guided tutorial about Pentaho Data Integration for Oracle and your graded assignment in unit 7. You should also read the case study documents about data integration background (unit 13) and data warehouse design (unit 12). Since this assignment only uses a subset of the data warehouse, you do not need to have detailed understanding of the unused parts of the data warehouse design.

Since the requirements for this assignment are detailed, you should plan to read the assignment and related background documents carefully. You may want to develop the transformation incrementally for each major requirement. You should use the test data warehouse tables before you apply your transformation to the production data warehouse tables.

Validation Requirements
Your Pentaho transformation should perform five types of validation. Change rows should be rejected for each violation of a validation rule.
- Null values: reject any column except the primary key (Change_Id) for null values. You do not need to perform a null value check on the primary key column.

- Foreign key references: reject invalid foreign key references for location (Location_Id), customer (Customer_Id_Ordered_By), sales agent (Sales_Agent_Id), and sales class (Sales_Class_Id).

- Date references: reject invalid dates in date columns (Contract_Date, Date_Promised, Date_Shipped_By). The dates are stored as text in the job change data table with a format of yyyy-mm-dd. The combination of month, day, and year should be a valid date that exists in the W_Time_D table. Note that invalid date and foreign key references will be captured in the same error log step as explained in the section about using the Insert/Update step.

- Business day differences: reject a job change row if differences between dates do not satisfy difference constraints. Since this constraint involves differences in business days, the difference is computed by subtracting row numbers associated with Time_Id values in the W_Time_D table. Note that you cannot just subtract the Time_Id values because of the business day requirements. The Additional Tips section provides background about retrieving row numbers.

o The row number of the Date_Promised must be between 14 and 30 days after the row number of the Contract_Date.
o The row number of the Date_Ship_By must 2 to 7 days before the row number of the Date_Promised.
o You will need a Calculator step to compute the difference in days after retrieving row number values associated with each date column in the job change data table.
o Note that the time dimension table contains a custom calendar with only working days, not weekends. Holidays, if falling on week days, are included in the time dimension table, however. Note, that you cannot use a date difference function to calculate the difference in days because the constraint involves differences in work days, not just the difference in days.

- Lead validation: The Lead_Id in the job change data table must exist in the W_Lead_F table. In addition, the Created_Date in the W_Lead_F table should be chronologically before the Contract_Date in the job change data table. This validation checking occurs after inserting a row into the W_Job_F table. The other validation rules are checked either before or during the insertion into the W_Job_F table.

Inserting into the Job Fact Table and Updating the Lead Fact Table

You should use the Insert/Update step to insert a row into the W_Job_F table. You should use an Oracle sequence object to generate the primary key of the W_Job_F table using an Add Sequence step before the Insert/Update step. An item in the Additional Tips section explains the usage of the Insert/Update step.

You should use the Insert/Update step to check foreign key errors for location (Location_Id), customer (Customer_Id_Ordered_By), sales agent (Sales_Agent_Id), and sales class (Sales_Class_Id), as well as the Time_Id values associated with the date columns. Since the merge join step does not provide error flows, you cannot use the merge join step to check foreign key errors. The merge join step will reject non-matching rows, but it does not allow non matching rows to be sent to another step for error processing.

To update the W_Lead_F table, you should use the Update step, not the Insert/Update step. Although you did not use the Update step in unit 6, it is like the Insert/Update step. You should update the W_Lead_F table after inserting a row into the W_Job_F table. If the validation in the Update step succeeds, you need to update the Job_Id and Success columns of the W_Lead_F table. The Job_Id should be assigned the generated Job_Id in the stream. The Success column should be set to "Y".

Rejected rows will not update the W_Lead_F table. However, after inserting a row into the W_Job_F table, a validation error may occur when updating the W_Lead_F table. There is no way to avoid this anomalous situation in which a row is inserted into the job fact table, but the associated lead fact table row is not updated.

Error Log Insertion

If a row in the job change data table fails a validation rule, a row must be inserted into the error log table. For all validation errors, the change row should contain the generated Log_Id, Change_Id of the rejected row, and an error message (Note column) indicating the reason for rejection. For errors related to updating the W_Lead_F table, the error log should contain the Job_Id of the related W_Job_F row. The Log_Id value should be generated by an Oracle sequence and available in the stream. Table 1 summarizes error handling using the error log.

Table 1: Error Handling using the W_Error_Log Table

Type of Error

Columns

Comments

Null value errors

Change_Id, Note

Log_Id generated using a sequence. Job_Id is null in the error log row.

Date difference errors

Change_Id, Note

Log_Id generated using a sequence. Job_Id is null in the error log row.

Foreign key errors including date FKs

Change_Id, Note

Log_Id generated using a sequence. Job_Id is null in the error log row.

Lead update errors

Change_Id, Job_Id, Note

Log_Id generated using a sequence. Job_Id should come from the stream.

You send a rejected row to another step in Filter Rows, Insert/Update, and Update steps. The Merge Join step does not have a feature for rejecting rows. In a Filter Rows step, you should send false data to another step to insert rows into the W_Error_Log table. In Insert/Update and Update steps, you specify error handling using the Define Error Handling item (right click on the step and select this item near the bottom of the item list).

To facilitate error messages in the error log table, you should create an "Add constants" step in the initial steps of the transformation. You need to assign values to constants for error messages and the Success column in the W_Lead_F table. You need to specify the data type for each constant field in the step. You should create separate fields in the "Add constants" step for each type of validation error: invalid foreign key errors, invalid date differences, null value errors, and lead table errors.

Case Study Part 3 Assignment

The assignment for part 3 of the case study extends your learning experience from units 8 and 9. You will be challenged to develop SQL statements for major business intelligence needs. Some of the statements will be more complex than you encountered in unit 9 due to the larger schema size and relationships among fact tables.

As preparation for this assignment, you should review material about the GROUP BY clause as well as analytic functions. You need to sharpen skills for query formulation involving row summaries as a foundation for this assignment. You also should review concepts and examples involving analytical functions in unit 9.

For this assignment, you should understand business intelligence needs and relate the needs to the data warehouse tables. For business intelligence needs, you should read the associated document providing background about analytical query formulation. After you are clear about the database representation for a problem, you should then begin writing SQL statements. To help structure your query formulations, you should create basic statements without the analytic function elements. After your basic statements execute correctly, you can then revise them for analytic function elements.

Base Queries
The base queries involve two broad areas, revenues/costs and quality control. To facilitate reuse of base queries with analytic function requirements, CREATE VIEW statements are required for some of the base queries.

To calculate the difference in business days in the base queries for contractual delays (BQ5 and BQ6), you should use the getBusDaysDiff function as shown in Appendix A. You can use this function to calculate the difference in business days between two-time identifier values. After compiling this function, you can use it in a SELECT statement just like you use Oracle built-in functions. Without this function, calculating the difference in business days involves much more complex SQL coding.

To assist with formulation of analytic queries, you should create views (not materialized views) for the queries involving revenue summary, cost summary, date promised delays, and shipped by date delays. You can use these views just like base tables in queries for analytic functions.

Queries involving Analytic Functions

The base queries can be used in many queries involving analytic functions. This assignment involves a small subset of possible analytic queries using the base queries. In some analytic queries, you should extend base queries with analytic functions. In other analytic queries, you should use a view in the FROM clause instead of directly extending base queries.

Analytic queries involving customer revenue trends

You should write two analytic queries extending the base query for job revenue trends as summarized in Table 2. Both analytic queries involve window comparisons.

Attachment:- Case Study.zip

Request for Solution File

Ask an Expert for Answer!!
Dissertation: Develop sql statements for major business intelligence
Reference No:- TGS02707889

Expected delivery within 24 Hours