Write the sql necessary to extract and process the data


Overview

A complete data warehouse with a snowflake schema should be designed to address key management objectives in a common business situation, e.g., hotel industry. Two operational databases for that domain are evaluated and a series of SQL procedures written to extract, transform and load data from the two operational databases into the data warehouse. SQL needed to address management's key objectives using the fact and dimension tables of the data warehouse are created along with a description of the results suitable for presentation to management.

Assignment Resources

1) ER diagrams for two operational databases for the business situation to review:

Corp1ERD-Revised-Ayyaz.jpg , Corp2ERD-Revised-Ayyaz.jpg

2) Data dictionaries for two operational databases for the business situation to review:

Corp1Data Dictionary.xls , Corp2Data Dictionary Revised.xls

3) Slides from the DW project Web Ex:

DataWarehouseProjectOverview.pptx

4) IBM Website Description of Fact Table Grain:

https://www.ibm.com/support/knowledgecenter/SS9UM9_9.1.0/com.ibm.datatools.dimensional.ui.doc/topics/c_dm_design_cycle_2_idgrain.html

Business Situation Description:

You work for a large corporation that has just purchased 2 hotel and resort corporations each consisting of over 100 hotels. Each Corporation operates a custom database. You are provided the data dictionary and ER diagrams for the two operational databases. Management would like you to design a data warehouse that allows them to achieve the most competitive advantage possible.

(Note: The databases you will evaluate come from student groups in another class responding to the Hokie Resort problem. This problem also is provided for your review.)

Please name the document "Leslie_Alexandra _Project2.docx."

1 Review the submission Hokie Resort situation description to understand the problem domain(see Business Situation Description below)

2 Write the three most important questions that management must answer to achieve a competitive advantage in the hotel market.
Submit the questions and a brief (1-2 pages) explanation/justification of why these are the most important questions.

3 Design a Data Warehouse Star or Snowflake schema that is sufficient for addressing these questions.
Submit an ER Model of your data warehouse schema.
See this example of a schema from the Hokie Hospital problem: HospitalDWSchema.vsdx
Example as pdf: Visio-HospitalDWSchemav2.pdf

4 Implement your Data Warehouse Star or Snowflake schema in MySQL or other DBMS.
Submit MySQL of other DDL that implements your data warehouse schema.

5 Populate MySQL or other DB with sufficient rows to demonstrate your Data Warehouse

This will require you to create rows for all dimension (including time) and fact tables in your Data Warehouse. You must insert sufficient rows in the DW to be able to execute the ROLLUP queries for the final step in this assignment.
This script is from the textbook and loads a sample data warehouse: DW-DBINIT.SQL

6 Analyze the ER diagram and data dictionary from both of the operational databases to determine if the two operational hotel databases have the data needed for your data warehouse.

For each DB, create a mapping that shows the tables from that DB that are used to create rows in your data warehouse tables. For each data warehouse table, describe how the operational data is aggregated to create a row in the table. Submit your mapping and aggregation summary in the following format.

DatawarehouseTable Operational DB Table Aggregation/Sum
PatientDim Corp1: Patient No aggregation, each row is an instance in the DW.
PatientDim Corp2: Customer No aggregation, each row is an instance in the DW.
MediationsFact Corp1: Prescriptions Count and average amount of drug given are created.
MediationsFact Corp2: Drugs Provided Count and average amount of drug given are created.

(Note: If an operational database does not contain the data needed for your data warehouse design, then propose revisions to the existing tables in the DB or define additional tables to be populated in the DB so that it will contain the data needed for your data warehouse). (5 points)

7 Write the SQL necessary to extract and process the data from the two operational databases so that it will be suitable for your elements of your data warehouse.

The mapping you made in question 4 should help in this process. This requires writing SQL procedures that include SELECT statements from the operational DBs and INSERT INTO the data warehouse tables (Note: you do not have to make the procedures work). You should extract and load data for two of your dimension tables and one of your fact tables from each operational DB. In addition, you should show the population of the time dimension and include it on your fact table rows. Pay attention to the correct grouping and aggregation necessary to transform the operational data into the form needed for your data warehouse. An example of the procedure file is provided from a previous semester using the Hokie Hospital problem in this file: . Your task is to make a similar procedure that will extract the data from the operational databases into your data warehouse design.

Submit your procedures with a .sql file extension so that we can review them within Notepad++. While any logical organization of procedures is acceptable (provide a brief justification/rationale for others), the preferred approach is to create one SQL procedure for each dimension table and one for the fact table from your data warehouse. These procedures should access both operational DBs and be as simple as possible.

Find an example of the procedures needed here:
Dimension table: Q5_SampleProcedure_Hokie_Hospital_DimensionV2.sql
Fact table: Q5_SampleProcedure_Hokie_Hospital_FactV2.sql

8 Write the SQL that will be needed to answer the three most important questions using your data warehouse. This requires writing 3 SQL statements/procedures.

Attachment:- Project Instructions.rar

Solution Preview :

Prepared by a verified Expert
Database Management System: Write the sql necessary to extract and process the data
Reference No:- TGS02276749

Now Priced at $50 (50% Discount)

Recommended (97%)

Rated (4.9/5)