Inss 422 - business intelligence - identify data anomalies


Q1. Given the following file for assignment worker.com, identify data anomalies that must be removed before data can be loaded in data warehouse.

Worker_assignment ← -----------------on course web site

File is available in SAKAI

Assignment_worker(assignment_no, assignment_date, emp_number, chg_hour,assigned_hour, charges)

Where assignment number is the number assigned to an assignment
Assignment_date is the date assignment started
Emp_number is the number of employee assigned to that assignment
Chg_hour is amount paid to that employee for that assignment
Assigned_hour is the hours assigned to that employee for that assignment
Charges are the Total charges for that employee for that assignment (this is calculated as Chg_hour*assigned_hour)

Rules:
- Assignment numbers always start with a letter followed by a 1 and are ALWAYS four characters long
ex: A123, Z178

- Emp No IS always 3 CHARCATER LONG

- An employee can not work more than 40 hour on a given project

Requirement:
Count (using EXCEL formulas -- IF, countif etc. as done in class) four types of errors:

- Missing data
- Incorrect Format
o To check length of empno--you can use LEN(cell address) to get length of item in that cell
o check for assignment number format (BONUS +1 points)
- Zero values
- Incorrect Calculations
o check for charges
charges= chg_hour*Assigned_Hour
o check for employee working more than 40 hours

Once counted

- Draw the pie chart or line chart of data anomalies and
- Discuss what errors can be corrected and how. (submit in WORD)

Must submit the EXCEL worksheet where errors are calculated and graph is drawn

Q2

Data integrity is a required feature of data warehouses. P & G is building a data warehouse and have run in data integration problems. They need to get data from 2 different users and combine them to maintain data integrity in their data warehouse.

The sources are:

Asia region
North American Region

Both region have data stored in different formats in two different files (employee_asia and emp_NA

Both tables are available in account Aggarwal as READ ONLY. You must create a copy in your account before using it.

Or

you can create your own tables.

SHOW ALL QUERIES AND OUTPUTS

1. CLEAN the data in required format (for gender, country of origin, job_class and seniority)
a. Employee gender should be standardized, i.e., male should be changed to m and female to f
b. Country should be spelled completely, i.e, USA should be spelled out as United States of America
c. Ceylon no longer exists, change the name to Sri Lanka
d. Name is one attribute in dimension table, combine name as last and first, example Bora (last) and Lakshmi (first) should be modified to Bora, Lakshmi
e. Calculate both job_class and seniority

2. Create CLEAN_ASIA table

3. Create CLEAN_NA table

4. Combine the two using UNION to create following table

EMPLOYEE_DIM (Employee Id, Employee_name, seniority, gender, country, job_class)

5. Show the contents and structure of EMPLOYEE_DIM table.
6. Give a count of male and female employees

Q3 Revise the data warehouse based on new requirements (same as what we did in class)

Attachment:- Assignment.rar

Attachment:- archive 2.zip

Solution Preview :

Prepared by a verified Expert
Database Management System: Inss 422 - business intelligence - identify data anomalies
Reference No:- TGS02938648

Now Priced at $40 (50% Discount)

Recommended (96%)

Rated (4.8/5)