Create and explain an erd to go with this data your erd


Assignment

This assignment starts with the script, "Week6_business_units.sql" . This script should create a table called "business_unit" and a table called "Product_BU." Unfortunately, the metadata descriptions have been lost, so you will need to figure out what you can from the SQL script. The only thing you know about the metadata is that the company runs several individual strategic business units, such as "On The Go" and "Snack." Each of these business units is run under an umbrella designation, such as "Growth" or "Decline." The company will run marketing for growth products differently than it would run marketing for products on the decline.

You also have product order files from 2012, 2013, and 2014. They are attached as .csv files titled

• "2012_product_data_students.csv"
• "2013_product_data_students.csv"
• "2014_product_data_students.csv"

Your job is to use SQL to perform an ETL which will accomplish the following:

1. Extract data from the 2012, 2013, and 2014 order files

2. Transform the data according to the given rules

3. Load it into one final table

4. Export your final output table under the name "GX_output_final.csv" . (You may create as many or as few data objects as you like in your work, but the data in the .csv file named "GX_output_final.csv" will be the data evaluated.

Please name the computer files you submit for this assignment with a "GX" prefix, where "X" is your group number. For example, if you are in Group 3, you might create an SQL script named "G3_extract_2012.sql" (Ensure your group number and group member names are commented in any script you turn in as well.)

This is so when we grade the work, it's clear which bit came from which group. You should get credit for your good work!

You may write one large SQL script to accomplish the entire process. You may also break your SQL commands into smaller groups, interspersed with MySQL GUI commands. If you do this, your notes should reflect what you did (for example, in the Appendix you could say "We created database YYY, and then used the "import" button on the MySQL GUI interface to upload the .csv file into Table Z. Then we ran the script shown in Figure X ...")

Please only use MySQL in our VDA in this assignment. The only exceptions here are minor edits made using Notepad or Excel, such as putting headers on column names. Document these carefully in your Appendix; if your SQL script doesn't write column headers, but your output file magically has them, we want to know how they got there. You can just say something like "After we did << XXX >> to export the data, we used Notepad to insert Row 1, which are the header names.")

Remember, you have learned how to download and run a .sql script in the Virtual Lab. And in Week 4, we learned how to use FileZilla to retrieve the results of an outfile. You will need both of these skills this week.

A note about outfile names: we know the SQL server and FileZilla don't let you easily overwrite an output file, so you may find yourself going through several iterations of output file names, such as "outfile_01" and "outfile_02." It's okay if you need to manually rename your final output file from "outfile_99" to the name requested above just before you turn it in. Just make a note if you did this. (You don't need to hold your breath and hope you get the code to run perfectly the very first time.)

Detailed Instructions:

Extraction: Your extracted data should meet the following criteria for each of the 2012, 2013, and 2014 data sets.

1. Business Unit Designations are for "Growth" and "Mature" only; do not choose any orders which are associated with a "Decline" designation

2. You will need to make a business decision about whether you want to extract records with a quantity of 0 or an order total of 0. Please note your decision and the logic behind it in your management memo.

Transformation: Your output file should follow this format, for loading into the data mart. A sample of some output is given below; note that your data may or may not match these numbers.

1. BU Designation - this is Growth and Mature; please roll up by this field
2. BU Name - no transformations; roll up by this field within BU Designation
3. Product - no transformations; roll up by this field within BU Name
4. Region - no transformations; roll up by this field within Product
5. Year - no transformations; roll up by this field within Region
6. Month - no transformations; roll up by this field within Year

7. Sum of Quantity - this reflects the sum of the "Quantity" field in the relevant data. For example, in the data below, the first line indicates that for Growth/Energy/Purple Pain/Eastern/2012/April, there was a total of 20 Purple Pain packets sold. This could reflect twenty 1-packet sales, four 5-packet sales, or one sale of 20 packets.

8. Sum of Order Total - this reflects the sum of the "Order total" field in the relevant data. For example, in the data below, the first line indicates that for Growth/Energy/Purple Pain/Eastern/2012/April, there was a total of 6960 cents in revenue from the 20 Purple Pain packets sold. (This implies a price of 6960/20 = 348 cents, or $3.48 per Purple Pain Packet in 2012.) You can assume pricing is stable throughout a calendar year, and any price changes happen instantaneously at midnight on December 31 and apply to the entire next year.

Load: Your deliverable is a single .csv file with the applicable data in it. It should contain only the fields listed above, and should be sorted alphabetically (or numerically) ascending in each field, with the leftmost fields having precedence. For example, you should first sort on BU Designation, and within that, sort on BU_Name. Your one data file should contain the data from all three years (2012, 2013, and 2014). Make sure to use your .csv editor (such as Notepad or Excel) to insert the field names on your .csv file after you have exported from SQL.

Management Memo

Your team writes a memo to management outlining your answers to the following questions:

1. Create and explain an ERD to go with this data. Your ERD should describe the business situation in existence as best as you can infer it. Since your input files are not necessarily in the best shape, your ERD should not simply map the input files. Your output file is by definition a flat file with no major database schema, so your ERD shouldn't map that either. As a hint, consider this: based on the data here, what relationship can you infer exists between BU Designation and Product? One to one? One to many? Must-have or may-have? Use ER Assistant to do your ERD, and incorporate a screenshot of your ERD in the management memo. (You do not need to attach the ER Assistant file.)

2. Document your ETL process. Which functions did you use, and what logic did you follow? This should be at the level that your boss, who has an MBA but not an IT/database background, can follow it. Do not use "computer-ese" here; use regular business English.

3. Give metadata for your final deliverable file. The analysts who follow you will thank you.

4. Your boss has a question for you. "We think this is about the right level of granularity for our data mart. What do you think? Should we extract more detailed information, and if so, what? Or would you recommend going to a coarser level of granularity, and if so, what fields would you recommend we drop?" Give your rationale. Think critically, and demonstrate a good understanding of data management.

5. Your boss wants to know the answer to this business question: "We believe our Growth segment should show at least 10% year over year growth in either quantity sold or order total. We also believe our Mature segment should remain pretty much the same in terms of quantity and order totals. If I give the final data file you produced to Ramon (an expert analyst), can he run queries to answer this?" (You may wish to run a query or two as proof of concept.) Tell the boss if you believe the data as laid out like it is will easily support Ramon in that sort of analysis. If it will, what about it makes it easy? If it won't, how could it change to support this analysis?

6. Your boss has another question: "Our database folks have suggested we use a different format for the ETL if I'm so interested in growth. It's copied below. It's the exact same data, just a little differently arranged. What do you think of it? Bobby, one of my IT people, thinks a data mart with this layout is a brilliant answer to the growth question. But Susie, another one of my IT people, has concerns that this data layout will make it hard to query on any other dimension, such as whether a particular product is doing well or poorly in a given region, regardless of year, or monthly seasonal trends. Am I missing anything here? What do you recommend? If we had to go with just one layout of our data mart, which layout should it be?"

Existing layout:

Proposed layout:

A successful memo will meet the following criteria:

• Times New Roman, double spaced, 12-point font, with 1-inch margins
• Contain a cover page with your group's number and all group members' names on it
• Contain a bibliography in APA format citing appropriate references (you may need to only cite this classroom and the Reference Manual; if you look up other sources cite them too.)
• Pass a Turnitin check for plagiarism
• Be in memo form, addressed to your boss, in business English (not computer-ese). Technical talk goes in the Appendix.
• Be of reasonable length. There are no page minimums or maximums, but please be reasonable. Something on the order of 10 pages or less for the written memo should probably suffice; the Appendix may run longer.

• An Appendix with any technical information you want to include. This could be technical explanations of how you used the GUI, or other nerdspeak.

• Classic APA formatting calls for all figures, exhibits, and tables to be in the Appendix. I'm relaxing this requirement here. If a diagram (for example, a flowchart of something) would make more sense in the body of your paper, put it in the body. If it would make more sense in the Appendix, leave it in the Appendix.

Attachment:- Attachments.rar

Solution Preview :

Prepared by a verified Expert
PL-SQL Programming: Create and explain an erd to go with this data your erd
Reference No:- TGS02347397

Now Priced at $35 (50% Discount)

Recommended (95%)

Rated (4.7/5)