Data warehousing distance learning spring 2017 -


Module Learning Outcomes

Module Learning Outcomes are the official statement of what you are intended to gain from the module. ("On successful completion of the module the student will be able to ....."). All module specifications carry these statements.

In the table below we list the three Module Learning Outcomes for Data Warehousing. Then in purple italics we state what this assignment asks you to do in order to demonstrate your achievement of each MLO.

- Apply concepts and justify decisions when modelling, designing and constructing practical examples or paper descriptions of applications in this area.

You are provided with data and a working project definition for a Simple Star. Demonstrate your ability to design, apply and discuss FOUR sets of changes to implement more advanced concepts.

A range of suggestions for concept topic areas is given.

- Describe and critically evaluate the role and relevance of data warehousing and analytical investigation to the solution of business information problems.
and ...

- Explain the concepts that underpin the subject area of data warehousing, making reference to main established concepts and some developing areas.

Using online and / or printed literature sources, critically evaluate your implementation, in particular addressing the question "Except for scale (number of rows of data), what is different about your implementation compared to what you would expect for a full industrial* DW implementation?"

Part A Amend/extend an existing Data Warehousing implementation.

You are provided with all the data and a working project definition for a Simple Star. Demonstrate your ability to design, apply and discuss FOUR sets of changes to implement more advanced concepts.

A range of suggestions for concept topic areas is given.

During the module you have been provided with an SQL Database "SalesStarAssignment" containing data tables forming a Simple Star, plus a Visual Studio (aka Data Tools) Project to generate a Data Cube. The teaching notes also provided small number of NewFacts tables, each representing three months and one day's further information on sales.

Amend and/or extend this application in FOUR of the following ways

- Ensuring data quality

The datasets provided for this assignment have been checked to ensure that all data is valid. Even within the provided FactUpdate tables there are no incorrect dates, and no references to items that are not listed in the Dimension tables.

In real implementations it is very unsafe to assume that all entries in NewFacts are valid, because these tables are usually compiled from data from operational data sources, and those sources may have low quality control.

- Use literature (cite your sources, of course) to find what you believe to be the most common data quality problems that occur with new facts.

- Edit one of the provided FactUpdate tables to include examples of the problems you list.
Implement processes to detect and resolve the "problems" you introduced into the data set. In your report illustrate and explain your processes.

- Implementing additional Calculated Measures and Key Performance Indicators (KPIs)

The teaching notes for this module show how to implement Calculated Measures and KPIs.

- Identify further situations where Calculated Measures and KPIs would be useful for data analysis and implement your own examples. In your report illustrate and explain your work. Higher marks are available for examples that go beyond the module's basic teaching notes - cite the literature sources for all techniques you use that go beyond the course notes.

- Integration of data from different sources including data conformation

Nearly every Data Warehouse for analytic systems draws data from multiple sources. For example, sales could be made in a number of different countries and reported in slightly different forms (eg different currencies, or using different codes/names for the same items).

Assume the "SalesStarDemo" company is receiving its sales data from two franchised outlets, each of which report its data in formats that are not encoded in the same way as the provided Facts table. For example, for some fields the two sources don't even the same labels for the same things (they might sell the same product, but under different branded names).

- Create two tables, each representing a day's sales for one of the franchises (the number of records does not need to be large). Implement and explain processes to generate a unified NewFacts table that is appropriate for then uploading into the Production star.

- Explain any scripts and Staging area tables (eg lookup tables) you create. Wherever you use techniques you read about in the literature, cites sources.

- Cubes with multiple fact tables

The above example discusses a situation where multiple sources are effectively all about the same things (sales by our company), but coded inconsistently. During the Extract-Transform-Load processes, these sources are integrated and recoded into a single NewFacts table, which can then be appended to the Simple Star held within SalesStarDemo.

In other situations the multiple sources may be about related but different things. In this case it is often appropriate to generate Complex Stars with multiple Facts tables, and new Dimension tables.

Assume that our case study company is able (legally! perhaps though a market research company) to get hold of a summary of competitors' sales data on a monthly aggregated basis (i.e. once a moth we receive a report listing totals of HOW MUCH each competitor has sold in that month, but not to which customer or exactly what date). You may need to make other assumptions about the contents or level of detail in the summary report.

- Show and explain the following: Change the provided Data Warehouse to implement the Competitor data as a separate Facts table. Populate the new Facts table with suitable data. Rebuild the Cube such that it now has two Facts tables. Demonstrate the use of the Cube. If you make other design decisions, explain these. Cite any literature sources of help.

- Other (Eg Visualisation, Slowly Changing Dimensions, Use of Tabular facilities, your own choice)

- The module teaching notes make brief mention of quite a number of other techniques not listed in the "dot" titles above. Learn about one of these from literature (eg online tutorials; give references) and apply the approach to the "SalesDemo" dataset.

Marks will be awarded according to the extent of independent learning you demonstrate.

The total writing for Part A should be around 1200 words. Words beyond 1400 will not be read.

Part B

Using literature sources, critically evaluate your implementation, in particular addressing the question "Except for scale (number of rows of data), what is different about your implementation compared to what you would expect for a full industrial DW implementation?"

To answer Part B well you will have to establish what methods of implementation there are for full commercial or production analytical data warehousing.

You can get this information by reading books, journals and company white papers. You can use video lectures, tutorials, people's online blogs, company adverts too - but do not rely only on non-peer reviewed sources.

Many items can be accessed online books through the Library "Gateway" facility - see menu bars of Blackboard.

Almost certainly you will discuss what several authors say, highlighting the similarity or differences between their answers (try to analyse why they differ - eg what perspective are the authors taking? when was their document was written? does it have a bias towards a particular application/usage sector, etc), and you will use this comparison to review some pieces of your implementation, to explain to what extent your work is representative of what DW industry or researchers say is the topic of "data warehousing".

It is recommended that you pick a small number of topics (eg three, or four) and discuss these in detail rather than take a large number of topics and only discuss each in trivial depth.

Examples of topics you might discuss are:-

- What methodologies are used to structure data warehousing projects ("Inmon vs Kimball" is a good search starting point). What does the DW industry use? The approach you have used is closest to which?

- Industry-scale Data Warehouses probably use many tools to help automate routine processes. What are the main tools? Which are the processes most often covered by tools? How far do your scripts/processes illustrate the PRINCIPLES covered by the tools?

- Why does real data need so much data cleansing? What is industries' practice about data cleansing? In what ways does (or could) your assignment solution simulate what industry does?

- Results of Analytical data investigation are often presented visually (eg via graphs or displays). Why? You will have used a particular tool for your implementation (probably Excel). How representative of analytic visualisation tools is your assignment?

- Managing metadata is important. What is metadata? How does metadata help? What examples of metadata are there in your implementation?

Notice that we recommend that you discuss THREE areas, yet we have already listed FIVE topic examples. This is to illustrate that there is a wide range to choose. You can choose other topic areas yourself.

The total writing for Part B should be around 1200 words. Words beyond 1400 will not be read.

Citations and References

You MUST use APA-style for citations and referencing. This is SHU requirement (not just a module requirement).

Request for Solution File

Ask an Expert for Answer!!
Database Management System: Data warehousing distance learning spring 2017 -
Reference No:- TGS02264881

Expected delivery within 24 Hours