Techniques and interpretation for advanced statistical


Techniques and Interpretation for Advanced Statistical Research Assignment -

Module - Description of Numerical and Categorical Data

Assignment:

1. Select one of the following:

a. Identify any application area that interests you and select any sample data that addresses a business problem. Understand the significance of each data set and describe the value of each data type. Describe the structure and organization of the data set and explain why it is organized that way. Can you identify any patterns or derive any anticipated data values from this set? If not, why not? What are the hidden singular occurrences of data in the data set? Do you think these exceptional data values should be ignored? What are the pitfalls in inferences if they are included?

b. Find a doctoral thesis of interest to you that does a statistical analysis. Assess the sample data that was used. Does the data set contain numerical or categorical value? Can you apply the concepts of this chapter and review the thesis. If yes, explain how are they applied and if the approach can be improved. How, and in what manner, can they be improved? Support your observation. If no, explain why and support your explanation with solid references.

2. Compare two data sets of any application area - one that consists of numerical data and the other with categorical data. In your opinion which data describes the problem better? Why? Support your observations logically.

3. 4M Growth Industries - For the data for this assignment, refer to website mentioned above in Additional resources (Chapter 3 - 03_4m_growth_ind). (Ref Page 48 Chapter 3, Problem number 59)

The U.S. Department of Commerce tracks the number of workers employed in various industries in the United States. It summarizes these data in the Statistical Abstract of the United States. This regularly appearing volume contains a rich collection of tables and charts that describe the country. The data shown in the following table appear in Table 620 in the 2012 edition, available online at www.census.gov. This table summarizes a categorical variable from each of two data tables. These categorical variables identify the type of industry that employs the worker. Each row in the underlying data tables represents an employee, either in 2000 or 2010.

Industry

2000

2010

Agriculture

2,464

2,206

Construction

9,931

9,077

Manufacturing

19,644

14,081

Transportation

7,380

7,134

Education

11,255

13,155

Health

14,933

18,907

Hospitality

11,186

12,530

Retail Trade

15,763

15,934

Finance

9,374

9,350

Professional Services

13,649

15,253

Each value shown in this frequency table is a rounded count given in thousands. For example, about 2,464,000 were employed in agriculture and related industries in 2000.

Motivation

(a) A firm that sells business insurance that covers the health of employees is interested in how it should allocate its sales force over the different industries. Why would this firm be interested in a table such as this?

Method

(b) What type of plot would you recommend to show the distribution of the workforce over industries in each year?

(c) Management is particularly interested in knowing which industries have experienced the most growth in the number of employees. What graph do you recommend to show changes in the size of the workforce in the different industries?

Mechanics

(d) Prepare the chart that you chose in part (c). Explain the order you chose to show the categories.

(e) How does the chart highlight industries that have smaller employment in 2000 than in 2010?

(f) Do you think you should show all ten categories, or should some rows be combined or set aside?

Message

(g) What message does your chart convey about the changing nature of employment?

(h) By focusing on the growth and using a single chart, what's hidden?

4. 4M Credit Scores - For the data for this assignment, refer to website mentioned above in Additional resources (Chapter 4 - 04_4m_credit_scores) Page 76, Problem number 64.

When a customer asks to borrow money, many banks check his or her credit score before giving him or her a loan.

The credit score estimates the risk associated with making a loan to the customer. Credit scores are based on how well an individual has handled past debts. Customers who have made regular payments on time on several loans get a high score, whereas those who have been late, and perhaps defaulted on a loan, have lower scores.

The data in this question gives the credit score for 963 individuals who recently obtained small business loans from a lender. The average score is 580. The lender is considering raising its standards for getting a loan to reduce its exposure to losses. An executive proposed raising the minimum credit score for these loans to 550, saying that such a score would not lose too much business as it was "below average."

Motivation

(a) The average of these credit scores is 580. Why would it be useful for managers of the lender to examine other attributes of the data before deciding on new standards?

Method

(b) What other attributes of the data should the managers examine?

Mechanics

(c) Present a display that captures the relevant attributes of these credit scores. Include relevant summary statistics.

Message

(d) What do you think of the recommended new credit limit?

5. 4M Textbooks - The data for this assignment comes from research you will do using the Internet. See Chapter 2, Problem number 32, Page 24.

This exercise requires research on how books are priced on the Internet. We will divide books into three broad categories: popular books, schoolbooks or textbooks, and recreational books. Some books may fall in two or even three of these categories.

Motivation

(a) If you were able to reduce the cost of the books that you buy each year by 5 or 10%, how much do you think you might be able to save in a year?

Method

(b) Create a data table of prices for the three types of books. You will compare prices of books at two Internet stores. Your data table will have four columns: one for the name of the book, one for the type of the book, and two more columns for the prices. (If you don't have two Internet stores at which you usually shop, use amazon.com and barnesandnoble.com.) What types of data are these?

(c) Identify five books of each type. These form the rows of the data table.

Mechanics

(d) Fill in your table from prices on the Web. Did you find the prices of the books that you started with, or did you have to change the choice of books to find the books at both stores?

(e) Do your prices include costs for shipping and taxes, if any? Should they?

Message

(f) Summarize the price difference that you found for each of the three types of books. Describe any clear patterns you found. For example, if one store is always cheaper, then you have an easy recommendation. Alternatively, if one store is cheaper for textbooks but more expensive for popular books, you have a slightly more complex story to tell.

Request for Solution File

Ask an Expert for Answer!!
Advanced Statistics: Techniques and interpretation for advanced statistical
Reference No:- TGS02901099

Expected delivery within 24 Hours