What is the distribution of y what is the mean of y and


Business Statistics Assignment

Problem 1

Part 1

Commonwealth Health Insurance has become interested in a new type of cancer screening. Early screening of cancer reduces the risk that cancers can develop, leading to greatly improved patient health and less costly procedures. Implementing this new screening procedure costs $40 million every year. Staying with the current screening procedure is far less costly, only costing $5 million every year. Hence, Commonwealth Health Insurance wants to be careful and conduct a rigorous analysis of this important decision.

Regardless, this new type of cancer screening has not been put in practice before, and hence, the final cost reduction is uncertain. Commonwealth Health Insurance has assessed that there is an 80% chance that the screening procedure is very successful, and a 20% chance that the screening is not very successful. In the case that the screening procedure is very successful, it is estimated that the yearly costs related to cancer are decreased by $100 million. On the other hand, when it is not a very successful case, yearly costs are reduced by only $30 million, which is the same as under the current procedure.

The decision tree associated with this problem is shown in Figure 1 below.

(a) Fill in the decision tree, i.e., where needed fill in the probabilities, the end point values, and calculate the Expected Monetary Value (EMV). Please explain what is the optimal decision based on EMV.

(b) Calculate the probability to be very successful for which the decisions to implement and not implement yield the same EMV.

(c) Discuss the sensitivity of the decision from (a) using the outcome of (b).

Part 2

For each of the following statements determine if it is true or false. Please offer a one-sentence explanation of your answer.

(d) The table below describes the probability distribution of a random variable X. The values of X are given in the first row. The probabilities, p1, p2, p3, p4, p5, and p6 are in the second row. For example, ??(?? = 6) = p4 .

i. It is possible that the expected value of X is 13.
ii. It is possible that the expected value of X is 23.

(e) A resident of Boston is chosen completely at random. Consider the following two events:

i. The person selected is a teacher
ii. The person selected is a teacher and is a vegetarian
The probability of event (ii) can never exceed that of event (i).

(f) In a plot of a regression output, corresponding to a simple linear regression (OLS) model with one explanatory variable, it is possible that all of the training data points are above the regression line.

(g) In an optimization problem, the optimal solution is always on the boundary.

Problem 2

Jacob has recently opened a new apparel store close to the towns of Bern and Oulli. Bern and Oulli together have a total population of 10,000 out of which 4,000 are from Bern and 6,000 are from Oulli. Everyday multiple customers enter the store, but Jacob is interested in counting the number of times the first customer comes from Bern. Each person is equally likely to stop at the store on any given day. Moreover, this likelihood is independent and identical for different days.

a) What is the probability that the first person of the day comes from Bern?

For the first 10 days, Jacob wants to know how many times the first arrival will come from Bern. Let Y denote the number of days so that the first arrival comes from Bern.

For parts (b)-(e) you can use the answer that you calculated for part (a). (You will not be penalized in case your answers to later parts change due to a calculation error in part (a).

b) What is the distribution of Y? What is the mean of Y and standard deviation of Y?

c) What is the probability that none of the first arrivals happened from Bern?

d) What is the probability that Jacob saw an equal number of first arrivals from both Bern and Oulli in the first 10 days?

e) Write Y as a sum of random variables and use the Central Limit Theorem to calculate the probability that Jacob saw at least 3 and at most 5 days on which the first arrival was from Bern.

Problem 3

In the past year, Winter Parks has opened a new recreation park on the shores of Lake Summer. Entrance to the park is free, but there are several paid attractions as well as a membership option with additional benefits. Winter Parks wants to offer coupons to their customers to encourage them to use the attractions in the new park. Steven has been tasked with analyzing which customers should be targeted. To do this, Steven wants to predict how much each customer would spend at Lake Summer. The following is a list of the variables that Steven gathered about several households that have visited before:

- expsum: expenditures when visiting Lake Summer
- visits: number of visits to Lake Summer
- ski: indicator of whether the customer waterskied, 0 if not waterskiing and 1 if waterskiing
- income: annual household income
- feesum: annual member fee for Lake Summer, 0 if not paid and 1 if paid

The initial model regresses expsum on all the available independent variables.

First, Steven wants to analyze the regression output to see which variables are useful predictors.

(a) Calculate the missing coefficient of the variable ski in Model 1; use the output of Model 1.

(b) For each of the variables visits and feesum explain if they are insignificant and if the variable is insignificant, explain why; use the output of Model 1. Discuss whether the coefficients for the independent variables visits and feesum make sense in Model 1.

Years ago, Steven took a class and he remembers that his professor told him that it is a good custom to go back to the data and plot it. Figure 1 shows the dependent variable expsum and the independent variable visits for each data point.

(c) Do you suspect that a linear equation describes the relationship between visits and expsum. Given your answer above, how would you improve the model, if at all.

Before Steven could make these changes, he was interrupted by his colleague who handed him last year's customer expenditure data for the park at Lake Weather. He decided to include the variable expwea: expenditures when visiting Lake Weather.

Additionally, Steven calculates the correlation.

Steven looks at both the model and the correlation table and finds ways for improvement. In particular, he wants to remove a variable from the model.

(d) Explain which variable could be removed from the model first; use the output of Model 2 and Table 1.

(e) Discuss whether there is multicollinearity between the independent variables.

(f) Write explicitly the multiple linear regression equation describing expenditures at Lake Summer corresponding to Model 2.

(g) Explain which variable is the most useful in predicting expenditures at Lake Summer; use specific numbers from the output of both Model 1 and Model 2.

Problem 4

Datatronics is a consumer analytics firm that offers licenses for two different types of software packages: DAP (Data Analytics Package) and DMP (Data Modeling Package). While the first package provides data analytics tools for clients, the second focuses on modeling support and aids clients' decision making process.

Licensing agreements involve setup and initial support that Datatronics must provide to its customers. The company has two different customer support centers for providing assistance: one in the Philippines (P) and another in the United States (U) to serve its customers.

The company is considering to offer at most 750 licenses of DAP (which is the demand for DAP in the next quarter), and at most 950 licenses of DMP (which is the demand for DMP in the next quarter). Because of the limited customer support personnel available, U can only support up to 800 licenses (of either kind), and P is limited to supporting 1,000 licenses (of either kind). The two facilities employ different workforce which translates into different customer hour requirements, as well as earnings per serviced customer. The relevant information is summarized in the table below. Labor is measured in hours. For example, customer support of 1 DAP client from the center at P requires 30 hours, and the total amount of labor available over the next quarter will be 17,500 hours.

Datatronics has recently hired Anne, a undergrad for its internship program. Anne, has formulated a linear program and solved it using Excel's solver tool which gave the following output. (Some of the output is intentionally left blank)

Note that the variable UDAP and UDMP stand for the DAP and DMP licenses that should be serviced from USA and PDAP and PDMP are the licenses that should be serviced from Philippines.

a) Write the linear constraint corresponding to the DAP demand and linear constraint corresponding to the capacity constraint in the US.

b) What is the optimal number of DAP licenses that should be assigned to the servicing center in the US?

c) Is the demand constraint for DAP binding? Is the demand constraint for DMP binding?

d) What is the shadow price associated with the capacity constraint at the support center in the Philippines?

e) Datatronics can contract for 1,000 hours of additional labor in the Philippines, at a cost of $29 per hour, including benefits, overhead, etc. Is this worth doing?

Problem 5

Donatello is graduating and his friends, Leonardo, Rafael, Michelangelo, and Splinter are coming for his graduation ceremony.
They will arrive a day early, and will have time to tour the city. Donatello decided to plan a fun day in the city for them. He began by composing a list of attractions around the city:

Unfortunately, they will not have time to visit all of the attractions, as they only have 10 hours. Since Donatello never took this class, he came to you for help.

(a) Formulate the problem as a discrete linear optimization problem to maximize the total fun during the limited time available. What are the decision variables? What is the range for each variable? What is the objective function? What are the constraints?

(b) Donatello sent the itinerary to his friends, and received a list of requests. Model each of the requests listed below as linear constraints

• Rafael said that if they go to more than 3 museums, then they have to go to the Samuel Adams Brewery.

• Michelangelo loves to play outside, and therefore, asked that in total they will spend at least 2 hours in outdoor activities.

• Splinter realized that the costs are getting high. He asked you to make sure that they are not spending more than $300 (in total for Donatello and his friends).

• Leonardo asked to visit at least one university or at least two museums.

You successfully modeled all of the requests, and shared the itinerary with your classmate. He mentioned a website that had a better estimation of the time each attraction takes. In particular, this website modeled the time at each attraction as a normal random variable, and provides the mean and standard deviation of each attraction, as well as the correlation between the different attractions. The data from the website is provided below.

(c) If the itinerary includes the Freedom Trail, Boston Public Garden and Charles River Esplanade, what is the standard deviation of the length of the trip? Add a (maybe non-linear) constraint that assures that the standard deviation of the total trip is at most 60 minutes.

You used your favorite solver, and solved the problem. After looking at your suggested schedule, Donatello realized that he forgot to account for the distance between the different attractions. He gave you the distances table below.

(Note that the distances table can be found at the end of the question)

(d) Rewrite your constraint from part (a) to account for the traveling time. Make sure to start your day and end it at the Marriott Cambridge hotel. What are the decision variables? What is the range for each variable? What are the new constraints?

To save some time, please consider only the first 5 attractions (Freedom Trail, Boston Public Garden, Charles River Esplanade, Boston Tea Party Ships & Museum, and Back Bay) when you write your answer to part (d).

Hints:

• The variables should be in the form of ????,?? that would indicate that they went from attraction ?? to attraction ??.
• Make sure that you are leaving the hotel, and that you are going back to the hotel.
• Make sure that if you go to an attraction, you also leave it.
• Don't forget to link the new variables to the old ones.

Problem 6

The primary goal of the course has been to teach you some important analytics tools that we believe can make a difference in making decisions based on data. We would like to ask you to think back to your current or last job before coming to this class. Please identify a project, activity, task, or assignment that you worked on at that job where you would now have analyzed the problem differently given the knowledge you acquired from this semester.

a) Describe the project/activity/assignment.

b) Describe either the data that you had available for the project or that you now wish you had developed in order to complete the project.

c) What modeling tool(s) from this class would you have used on this project, and why do you think these tools would have been effective?

Attachment:- Business Statistics Assignment.pdf

Solution Preview :

Prepared by a verified Expert
Applied Statistics: What is the distribution of y what is the mean of y and
Reference No:- TGS02323491

Now Priced at $140 (50% Discount)

Recommended (99%)

Rated (4.3/5)