Math 4044 - statistics for data science assignment carry


Statistics for Data Science Assignment -

For this assignment you will continue to use data derived from Capital Bikeshare trip records from 2011 and 2012, this time analysing patterns in daily numbers of rentals by casual users.

References and Data Sources:

Bache, K. & Lichman, M. (2013). UCI Machine Learning Repository [https://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

Fanaee-T, Hadi, and Gama, Joao, 'Event labeling combining ensemble detectors and background knowledge', Progress in Artificial Intelligence (2013): pp. 1-15, Springer Berlin Heidelberg.

Data file for this assignment:     

The data file for this assignment is called daily.sas7bdat and contains daily counts of bike rentals for 2011 and 2012, derived from Capital Bikeshare trip history data, with additional weather and seasonal information. The data was downloaded from the UCI Machine Learning Repository. Variables in that file are as follows:

Variable

Description

Instant

Record index

dteday

Date

Season

Winter, spring, summer or fall (northern hemisphere)

Yr

0 = 2011, 1 = 2012

Month

Month (January to December)

Weekday

Day of the week (Monday to Sunday)

workingday

Working day = 1, weekend or public holiday = 0

Temp

Normalised temperature in degrees Celsius; observed temperature divided by 41 (max)

Atemp

Normalised 'feels like' temperature in degrees Celsius; values divided by 50 (max)

Hum

Normalised humidity; observed values divided by 100 (max)

Windspeed

Normalised wind speed; values divided by 67 (max)

Casual

Count of casual users

registered

Count of registered users

Count

Total count of bike rentals (casual and registered)

Assignment tasks:          

Question 1 - Carry out a one-way analysis of variance relating casual to weekday. Use contrasts to test at least one a-priori hypothesis of your choice. Examine and comment on residuals. Also carry out appropriate post-hoc comparisons and discuss your results.

Question 2 - Use SAS to perform a one-way ANCOVA relating casual to weekday with atemp as a covariate, including appropriate post-hoc comparisons:

-Confirm that there is a linear relationship between the response variable and the covariate (a scatterplot and a correlation coefficient plus a comment will suffice);

-Check the two additional ANCOVA assumptions (report and comment only on the parts of the output most directly relevant to condition checking):

  • Independence of the covariate and the treatment effect (perform a one-way ANOVA test; there should be no statistically significant difference);
  • Equality of slopes (add and check significance of the interaction term);

-Report and briefly discuss your results.

Technical note: Make sure you obtain and examine Type III Sum of Squares (ss3). Also obtain estimates of 'least squares means' (lsmeans) which are means by treatment adjusted for the covariate.

Question 3 -

(a) Carry out a one-way analysis of variance relating casual to season. Use contrasts to test at least one a-priori hypothesis of your choice. Also carry out appropriate post-hoc comparisons and discuss your results.

(b) Extend your analysis in part (a) to test whether there is evidence of interaction between season and the type of day (working day vs weekend or public holiday). Carry out appropriate post-hoc comparisons and discuss your results.

(c) The distribution of the number of casual users by season is actually not Normal so a Kruskal-Wallis test may be more appropriate to relate casual to season. Carry out this test and for post-hoc analysis, consider comparisons between summer and each of the other seasons. Discuss and compare your results to those in part (a).

Question 4 - Write a summary of your findings from Questions 1 to 3. Keep the technical details of the analyses that led you to these conclusions to the absolute minimum. Rather, focus on practical significance and present your findings in non-specialist terms. One page will be sufficient.

Request for Solution File

Ask an Expert for Answer!!
Basic Statistics: Math 4044 - statistics for data science assignment carry
Reference No:- TGS01645030

Expected delivery within 24 Hours