Mat10251 statistical analysis project - what was the


Project Data

The data for this project can be accessed from the MySCU site for MAT10251 in

Project under Assessment.

The data set provided contains 10 randomly chosen samples of size 116.

To obtain your data
(1) Click on the 'Project Data' file. This will download an Excel file.
(2) Select the 6 columns (bwt to smoke) of data for the sample specified by the last digit of your student ID number.
(3) Copy this into a new Excel file.

There are 10 sample data sets each of 6 columns (bwt to smoke)

Your sample number matches the last digit of your SCU student ID number. For example, if your student ID number ends in 1 your sample is Sample 1 and you will be analysing the birth data in columns H to M.

Project Situation

Assume that you are a research assistant engaged to investigate and report on the particulars and circumstances of new-borns.

You will be required to use your sample of 116 births to answer several questions relating to the sample or population. For this project, you will assume that the 116 births surveyed were randomly selected from the population of all births in your particular LHD in 2002.

In each part of the project you are required to analyse your sample data in response to given questions and provide a written answer.

You can assume that the written answers are components of a longer report on the birth particulars of children born in the LHD.

To give your written task a context you should construct a scenario for the report.

Each written answer should be a word document into which your Excel, or similar, output has been copied.

Project Preparation

You are expected to use Excel, or another spreadsheet or statistical package, when completing the project.

Your written answers presenting findings and conclusions should be considered as a part of a longer report. Each written answer should be a word document into which your Excel, or similar, output has been copied

In addition, your statistical workings for Parts B and C should appear as appendices to your written answers. These should include all necessary steps and appropriate Excel output.

The written answers, with appendices for Parts B and C, should each be submitted as a single word document.

In preparing your appendices you may use one of the following formats:

- Word with Excel output added.
- Handwritten with Excel output added. This will then need to be scanned and added to your word document.
Notes
- You should not need to read beyond the study guide and textbook to complete the project.
- You probably will not need to reference, but if you do, use any consistent referencing style.

Data Analysis Project - Part A

Purpose: To
- introduce you to the project data, questions and Excel
- use Excel to graph data and calculate summary statistics
- interpret and communicate Excel results.
Full marks for Part A will be given for successful submission of an acceptable attempt.


Part A Question
Provide information on the birth weight of babies born in 2002 in a certain Local Health District (LHD). In particular, what are the minimum and maximum weights, the average weight, and how do these weights vary. Also, within what estimated range would you expect a typical baby's weight to lie?

Tasks - Part A Submission
Complete the following tasks
1) Download and save your data.
2) Download the Project Part A cover sheets, name and save this file as "Family Name_First Name_Part_A_Campus".
3) Enter your Sample Number on page 2 of the Part A coversheets.
4) Statistical Tasks
Using bwt data explore the birth weights of the LHD children, by using Excel to
- Construct a frequency histogram or polygon
- Calculate the summary statistics
Note: bwt data is measured in grams (gms)


5) Written Task
Using the instructions given on page four of the Part A coversheets, introduce your data and the results of your preliminary investigation of birth weights for children in the LHD in 2002.
This should be one to two pages and 300 to 500 words.
Use an appropriate style, without statistical jargon and equations, to clearly communicate your results.

6) Complete Coversheets 1 and 2, then save and submit Part A of the project by the due date Tuesday 22nd November 2016.

Statistical Calculation
- To obtain full marks your graph must be correct, including correct labels on both axes and a title. Marks will be deducted if:
- Graph incorrect (eg, gaps between classes of non-zero frequency in a histogram for continuous data)
- Excel, or similar, is not used
- Axes incorrectly or not labelled
- No title
- Inappropriate classes are used
- Scale on axes distorts graphs.
- To obtain full marks for the summary statistics copy the output table of the Descriptive Statistics command in Data Analysis. You may delete unnecessary statistics in this table; you may also include other statistics, for example quartiles.
- Marks will be deducted if this table is incorrect, so check:
- Your sample size
- Whether you are calculating sample statistics or population parameters.

Written Task - Report
- 300 - 500 words and 1 to 2 pages - marks will be deducted if this is greatly exceeded.
- To obtain full marks this must:
- Be well structured
- Clearly communicate the results of the Excel output in language appropriate for your audience
- Include appropriate graph and summary statistics.
- Provide information on average birth weight of children, how the weights vary and any pattern to these weights.
- Marks will be deducted if:
- There is little or no comment on, or interpretation of, the Excel output
- Unnecessary statistical jargon and equations appear
- It is confusing or not readable
- It is handwritten
- For each major spelling and/or grammatical error half a mark will be deducted, up to a maximum of two marks.
- Also up to two marks may be deducted for poor structure and/or presentation.

Data Analysis Project - Part B

Purpose: To
- obtain feedback on your submission in Part A and to gain experience in self-evaluation of submitted work
- apply your knowledge of statistical inference to answer questions about birth data by analysing the data and communicating the results.

Part B Submission
You should submit one word document consisting of
- Part B coversheets - first four pages, including completed self-marking sheet for Part A with reflection.
- Copy of your Part A submission
- Written answer for Part B - this should follow the format given on page 5 of Part B coversheets
- Appendices for Part B, which contains full statistical working for the required statistical tasks.

Tasks

Task 1 Part A Self-Marking - 5 marks

When directed to do so during Week 5 complete the following tasks
1) Open your saved copy of your submission for Part A.
2) Replace the Part A coversheets (three pages) with the Part B coversheets (first four pages).
3) Rename and save this file as
"Family Name_First Name_Part_B_Campus".
4) Use the solution template and marking guide provided to mark your submission for Part A, enter recommended marks on the self-marking sheet for Part A, page 3 of the file in 3) above
5) Write a short (approximately 200 words) reflection/feedback on your submission and marking of Part A. In particular;
- consider the good aspects of your submission, what did you do well,
- identify where you made mistakes, and how you would avoid them in the future,
- consider what you learnt from submitting and marking Part A.
This is to be entered in the space at the bottom of the self-marking sheet for Part A.
6) Save the file. This is to be submitted with Part B - due Tuesday 3rd January 2017.

Task 2 Part B - Appendix Statistical Inference Tasks

The following statistical tasks should appear as appendices to your written answer. This should include all necessary steps and appropriate Excel, or equivalent, output.
These appendices should come after your written answer within your single word document for Part B.
In preparing your appendices you may use one of the following formats:
- Word with Excel output added.
- Handwritten with Excel output added. This will then need to be scanned and added to your word document.

Statistical Inference Tasks

Choose a level of significance for any hypothesis tests and a level of confidence for any confidence intervals. Enter these values on page 2 of the Part B cover sheets along with the sample number from Part A.

Question 1 - Topic 5

You are asked for an estimate of the proportion of older women giving birth in your LHD in 2002.
To provide this estimate use age data and an appropriate statistical inference technique to answer the following question

What was the proportion of babies born in your Local Health District in 2002 to women who were at least 35 years of age?

Question 2 - Topic 6
It has been reported that the average weight of babies of European heritage is 3.5kg. You are asked to investigate whether this was also true for babies in your LHD in 2002.
To provide a justified answer to this question use the bwt data and an appropriate statistical inference technique to answer the following question

In your Local Health District in 2002 was the average birth weight significantly different to the European average?

Notes:
- You may need to transform or manipulate your sample data, before using Excel, or equivalent, for the required statistical calculations.
- Use Excel, or similar, for statistical calculations. You do not need to repeat any Excel calculations by hand. However, make sure that you define your random variables and include any steps not given by Excel. For example, in a hypothesis test include the null and alternative hypotheses, along with the decision to reject or not reject the null hypothesis.
- Mention any assumptions you need to make.
- Comment on why the test/confidence interval has been chosen
- Make sure you interpret confidence intervals and write a conclusion to hypothesis tests.

Part B Written Task

For each question present the results of your calculations, with your interpretation and conclusion as part of a report.
Use the instructions given on page five of the Part B coversheets. This should be one to three pages and 200 to 400 words.
It should be submitted as a Word file with Excel output added.

Make sure you:
- Introduce each question and put it in context.
- Answer each question in non-statistical language
- Present the results of your intervals or tests without unnecessary statistical jargon
- Include conclusions which answer the given questions.

Part B

Read these marking criteria carefully and consider it when preparing Part B. See the marking and feedback sheet, page 4 of Part B coversheets, for allocation of marks.

Part A Self-Marking

Full marks will be given for an "acceptable self-marking and reflection". This is defined as the majority of errors (in particular major or obvious errors) are recognised and considered in marking and reflection.
Zero marks will be given if no or minimal reflection and/or self-marking or major errors are not recognised.

Statistical Calculation
- For the intervals and tests marks will be given for:
- Choice of appropriate statistical technique/s.
- Random variable defined
- Correct hypotheses for a test
- Correct statistical calculations, including Excel
- Correct interpretation of results.

Written Task - Report
- 200 to 400 words and one to three pages - marks will be deducted if this is greatly exceeded.
- To obtain full marks must:
- Be well structured and analysed
- Answer the questions and clearly communicate the results of the Excel output in language appropriate for your audience.
- Include an introduction to and conclusion for each question.
- Include appropriate Excel output

Data Analysis Project - Part C

Purpose: To answer questions about birth data by applying your knowledge of statistical inference and regression and correlation and to communicate the results.

Task 1 - Part C Appendix Statistical Inference and Regression and Correlation Tasks

The following statistical tasks should appear as appendices to your written answer. This should include all necessary steps and appropriate Excel, or equivalent, output.
These appendices should come after your written answer within your single word document for Part C.
In preparing your appendices you may use one of the following formats:
- Word with Excel output added.
- Handwritten with Excel output added. This will then need to be scanned and added to your word document.
Choose a level of significance for any hypothesis tests and a level of confidence for any confidence intervals. Enter these values on page 2 of the Part C cover sheets along with the sample number from Part A.
Use your sample and appropriate statistical inference and regression and correlation techniques to answer the following questions.

Question 1 Statistical Inference Topic 7

You are asked if the birth weights, in your LHD, of children born to non-smoking mothers are usually higher than those born to smokers.
Use bwt and smoke data and an appropriate statistical inference technique to answer the following question
On average are the birth weights of babies born to non-smoking mothers greater than those born to smokers?

Question 2 Simple Linear Regression model Topic 8

You are asked how the length of a pregnancy influences the birth weight.
Use gestation (independent variable, measured in days) and bwt (dependent variable, in gms) to model the relationship between number of length of pregnancy and birth weight.

Explore this relationship by

1. Plotting data with a scatter plot.

2. Calculating the least squares regression line, correlation coefficient and coefficient of determination.

Question 3 Multiple Linear Regression model Topic 9

You are asked what other factors may have an influence on birth weight.
Explore this by adding age (of mother, in years), height (of mother, in cms), weight (of mother before pregnancy, in kgs), and smoke (smoking status of mother, where 0 is non-smoker, 1 is smoker) as additional independent variables to the regression model developed in Question 2 and
a) Calculating the multiple regression equation, multiple correlation coefficient, and coefficient of multiple determination.
b) Using appropriate tests to determine which independent variables make a significant contribution to the regression model.

Task 2 - Written Answer - Report
For each question present the results of your calculations, with your interpretation and conclusion, as part of a larger report.
Use the instructions given on pages four and five of the Part C coversheets. This should be 500 to 900 words and three to seven pages
It should be submitted as a Word file with Excel output embedded Make sure you:
- Introduce each question and put it in context
- Answer the questions in non-statistical language.
- Present the result of your procedures, intervals and/or tests without unnecessary statistical jargon
- Include conclusions which answer the given questions. In particular, for Question 2
- Explain the choice of independent and dependent variables
- Include your graph
- From your scatter plot discuss any apparent relationship between gestation and birth weight. Comment on the strength, shape and sign of the relationship.
- Interpret the gradient and vertical intercept of the simple linear regression equation.
- Discuss and interpret the values of correlation coefficient and coefficient of determination. In particular, are these values consistent with your graph
- Mention any concerns you may have about the validity of your results due do a non-linear relationship, extreme values etc.

In particular, for Question 3
- Interpret the values of the multiple regression coefficients. Compare these with the corresponding values in the simple linear regression model.
- Discuss and interpret the values of the multiple correlation coefficient and coefficient of multiple determination. In particular, compare these with the corresponding values for the simple linear regression model.
- Include and justify your recommended/preferred regression model.

Written Answer - Report
- 500 to 900 words and three to seven pages - marks will be deducted if this is greatly exceeded.
- To obtain full marks must:
- Be well structured and analysed
- Clearly communicate the results of the Excel output in language appropriate for your audience
- Include an introduction to each question and your conclusions
- Include appropriate Excel output
- Answer the questions in non-statistical language.
- Marks will be deducted if:
- There is little or no comment on, or interpretation of, the Excel output
- Unnecessary statistical jargon and equations appear
- It is confusing or not readable
- For each major spelling and/or grammatical error half a mark will be deducted, up to a maximum of two marks
- Also up to two marks may be deducted for poor structure and presentation.

Attachment:- Project Data Session.xls

Request for Solution File

Ask an Expert for Answer!!
Applied Statistics: Mat10251 statistical analysis project - what was the
Reference No:- TGS01697618

Expected delivery within 24 Hours