Using the box and whiskers plot describe how age might


The coursework requires answers to questions asked only. Please pay attention because in some cases you have to answer only one question from the rest - no pages required.. Dont worry about the number of pages you see, this is because of the graphs and charts included to help you with answering the questions. The actual questions are actually much less than the number of pages

Part -1:

Answer the following question:

Q1: Using the Box and Whiskers Plot describe how Age might influence the likelihood of receiving a tax penalty. Is it statistically significant? If so at what level?

Q2: Using the Histogram plot what information is provided that might support a difference in ages for the two groups.

Q3: Using the Cumulative Distribution plot describe where the differences in the two groups might be the greatest.

Q4: Does Age follow Benford's Law?

Q5: Comparing the Box and Whiskers plots of Age and Income which variable should we be most concerned about regarding the existence of outliers? Why?

Enter your answer in a red font.

Q6: What do you think the relationship is between Income and the assessment of additional taxes owed by the IRS? Support your answer.
Enter your answer in a red font.

Q7: How are the distributions of Age and Income different?

Q8: Which group does not seem to follow Benford's Law for the Income variable?

Q9: What should you keep in mind regarding the possible relationship between Employment and audit results based on the three plots?

Q10: What should you keep in mind regarding the possible relationship between Education and audit results based on the mosaic plot?
Enter your answer in a red font.

Q11: Often the dataset will contain variables with ambiguous values. In this case the definitions of the levels of Marital are not clear. Find the official marital status definitions that the IRS uses for filing status. Discuss how you would map the IRS values to the ones provided.

(b)What should you keep in mind regarding the possible relationship between Marital Status and audit results based on the mosaic plot?
Enter your answer in a red font.

Q12: What should you keep in mind regarding the possible relationship between Occupation and audit results based on the mosaic plot?
Enter your answer in a red font.

Q13: What should you keep in mind regarding the possible relationship between Gender and audit results based on the mosaic plot?

Q14: Based on the Pairwise Comparison plots is there a meaningful relationship between Age and Income? Explain why.

Q15: Based on the two box plots of Employment X Age and Employment X Income which numerical variable is more likely to be related to Employment?

Q16: If it is determined that the Marital level entitled Absent is in fact a missing observation what problems might we have with it and Age?

Q17: Looking at the Gender X Income box plot describe why females are making more money than males. Use the numerical characteristics of the distribution as your answer.

Q18: What is the direction of the relationship of the variables Age and Income based on the correlation diagram?

Q19: What might explain the high correlation between the variables Employment and Occupation?

Q20: Using the information in the dendrogram, which variable Age or Income is likely to influence the variable Target Adjusted the most?

Part -2:
Q1: What potential problems do you see in the Marital Status variable?
HINT: Consider the definition of Absent.

Q2: What do you consider to be the most important characteristic of the Employment distribution?
HINT: Consider whether certain categories are much larger or smaller than others.

Q3: What do you consider to be the most important characteristic of the Income distribution?
HINT: Consider the shape of the distribution and its basic parameters such as mean, median, mode, and range.

Q4: What do you consider to be the most important characteristic of the Deductions distribution?
HINT: Consider whether certain values occur much more frequently than others.

Q5: How would you summarize the relative quality of the data so far? (Especially in regards to missing observations.)

Q6: The Age distribution shows a value of 28 underneath the number .25. In your own words describe what this represents.

Q7: The Income variable has a mean of $84,688 and its 50% has a value of $59,769. Why do you think the mean is so much larger?

HINT: Consider how extreme (very large and very small) values affect the mean and median.
Enter your answer in a red font.

Q8: For which one of the categorical variables (if any) would this not be an appropriate order when presenting the results of this project? Why?

HINT: This question concerns the order in which the individual values within a categorical variable are listed in the Rattle output. For example, for the categorical variable Employment, the values are listed as Consultant, Private, PSFederal, PSLocal, and so on. Look at each categorical variable and consider whether the order in which the values are listed is logical, or would a different ordering make more sense.

Q9 If the unit of measure of Age is years then what is the unit of measure for Variance? Standard Deviation?

Q10 Describe in your own words what being 95% Confident means in this circumstance.

Q11: Compare the sizes of the confidence intervals for Age and Income. How can you compare the differences?

Q12 Compare the kurtosis values for Age, Income and Deductions. Which one is the most peaked distribution?

Q13 Compare the skewness values for Age, Income and Deductions. Which one is the most skewed? In what direction?

Attachment:- Assignment.rar

Request for Solution File

Ask an Expert for Answer!!
Dissertation: Using the box and whiskers plot describe how age might
Reference No:- TGS01303977

Expected delivery within 24 Hours