Why are squared values used so often when calculating


Question 1. Inferential statistics (choose each correct answer)
a. Describe your dataset
b. Draw conclusions about the populations
c. Identify patterns
d. Predict future observations

Question 2. Quartiles are the 75th (Q3) and 25th (Q1) percentiles.

Question 3. Why are squared values used so often when calculating statistics? Because that's what's used for further interpretation of results. It's also been known to be more accurate. Squaring makes it much easier to work with.

Question 4. Variance is a description of
a. Range
b. Standard deviation
c. How dispersed the data is around the mean
d. Error

Question 5. An estimated mean value from your sample is inferring thetrue meanvaluefrom the population.

Question 6. The value describing the most frequent response of a variable is the
a. Mean
b. Median
c. Mode
d. Weighted mean

Question 7. There can be more than one mode in a dataset (T/F). TRUE

Question 8. In a symmetrically distributed dataset, the mean is the same as the median. (T/F) FALSE

Question 9. Data that is skewed is best described by the
a. Mean
b. Median
c. Mode
d. Weighted mean

Question 10. When setting alpha for a hypothesis test, are you controlling the Type I or Type II error rate? ______ A type 1 error is when we wrongly reject the null hypothesis. Type 1 errors can not be avoided they are part of the design of the statistical test but they can be made less common by setting the significance (that's the 5% above) at a lower level. However if we do do set the significance level lower say 1% that increases the chance of a type 2 error.

A type 2 error is when you wrongly fail to reject the null hypothesis.

Question 11. Which of the following statements is true?
a. Linear Regression errors values has to be normally distributed but in case of Logistic Regression it is not the case
b. Logistic Regression errors values has to be normally distributed but in case of Linear Regression it is not the case
c. Both Linear Regression and Logistic Regression error values have to be normally distributed
d. Both Linear Regression and Logistic Regression error values have not to be normally distributed

Question 12. A boxplot graphically displays
a. Median, interquartile range, outliers
b. Mean, +/- 2 standard deviations, min and max
c. Mean, interquartile range, min and max
d. Median, +/- 1.96 standard deviations, outliers


For questions 13-15, Consider a dataset with 1000 observations with the variables Treatment (placebo, drug), Response (improved, did not improve), age, weight, zip code, gender, and race.

Question 13. Write null and alternative hypothesis to test for an association between Response and Treatment:
Null hypothesis: µ1 2 - µ = 0 Alternative hypothesis: µ1 2 < µ or µ1 2 - µ < 0 [Mean reaction time is lower for the young group]

Question 14. What is the best test statistic to use for this hypothesis? H0: p = .15 Ha: p > .15

Question 15. What is the best test statistic to identify what factors influence treatment response?

First, compute 80 ˆ .20 400 p = = . Then 0 0 0 ˆ .20 .15 .05 2.80 (1 ) .15(.85) .01785 400 ________________

Question 16. Blood type (A, B, AB, and O) is what type of data?
a. Scaled
b. Categorical
c. Continuous
d. Ordinal

Question 17. Examples of a population include (choose all that apply)
a. People who received a kidney transplant in the United States
b. Medical Devices manufactured by Company X during 2017
c. Patients visiting a primary care physician whose social security number ends in ‘9'
d. 1000 children with Type I diabetes

Question 18. Effect size is defined as:
a. The difference you need to measure in order to reject your null hypothesis
b. The variability in your measurement of the dependent variable
c. The true difference between the parameters of your populations being compared
d. The clinically interesting difference between your populations being compared

Question 19. If the residuals (ε) from a regression model are random, they will follow a _______ distribution
a. Uniform
b. Independent
c. Normal
d. Poisson

Question 20. What does it mean that observations must be independent?
a. Observations are randomly sampled
b. The X and Y variable are uncorrelated
c. The measurement of one subject has no relationship to results from the other subjects in your sample
d. Observations are normally distributed

Question 21. Which are some actions you can take to improve the power of your analysis (choose all that apply)?
a. Increase sample size
b. Decrease effect size
c. Improve your measurement precision to decrease variance
d. Choose a different model
e. Do a one-sided test

Question 22. If the true value of (population) mean age of people who live in nursing homes is 80 years, and your sample of nursing homes in Phoenix yields an estimate of 85.3 years and confidence interval [84.1,86.4], this estimate is (choose all that apply)
a. An example of a type I error
b. An example of a type II error
c. Accurate, but not precise
d. Precise, but not accurate

Question 23. You run a study to compare 2 doses of Drug X to placebo. To test whether any dose of Drug X is better than placebo, use _____ test. In order to tell which dose is better than placebo, use __________ test.

Question 24. If you use pooled variances for an ANOVA test and the variance of group B is actually much larger than that of group A, are you more likely to generate a Type I or a Type II error? _____. How could you avoid this error?

Question 25. Suppose you have MRI results that estimate tumor size before and after treatment with chemotherapy (If the tumor is not seen on the "after" MRI, tumor size is recorded as 0). Choose the best test statistic to demonstrate chemotherapy effectiveness (if any):
a. Use a two-sided t-test to compare the mean tumor size after treatment to the mean tumor size before treatment
b. Use a one-sided t-test comparing the "after" measurement for difference from 0 because you only care if the tumor disappears
c. Use a paired t-test to compare the mean difference between before and after measurements
d. Use an ANOVA with ‘before' size as the covariate and ‘after' size as the dependent variable

Question 26. One method to analyze the performance of Logistic Regression is AIC, which is similar to R-Squared in Linear Regression. Which of the following is true about AIC?
a. We prefer a model with minimum AIC value
b. We prefer a model with maximum AIC value
c. Both but depend on the situation
d. None of these

You are analyzing the results of a clinical trial intended to show that your company's new drug X is superior to the standard treatment. You collect data from 100 subjects in an "open-label" study (patients and doctors know which treatment was given). You run your test, reject the null hypothesis with p = 0.023, and conclude that your drug is superior. However, two other independent double-blind trials testing 50 subjects each, failed to show superiority. Are the following statements (true or false?

Question 27. ___ Your study had a Type I error

Question 28. ___ Your study was under-powered

Question 29. ___ The competitor's studies may have been under-powered

Question 30. ___ Your study failed to account for the placebo effect

Question 31. In multiple linear regression (choose all that apply):
a. The relationship between Xi and Y is linear
b. Some transformation of Xi is has a linear relationship with Y
c. Each unit change in Xi causes a unit change of βi in Y
d. The relationship between Xi and Xj must be linear
e. The intercept (β0) must be 0.

Question 32. In a coin-tossing trial with a fair coin, the probability of getting heads is 0.5. What are the odds of getting heads?
a. 0
b. 0.5
c. 1
d. 2
Question 33. What is the odds ratio for variable X if the calculated coefficient (from a logistic regression) is .037? ________
List 3 reasons data might be censored:

Question 34. __________________

Question 35. __________________

Question 36. __________________

Question 37. Why is the logit link used in logistic regression? ____


For questions 38-42, consider the case where a multiple linear regression model predicting patient weight yields the following results. The variables are Age (in years), Gender(0=female, 1=male), fat intake (average daily intake where 1 = 10-19 g/day, 2=20-29 g/day, 3=30-39 g/day, etc), and exercise frequency (0= <5 days a week, 1= >5 days a week)

Variable

Parameter estimate

Standard error

t-value

Pr>|t|

Intercept

106.3

10.2

3.9

.003

Age

1.4

.36

7.5

<0.001

Gender

20.4

4.5

3.3

.01

Fat intake

3.8

5.6

.5

.70

Exercise

-10.7

2.8

6.7

0.001

P=0.0078

Adjusted R-square=0.48

Question 38. Which variables should be included In the final model? ______________________________

Question 39. What is the predicted weight of a 35 year old male with fat intake=6 and exercise=1? __________________

Question 40. Why is fat intake not significant even though it has a higher coefficient than age? _______

Question 41. What is the difference between the overall p-value and the individual p-values? _______________

Question 42. Based on the adjusted R-square of 0.48, is this a useful model to predict patient weight? ______

Question 43. Which of the following statements is true?

a. Linear Regression residuals values must be normally distributed, but it is not true for Logistic Regression
b. Logistic Regression errors values must be normally distributed, but this is not true for Linear Regression
c. Both Linear Regression and Logistic Regression error values have to be normally distributed
d. Neither Linear Regression and Logistic Regression error values have to be normally distributed

Question 44. Examples of "time 0" in a Survival analysis could include:
a. Date of diagnosis
b. Date of surgery
c. Date of onset of disease
d. Date of consent to participate in a clinical trial

Question 45. In Cox modelling, the baseline hazard is taken from
a. Sum of Squares of the residuals
b. The intercept
c. Solving the final model using the mean value of each independent variable as each Xi
d. Exp(coef(b1))

For questions 46-46 consider a dataset Teen_Drinking with the following variables:

Drinks

0=no, 1=yes

Gender

1=male, 2=female

Depressed

0=no, 1=yes

Age

 

Attends college

0=no, 1=yes

Parents divorced

0=no, 1=yes

Question 46. Do any variables need to be transformed prior to performing logistic analysis? If so, explain ____________

Question 47. Write out the R code to predict Drinks (assuming no multicollinearity) ______

Question 48. In the model results from (b), the odds ratio for ‘Depressed' is 0.5. What is the relationship between Depression and Drinking? ___

Mark the remaining statements (Questions 49-64) as true or false:

Question 49. ___ Scatterplots are a good way to tell if 2 variables are correlated

Question 50. ___ If X is correlated with Y, then a change in X causes a change in Y

Question 51. ___ A correlation of 0.2 is unimportant even if the results of a cor.test is significant

Question 52. ___ A correlation of 1.1 is a "strong" correlation

Question 53. ___ A correlation of 0 means that there is no relationship between X and Y

Regarding logistic regression, ___

Question 54. ___ The independent variables must be binary

Question 55. ___ Measures the probability that a person will experience the event of interest

Question 56. ___ Measures the proportion of your sample that experienced the event of interest

Question 57. ___ A negative estimated coefficient means that decreasing values of X cause a decrease in the odds of experiencing the event of interest

Question 58. ___ The outcome variable must be binary

Regarding Survival analysis,

Question 59. ___ Kaplan-Meier curves graphically present the results of a Cox model

Question 60. ___ Significance of a log-rank test to compare Kaplan-Meier curves is based on the Chi-square test.

If the Kaplan-Meier curves for a survival dataset CROSS,

Question 61. ___ The proportional hazards assumption has been violated for the grouping variable

Question 62. ___ The log-rank test can be used to test if the curves are different

Question 63. ___ The grouping variable should be excluded from Cox proportional hazards analysis

Question 64. ___ The Cox model could be run in two parts depending on where the curves cross

EXTRA CREDIT:

Question 1. In DNA or RNA sequencing, "mapping" refers to:
a. Determining which gene each read came from
b. Aligning each read to its chromosomal location according to the reference genome
c. Removing non-human sequences
d. Identifying what species your sample came from

Question 2. Phred scores are (choose all that apply)
a. A way to identify low vs. high confidence base calls
b. Based on the intensity of the fluoresced light signals from labelled nucleotides
c. On a linear scale
d. Reported as a single quality score for each read in your sample

Question 3. What are some reasons why the number of RNA reads may not be comparable across samples?
a. Samples are taken from different tissue types
b. Proteins underwent alternative splicing
c. One of the samples was degraded and read depth is low
d. The samples have a different mix of cell types

Solution Preview :

Prepared by a verified Expert
Advanced Statistics: Why are squared values used so often when calculating
Reference No:- TGS02676091

Now Priced at $65 (50% Discount)

Recommended (99%)

Rated (4.3/5)