Poph90144 linear and logistic regression assignment present


Linear and Logistic Regression Assignment-

SECTION A - Case-control study

The dataset "data_assessment2_ccstudy.dta" provides data from 560 patients admitted to hospital (in a region with malaria) who are part of a hypothetical nested case-control study. There are 140 patients who died within 1 year of hospital admission (cases) and 420 controls, the cases and controls have been selected from a larger cohort study of 13000 patients where 140 had died within 1 year of follow-up. Sex and age were routinely recorded in the hospital admission records. For the case-control study further information, haemoglobin level and malaria infection status on admission, were extracted from laboratory data records.

The variables in this dataset are:

Variable name                   Description

id                                      Unique identifier

dead                                  Died within 1 year of hospital admission (0 = control, 1 = case)

age                                    Age at baseline (years)

haemoglobin                       Haemoglobin level at baseline (g/dL)

malaria                               Malaria at baseline (0 = no malaria; 1 = malaria)

male                                   Sex of patient (0=female, 1=male)

We will use multivariable logistic regression to investigate the evidence for an association between haemoglobin and death, controlling for the possibility that this association is confounded by other exposure variables that appear in the dataset.

1. Description of study sample

a) Present histograms for age and haemoglobin and describe the distribution of these variables in terms of approximate normality and appropriate measures of centrality and spread.

b) Provide a table that summarises the distribution of age, sex, haemoglobin, and malaria with separate columns for those who died (cases) and the controls (remember this is a case-control study).

c) Using the information regarding the numbers of patients who died in the 1 year follow-up period in the cohort study, estimate the odds of death for a patient in the cohort study.

d) Calculate the estimated odds of death in the case-control study. Why isn't this estimate equal to the odds of death in the cohort study (calculated in 1c)?

2. Univariable logistic regression models

Consider the two univariable logistic regression models of the outcome dead on the variables male and malaria.

a) Present in a table the estimated odds ratios, 95% confidence intervals for the population odds ratio and p-values for the two separate simple logistic regressions.

b) Interpret the Odds Ratios for the univariable logistic regression of death and malaria (an interpretation of the p-value and confidence intervals is not required). Since the study population for this case-control study is hospital 'in-patients' what further information may you want regarding the patients without malaria infection on admission?

3. Linear association between exposure & outcome

We must decide whether it is reasonable to assume a linear association between the numerical exposure variables, age and haemoglobin, and the log odds of death.

a) Create a new variable in the dataset containing quintiles of age using the xtile command:

xtile age_q5=age, nq(5)

Use Stata to plot the log odds of death versus age_q5.

Note please use the Stata option commands:

ciplot yscale(log) yscale(range(0.5 2)) ylabel(0.25 0.5 0.75 1 1.5 2)

[Note:- for earlier versions of Stata you may need to replace "ciplot" above with "graph"]

Briefly summarise the plot, by describing whether the association looks linear.

b) Using the variable age_q5, fit separate simple logistic regression models with age_q5 as a categorical variable and as a continuously valued variable. Compare the models using the likelihood ratio test and comment on whether the association between log odds of death and age is linear.

c) Repeat parts 3a) & 3b) to investigate whether the association between haemoglobin and the log odds of death is linear. Briefly comment on whether the association is linear and state the null hypothesis being tested here.

4. Multivariable logistic regression models - Confounding

Now use univariable logistic regression to estimate the unadjusted odds ratios of death for haemoglobin and all three potential confounders (age, sex, and malaria). Then use multivariable logistic regression (including all four variables) to estimate the adjusted odds ratios. Include haemoglobin and age as categorical variables with the following groupings - age (< 3 & ≥ 3 years, with ≥ 3 years as the reference group) and haemoglobin (<9 (low), 9-14 (normal), >14 (high) g/dL, with the haemoglobin group 9-14 g/dL set as the reference group) [Hint - use 'gen' and 'replace' commands to create new variables].

a) Present in a table two columns - the unadjusted Odds Ratios (95% Confidence Intervals) and the adjusted Odds Ratios (95% Confidence Intervals) for the association between haemoglobin, age, sex, and malaria and the odds of death.

b) Comment on any confounding observed by considering any changes in the odds ratio of haemoglobin (categorical version) from the univariable to the multivariable logistic regression.

c) Investigate the confounding by exploring any univariable associations (in the controls only) between haemoglobin and the potential confounders.

d) Comment on the associations between the potential confounders and the outcome (after adjusting for the exposure of interest, haemoglobin). Together with what you found in 4c, comment on which variables are confounding the association between haemoglobin and death.

5. Final presentation of results and Stata do file

a) Please write a summary (abstract) based on the analyses you performed in the previous questions to answer the research question "Is there an association between haemoglobin and death?" (maximum word count of 200). Your summary should have the headings:- Aim, Study Design, Statistical Methods, Results.

b) Please provide a copy of your Stata do-file for performing the statistical analyses required for questions 1 to 5. Do not upload a second file when submitting your assignment but instead copy and paste the 'Stata do-file' to your word document.

SECTION B -

The dataset "data_assessment2_lupus.dta" provides cross-sectional data from 60 women who have Systemic Lupus Erythematosus (SLE), a chronic, multisystem autoimmune disease. The treatment for SLE often involves steroid therapy. The clinical researcher is particularly interested in bone loss in SLE and the impact of steroid usage. She is seeking your assistance in analysing a dataset she has compiled consisting of bone mineral density at one location (left hip), whether steroids had ever been prescribed or not, and the patient's age and smoking history (ever/never).

The variables in the dataset are:

patid                                 patient identification number

hipbmd                              bone mineral density measurement at the left hip in mg/cm2

ster_evr                             steroid usage coded as 1 for Ever usage and 0 for Never

age                                    age in years

smoker                              smoking history: coded as 1 for an ever smoker and 0 for never smoker

The research question of interest is:-

Is the relationship between steroid usage and bone mineral density modified by smoking and age?

6. Linear association between age & hipbmd;

a) Assess both visually and statistically (by including an additional squared term of age in the model) if it is reasonable to assume a linear association between hipbmd versus age.

7. Univariable and multivariable linear regression

Perform univariable linear regression to obtain the unadjusted associations between the outcome hipbmd and steroid usage, age and smoking. Following this perform multivariable linear regression including all three covariates.

a) Present in a single table the estimates, 95% confidence intervals and p-values of the univariable and multivariable linear regression analyses with separate columns for the unadjusted and adjusted estimates.

b) Interpret the adjusted association between steroid usage and bone mineral density.

c) Investigate if the association between steroid usage and bone mineral density is modified by smoking, after controlling for age.

d) Investigate if the association between steroid usage and bone mineral density is modified by age, after controlling for smoking.

8. (Concluding statement; 5 marks) Describe for the clinician in a single paragraph the results of your statistical analyses, in particular, addressing her research question (maximum 100 words).

Assignment link - https://www.dropbox.com/s/gs0ksj8lmmmgcpi/Assignment.zip?dl=0.

Request for Solution File

Ask an Expert for Answer!!
Applied Statistics: Poph90144 linear and logistic regression assignment present
Reference No:- TGS01586578

Expected delivery within 24 Hours