Split the dataset into two groups by variable gender


Assignment

Instructions: Read lectures notes and sample codes I used in class carefully before starting your homework. You need to submit a .sas file on ANGEL

(Lessons > Homework > dropbox for Homework2).

1. (Column Input and Assignment Statement). Below is a small dataset of grades for 6 students.

Name

Gender

Homework

Midterm

Final

Beck

Male

90

88

75

Christal

Female

100

85

80

Gavetti

Female

80

77

71

Easton

Male

90

82

76

Mattiace

Male

60

67

53

Chen

Female

85

83

68

Table 1: Grades

(a) Use column input to read the dataset above called Grades and store it in the folder MyLibrary1

(b) Define Two new variables Score1 and Score2, where

Score1 = 40% Homework +20% Midterm + 40% Final, and Score2 = 40% Midterm + 60% Final.

(c) Calculate the final score. The final score, represented by a new variable fs, is defined to be the maximum of the following three variables:

Final, Score1 and Score2.

(d) Curve the final score fs. The curved new final score, represented by newfs, is defined as

newfs = 0.7 × fs + 30

(e) Round the new final score newfs to the nearest multiple of 5. (Hint: Use round function,

round(argument1, argument2), read SAS documentation for more details.)

(f) Print out the whole dataset, including all the new variables you created except the variable

Gender. (Usevarstatement).

2. (Formmated Input). Below is a dataset of all the games between Argentina and Germany in the FIFA World Cup. The descriptions of the variables are:

• game: the number of the game
• year: the year of the World Cup
• date: the date of the game
• stage: the stage of the game (a character variable)
• Arg: the score of Argentina
• Ger: the score of Germany
• Attendence: the number of audience

game

year

date

stage

Arg

Ger

Attendance

1

1958

08Jun1958

1st Round, Group1

1

3

31,156

2

1966

16Jul1966

1st Round, Group2

0

0

46,587

3

1986

29Jun1986

Final Game

3

2

114,600

4

1990

08Jul1990

Final Game

0

1

73,603

5

2006

30Jun2006

1/4 Finals

1

1

72,000

6

2010

03Jul2010

1/4 Finals

0

4

64,100

7

2014

13Jul2014

Final Game

0

1

74,738

Table 2: Germany vs Argentina in World Cup

(a) Use formatted input to read the dataset above called games. When reading the dataset (1) use an absolute pointer control to read year and date, and use a relative pointer controls to read stage and Attendance; (2) Use informat dateW. for variable 'date' and use informat commaW. for variable 'Attendance', where W is an integer which specifies the number of columns that variable will occupy.

(b) Set the page size to 50, line size to 80, center the output, and suppress the date and time when the output was created.

(c) Print the data set games. (1) Remember formatted input does not imply formatted output. Use format statement to format your output; (2) Add a title for this dataset.

3. (IF-THEN-ELSE Statement and Create a List Report). We have the following small dataset called HealthReport

Subj

Fname

Lname

Gender

Height

Weight

Date

Cal

1024

Alice

Smith

Female

65

125

12/01/95

2,036

1167

Maryann

White

Female

68

140

12/02/95

1,800

1168

Thomas

Jones

Male

 

190

01/05/96

2,302

1201

Benedictine

Arnold

Male

68

190

11/30/95

2,432

1302

Felicia

Ho

Female

63

115

01/01/96

1,972

1344

James

Hunter

Male

70

188

12/15/95

1,869

Table 3: Health Examination Report The columns, from left to right, represent the following variables:

Variable Name

Descriptions

Variable Type

Subj

Subject number

Numeric

Fname

First name

Character

Lname

Last name

Character

Gender

Gender

Character

Height

Height (lbs.)

Numeric

Weight

Weight (inches)

Numeric

Date

Date of measurement

Numeric

Cal

Calorie

Numeric

Table 4: Descriptions of variables

(a) Read the dataset above into SAS (pay attention to length and format of variables), and change the SAS system settings as follows:

linesize = 64, no current date and time, no page number

(b) Use assignment statement to create a new numeric variable BMI. BMI (Body mass index) is a person's weight divided by the square of his/her height. In other words, it is defined to be

BMI = Weight / Height2

where the weight unit is kilogram and the height unit is meter. So first you need to convert pound into kilogram and convert inch into meter (1 kilogram = 2.205 pounds, 1 inch = 0.0254 meter), and then calculate BMI.

(c) Use IF-TEHN-ELSE statement to create a new character variable Result. The value of variable

Result is determined by the value of BMI according to the following table.

Note: the length of the variable Result can be longer than 8. You should specify an appropriate length for variable Result with length statement. The length statement should be placed before the input statement.

Value of BMI     Value of Result
Missing             'Missing'
≤ 25                'Underweight'
> 25 and ≤       30 'Normal'
> 30                'Overweight'

(d) Use SORT procedure to sort the dataset Health Report first by the variable Gender in an ascending order, and then by the variable Lname in a descending order. The sorted dataset should be saved in another SAS dataset named Srtd Health Report.

(e) Print the the sorted dataset Srtd Health Report. When printing the dataset,

1. Use variable Lname and Fname as the identification variables (Use ID statement).

2. Split the dataset into two groups by variable Gender. Calculate the sum of Cal for each group.

3. Use LABEL statement to assign labels to all variables to make your output more readable. Particularly, use the descriptions in Table 2 as the labels. You can place the LABEL statement either in the DATA step or in the PRINT procedure. Use labels rather than the variable names when printing the dataset.

4. Add a TITLE ('Health Examination Report Sorted by Last Name') and FOOTNOTE for this dataset. The content of the FOOTNOTE is: ' BMI = Weight(kg)/(Height(m))^2'.

5. Use the correct FORMAT for variables Date and Cal.

(f) Use Where statement to print the first and last names of female students who are underweight.

Request for Solution File

Ask an Expert for Answer!!
Basic Statistics: Split the dataset into two groups by variable gender
Reference No:- TGS01525490

Expected delivery within 24 Hours