Is there any need to preprocess the data to be more, Applied Statistics

Is there any need to preprocess the data to be more

Assignment 1:

1. Using heritage data (release 1) in SQL

a. Find support for all single itemsets

b. List all itemsets with 2 elements and support of at least 0.2

c. List all itemsets with 3 elements and support at least 0.2

2. In Weka

a. Load heritage data (release 1)

b. Apply at least two association rule generation algorithms and compare results

c. Apply FPTree algorithm with at least two measures of rule metrics

Assignment 2:

1. In SQL/Weka:

a. Prepare heritage data for classification learning

b. Load heritage data release 3 (preprocessed to binary representation, including demographics and output attribute(s))

c. Perform exploratory analysis

d. Create at least three classification models for predicting hospitalization based on Year 1 data.

e. Which model performs the best on year 2 data?

f. Create regression model for predicting hospitalization days.

g. What is the difference between regression and classification models?

h. Present your results in a form of short report that includes screenshots, tables, an d needed description.

Assignment 3:

Classification Part 2

1. Using heritage release 3 data prepared last assignment

a. Include drug information into data

b. Include laboratory information into data

c. Import newly created data into Weka and run classification algorithms

d. Does inclusion of the information improve predictions?

There are many ways to complete question 4, so you need to make different decisions.

Try not to overcomplicate the problem.

2. In Weka using heritage 3 dataset

a. Apply kmeans algorithm for k=2, 3, 5, 10

b. Apply EM algorithm. What is the optimal number of clusters obtained by EM?

c. Compare the created clusters to classification based on hospitalization in year 2.

Assignment 4:

3.Using the data table shown below.

a.Calculate distance between all points in 1
-norm, 2
-norm and infinity
-norm. Show dissimilarity matrix.

b. Is there any need to preprocess the data to be more suitable for clustering? If so, describe the operations and show the resulting data table.

c.Apply k
-means clustering algorithm with k=2.

Using the data table shown below.

a. Calculate distance between all points in 1-norm, 2-norm and infinity-norm. Show dissimilarity matrix.

b. Is there any need to preprocess the data to be more suitable for clustering? If so, describe the operations and show the resulting data table.

c. Apply k-means clustering algorithm with k=2.

ID	Age	BMI	Gender	Total Cholesterol
1	30	24	M	180
2	70	19	M	190
3	65	26	M	220
4	40	32	F	260

Assignment 5:

-Text Mining

1. Write regular expression to:

a. detect zip codes in text

b. Find last names of all patients whose first name is John (note that regular expressions may have some false positives/false negatives).

2. List challenges in automatically retrieving ICD-9 codes from clinical notes. Search literature for to find relevant published work. Also, include own observations and comments.

3. Using the SMS data

a. Split data into training (80%) and testing (20%) sets

b. Build naïve Bayes classifier for detecting spam based on bag of words

i. List all words in the documents

ii. Count occurrences in spam and ham

iii. Assign likelihoods P(word|spam) and P(word|ham) for all words

iv. Convert test data into list of words. For each message you need, 2 columns: message id and word

v. Classify test data. This can be done by a series of joins with the data prepared in (iii).

vi. Calculate accuracy of your model (accuracy, precision, recall)

Attachment:- Assignment 1.rar

View Complete Question

Request for Solution File

Ask an Expert for Answer!!

Applied Statistics: Is there any need to preprocess the data to be more

Reference No:- TGS01625226

Expected delivery within 24 Hours

Have a Question? (oR Write a Review)

Write atleast 100 words!!

Request for Solution File

Ask an Expert for Answer!!

Applied Statistics: Is there any need to preprocess the data to be more

Reference No:- TGS01625226

Have a Question? (oR Write a Review)

Recent Questions Asked Applied Statistics

Q : Using the slr method calculate the auxiliary energy

Q : You are a facility manager for a local high school and your

Q : What alternatives does a bank have if it needs temporary

Q : Discuss the nature of victim participation in the criminal

Q : Is there any need to preprocess the data to be more

Q : Compute the inventory for this department as of january 31

Q : Bank balance sheet - create a balance sheet for a typical

Q : Determine the illuminances sun sky and ground-reflected on

Q : What do you think are some of the barriers and more

What if boundaries between family and outside world rigid

Discussion about consult for social services

Expresses emotions ranging from fear to anxiety

Which is not assessed with a mental status exam

Is a counselor doing harm by using intuition

How kinship networks can best be described as

Cultural values of the african-american family system

Request for Solution File

Ask an Expert for Answer!!

Applied Statistics: Is there any need to preprocess the data to be more

Reference No:- TGS01625226

Recent Questions Asked Applied Statistics

Q : Using the slr method calculate the auxiliary energy

Q : You are a facility manager for a local high school and your

Q : What alternatives does a bank have if it needs temporary

Q : Discuss the nature of victim participation in the criminal

Q : Is there any need to preprocess the data to be more

Q : Compute the inventory for this department as of january 31

Q : Bank balance sheet - create a balance sheet for a typical

Q : Determine the illuminances sun sky and ground-reflected on

Q : What do you think are some of the barriers and more

Asked Questions