This problem illustrates the classification approach by, Data Structure & Algorithms

This problem illustrates the classification approach by

Assignment-

Problem 1: This problem illustrates the classification approach by using decision trees and the Lupus data (you can download the data file "sledata" from D2L site, course documents for week 6). The data consists of 300 patient records. Each record contains 12 elements. The first 11 elements stand for different symptoms and the final element of each record indicates the diagnosis. Build a decision tree and report:

1) The decision tree and the criteria used for building the tree for deciding the best split and the stopping condition (such as which impurity measure, how many cases for parents and children per node, etc)

2) How many nodes the final tree has and how many of them are terminal nodes;

3) What are the most important three Lupus data features in building the tree? Explain your answer.

4) Increase the number of cases for each parent and child. What do you notice with the complexity (number of nodes) of the tree? Does it increase? Explain your answer.

Problem 2: This problem illustrates the effect of the class imbalance of the accuracy of the decision trees. Download the red wine quality data from the UCI machine learning repository at: https://archive.ics.uci.edu/ml/datasets/Wine+Quality

1. Report how many classes (treat each quality level as a different class) are and what is the distribution of these classes for the red wine data is.

2. Repeat Problem 1 on the red wine data.

3. Now bin the class variable in such a way that data is not so imbalanced with respect to the class variable. Repeat Problem 1 but on the wine data with less number of classes (the binned class variable).

4. How the performance of the best classification model on the original class variable compares with the accuracy of the best classification model on the binned classification variable?

5. Do you have any other ideas on how you can improve the results further?

Showing that your idea will actually work will be graded with five extra credit points.

Problem 3: Differentiate between the following terms:

a. feature selection and feature extraction
b. training and testing
c. parametric reduction techniques and non-parametric reduction techniques
d. uniform binning and non-uniform binning
e. covariance matrix and correlation matrix.

View Complete Question

Request for Solution File

Ask an Expert for Answer!!

Data Structure & Algorithms: This problem illustrates the classification approach by

Reference No:- TGS01395268

Expected delivery within 24 Hours

Have a Question? (oR Write a Review)

Write atleast 100 words!!

Request for Solution File

Ask an Expert for Answer!!

Data Structure & Algorithms: This problem illustrates the classification approach by

Reference No:- TGS01395268

Have a Question? (oR Write a Review)

Recent Questions Asked Data Structure & Algorithms

Q : What should the role of the government in terms of the

Q : Pick a business task you would like to computerize how

Q : Assume that jimmy cash has 3100 in his checking account at

Q : The systems development life cycle sdlc provides a

Q : This problem illustrates the classification approach by

Q : Projects have many dependencies any of which could become

Q : What is the largest single deposit outflow can the bank

Q : The americans with disabilities act prohibits

Q : Two workers x and y are roofers working on the same wage

Problem regarding hospital readmission rates

What if salivary secretions are reduced or absent

Which genetic mutations are commonly associated with disease

Why patient demonstrates functional deficits

Female presenting with urticaria, wheezing, voice change

Integrative model of psychopathology

Preformed antibodies transfer from donor to recipient

Request for Solution File

Ask an Expert for Answer!!

Data Structure & Algorithms: This problem illustrates the classification approach by

Reference No:- TGS01395268

Recent Questions Asked Data Structure & Algorithms

Q : What should the role of the government in terms of the

Q : Pick a business task you would like to computerize how

Q : Assume that jimmy cash has 3100 in his checking account at

Q : The systems development life cycle sdlc provides a

Q : This problem illustrates the classification approach by

Q : Projects have many dependencies any of which could become

Q : What is the largest single deposit outflow can the bank

Q : The americans with disabilities act prohibits

Q : Two workers x and y are roofers working on the same wage

Asked Questions