Calculate the information gain when splitting on a and b


Assignment: Data Classification and Data Association Analysis

1) Consider the following data set for a binary class problem.

 

A

B

Class Label

T

F

+

T

T

+

T

T

+

T

F

-

T

T

+

F

F

-

F

F

-

F

F

-

T

T

-

T

F

-

 
(a) Calculate the information gain when splitting on A and B. Which attribute would the decision tree induction algorithm choose?

(b) Calculate the gain in the Gini index when splitting on A and B. Which attribute would the decision tree induction algorithm choose?

(c) Consider both entropy and the Gini index are monotonously increasing on the range [0, 0.5] and they are both monotonously decreasing on the range [0.5, 1]. Is it possible that information gain and the gain in the Gini index favor different attributes? Explain.

2) For each question below provide an example of an association rule from the market basket domain that satisfies the following conditions. Also, describe whether such rules are subjectively interesting.

(a) A rule that has high support and high confidence.

(b) A rule that has reasonably high support but low confidence.

(c) A rule that has low support and low confidence.

(d) A rule that has low support and high confidence.

3) Consider the training examples shown in the table below for a binary classification problem.

 

Instance

al

a2

a3

Target Class

1

T

T

1.0

+

2

T

T

6.0

+

3

T

F

5.0

-

4

F

F

4.0

+

5

F

T

7.0

-

6

F

T

3.0

-

7

F

F

8.0

-

8

T

F

7.0

+

9

F

T

5.0

-

(a) What is the entropy of this collection of training examples with respectto the positive class?

(b) What are the information gains of a1 and a2 relative to these training examples?

(c) For a3, which is a continuous attribute, compute the information gain for every possible split.

(d) What is the best split (among a1, a2, and a3) according to the information gain?

(e) What is the best split (between a1 and a2) according to the classification error rate?

(f) What is the best split (between a1 and a2) according to the Gini index?

 

 

Solution Preview :

Prepared by a verified Expert
Database Management System: Calculate the information gain when splitting on a and b
Reference No:- TGS03003598

Now Priced at $40 (50% Discount)

Recommended (96%)

Rated (4.8/5)