What is the confidence of the rule - for classification and


Question 1

Which of the following statement(s) is(are) correct?

a. In multiple linear regression, dropping predictors that are uncorrelated with the dependent variable may decrease the variance of predictions.
b. In multiple linear regression, using predictors that are actually uncorrelated with the dependent variable may decrease the variance of predictions.

c. Both a. and b.

d. Neither a. nor b.

Question 2

The prediction error for record i is defined as the difference between its actual yi value and its predicted ^yi value: ei = yi - ^yi, ∑i=1nei2

please select one of the appropriate acronyms or the correct answer in the following:

a. MAPE
b. RMSE
c. MAE or MAD

d. Average Error

e. Total SSE

Question 3

Given a transaction database for mining association rule as follows:

TID

Items

100

A C D

200

B C E

300

A B C E

The number of generated item-sets from the set of items {A, B, C, D, E} can be used to formulate association rules is

a. 26

b. 5

c. 6

d. 12

Question 4

Which of the following statement(s) is(are) correct?

a. Each node in a classification tree is corresponding to a dimension (column) of a data table.
b. Each node with its associated value in a classification tree is used to partition the data set along its corresponding dimension.

c. Both a. and b.
d. Neither a. nor b

Question 5

A dataset has 1000 records and one variable with 5% of the values missing, spread randomly throughout the records in the variable column. An analysis decides to remove records that have missing values. About how many records would you expect would be removed?

Question 6
Given a database table containing weather data as follows:

Outlook

Temperature

Humidity

Windy

Class: Play

Sunny

Hot

High

False

No

Sunny

Hot

High

True

No

Overcast

Hot

High

False

Yes

Rainy

Mild

High

False

Yes

Rainy

Cool

Normal

False

Yes

Rainy

Cool

Normal

True

No

Overcast

Cool

Normal

True

Yes

Sunny

Mild

High

False

No

Sunny

Cool

Normal

False

Yes

Rainy

Mild

Normal

False

Yes

Sunny

Mild

Normal

True

Yes

Overcast

Mild

High

True

Yes

Overcast

Hot

Normal

False

Yes

Rainy

Mild

High

True

No

Where Outlook, Temperature, Humidity, and Windy are the input variables (predictors), and Play is the output variable (response or outcome).
For the given sample, X = (Outlook = ‘Sunny', Temperature = ‘Mild' , Humidity = ‘High' , Windy = ‘False'), using naïve Bayes classification method to classify the sample indicates to play.

True

False

Question 7

The difference between the statistical regression models and the neural network model is(are)

a. The neural network model uses hidden layers.
b. The regression models have no input layer.
c. The regression models have no output layer

d. All of a., b., and c.

Question 8

A dataset has 1000 records and 50 variables with 5% of the values missing, spread randomly throughout the records and variables. An analysis decides to remove records that have missing values. About how many records would you expect would be removed?

Question 9

Given a database table containing weather data as follows:

Outlook

Temperature

Humidity

Windy

Class: Play

Sunny

Hot

High

False

No

Sunny

Hot

High

True

No

Overcast

Hot

High

False

Yes

Rainy

Mild

High

False

Yes

Rainy

Cool

Normal

False

Yes

Rainy

Cool

Normal

True

No

Overcast

Cool

Normal

True

Yes

Sunny

Mild

High

False

No

Sunny

Cool

Normal

False

Yes

Rainy

Mild

Normal

False

Yes

Sunny

Mild

Normal

True

Yes

Overcast

Mild

High

True

Yes

Overcast

Hot

Normal

False

Yes

Rainy

Mild

High

True

No

Where Outlook, Temperature, Humidity, and Windy are the input variables (predictors), and Play is the output variable (response or outcome).
For the given sample, X = (Outlook = ‘Sunny', Temperature = ‘Mild' , Humidity = ‘High' , Windy = ‘False')
Please compute the conditional probability P(X|PLAY='Yes')
* P(PLAY='Yes') . (* is the multiplication)
Please give keep 3 digits after decimal, for example. 0.521.

Question 10

Given a database table containing weather data as follows:

Outlook

Temperature

Humidity

Windy

Class: Play

Sunny

Hot

High

False

No

Sunny

Hot

High

True

No

Overcast

Hot

High

False

Yes

Rainy

Mild

High

False

Yes

Rainy

Cool

Normal

False

Yes

Rainy

Cool

Normal

True

No

Overcast

Cool

Normal

True

Yes

Sunny

Mild

High

False

No

Sunny

Cool

Normal

False

Yes

Rainy

Mild

Normal

False

Yes

Sunny

Mild

Normal

True

Yes

Overcast

Mild

High

True

Yes

Overcast

Hot

Normal

False

Yes

Rainy

Mild

High

True

No

Where Outlook, Temperature, Humidity, and Windy are the input variables (predictors), and Play is the output variable (response or outcome). Please compute the conditional probability P(Humidity = ‘High'| PLAY='No').

Question 11
Which of the following statement(s) is(are) correct in association rule mining or affinity analysis?
a. If an itemset is frequent, then its subset is also frequent.

b. If an itemset is frequent, then its super set is also frequent.

c. If an itemset is infrequent, then its super set is also frequent.

d. If an itemset is frequent, then its subset is also infrequent.

Question 12

Which of the following statement is correct in association rule mining or affinity analysis?
a. A strong rule with high support does not necessarily lead to its high confidence.
b. A strong rule with low support leads to its low confidence.

c. A strong rule with low support leads to its high confidence.
d. A strong rule with high support always leads to its high confidence.

Question 13

Which of the following statement(s) is(are) correct?
a. Each node with its associated value in a classification tree defines a linear function.

b. A classification tree consists of many linear functions.
c. Both a. and b.

d. Neither a. nor b.

Question 14

Given a database table containing weather data as follows:

Outlook

Temperature

Humidity

Windy

Class: Play

Sunny

Hot

High

False

No

Sunny

Hot

High

True

No

Overcast

Hot

High

False

Yes

Rainy

Mild

High

False

Yes

Rainy

Cool

Normal

False

Yes

Rainy

Cool

Normal

True

No

Overcast

Cool

Normal

True

Yes

Sunny

Mild

High

False

No

Sunny

Cool

Normal

False

Yes

Rainy

Mild

Normal

False

Yes

Sunny

Mild

Normal

True

Yes

Overcast

Mild

High

True

Yes

Overcast

Hot

Normal

False

Yes

Rainy

Mild

High

True

No

Where Outlook, Temperature, Humidity, and Windy are the input variables (predictors), and Play is the output variable (response or outcome). Please compute the conditional probability P(Outlook='Sunny'|PLAY='Yes'). Please keep 3 digits after the decimal points, for example, 0.123.

Question 15

Alternatives to maximize accuracy of a classifier or a data mining model is(are):

a. Maximizing sensitivity subject to some minimum level of specificity.
b. Minimizing false positive s subject to some maximum level of false negatives.

c. Neither a. nor b.

d. Both a. and b.

Question 16

The following questions are related to similarity measurement, please match each expression with the correct corresponding term.

1780_figure.jpg

a. Mahalanobis distance
b. Correlation-based similarity
c. Maximum coordinate distance

d. Manhattan distance

Question 17

Which of the following statement(s) is(are) correct?

a. The sensitivity of a classifier measures the true positive rate.

b. The specificity of a classifier measures the false positive rate.

c. Both a. and b.
d. Neither a. nor b.

Question 18

Which of the following(s) is(are) used to measure the impurity of data in the process of constructing CART (classification and regression tree)?

a. Gini Index
b. Entropy
c. Both a. and b.

d. Neither a. nor b.

Question 19

Given a transaction database for mining association rule as follows:

TID

Items

100

A C D

200

B C E

300

A B C E

which one of the following statement is correct?

a. There are 5 item-set that can be generated from the set of items {A, B, C, D, E}.
b. There are 5 non-empty item-set that can be generated from the set of items {A, B, C, D, E}.
c. There are 31 non-empty item-set that can be generated from the set of items {A, B, C, D, E}.
d. There are 32 non-empty item-set that can be generated from the set of items {A, B, C, D, E}.

Question 20

The prediction error for record i is defined as the difference between its actual yi value and its predicted ^yi value: ei = yi - ^yi, for the given expression, 100% x 1/n x ∑i=1n|ei/yi|, please select one of the appropriate acronyms or the correct answer in the following:

Total SSE

RMSE

MAPE
MAE or MAD
Average Error

Question 21

Given a transaction database for mining association rule as follows:

TID

Items

100

A C D

200

B C E

300

A B C E

For given support count = 2, which one of the following item-sets is not a frequent item- set?

a. {B, C, E}
b. {A, C, D}
c. {C, E}
d. {B, E}

Question 22

Given a customer buying electronics database, and 10,000 transactions are analyzed and the data show:
6,000 of customer transactions included computer games (game) 7,500 of them included videos (video),
4,000 of them included both computer games and videos
The generated association rule is; video-> game [support = ?%, confidence = ?%]
What is the confidence of the rule? (Please keep 2 digits after the decimal point, for example, 025).

Question 23
Given a customer buying electronics database, and 10,000 transactions are analyzed and the data show:
6,000 of customer transactions included computer games (game) 7,500 of them included videos (video),
4,000 of them included both computer games and videos
The generated association rule is: game -> video [support = ?%, confidence = ?%] What is the support of the rule?

Question 24

In order to achieve a given degree of reliability with a given data set and a given data mining model, what is the good rules of thumb for the ratio between the number of input variables (predictor variables) and the number of records?

a. to have 10 records for each input variable (predictor variable).

b. to have at least 6 × m × p records, m is the number of outcome classes, and p is the number of input variables.
c. Either a. or b.
d. Neither a. nor b.

Question 25

Given a database table containing weather data as follows:

Outlook

Temperature

Humidity

Windy

Class: Play

Sunny

Hot

High

False

No

Sunny

Hot

High

True

No

Overcast

Hot

High

False

Yes

Rainy

Mild

High

False

Yes

Rainy

Cool

Normal

False

Yes

Rainy

Cool

Normal

True

No

Overcast

Cool

Normal

True

Yes

Sunny

Mild

High

False

No

Sunny

Cool

Normal

False

Yes

Rainy

Mild

Normal

False

Yes

Sunny

Mild

Normal

True

Yes

Overcast

Mild

High

True

Yes

Overcast

Hot

Normal

False

Yes

Rainy

Mild

High

True

No

Where Outlook, Temperature, Humidity, and Windy are the input variables (predictors), and Play is the output variable (response or outcome). Please compute the conditional probability P(Humidity = ‘High'| PLAY='Yes'). Please keep 3 digits after the decimal point, for example, 0.123.

Question 26

Given a customer buying electronics database, and 10,000 transactions are analyzed and the data show:
6,000 of customer transactions included computer games (game) 7,500 of them included videos (video),
4,000 of them included both computer games and videos
The generated association rule is: game -> video [support = ?%, confidence = ?%] What is the lift of the rule? (please keep 2 digits after the decimal, for example, 0.25)

Question 27

Given a customer buying electronics database, and 10,000 transactions are analyzed and the data show:
6,000 of customer transactions included computer games (game) 7,500 of them included videos (video),
4,000 of them included both computer games and videos
The generated association rule is: game -> video [support = ?%, confidence = ?%] Which one of the following statements is correct?
a. It shows the video and game are not positively associated or correlated.
b. It shows the video and game are not negatively associated or correlated.
c. The customers who buy computer video also buy game.
d. It shows the video and game are independent to each other.

Question 28
A good classifier will give a high lift chart

a. when the classifier acts on a lot of cases.
b. when the classifier acts on only a few cases.

c. Both a. and b.

d. Neither a. nor b.

Question 29

Given a database table containing weather data as follows:

Outlook

Temperature

Humidity

Windy

Class: Play

Sunny

Hot

High

False

No

Sunny

Hot

High

True

No

Overcast

Hot

High

False

Yes

Rainy

Mild

High

False

Yes

Rainy

Cool

Normal

False

Yes

Rainy

Cool

Normal

True

No

Overcast

Cool

Normal

True

Yes

Sunny

Mild

High

False

No

Sunny

Cool

Normal

False

Yes

Rainy

Mild

Normal

False

Yes

Sunny

Mild

Normal

True

Yes

Overcast

Mild

High

True

Yes

Overcast

Hot

Normal

False

Yes

Rainy

Mild

High

True

No

Where Outlook, Temperature, Humidity, and Windy are the input variables (predictors), and Play is the output variable (response or outcome). Please compute the conditional probability P(Outlook='Sunny'|PLAY='No'). Please keep 3 digits after the decimal points, for example, 0.123.

Question 30

Which of the following statement(s) is(are) correct?

a. Multiple linear regression can be used to predict the value of continuous dependent variable for new observation.

b. Logistic regression can be used to classify a new observation into one of the specific classes.

c. Both a. and b.

d. Neither a. nor b.

Question 31

Given a database table containing weather data as follows:

Outlook

Temperature

Humidity

Windy

Class: Play

Sunny

Hot

High

False

No

Sunny

Hot

High

True

No

Overcast

Hot

High

False

Yes

Rainy

Mild

High

False

Yes

Rainy

Cool

Normal

False

Yes

Rainy

Cool

Normal

True

No

Overcast

Cool

Normal

True

Yes

Sunny

Mild

High

False

No

Sunny

Cool

Normal

False

Yes

Rainy

Mild

Normal

False

Yes

Sunny

Mild

Normal

True

Yes

Overcast

Mild

High

True

Yes

Overcast

Hot

Normal

False

Yes

Rainy

Mild

High

True

No

Where Outlook, Temperature, Humidity, and Windy are the input variables (predictors), and Play is the output variable (response or outcome).
For the given sample, X = (Outlook = ‘Sunny', Temperature = ‘Mild' , Humidity = ‘High' , Windy = ‘False')
Please compute the conditional probability P(X|PLAY='No'). Please give keep 3 digits after decimal, for example. 0.521.

Question 32

Given a database table containing weather data as follows:

Outlook

Temperature

Humidity

Windy

Class: Play

Sunny

Hot

High

False

No

Sunny

Hot

High

True

No

Overcast

Hot

High

False

Yes

Rainy

Mild

High

False

Yes

Rainy

Cool

Normal

False

Yes

Rainy

Cool

Normal

True

No

Overcast

Cool

Normal

True

Yes

Sunny

Mild

High

False

No

Sunny

Cool

Normal

False

Yes

Rainy

Mild

Normal

False

Yes

Sunny

Mild

Normal

True

Yes

Overcast

Mild

High

True

Yes

Overcast

Hot

Normal

False

Yes

Rainy

Mild

High

True

No

Where Outlook, Temperature, Humidity, and Windy are the input variables (predictors), and Play is the output variable (response or outcome). Please compute the prior probability, P(PLAY='No'). Please keep 3 digits after the decimal point.

Question 33

The overfitting problem can be detected
a. When the performance on the training data set improves and validation data set deteriorates.

b. When the performance on the training data set deteriorates and validation data set improves.
c. When the performance on both the training data set and validation data set deteriorate.
d. When the performance on both the training data set and validation data set improve.

Question 34

The purpose(s) of dimension reduction is(are)

a. reducing effects of curse dimensionality.

b. eliminating the input variables/predictors that are uncorrelated to the output variables/response.

c. reducing the possibility of overfitting.

d. All of a., b., and c.

Question 35

The prediction error for record i is defined as the difference between its actual yi value and its predicted ^yi value: ei = yi - ^yi, for the given expression 1/n ∑i=1n|ei|, please select one of the appropriate acronyms or the correct answer in the following.
a. Average error
b. MAE or MAD
c. MAPE
d. RMSE
e. Total SSE

Question 36

Given a customer buying electronics database, and 10,000 transactions are analyzed and the data show:
6,000 of customer transactions included computer games (game) 7,500 of them included videos (video),
4,000 of them included both computer games and videos
The generated association rule is: game -> video [support = ?%, confidence = ?%] Which one of the following statements is correct?
a. It shows the game and video are independent to each other.
b. The customers who buy computer games also buy video.
c. It shows the game and video are not positively associated or correlated.
d. It shows the game and video are not negatively associated or correlated.

Question 37

The difference between the logistic regression model and the neural network model is(are)

a. The neural network model uses hidden layers.

b. The neural network model uses activation function.

c. Both a. and b.

d. Neither a. nor b.

Question 38

For cluster analysis, which of the following statement(s) is(are) correct?
a. K-means clustering method is used to form the cluster into hierarchy.
b. K-means clustering method is centroid based approach.
c. K-means clustering method is not a centroid based approach.

d. K-means clustering method is a hierarchical clustering method.

Question 39

Given a database table containing weather data as follows:

Outlook

Temperature

Humidity

Windy

Class: Play

Sunny

Hot

High

False

No

Sunny

Hot

High

True

No

Overcast

Hot

High

False

Yes

Rainy

Mild

High

False

Yes

Rainy

Cool

Normal

False

Yes

Rainy

Cool

Normal

True

No

Overcast

Cool

Normal

True

Yes

Sunny

Mild

High

False

No

Sunny

Cool

Normal

False

Yes

Rainy

Mild

Normal

False

Yes

Sunny

Mild

Normal

True

Yes

Overcast

Mild

High

True

Yes

Overcast

Hot

Normal

False

Yes

Rainy

Mild

High

True

No

Where Outlook, Temperature, Humidity, and Windy are the input variables (predictors), and Play is the output variable (response or outcome). Please compute the conditional probability P(Windy = ‘False'| PLAY='Yes').
Please give keep 3 digits after decimal, for example. 0.521.

Question 40

In the Neural network model, what kind parameter(s) is(are) used to avoid getting stuck in local optimum?

a. The learning rate

b. The momentum
c. Both a. and b.
d. Neither a. nor b.

Question 41

Given a database table containing weather data as follows:

Outlook

Temperature

Humidity

Windy

Class: Play

Sunny

Hot

High

False

No

Sunny

Hot

High

True

No

Overcast

Hot

High

False

Yes

Rainy

Mild

High

False

Yes

Rainy

Cool

Normal

False

Yes

Rainy

Cool

Normal

True

No

Overcast

Cool

Normal

True

Yes

Sunny

Mild

High

False

No

Sunny

Cool

Normal

False

Yes

Rainy

Mild

Normal

False

Yes

Sunny

Mild

Normal

True

Yes

Overcast

Mild

High

True

Yes

Overcast

Hot

Normal

False

Yes

Rainy

Mild

High

True

No

Where Outlook, Temperature, Humidity, and Windy are the input variables (predictors), and Play is the output variable (response or outcome).
For the given sample, X = (Outlook = ‘Sunny', Temperature = ‘Mild' , Humidity = ‘High' , Windy = ‘False')
Please compute the conditional probability P(X|PLAY='No')
* P(PLAY='No') . (* is the multiplication)
Please give keep 3 digits after decimal, for example. 0.521.

Question 42

In terms of the number of variables involved in the training process of supervised learning, which of the following statement is correct?

a. The more variables we include in the trained model, the less the risk of overfitting.
b. The more variables we include in the trained model, the greater the risk of overfitting.

c. The number of variables included in trained model has no impact to the risk of overfitting.
d. The less variables we include in the trained model, the greater the risk of overfitting.

Question 43

Given a transaction database for mining association rule as follows:

TID

Items

100

A C D

200

B C E

300

A B C E


For given support count = 2, which one of the following statement is incorrect?

a. The rule B, C -> E and the rule C,E -> B have the same support value.
b. The rule B,C -> E and the rule C-> B,E- have the same confidence value.
c. The rule B,C -> E and the rule B,E->C have the same support value.

d. The rule B, C -> E and the rule C,E -> B have the same confidence value.

Question 44

Given a database table containing weather data as follows:

Outlook

Temperature

Humidity

Windy

Class: Play

Sunny

Hot

High

False

No

Sunny

Hot

High

True

No

Overcast

Hot

High

False

Yes

Rainy

Mild

High

False

Yes

Rainy

Cool

Normal

False

Yes

Rainy

Cool

Normal

True

No

Overcast

Cool

Normal

True

Yes

Sunny

Mild

High

False

No

Sunny

Cool

Normal

False

Yes

Rainy

Mild

Normal

False

Yes

Sunny

Mild

Normal

True

Yes

Overcast

Mild

High

True

Yes

Overcast

Hot

Normal

False

Yes

Rainy

Mild

High

True

No

Where Outlook, Temperature, Humidity, and Windy are the input variables (predictors), and Play is the output variable (response or outcome), please compute the prior probability P(PLAY='Yes'). Please give keep 3 digits after the decimal point, for example. 0.521.

Question 45
Given a database table containing weather data as follows:

Outlook

Temperature

Humidity

Windy

Class: Play

Sunny

Hot

High

False

No

Sunny

Hot

High

True

No

Overcast

Hot

High

False

Yes

Rainy

Mild

High

False

Yes

Rainy

Cool

Normal

False

Yes

Rainy

Cool

Normal

True

No

Overcast

Cool

Normal

True

Yes

Sunny

Mild

High

False

No

Sunny

Cool

Normal

False

Yes

Rainy

Mild

Normal

False

Yes

Sunny

Mild

Normal

True

Yes

Overcast

Mild

High

True

Yes

Overcast

Hot

Normal

False

Yes

Rainy

Mild

High

True

No

Where Outlook, Temperature, Humidity, and Windy are the input variables (predictors), and Play is the output variable (response or outcome).
For the given sample, X = (Outlook = ‘Sunny', Temperature = ‘Mild' , Humidity = ‘High' , Windy = ‘False')
Please compute the conditional probability P(X|PLAY='Yes'). Please give keep 3 digits after decimal, for example. 0.521.

Question 46

The purpose of normalizing data value into unit value is

a. To reduce the impact or dominance of the data with large scale value.

b. To reduce the bias caused by the data with large scale value.

c. Both a. and b.

d. Neither a. nor b.

Question 47

Given a customer buying electronics database, and 10,000 transactions are analyzed and the data show:
6,000 of customer transactions included computer games (game) 7,500 of them included videos (video),
4,000 of them included both computer games and videos
The generated association rule is: game -> video [support = ?%, confidence = ?%]

What is the confidence of the rule? (please enter the value with the only integer part, for example, 50%, enter 50.

Question 48

A dataset has 1000 records and 2 variables with 5% of the values missing, spread randomly throughout the records and variables. An analysis decides to remove records that have missing values. About how many records would you expect would be removed?

Question 49

Which of the following statement(s) is(are) correct for multiple linear regression method in XLMiner?
a. The categorical data value must be transformed into binary data.
b. The numerical data value must be transformed into categorical value.

c. Either a. or b.
d. Neither a. nor b.

Question 50

Which of the following statement(s) is(are) correct?
a. The classification tree is used to generate class label for categorical output.

b. The regression tree is used to generate the numerical output for prediction or estimation.

c. Both a. and b.

d. Neither a. nor b.

Question 51

The prediction error for record i is defined as the difference between its actual yi value and
its predicted ^yi value: ei = yi - ^yi, for the given expression, 1/n∑i=1nei, please select one of the appropriate acronyms or the correct answer in the following:

a. MAPE

b. Total SSE
c. RMSE
d. Average error

e. MAE or MAD

Question 52

In the Neural network model, what kind of parameter(s) is(are) used to avoid overfitting?
a. The learning rate

b. The momentum
c. Both a. and b.

d. Neither a. nor b.

Question 53

The prediction error for record i is defined as the difference between its actual yi value and its predicted ^y value: ei = yi - ^yi, √(1/n∑i=1nei2)

please select one of the appropriate acronyms or the correct answer in the following:

a. TOTAL SSE
b. RMSE
c. MAE or MAD
d. MAPE
e. Average Error

Question 54

Which of the following statement(s) is(are) correct?
a. Each branch from the root to a leave node in a classification tree represents a classification rule.

b. Each branch from the root to a leave node in a classification tree is associated with a partitioned data set with a class label.

c. Both a. and b.

d. Neither a. nor b.

Question 55

Which of the following data mining methods in XLMINER is especially suited for (and limited to) both categorical predictor and outcome variable?

a. Naïve Bayes method.
b. Regression
c. K-Nearest Neighbor method.
d. Neural Network

Question 56

In terms of input variables/predictors and output variable/response, there are four combinations in the following:

continuous input variables/predictors - continuous output variable/response
continuous input variables/predictors - categorical output variable/response

categorical s input variables/predictors - categorical output variable/response
categorical s input variables/predictors - continuous output variable/response

Which of the following data mining method can be used for any one of the four combinations in XLMINER?
a. Neural network

b. Naïve Bayes method

c. Logistic regression
d. Linear regression

Question 57

Given a database table containing weather data as follows:

Outlook

Temperature

Humidity

Windy

Class: Play

Sunny

Hot

High

False

No

Sunny

Hot

High

True

No

Overcast

Hot

High

False

Yes

Rainy

Mild

High

False

Yes

Rainy

Cool

Normal

False

Yes

Rainy

Cool

Normal

True

No

Overcast

Cool

Normal

True

Yes

Sunny

Mild

High

False

No

Sunny

Cool

Normal

False

Yes

Rainy

Mild

Normal

False

Yes

Sunny

Mild

Normal

True

Yes

Overcast

Mild

High

True

Yes

Overcast

Hot

Normal

False

Yes

Rainy

Mild

High

True

No

Where Outlook, Temperature, Humidity, and Windy are the input variables (predictors), and Play is the output variable (response or outcome). Please compute the conditional probability P(Windy = ‘False'| PLAY='No').

Question 58

Which of the following statement(s) is(are) correct?

a. The sensitivity of a classifier measures the false negative rate.

b. The specificity of a classifier measures the true negative rate.
c. Neither a. nor b.
d. Both a. and b.

Question 59

Given a database table containing weather data as follows:

 Outlook  Temperature  Humidity  Windy  Class: Play

Sunny

Hot

High

False

No

Sunny

Hot

High

True

No

Overcast

Hot

High

False

Yes

Rainy

Mild

High

False

Yes

Rainy

Cool

Normal

False

Yes

Rainy

Cool

Normal

True

No

Overcast

Cool

Normal

True

Yes

Sunny

Mild

High

False

No

Sunny

Cool

Normal

False

Yes

Rainy

Mild

Normal

False

Yes

Sunny

Mild

Normal

True

Yes

Overcast

Mild

High

True

Yes

Overcast

Hot

Normal

False

Yes

Rainy

Mild

High

True

No

Where Outlook, Temperature, Humidity, and Windy are the input variables (predictors), and Play is the output variable (response or outcome).
Please compute the conditional probability P(Temperature = ‘Mild'|PLAY='Yes'). Please keep 3 digits after the decimal point, for example, 0.123.

Question 60

Given a transaction database for mining association rule as follows:

TID

Items

100

A C D

200

B C E

300

A B C E

For given support count = 2, which one of the following statement is incorrect?

a. The rule A->C and C->A have the same confidence value.
b. The item-set in rule C->A is a frequent item-set.
c. The item-set in rule A->C is a frequent item-set.
d. The rule A->C and C->A have the same support value.

Question 61

Given a transaction database for mining association rule as follows:

TID

Items

100

A C D

200

B C E

300

A B C E

For given support count = 2, which one of the following statement is incorrect?

a. The rule B->C, E and the rule E->B,C have the same confidence value.
b. The rule E ->B,C and the rule C->B,E have the same confidence value.
c. The rule B,C-> E and the rule E->B,C have the same confidence value.
d. The rule B,E ->C and the rule C->B,E have the same confidence value.

Question 62

The difference between the multiple linear regression model and the neural network model is(are)?

The neural network model uses hidden layers.

The neural network model uses activation function.

Both a. and b.
Neither a. nor b.

Question 63

Given a database table containing weather data as follows:

Outlook

Temperature

Humidity

Windy

Class: Play

Sunny

Hot

High

False

No

Sunny

Hot

High

True

No

Overcast

Hot

High

False

Yes

Rainy

Mild

High

False

Yes

Rainy

Cool

Normal

False

Yes

Rainy

Cool

Normal

True

No

Overcast

Cool

Normal

True

Yes

Sunny

Mild

High

False

No

Sunny

Cool

Normal

False

Yes

Rainy

Mild

Normal

False

Yes

Sunny

Mild

Normal

True

Yes

Overcast

Mild

High

True

Yes

Overcast

Hot

Normal

False

Yes

Rainy

Mild

High

True

No

Where Outlook, Temperature, Humidity, and Windy are the input variables (predictors), and Play is the output variable (response or outcome). Please compute the conditional probability P(Temperature = ‘Mild'|PLAY='No').

Question 64

Given a transaction database for mining association rule as follows:

TID

Items

100

A C D

200

B C E

300

A B C E

For given support count = 2 and item-set {B, C, E}, how many number of valid rules can be generated from the item-set {B, C, E}?

a. 2

b. 12

c. 3

d. 6

Question 65

Which of the following statement(s) is(are) correct?
a. Outliers are the values that lie far away from the bulk of the data.
b. An outlier whose value is over 3 standard deviation away from the mean.

c. An outlier is an invalid data point.

d. Both a. and b.

Question 66
One of the ways to handle missing values in preprocessing of data mining is
a. to drop the columns with missing values.
b. to replace the missing values with imputed value.
c. Both a. and b.
d. Neither a. nor b.

Question 67

For classification and regression trees (CART), which of the following ways can be used to avoid overfitting?

a. Setting rules to stop tree growth.

b. Pruning the full-grown tree back to a level where it does not overfit.
c. Both a. and b.
d. Neither a. nor b.

Solution Preview :

Prepared by a verified Expert
Data Structure & Algorithms: What is the confidence of the rule - for classification and
Reference No:- TGS02545011

Now Priced at $55 (50% Discount)

Recommended (99%)

Rated (4.3/5)