About how many records would you expect would be removed -


Question 1

A dataset has 1000 records and 2 variables with 5% of the values missing, spread randomly throughout the records and variables. An analysis decides to remove records that have missing values. About how many records would you expect would be removed?

Question 2

Which of the following statement(s) is(are) correct?

a. The sensitivity of a classifier measures the false negative rate.

b. The specificity of a classifier measures the true negative rate.

c. Neither a. nor b.

d. Both a. and b.

Question 3

For classification and regression trees (CART), which of the following ways can be used to avoid overfitting?

a. Setting rules to stop tree growth.

b. Pruning the full-grown tree back to a level where it does not overfit.

c. Both a. and b.

d. Neither a. nor b.

Question 4

Given a customer buying electronics database, and 10,000 transactions are analyzed and the data show:
6,000 of customer transactions included computer games (game)
7,500 of them included videos (video),
4,000 of them included both computer games and videos
The generated association rule is: video -> game [support = ?%, confidence = ?%]
Which one of the following statements is correct?

a. The customers who buy computer video also buy game.

b. It shows the video and game are not positively associated or correlated.

c. It shows the video and game are independent to each other.

d. It shows the video and game are not negatively associated or correlated.

Question 5

The following questions are related to similarity measurement, please match each expression with the correct corresponding term.

a. Mahalanobis distance

b. Maximum coordinate distance

c. Correlation-based similarity

d. Manhattan distance

Question 6

Given a database table containing weather data as follows:

Outlook

Temperature

Humidity

Windy

Class: Play

Sunny

Hot

High

False

No

Sunny

Hot

High

True

No

Overcast

Hot

High

False

Yes

Rainy

Mild

High

False

Yes

Rainy

Cool

Normal

False

Yes

Rainy

Cool

Normal

True

No

Overcast

Cool

Normal

True

Yes

Sunny

Mild

High

False

No

Sunny

Cool

Normal

False

Yes

Rainy

Mild

Normal

False

Yes

Sunny

Mild

Normal

True

Yes

Overcast

Mild

High

True

Yes

Overcast

Hot

Normal

False

Yes

Rainy

Mild

High

True

No

Where Outlook, Temperature, Humidity, and Windy are the input variables (predictors), and Play is the output variable (response or outcome).

For the given sample, X = (Outlook = ‘Sunny', Temperature = ‘Mild' , Humidity = ‘High' , Windy = ‘False'), using naïve Bayes classification method to classify the sample indicates to play.

True
False

Question 7

Which of the following statement(s) is(are) correct?

a. Each branch from the root to a leave node in a classification tree represents a classification rule.

b. Each branch from the root to a leave node in a classification tree is associated with a partitioned data set with a class label.

c. Both a. and b.

d. Neither a. nor b.

Question 8

Given a database table containing weather data as follows:

Outlook

Temperature

Humidity

Windy

Class: Play

Sunny

Hot

High

False

No

Sunny

Hot

High

True

No

Overcast

Hot

High

False

Yes

Rainy

Mild

High

False

Yes

Rainy

Cool

Normal

False

Yes

Rainy

Cool

Normal

True

No

Overcast

Cool

Normal

True

Yes

Sunny

Mild

High

False

No

Sunny

Cool

Normal

False

Yes

Rainy

Mild

Normal

False

Yes

Sunny

Mild

Normal

True

Yes

Overcast

Mild

High

True

Yes

Overcast

Hot

Normal

False

Yes

Rainy

Mild

High

True

No

Where Outlook, Temperature, Humidity, and Windy are the input variables (predictors), and Play is the output variable (response or outcome). Please compute the conditional probability P(Windy = ‘False'| PLAY='Yes').
Please give keep 3 digits after decimal, for example. 0.521.

Question 9

Given a transaction database for mining association rule as follows:

TID

Items

100

A C D

200

B C E

300

A B C E

400

B E

which one of the following statement is correct?

a. There are 32 non-empty item-set that can be generated from the set of items {A, B, C, D, E}.

b. There are 5 non-empty item-set that can be generated from the set of items {A, B, C, D, E}.

c. There are 31 non-empty item-set that can be generated from the set of items {A, B, C, D, E}.

d. There are 5 item-set that can be generated from the set of items {A, B, C, D, E}.

Question 10

Which of the following statement(s) is(are) correct?

a. In multiple linear regression, dropping predictors that are uncorrelated with the dependent variable may decrease the variance of predictions.

b. In multiple linear regression, using predictors that are actually uncorrelated with the
dependent variable may decrease the variance of predictions.

c. Both a. and b.

d. Neither a. nor b.

Question 11

Which of the following statement(s) is(are) correct?

a. When the number of neurons at hidden layer increases, the chance of the neural network overfits the training data decreases.

b. When the number of neurons at hidden layer increases, the chance of the neural network overfits the training data increases.

c. When the number of neurons at hidden layer decreases, the chance of the neural network overfits the training data increases.

d. All of a., b., and c. are correct.

Question 12

Given a transaction database for mining association rule as follows:

TID

Items

100

A C D

200

B C E

300

A B C E

400

B E

For given support count = 2, which one of the following statement is incorrect?

a. The rule A->C and C->A have the same confidence value.

b. The item-set in rule A->C is a frequent item-set.

c. The rule A->C and C->A have the same support value.

d. The item-set in rule C->A is a frequent item-set.

Question 13

Given a customer buying electronics database, and 10,000 transactions are analyzed and the data show:
6,000 of customer transactions included computer games (game)
7,500 of them included videos (video),
4,000 of them included both computer games and videos
The generated association rule is: game -> video [support = ?%, confidence = ?%]

What is the confidence of the rule? (please enter the value with the only integer part, for example, 50%, enter 50.

Question 14

Which of the following statement(s) is(are) correct?

a. When the number of neurons at hidden layer increases, the chance of the neural network underfits the training data increases.

b. When the number of neurons at hidden layer decreases, the chance of the neural network underfits the training data decreases.

c. When the number of neurons at hidden layer increases, the chance of the neural network underfits the training data decreases.

d. All of a., b., and c. are correct.

Question 15

Given a database table containing weather data as follows:

Outlook

Temperature

Humidity

Windy

Class: Play

Sunny

Hot

High

False

No

Sunny

Hot

High

True

No

Overcast

Hot

High

False

Yes

Rainy

Mild

High

False

Yes

Rainy

Cool

Normal

False

Yes

Rainy

Cool

Normal

True

No

Overcast

Cool

Normal

True

Yes

Sunny

Mild

High

False

No

Sunny

Cool

Normal

False

Yes

Rainy

Mild

Normal

False

Yes

Sunny

Mild

Normal

True

Yes

Overcast

Mild

High

True

Yes

Overcast

Hot

Normal

False

Yes

Rainy

Mild

High

True

No

Where Outlook, Temperature, Humidity, and Windy are the input variables (predictors), and Play is the output variable (response or outcome). Please compute the conditional probability P(Outlook='Sunny'|PLAY='Yes'). Please keep 3 digits after the decimal points, for example, 0.123.

Question 16

The prediction error for record i is defined as the difference between its actual value and its predicted value: ,

please select one of the appropriate acronyms or the correct answer in the following:

a. MAPE

b. RMSE

c. MAE or MAD

d. Total SSE

e. Average Error

Question 17

Which of the following statement(s) is(are) correct?

a. Outliers are the values that lie far away from the bulk of the data.

b. An outlier whose value is over 3 standard deviation away from the mean.

c. An outlier is an invalid data point.

d. Both a. and b.

Question 18

Given a database table containing weather data as follows:

Outlook

Temperature

Humidity

Windy

Class: Play

Sunny

Hot

High

False

No

Sunny

Hot

High

True

No

Overcast

Hot

High

False

Yes

Rainy

Mild

High

False

Yes

Rainy

Cool

Normal

False

Yes

Rainy

Cool

Normal

True

No

Overcast

Cool

Normal

True

Yes

Sunny

Mild

High

False

No

Sunny

Cool

Normal

False

Yes

Rainy

Mild

Normal

False

Yes

Sunny

Mild

Normal

True

Yes

Overcast

Mild

High

True

Yes

Overcast

Hot

Normal

False

Yes

Rainy

Mild

High

True

No

Where Outlook, Temperature, Humidity, and Windy are the input variables (predictors), and Play is the output variable (response or outcome).
For the given sample, X = (Outlook = ‘Sunny', Temperature = ‘Mild' , Humidity = ‘High' , Windy = ‘False')
Please compute the conditional probability P(X|PLAY='Yes').
Please give keep 3 digits after decimal, for example. 0.521.

Question 19

If the probable nature of the cluster is unknown, which cluster distance function will be good choice to cluster the data?

a. Single linkage distance

b. Complete linkage distance

c. Centroid distance

d. Average linkage distance

Question 20

The prediction error for record i is defined as the difference between its actual value and its predicted value: , for the given expression, , please select one of the appropriate acronyms or the correct answer in the following:

MAE or MAD

RMSE

Total SSE

MAPE

Average Error

Question 21

The difference between the statistical regression models and the neural network model is(are)

a. The neural network model uses hidden layers.

b. The regression models have no input layer.

c. The regression models have no output layer

d. All of a., b., and c.

Question 22

Given a customer buying electronics database, and 10,000 transactions are analyzed and the data show:
6,000 of customer transactions included computer games (game)
7,500 of them included videos (video),
4,000 of them included both computer games and videos
The generated association rule is; video-> game [support = ?%, confidence = ?%]

What is the confidence of the rule? (Please keep 2 digits after the decimal point, for example, 025).

Question 23

Which of the following statement(s) is(are) correct?

a. Each node in a classification tree is corresponding to a dimension (column) of a data table.

b. Each node with its associated value in a classification tree is used to partition the data set along its corresponding dimension.

c. Both a. and b.

d. Neither a. nor b

Question 24

In terms of input variables/predictors and output variable/response, there are four combinations in the following:

continuous input variables/predictors - continuous output variable/response
continuous input variables/predictors - categorical output variable/response

categorical s input variables/predictors - categorical output variable/response
categorical s input variables/predictors - continuous output variable/response

Which of the following data mining method can be used for any one of the four combinations in XLMINER?

a. Neural network

b. Linear regression

c. Naïve Bayes method

d. Logistic regression

Question 25

Given a customer buying electronics database, and 10,000 transactions are analyzed and the data show:
6,000 of customer transactions included computer games (game)
7,500 of them included videos (video),
4,000 of them included both computer games and videos
The generated association rule is: game -> video [support = ?%, confidence = ?%]
What is the lift of the rule? (please keep 2 digits after the decimal, for example, 0.25)

Question 26

Which of the following statement(s) is(are) correct?

a. Each node with its associated value in a classification tree defines a linear function.

b. A classification tree consists of many linear functions.

c. Both a. and b.

d. Neither a. nor b.

Question 27

Given a database table containing weather data as follows:

Outlook

Temperature

Humidity

Windy

Class: Play

Sunny

Hot

High

False

No

Sunny

Hot

High

True

No

Overcast

Hot

High

False

Yes

Rainy

Mild

High

False

Yes

Rainy

Cool

Normal

False

Yes

Rainy

Cool

Normal

True

No

Overcast

Cool

Normal

True

Yes

Sunny

Mild

High

False

No

Sunny

Cool

Normal

False

Yes

Rainy

Mild

Normal

False

Yes

Sunny

Mild

Normal

True

Yes

Overcast

Mild

High

True

Yes

Overcast

Hot

Normal

False

Yes

Rainy

Mild

High

True

No

Where Outlook, Temperature, Humidity, and Windy are the input variables (predictors), and Play is the output variable (response or outcome). Please compute the conditional probability P(Outlook='Sunny'|PLAY='No'). Please keep 3 digits after the decimal points, for example, 0.123.

Question 28

Alternatives to maximize accuracy of a classifier or a data mining model is(are):

a. Maximizing sensitivity subject to some minimum level of specificity.

b. Minimizing false positive s subject to some maximum level of false negatives.

c. Neither a. nor b.

d. Both a. and b.

Question 29

The difference between the multiple linear regression model and the neural network model is(are)?

The neural network model uses hidden layers.

The neural network model uses activation function.

Both a. and b.

Neither a. nor b.

Question 30

The prediction error for record i is defined as the difference between its actual value and its predicted value: ,
please select one of the appropriate acronyms or the correct answer in the following:

a. TOTAL SSE

b. RMSE

c. MAE or MAD

d. MAPE

e. Average Error

Question 31

Given a database table containing weather data as follows:

Outlook

Temperature

Humidity

Windy

Class: Play

Sunny

Hot

High

False

No

Sunny

Hot

High

True

No

Overcast

Hot

High

False

Yes

Rainy

Mild

High

False

Yes

Rainy

Cool

Normal

False

Yes

Rainy

Cool

Normal

True

No

Overcast

Cool

Normal

True

Yes

Sunny

Mild

High

False

No

Sunny

Cool

Normal

False

Yes

Rainy

Mild

Normal

False

Yes

Sunny

Mild

Normal

True

Yes

Overcast

Mild

High

True

Yes

Overcast

Hot

Normal

False

Yes

Rainy

Mild

High

True

No

Where Outlook, Temperature, Humidity, and Windy are the input variables (predictors), and Play is the output variable (response or outcome). Please compute the conditional probability P(Windy = ‘False'| PLAY='No').

Question 32

Given a customer buying electronics database, and 10,000 transactions are analyzed and the data show:
6,000 of customer transactions included computer games (game)
7,500 of them included videos (video),
4,000 of them included both computer games and videos
The generated association rule is: game -> video [support = ?%, confidence = ?%]

What is the support of the rule?

Question 33

For the given item-set {A, B, C, D}, how many number of valid rules can be generated from the item-set {A, B, C, D}?

a. 15

b. 11

c. 50

d. 4

Question 34

Which of the following statement(s) is(are) correct?

a. Multiple linear regression can be used to predict the value of continuous dependent variable for new observation.

b. Logistic regression can be used to classify a new observation into one of the specific classes.

c. Both a. and b.

d. Neither a. nor b.

Question 35

Which of the following data mining methods in XLMINER is especially suited for (and limited to) both categorical predictor and outcome variable?

a. Neural Network

b. K-Nearest Neighbor method.

c. Regression

d. Naïve Bayes method.

Question 36

For cluster analysis, which of the following statement(s) is(are) correct?

a. K-means clustering method is not a centroid based approach.

b. K-means clustering method is centroid based approach.

c. K-means clustering method is used to form the cluster into hierarchy.

d. K-means clustering method is a hierarchical clustering method.

Question 37

One of the ways to handle missing values in preprocessing of data mining is

a. to drop the columns with missing values.

b. to replace the missing values with imputed value.

c. Both a. and b.

d. Neither a. nor b.

Question 38

Which of the following statement is correct in association rule mining or affinity analysis?

a. A strong rule with low support leads to its high confidence.

b. A strong rule with high support does not necessarily lead to its high confidence.

c. A strong rule with high support always leads to its high confidence.

d. A strong rule with low support leads to its low confidence.

Question 39

Given a database table containing weather data as follows:

Outlook

Temperature

Humidity

Windy

Class: Play

Sunny

Hot

High

False

No

Sunny

Hot

High

True

No

Overcast

Hot

High

False

Yes

Rainy

Mild

High

False

Yes

Rainy

Cool

Normal

False

Yes

Rainy

Cool

Normal

True

No

Overcast

Cool

Normal

True

Yes

Sunny

Mild

High

False

No

Sunny

Cool

Normal

False

Yes

Rainy

Mild

Normal

False

Yes

Sunny

Mild

Normal

True

Yes

Overcast

Mild

High

True

Yes

Overcast

Hot

Normal

False

Yes

Rainy

Mild

High

True

No

Where Outlook, Temperature, Humidity, and Windy are the input variables (predictors), and Play is the output variable (response or outcome), please compute the prior probability P(PLAY='Yes'). Please give keep 3 digits after the decimal point, for example. 0.521.

Question 40

The following questions are related to distance measurement between two clusters, please match each expression with the correct corresponding term.

where is the distance between two data points such that p belongs to cluster and p' belongs to cluster , and is the distance between cluster and cluster .

where is the distance between two data points such that p belongs to cluster and p' belongs to cluster , and is the distance between cluster and cluster .

where is the center of cluster , is the center or centroid of cluster , is the distance between and , and is the distance between cluster and cluster, where is the distance between two data points such that p belongs to cluster and p' belongs to cluster, and is the distance between cluster and cluster is the number of data points in cluster , and is the number of data points in cluster .

A. Centroid Distance
B. Single Linkage Distance
C. Average Distance
D. Complete Linkage Distance

Question 41

A dataset has 1000 records and one variable with 5% of the values missing, spread randomly throughout the records in the variable column. An analysis decides to remove records that have missing values. About how many records would you expect would be removed?

Question 42

Which of the following(s) is(are) used to measure the impurity of data in the process of constructing CART (classification and regression tree)?

a. Gini Index

b. Entropy

c. Both a. and b.

d. Neither a. nor b.

Question 43

Given a database table containing weather data as follows:

Outlook

Temperature

Humidity

Windy

Class: Play

Sunny

Hot

High

False

No

Sunny

Hot

High

True

No

Overcast

Hot

High

False

Yes

Rainy

Mild

High

False

Yes

Rainy

Cool

Normal

False

Yes

Rainy

Cool

Normal

True

No

Overcast

Cool

Normal

True

Yes

Sunny

Mild

High

False

No

Sunny

Cool

Normal

False

Yes

Rainy

Mild

Normal

False

Yes

Sunny

Mild

Normal

True

Yes

Overcast

Mild

High

True

Yes

Overcast

Hot

Normal

False

Yes

Rainy

Mild

High

True

No

Where Outlook, Temperature, Humidity, and Windy are the input variables (predictors), and Play is the output variable (response or outcome).
For the given sample, X = (Outlook = ‘Sunny', Temperature = ‘Mild' , Humidity = ‘High' , Windy = ‘False')
Please compute the conditional probability P(X|PLAY='No') * P(PLAY='No') . (* is the multiplication)
Please give keep 3 digits after decimal, for example. 0.521.

Question 44

A dataset has 1000 records and 50 variables with 5% of the values missing, spread randomly throughout the records and variables. An analysis decides to remove records that have missing values. About how many records would you expect would be removed?

Question 45

Which of the following statement(s) is(are) correct in association rule mining or affinity analysis?

a. If an itemset is frequent, then its subset is also frequent.

b. If an itemset is frequent, then its super set is also frequent.

c. If an itemset is infrequent, then its super set is also frequent.

d. If an itemset is frequent, then its subset is also infrequent.

Question 46

The prediction error for record i is defined as the difference between its actual value and its predicted value: , for the given expression, , please select one of the appropriate acronyms or the correct answer in the following:

a. RMSE

b. Total SSE

c. MAE or MAD

d. Average error

e. MAPE

Question 47

Given a customer buying electronics database, and 10,000 transactions are analyzed and the data show:
6,000 of customer transactions included computer games (game)
7,500 of them included videos (video),
4,000 of them included both computer games and videos
The generated association rule is: game -> video [support = ?%, confidence = ?%]
Which one of the following statements is correct?

a. It shows the game and video are not positively associated or correlated.

b. It shows the game and video are independent to each other.

c. The customers who buy computer games also buy video.

d. It shows the game and video are not negatively associated or correlated.

Question 48

Given a database table containing weather data as follows:

Outlook

Temperature

Humidity

Windy

Class: Play

Sunny

Hot

High

False

No

Sunny

Hot

High

True

No

Overcast

Hot

High

False

Yes

Rainy

Mild

High

False

Yes

Rainy

Cool

Normal

False

Yes

Rainy

Cool

Normal

True

No

Overcast

Cool

Normal

True

Yes

Sunny

Mild

High

False

No

Sunny

Cool

Normal

False

Yes

Rainy

Mild

Normal

False

Yes

Sunny

Mild

Normal

True

Yes

Overcast

Mild

High

True

Yes

Overcast

Hot

Normal

False

Yes

Rainy

Mild

High

True

No

Where Outlook, Temperature, Humidity, and Windy are the input variables (predictors), and Play is the output variable (response or outcome).
For the given sample, X = (Outlook = ‘Sunny', Temperature = ‘Mild' , Humidity = ‘High' , Windy = ‘False')
Please compute the conditional probability P(X|PLAY='Yes') * P(PLAY='Yes') . (* is the multiplication)
Please give keep 3 digits after decimal, for example. 0.521.

Question 49

Given a database table containing weather data as follows:

Outlook

Temperature

Humidity

Windy

Class: Play

Sunny

Hot

High

False

No

Sunny

Hot

High

True

No

Overcast

Hot

High

False

Yes

Rainy

Mild

High

False

Yes

Rainy

Cool

Normal

False

Yes

Rainy

Cool

Normal

True

No

Overcast

Cool

Normal

True

Yes

Sunny

Mild

High

False

No

Sunny

Cool

Normal

False

Yes

Rainy

Mild

Normal

False

Yes

Sunny

Mild

Normal

True

Yes

Overcast

Mild

High

True

Yes

Overcast

Hot

Normal

False

Yes

Rainy

Mild

High

True

No

Where Outlook, Temperature, Humidity, and Windy are the input variables (predictors), and Play is the output variable (response or outcome). Please compute the conditional probability P(Temperature = ‘Mild'|PLAY='No').

Question 50

The prediction error for record i is defined as the difference between its actual value and its predicted value: , for the given expression , please select one of the appropriate acronyms or the correct answer in the following.

a. RMSE

b. Total SSE

c. MAE or MAD

d. Average error

e. MAPE

Question 51

Given a database table containing weather data as follows:

Outlook

Temperature

Humidity

Windy

Class: Play

Sunny

Hot

High

False

No

Sunny

Hot

High

True

No

Overcast

Hot

High

False

Yes

Rainy

Mild

High

False

Yes

Rainy

Cool

Normal

False

Yes

Rainy

Cool

Normal

True

No

Overcast

Cool

Normal

True

Yes

Sunny

Mild

High

False

No

Sunny

Cool

Normal

False

Yes

Rainy

Mild

Normal

False

Yes

Sunny

Mild

Normal

True

Yes

Overcast

Mild

High

True

Yes

Overcast

Hot

Normal

False

Yes

Rainy

Mild

High

True

No

Where Outlook, Temperature, Humidity, and Windy are the input variables (predictors), and Play is the output variable (response or outcome).

For the given sample, X = (Outlook = ‘Sunny', Temperature = ‘Mild' , Humidity = ‘High' , Windy = ‘False')
Please compute the conditional probability P(X|PLAY='No').
Please give keep 3 digits after decimal, for example. 0.521.

Question 52

Given a database table containing weather data as follows:

Outlook

Temperature

Humidity

Windy

Class: Play

Sunny

Hot

High

False

No

Sunny

Hot

High

True

No

Overcast

Hot

High

False

Yes

Rainy

Mild

High

False

Yes

Rainy

Cool

Normal

False

Yes

Rainy

Cool

Normal

True

No

Overcast

Cool

Normal

True

Yes

Sunny

Mild

High

False

No

Sunny

Cool

Normal

False

Yes

Rainy

Mild

Normal

False

Yes

Sunny

Mild

Normal

True

Yes

Overcast

Mild

High

True

Yes

Overcast

Hot

Normal

False

Yes

Rainy

Mild

High

True

No

Where Outlook, Temperature, Humidity, and Windy are the input variables (predictors), and Play is the output variable (response or outcome). Please compute the prior probability, P(PLAY='No'). Please keep 3 digits after the decimal point.

Question 53

The overfitting problem can be detected

a. When the performance on both the training data set and validation data set improve.

b. When the performance on the training data set improves and validation data set deteriorates.

c. When the performance on the training data set deteriorates and validation data set improves.

d. When the performance on both the training data set and validation data set deteriorate.

Question 54

Given a transaction database for mining association rule as follows:

TID

Items

100

A C D

200

B C E

300

A B C E

400

B E

For given support count = 2, which one of the following statement is incorrect?

a. The rule B,C -> E and the rule C-> B,E- have the same confidence value.

b. The rule B, C -> E and the rule C,E -> B have the same support value.

c. The rule B, C -> E and the rule C,E -> B have the same confidence value.

d. The rule B,C -> E and the rule B,E->C have the same support value.

Question 55

In terms of the number of variables involved in the training process of supervised learning, which of the following statement is correct?

a. The more variables we include in the trained model, the greater the risk of overfitting.

b. The number of variables included in trained model has no impact to the risk of overfitting.

c. The less variables we include in the trained model, the greater the risk of overfitting.

d. The more variables we include in the trained model, the less the risk of overfitting.

Question 56

In order to achieve a given degree of reliability with a given data set and a given data mining model, what is the good rules of thumb for the ratio between the number of input variables (predictor variables) and the number of records?

a. to have 10 records for each input variable (predictor variable).

b. to have at least 6 × m × p records, m is the number of outcome classes, and p is the number of input variables.

c. Either a. or b.

d. Neither a. nor b.

Question 57

Which of the following statement(s) is(are) correct?

a. The classification tree is used to generate class label for categorical output.

b. The regression tree is used to generate the numerical output for prediction or estimation.

c. Both a. and b.

d. Neither a. nor b.

Question 58

Which of the following statement(s) is(are) correct?

a. The sensitivity of a classifier measures the true positive rate.

b. The specificity of a classifier measures the false positive rate.

c. Both a. and b.

d. Neither a. nor b.

Question 59

Given a database table containing weather data as follows:

Outlook

Temperature

Humidity

Windy

Class: Play

Sunny

Hot

High

False

No

Sunny

Hot

High

True

No

Overcast

Hot

High

False

Yes

Rainy

Mild

High

False

Yes

Rainy

Cool

Normal

False

Yes

Rainy

Cool

Normal

True

No

Overcast

Cool

Normal

True

Yes

Sunny

Mild

High

False

No

Sunny

Cool

Normal

False

Yes

Rainy

Mild

Normal

False

Yes

Sunny

Mild

Normal

True

Yes

Overcast

Mild

High

True

Yes

Overcast

Hot

Normal

False

Yes

Rainy

Mild

High

True

No

Where Outlook, Temperature, Humidity, and Windy are the input variables (predictors), and Play is the output variable (response or outcome). Please compute the conditional probability P(Humidity = ‘High'| PLAY='No').

Question 60

The purpose of normalizing data value into unit value is

a. To reduce the impact or dominance of the data with large scale value.

b. To reduce the bias caused by the data with large scale value.

c. Both a. and b.

d. Neither a. nor b.

Question 61

The difference between the logistic regression model and the neural network model is(are)

a. The neural network model uses hidden layers.

b. The neural network model uses activation function.

c. Both a. and b.

d. Neither a. nor b.

Question 62

The purpose(s) of dimension reduction is(are)

a. reducing effects of curse dimensionality.

b. eliminating the input variables/predictors that are uncorrelated to the output variables/response.

c. reducing the possibility of overfitting.

d. All of a., b., and c.

Question 63

The number of generated item-sets from the set of items {A, B, C, D, E, F} can be used to formulate association rules is

a. 6

b. 63

c. 57

d. 602

Question 64

Given a transaction database for mining association rule as follows:

TID

Items

100

A C D

200

B C E

300

A B C E

400

B E

For given support count = 2, which one of the following item-sets is not a frequent item-set?

a. {B, E}

b. {A, C, D}

c. {B, C, E}

d. {C, E}

Question 65

Given a transaction database for mining association rule as follows:

TID

Items

100

A C D

200

B C E

300

A B C E

400

B E

For given support count = 2, which one of the following statement is incorrect?

a. The rule B,E ->C and the rule C->B,E have the same confidence value.

b. The rule B,C-> E and the rule E->B,C have the same confidence value.

c. The rule E ->B,C and the rule C->B,E have the same confidence value.

d. The rule B->C, E and the rule E->B,C have the same confidence value.

Question 66

Given a database table containing weather data as follows:

Outlook

Temperature

Humidity

Windy

Class: Play

Sunny

Hot

High

False

No

Sunny

Hot

High

True

No

Overcast

Hot

High

False

Yes

Rainy

Mild

High

False

Yes

Rainy

Cool

Normal

False

Yes

Rainy

Cool

Normal

True

No

Overcast

Cool

Normal

True

Yes

Sunny

Mild

High

False

No

Sunny

Cool

Normal

False

Yes

Rainy

Mild

Normal

False

Yes

Sunny

Mild

Normal

True

Yes

Overcast

Mild

High

True

Yes

Overcast

Hot

Normal

False

Yes

Rainy

Mild

High

True

No

Where Outlook, Temperature, Humidity, and Windy are the input variables (predictors), and Play is the output variable (response or outcome).

Please compute the conditional probability P(Temperature = ‘Mild'|PLAY='Yes'). Please keep 3 digits after the decimal point, for example, 0.123.

Question 67

Given a database table containing weather data as follows:

Outlook

Temperature

Humidity

Windy

Class: Play

Sunny

Hot

High

False

No

Sunny

Hot

High

True

No

Overcast

Hot

High

False

Yes

Rainy

Mild

High

False

Yes

Rainy

Cool

Normal

False

Yes

Rainy

Cool

Normal

True

No

Overcast

Cool

Normal

True

Yes

Sunny

Mild

High

False

No

Sunny

Cool

Normal

False

Yes

Rainy

Mild

Normal

False

Yes

Sunny

Mild

Normal

True

Yes

Overcast

Mild

High

True

Yes

Overcast

Hot

Normal

False

Yes

Rainy

Mild

High

True

No

Where Outlook, Temperature, Humidity, and Windy are the input variables (predictors), and Play is the output variable (response or outcome). Please compute the conditional probability P(Humidity = ‘High'| PLAY='Yes'). Please keep 3 digits after the decimal point, for example, 0.123.

Question 68

In the Neural network model, what kind of parameter(s) is(are) used to avoid overfitting?

a. The learning rate

b. The momentum

c. Both a. and b.

d. Neither a. nor b.

Question 69

In the Neural network model, what kind parameter(s) is(are) used to avoid getting stuck in local optimum?

a. The learning rate

b. The momentum

c. Both a. and b.

d. Neither a. nor b.

Question 70

A good classifier will give a high lift chart

a. when the classifier acts on a lot of cases.

b. when the classifier acts on only a few cases.

c. Both a. and b.

d. Neither a. nor b.

Solution Preview :

Prepared by a verified Expert
Data Structure & Algorithms: About how many records would you expect would be removed -
Reference No:- TGS02389731

Now Priced at $60 (50% Discount)

Recommended (99%)

Rated (4.3/5)