Run the nave bayes classifier in weka on the data using the


Questions -

Q1. The following dataset is created based on the fraud detection data discussed in class. An extra record (the last one) is added to the dataset. Also added is another predictor, AccountAge, which has three categories, <10, 10~30 and >30, referring to the number of days the account created. Using the Naïve Bayes method, calculate by hand the probabilities of the last record being truthful or fraudulent. Does the Naïve Bayes correctly classify this new record? Use all of the 11 records in your calculation. Show calculation steps similar to those in the Naïve Bayes lecture notes.

Transaction Time

Transaction Amount

Account Age

Class

night

small

>30

truthful

day

small

10~30

truthful

day

large

<10

truthful

day

large

>30

truthful

day

small

<10

truthful

day

small

>30

truthful

night

small

<10

fraudulent

night

large

10~30

fraudulent

day

large

>30

fraudulent

night

large

10~30

fraudulent

day

small

10~30

fraudulent

Q2. Download the data file CongressVote.arff. Open it with Notepad or WordPad and read the information about the data. Our task is to classify each record (i.e., a House member) to either a democrat or a republican based on his/her voting records. Note that this dataset has many missing values, labeled by '?'.

a. Run the Naïve Bayes classifier in Weka on the data, using the default parameters. What is the 10-fold cross-validation error rate? Show the output screen with the error rate and confusion matrix.

b. Run the k-nearest neighbor classifier in Weka on the data, using the default parameters. What is the 10-fold cross-validation error rate when k = 5? With all attributes categorical, how can the distances between records be measured? Explain this question using the following three records (which are records 27, 28 and 29 of the dataset). Which of the two records are closer to each other? Why?

y,n,y,n,n,n,y,y,y,n,y,n,n,n,y,y,democrat

y,y,y,n,n,n,y,y,y,n,y,n,n,n,y,y,democrat

y,n,n,y,y,n,y,y,y,n,n,y,y,y,n,y,republican

Q3. Download the BostonHousing2.xls file and read the data description. The dataset in the FullData sheet is taken from the BostonHousing.xls file used in Assignment 1. The target attribute is CATMEDV, which is a binary attribute converted from MEDV (which was removed).

a. Consider the data in the SmallData sheet, which includes the first 10 records of the full data and a subset of the original predictors. Calculate in Excel to classify record 6 (row 7, highlighted), using 1-NN and 3-NN respectively, based on the other 9 records. Show your results with Excel in a format similar to the screenshot on page 2 of the Nearest Neighbors lecture notes. Do 1-NN and 3-NN classify the record correctly?

b. Now, work on the FullData sheet. Within Excel, save the FullData sheet as a CSV file. Run k-NN in Weka on the CSV data file using the default parameters (10-fold cross-validation, k = 1). Show the output screen that displays the 10-fold cross-validation error rate and the related confusion matrix.

c. Run the C4.5 (J48) decision tree algorithm in Weka on the CSV data file created for Part (b) above. Show the output screen that displays the 10-fold cross-validation error rate and the related confusion matrix.

d. Which technique do you believe is better, k-NN or decision trees? Why? Please consider factors other than the error rates, which are about the same for the two techniques. (This is an open-ended question. It is more important to justify your choice than the choice itself.)

e. Now, back to the small data set with 10 records again. Save the data as a CSV file. Write and run R commands to classify record 6 (row 7), using 1-NN and 3-NN respectively, based on the other 9 records. Show the R commands and results (similar to those in the Nearest Neighbors lecture notes for the Admission example).

Attachment:- Assignment Files.rar

Request for Solution File

Ask an Expert for Answer!!
Dissertation: Run the nave bayes classifier in weka on the data using the
Reference No:- TGS02670870

Expected delivery within 24 Hours