Use cost matrix to compute the cost for each model


Problem 1: Pete, owner of Pistol Pete's Diamond Emporium, is investing in a diamond classification system due to his deteriorating eyesight. Pete buys and sells diamonds of varying quality: Low ($1,000-$3,000), Medium ($4,000-$7,000), and High ($8,000-$10,000). It is very important to Pete that his classifier properly classifies his diamonds so that he can not only have a profitable business, but also, so that his customers will continue to trust him as a business owner.

Using the possible cost matrix values given below, fill out the cost matrix that most accurately reflects Pete's needs for his diamond classifier model. After completing the cost matrix, justify your proposed cost matrix.

Possible cost matrix values: -1, -1, 0, 20, 20, 20, 20, 100, 100

Actual class

 

Predicted class

High

Medium

Low

High

 

 

 

Medium

 

 

 

Low

 

 

 

Problem 2: Weka recently added the fictitious Super Happy Terrific Classifier (SHTC) algorithm to its suite of available classifiers and you would like to use it in your analysis. Upon reading the SHTC documentation, you realize that it only accepts discrete attributes as input. However, many of the attributes in your data set are continuous. Can you still use the SHTC algorithm in your analysis? If yes, explain how. If no, explain why not. 

Problem 3: You have decided to use J48 as a classifier in Weka for your data set. After your analysis, you have found that the accuracy of J48 for your data set is greater than that of ZeroR, but less than the accuracy of OneR. Should you continue to use J48 as a classifier for your data set? Why or why not?

Problem 4: You have performed an unsupervised k-means clustering on a data set with two attributes and the results indicate a k of 2. Later, you determine the class values for each data instance (there are four class values) and a supervised clustering results in a k of 4. Provide a possible explanation for why the two clustering methods disagree on a k value and a draw a sketch of the two clustering to go along with your explanation.

Problem 5: You are using a 3-nearest neighbor classifier with Euclidean distance as the metric. Determine the class value of the data point Q (7, 2, 6) using the known data points with associated class values, below. Recall the general form for calculating Euclidean distance is

d(p, q) = √Σi(pi - qi)2

P1 (-4, 9, 3), class value 1

P2 (8, -2, 1), class value 1

P3 (6, 1, 5), class value 0

P4 (10, 8, 4), class value 0

P5 (-1, 0, -1), class value 1

Problem 6: Run the Nearest Neighbor classifier with a k-value of 7 and a Support Vector Machine with default values using 10-folds cross validation on the diabetes data set (diabetes.arff in Assignment 3 on myCourses) in Weka. Fill in the confusion matrices for the models in the tables below and use the cost matrix to compute the cost for each model. Based upon the cost, which model should be selected and why?

Nearest Neighbor (k=7) Confusion Matrix

   

 

Tested Negative

Tested Positive

Tested Negative

 

 

Tested Positive

 

 

Support Vector Machine Confusion Matrix

   

 

Tested Negative

Tested Positive

Tested Negative

 

 

Tested Positive

 

 

Cost Matrix

   

 

Tested Negative

Tested Positive

Tested Negative

0

50

Tested Positive

100

-1

Request for Solution File

Ask an Expert for Answer!!
Other Subject: Use cost matrix to compute the cost for each model
Reference No:- TGS03056257

Expected delivery within 24 Hours