Display excel 3d surface chart of the error-surface function


Unsupervised Learning: kMeans and Fuzzy Clustering

Instructions:

You will need to write several distinct programs beyond k-Means to accomplish the tasks set out below. Provide pseudocode for each in the appropriate sections of your report.

Part I: Visualizing (a slice of) a WCSSE 5-D Error-Surface

Use the 17 data-instances from the demo kmeans Excel file seen in class (HW3_S20_kmeans.xls) for this first exercise (i.e. cells B3:C19 in kmeans tab). The first datapoint should be: (0.62, 0.98). If you have overwritten the original data in the course of your explorations, re-download the file to use the same dataset (and get the same results) as everyone else!

Generate a 4D array of size 11 x 11 x 11 x 11. Note: indices represent locations of the two cluster-heads!

Each of the four integer indices should range in value from 0 to 10, mapping to the 0 - 1 range (i.e. in steps of size 0.10) of the k-means graph-axes in the Excel demo. The following generic code-example illustrates saving a WCSSE value of 9.83 for a 2-cluster solution consisting of cluster-heads (0.1, 0.2) and (0.8, 0.5) to such an array:

surfArray[1][2][8][5] = 9.83

Write a function (possibly named: getWCSE(data[], cheads[])) to generate every possible Within-Cluster Sum of Squared Distance/Errors (WCSSE) value for each of the 11^4 possible 2-cluster solutions afforded by this array.

Question 1: Display an Excel 3D Surface chart of the error-surface function w.r.t. indices (x1, y1), keeping the other centroid constant at (0,0)

  • That is, for all values surfArray[x1][y1][0][0]

Question 2: Display an Excel 3D Surface chart of the error-surface function w.r.t. indices (x2, y2), keeping the other centroid constant at (10, 10)

  • That is, for all values surfArray[10][10][x2][y2]

Explain the meaning of these two surface-graphs. How are they different?

Are they symmetric with respect to each other? Why or why not?

Do either of them hint at a possible solution near (0.2, 0.6) or (0.8, 0.56)?

Question 3: Display an Excel 3D Surface chart of the error-surface function w.r.t. indices (x1, y1) keeping the other centroid constant at (8, 6)

  • That is, for all values surfArray[x1][y1][8][6]

Question 4: Display an Excel 3D Surface chart of the error-surface function w.r.t. indices (x2, y2) keeping the other centroid constant at (2, 6)

  • That is, for all values surfArray[2][6][x2][y2]

Explain the meaning of these surface-graphs.

You know what; let me show some screenshots of what 2 of these WCSSE surfaces should look like -- if that might be helpful to you...

(0,0, x2, y2):

(10,10, x2, y2):

Part II: Clustering an Anonymous Dataset

Find the HW3_data.csv file in the class INFO folder. It contains 120 4-D instances to be clustered by your k-Means implementation.

Question 1: Implement the k-Means Algorithm as demonstrated by the in-class Excel demo.

Possible interface: kmeans(data[], cheads[])

Don't forget the heuristic for dealing with empty cluster sets. Your algorithm should output a clustering solution for the input data provided, as well as the final coordinates of the K centroids, and the WCSSE value of the solution.

- provide a pseudocode for your k-means algorithm in your report

Question 2: Determine the most natural value of k (i.e. BestK) to be used for this dataset. Use the average WCSE of at least 10 trials for each value of K from 1 - 10.

- show a graph of your BestK trial to support your choice of BestK

- provide a solid argument for why your chosen BestK value seems appropriate for this dataset

Question 3: Generate clustering solutions for at least 10 randomly-initialized sets of centroids using your BestK value determined previously.

- display the WCSE values in a table, highlighting the best one

- display a 2-D scatterplot of your best k-means clustering solution (you might use attributes attC and attD for the axes, though you may choose whichever two look best to you), including final centroid positions. Color-code the datapoints according to their cluster.

Question 4: Find the HW3_train.csv file, which has the actual classifications of the datapoints (i.e. the Variable class column). If you did not find K=3 to be your BestK previously, generate your best K=3 clustering solution now. Using only your final set of three centroid coordinates, conduct a nearest-neighbor classification of the training set (ignoring the Variable column for this process).

- Assuming you cluster instances into clusters named Cluster_1, Cluster_2, etc. determine the value of my variables A, B etc. by what class the majority of your cluster-members fall into. For example, if Cluster_x has 50 members, 30 of which are actually class A, 18 are B and 2 is C -- then your Cluster_x has gravitated toward the concept of my class A. Show me the translations of your clusters to my classes (Note: this mapping will be different from one student's results to another's)

- Determine the accuracy-rate of your clustering solution to the actual classifications of the training set.

Find the HW3_test.csv file. Classify the 30 instances therein according to their nearest centroid in your K=3 solution.

- Determine the accuracy-rate of your clustering-solution to the actual classifications of the testing set

- what do your accuracy results suggest about the effectiveness of the k-Means algorithm for this particular classification task? Do you think it is possible for k-Means to correctly classify these instances at the 100% level? 90% level? 80%? Explain.

Question 5: Implement the Fuzzy Clustering (FC) algorithm as described in class and as illustrated within the kMeans Excel file example. Repeat section D above, but now using the FC algorithm. Use the BestK determined from kMeans above as the number of clusters for this solution (Note: you should use the "more-complicated" of the two Expectation formulas from the class Powerpoint slide)

- State the pseudocode of your implementation of this algorithm. What is your stopping condition? Why do you think this is reasonable?

- When run multiple times, is there more, or less variety to the solutions you generate (Show me in a table)? Why is this so?

- Are your training/testing accuracy-rates better? Why do you suppose this is (or is not) so?

Graph and color-code your best FC solution for the training set.

Taking online assistance from the Data Mining Assignment Help service is one of the best options for students to attain their desired grades.

Tags: Data Mining Assignment Help, Data Mining Homework Help, Data Mining Coursework, Data Mining Solved Assignments, Fuzzy Clustering Assignment Help, Fuzzy Clustering Homework Help

Attachment:- Data Mining.rar

Request for Solution File

Ask an Expert for Answer!!
Other Subject: Display excel 3d surface chart of the error-surface function
Reference No:- TGS03042472

Expected delivery within 24 Hours