Create a classification model for letter recognition using


Problem 1: Download the letter recognition data from: https://archive.ics.uci.edu/ml/datasets/Letter+Recognition

The objective is to identify each of a large number of black-and-white rectangular pixel displays as one of the 26 capital letters in the English alphabet. The character images were based on 20 different fonts and each letter within these 20 fonts was randomly distorted to produce a file of 20,000 unique stimuli. Each stimulus was converted into 16 primitive numerical attributes (statistical moments and edge counts) which were then scaled to fit into a range of integer values from 0 through 15. Below is the attribute information, but more information on the data and how it was used for data mining research can be found in the paper:

P. W. Frey and D. J. Slate. "Letter Recognition Using Holland-style Adaptive Classifiers". (Machine Learning Vol 6 #2 March 91)

Attribute Information:

1. lettr capital letter (26 values from A to Z)

2. x-box horizontal position of box (integer)

3. y-box vertical position of box (integer)

4. width width of box (integer)

5. high height of box (integer)

6. onpix total # on pixels (integer)

7. x-bar mean x of on pixels in box (integer)

8. y-bar mean y of on pixels in box (integer)

9. x2bar mean x variance (integer)

10. y2bar mean y variance (integer)

11. xybar mean x y correlation (integer)

12. x2ybr mean of x * x * y (integer)

13. xy2br mean of x * y * y (integer)

14. x-ege mean edge count left to right (integer)

15. xegvy correlation of x-ege with y (integer)

16. y-ege mean edge count bottom to top (integer)

17. yegvx correlation of y-ege with x (integer)

Create a classification model for letter recognition using decision trees as a classification method with a holdout partitioning technique for splitting the data into training versus testing.

a. Changing the values for the depth, number of cases per parent and number of cases per leaf produces different tree configurations with different accuracies for training and testing. Choose at least five different configurations and report the accuracy for training and testing for each one of them.  Which configuration will you choose as the best model? Explain your answer.

b. For the best tree configuration, report the misclassification matrix and interpret it.  In your opinion, is accuracy a good way to interpret the performance of the model?  If not, suggest other measures.

c. What are the most important three attributes for recognizing the letters?

Problem 2: On the same data from Problem 1, apply a K-nearest neighbor classifier to classify the data.  Report the following:

1. If you are doing any data transformation, explain the transformation and why it is needed.

2. Report the misclassification matrix and the appropriate performance metrics for different values of K (K=1, 3, 5, and 7). 

3. Interpret the results and also compare them with the ones obtained by using the decision trees.

Request for Solution File

Ask an Expert for Answer!!
Data Structure & Algorithms: Create a classification model for letter recognition using
Reference No:- TGS01409376

Expected delivery within 24 Hours