Create a classification model for letter recognition using, Data Structure & Algorithms

Create a classification model for letter recognition using

Problem 1: Download the letter recognition data from: https://archive.ics.uci.edu/ml/datasets/Letter+Recognition

The objective is to identify each of a large number of black-and-white rectangular pixel displays as one of the 26 capital letters in the English alphabet. The character images were based on 20 different fonts and each letter within these 20 fonts was randomly distorted to produce a file of 20,000 unique stimuli. Each stimulus was converted into 16 primitive numerical attributes (statistical moments and edge counts) which were then scaled to fit into a range of integer values from 0 through 15. Below is the attribute information, but more information on the data and how it was used for data mining research can be found in the paper:

P. W. Frey and D. J. Slate. "Letter Recognition Using Holland-style Adaptive Classifiers". (Machine Learning Vol 6 #2 March 91)

Attribute Information:

1. lettr capital letter (26 values from A to Z)

2. x-box horizontal position of box (integer)

3. y-box vertical position of box (integer)

4. width width of box (integer)

5. high height of box (integer)

6. onpix total # on pixels (integer)

7. x-bar mean x of on pixels in box (integer)

8. y-bar mean y of on pixels in box (integer)

9. x2bar mean x variance (integer)

10. y2bar mean y variance (integer)

11. xybar mean x y correlation (integer)

12. x2ybr mean of x * x * y (integer)

13. xy2br mean of x * y * y (integer)

14. x-ege mean edge count left to right (integer)

15. xegvy correlation of x-ege with y (integer)

16. y-ege mean edge count bottom to top (integer)

17. yegvx correlation of y-ege with x (integer)

Create a classification model for letter recognition using decision trees as a classification method with a holdout partitioning technique for splitting the data into training versus testing.

a. Changing the values for the depth, number of cases per parent and number of cases per leaf produces different tree configurations with different accuracies for training and testing. Choose at least five different configurations and report the accuracy for training and testing for each one of them. Which configuration will you choose as the best model? Explain your answer.

b. For the best tree configuration, report the misclassification matrix and interpret it. In your opinion, is accuracy a good way to interpret the performance of the model? If not, suggest other measures.

c. What are the most important three attributes for recognizing the letters?

Problem 2: On the same data from Problem 1, apply a K-nearest neighbor classifier to classify the data. Report the following:

1. If you are doing any data transformation, explain the transformation and why it is needed.

2. Report the misclassification matrix and the appropriate performance metrics for different values of K (K=1, 3, 5, and 7).

3. Interpret the results and also compare them with the ones obtained by using the decision trees.

View Complete Question

Request for Solution File

Ask an Expert for Answer!!

Data Structure & Algorithms: Create a classification model for letter recognition using

Reference No:- TGS01409376

Expected delivery within 24 Hours

Have a Question? (oR Write a Review)

Write atleast 100 words!!

Request for Solution File

Ask an Expert for Answer!!

Data Structure & Algorithms: Create a classification model for letter recognition using

Reference No:- TGS01409376

Have a Question? (oR Write a Review)

Recent Questions Asked Data Structure & Algorithms

Q : Strategic analysis portfolionbspthe two companies that have

Q : 1 a worker has 24 hours per day to allocate between leisure

Q : St johns river shipyards welding machine is 15 years old

Q : In addition to common-size financial statements

Q : Create a classification model for letter recognition using

Q : If there are 13 million unemployed and a working age

Q : Which one of these is included in the yield of a bond with

Q : How do changes in the value of the us dollar impact apple

Q : Suppose we have a working age population is equal to 100

Example of an actual medical error on the internet

Strategies for effective communication vary across cultures

Internal processes focused on patient centered care

What development has taken place with my fetus

How can a researcher student nurse write data collection

Prenatal nurse is conducting a class on healthy pregnancy

Determine any unique cultural needs of the family

Request for Solution File

Ask an Expert for Answer!!

Data Structure & Algorithms: Create a classification model for letter recognition using

Reference No:- TGS01409376

Recent Questions Asked Data Structure & Algorithms

Q : Strategic analysis portfolionbspthe two companies that have

Q : 1 a worker has 24 hours per day to allocate between leisure

Q : St johns river shipyards welding machine is 15 years old

Q : In addition to common-size financial statements

Q : Create a classification model for letter recognition using

Q : If there are 13 million unemployed and a working age

Q : Which one of these is included in the yield of a bond with

Q : How do changes in the value of the us dollar impact apple

Q : Suppose we have a working age population is equal to 100

Asked Questions