Describe algorithms chosen and indicate why you chose them


Assignment

The research project will materialize everything we will study during the course. You will feel the application of the concepts we are studying.

I. Data Analysis Project.

1. Identify the problem(s) to be solved or opportunities to be realized by mining the selected data set.

2. Consider the following data preparation questions and explain your answers. When appropriate cite resources that support your answer. Explain how the answers, and data preparation, differed when you chose a different data mining method.

a. Should instances with missing values be deleted?
b. Should missing values be specially coded and then retained in the data set?
c. Should numeric values be assigned predetermined ranges or left for the algorithm to split?
d. Should categorical variables be grouped or coded to reflect a hierarchy?

3. To explore the problem or opportunity, use two or more of the following data mining methods covered by this course:

a. regression: linear regression, discriminant analysis or logistic regression,
b. decision trees,
c. neural networks,
d. hierarchical or k-means clustering,
e. association rules,
f. time series,
g. genetic algorithms.

4. Describe the algorithms chosen, and indicate why you chose them. Exploring a method of interest is a satisfactory reason for this course paper.

5. Explain how and why you used specific pruning parameters or other adjustments to create a sparser model.

6. Compare the alternative solutions using methods found in comparative studies in the literature. For example, see "Data mining for network intrusion detection: A comparison of alternative methods" Dan Zhu, G Premkumar, Xiaoning Zhang, Chao-Hsien Chu. Decision Sciences.Atlanta: Fall 2001.Vol.32. Report the results of the accuracy measures available with the software. If the software used does not have built-in accuracy reporting then manually test the model's accuracy on a small hold-out test sample of the data. The hold-out method creates separate training and test sets. This is particularly useful when testing the model on data from a later time period.

7. Create a table showing the number of cases correctly identified, Type I, and Type II errors. In addition, a ROC curve is appropriate with discriminant analysis and logistic regression. For these methods, changing the parameters for the line separating the classes, changes the percentages of Type I and Type II errors. Medical practitioners like ROC curves because they show the tradeoff between false positives and false negatives.

8. Which data mining method(s) seem superior for the chosen data set? Did the method that performed best in your study also dominate in similar comparative studies?

9. Compare the results or recommendations that would result from the use of the different methods.

10. Based on your analysis, justify a conclusion or recommendation.

Format your assignment according to the following formatting requirements:

1. The answer should be typed, double spaced, using Times New Roman font (size 12), with one-inch margins on all sides.

2. The response also includes a cover page containing the title of the assignment, the student's name, the course title, and the date. The cover page is not included in the required page length.

3. Also include a reference page. The Citations and references should follow APA format. The reference page is not included in the required page length.

Attachment:- Instructions-for-the-Research-Project.rar

Request for Solution File

Ask an Expert for Answer!!
Data Structure & Algorithms: Describe algorithms chosen and indicate why you chose them
Reference No:- TGS02979751

Expected delivery within 24 Hours