Cisc 662 data mining and knowledge discovery in databases -


Project Option I

In this project option, you will be expected to conduct a comprehensive literature search and survey, select and study a specific topic in one subject area of data mining and KDD, and write a technical paper on the selected topic all by yourself. The technical paper you are asked to write can be a detailed comprehensive survey on some specific topic or the original research work that will have been done by yourself.

Requirements and Instructions for the Technical Paper:

1. The objective of the paper should be very clear about subject, scope, domain, and the goals to be achieved.

2. The paper should address the important advanced and critical issues in a specific area of data mining and KDD. Your research paper should emphasize not only breadth of coverage, but also depth of coverage in the specific area.

3. The research paper should give the measurable conclusions and future research directions (this is your contribution).

4. It might be beneficial to review or browse through about 10 to 15 relevant technical articles before you make decision on the topic of the research project.

5. The research paper should reflect the quality at certain academic research level.

6. The paper should be about at least 25 to 30 pages (double space) or 2500-3000 words in length.

7. The paper should include adequate abstraction or introduction, and reference list.

8. Please write the paper in your words and statements, and please give the names of references, citations, and resources of reference materials if you want to use the statements from other reference articles.

9. From the systematic study point of view, you may want to read a list of technical papers from relevant magazines, journals, conference proceedings and theses in the area of the topic you choose.

10. For the format and style of your research paper, please make reference to GSCIS Dissertation Guide, IEEE or ACM journal articles.

Suggested Topics for KDD Research (But not limited) Theory and Fundamental Issues in KDD:

Data and knowledge representation for KDD

Database Models for knowledge discovery and data mining Definitions, formalisms, and theoretical issues in KDD Fundamental advances in search, retrieval, and discovery methods Modeling of structured, unstructured and multimedia data for KDD Metrics for evaluation of KDD results

Probabilistic modeling and uncertainty management in KDD

Data Mining Methods and Algorithms:

Algorithms for learning classification rules, characteristic rules, associative rules Algorithms for association rule mining
Algorithms for clustering, predication, etc.
Algorithmic complexity, efficiency and scalability issues in KDD High dimensional datasets and data preprocessing
Parallel and distributed data mining techniques Probabilistic and statistical models and methods in KDD
Supervised and unsupervised discovery and predictive modeling Using prior domain knowledge and re-use of discovered knowledge Measurement of rule interestingness and quality

KDD Process and Human Interaction:

Models of the KDD process

Methods for evaluating subjective relevance and utility Data and knowledge visualization

Interactive data exploration and discovery Privacy preservation data mining and security

Applications:

Application of KDD in business, science, medicine and engineering Application of KDD methods for mining knowledge in text, image,
audio, sensor, numeric, categorical or mixed format data, semi-structural data Big-data mining and data analytics
Mining multimedia, hyper-text, spatial, temporal databases Mining bioinformatics data
Applications of KDD for semantic query optimization Knowledge discovery and data mining tools
Resource and knowledge discovery using the Internet

Others:

Active databases

Application of genetic algorithm to KDD Fuzzy and prolog databases

Genetic algorithms Neural Networks

Regression Methods

Rough set model for relational databases Support Vector Machine

Suggested Check List for Written Report

Your written report should try to include the following items:

a. Introduction and objectives of the research.
b. Current state of arts and existing methodologies in the specific area.
c. Barriers, issues, and open problems in the area.
d. Existing, expected, proposed solutions, methods, and algorithms if any at the time of the project due.
e. Examples in details (step by step) to illustrate concepts, principles, theories, algorithms, methodologies, etc.
e. Research results if any at the time of the report due.
f. Analysis and comparison of research methods, algorithms, and expected results if any at the time of the report due.
g. Conclusions and future research directions if any at the time of the report due.
h. Reference list

Project Option II

In this project option, you are expected to implement some existing data mining algorithms or improvement based on the existing data mining algorithms.

The algorithms can be the one listed in the following, but limited to:

Associate rule mining algorithms (various data mining algorithms) Classification algorithms (Example, ID3, C4.5, C5.0, CART, etc.) Clustering Algorithms (various algorithms)
Fuzzy data mining and rough set data mining K-Nearest Neighbor
Genetic Algorithms Neural networks Support vector machine

Requirements for the project deliverable:

1. Description of the algorithm in pseudo code with proper explanation and documentation.
2. Illustrative examples of the algorithms in details
3. Analysis of the algorithm in terms of performance and time complexity
4. Description of bench mark, testing data to support experiment design
5. Analysis of experiment data
6. Readme file to include all the details of how to run the program
7. Live demo of the implementation with explanation.
8. Reference list

Specific Topics for Research and/or Implementation Supervised Learning Methods:

Classification Methods:

Regression Methods
Multiple Linear Regression Logistic Regression
Ordered Logistic and Ordered Probit Regression Models Multinomial Logistic Regression Model
Poisson and Negative Binomial Regression Models

Bayesian Classification Naïve Bayes Method
k Nearest Neighbors

Decision Trees
ID3 (Iterative Dichotomiser 3) C4.5 and C5.0
CART (Classification and Regression Trees) Scalable Decision Tree Techniques AdaBoost Algorithm
Ensemble Methods

Neural Network-Based Methods Back Propagation
Neural Network Supervised Learning Deep Learning
Bayes Belief Network

Rule-Based Methods
Generating Rules from a Decision Tree Generating Rules from a Neural Net
Generating Rules without Decision Tree or Neural Net Support Vector Machine
Fuzzy Set and Rough Set Methods Unsupervised Learning Methods: Clustering Methods:
Partition Based Methods Squared Error Clustering
K-Means Clustering (Centroid-Based Technique)
K-Medoids Method (Partition Around Medoids, Representative Object-Based Technique) Bond Energy

Hierarchical Methods
Agglomerative vs. Divisive Hierarchical Clustering
BIRCH (Balanced Iterative Reducing and Clustering Using Hierarchies) Chameleon (Hierarchical Clustering using Dynamic Modeling)
CLARANS (Clustering Large Applications Based Upon Randomized Search) CURE (Clustering Using REpresentatives)

Density Based Methods
DBSCAN (Density Based Spatial Clustering of Applications with Noise, Density Based Clustering Based on Connected Regions with High Density)
OPTICS (Ordering Points to Identity the Clustering Structure)
DENCLUE (DENsity Based CLUstEring, Clustering Based on Density Distribution Functions)

Grid-Based Methods
STING (Statistical Information Grid)
CLIQUE (Clustering In QUEst, An Apriori-like Subspace Clustering Method) Probabilistic Model Based Clustering
Clustering Graph and Network Data (For Example, Social Networks) Self-Organized Map Technique
Evaluation and Performance Measurement of Clustering Methods Assessing Clustering Technology
Determining the Number of Clusters Measuring Clustering Quality

Association Rule Mining Evolution Based Methods: Genetic Algorithms Applications:
Data Mining Applications for Business Intelligence and Analytics
Text Mining Spatial Mining Temporal Mining Web Mining
Recommender Systems

Others:

Over fitting and Under fitting issues Outliers
Performance Evaluation and Measurement Confusion Matrix
ROC (Receiver Operating Characteristic) AUC (Area Under the Curve)

Data Mining Tools XLMiner RapdiMiner TensorFlow Weka NodeXL

Sample Format of Project Report

1. Title Page

In general, the number of words in the title of report should be controlled within 10 words if possible. The title page should have your name, email, contact information, and date below the title.

2. Abstract
The abstract page should summarize the highlight of your project to tell the audience what have been done in the research project.

3. Table of Contents
The TOC part should list all titles of sections and subsections with page numbers.

4. Introduction
This part introduces the audience with necessary information to guide them into the subjects of your research project.

5. Background and Literature Review

6. Statement of the Proposed Research or Study
With the discussion in Background and Literature Review, the proposed research and study can be given in the format of, possibly, Problem Statement to indicate what to be studied, investigated, researched, and/or achieved from this project.

7. Methodology
Based on the Problem Statement and the objective to be achieved, you may want to elaborate the underline methodology to be used in order to fulfill the research task and achieve the goal of the research/study. If possible, please provide elaboration of rationales in both depth and width. It is better to use illustrative examples to explain the methodologies used in order to show your good understanding.

8. Experiment Design and Result Analysis
Provide the details of how experiments are designed and conducted, and observation from the experiment. Analysis of experimental results are important based on your observation, understanding, interpretation, etc. with some performance analysis methods.

9. Conclusion
Summarize your research/study by giving some conclusion from the project, and may provide future research/study directions with discussion of potentials.

10. Reference List

11. Appendix (if necessary)

For style, please make reference APA Manual, ACM, or IEEE publications.

Solution Preview :

Prepared by a verified Expert
Database Management System: Cisc 662 data mining and knowledge discovery in databases -
Reference No:- TGS02381007

Now Priced at $45 (50% Discount)

Recommended (95%)

Rated (4.7/5)