Comp9417 - compare the performance of different algorithms


1 Aims

The aim of this assignment is to enable you to apply different machine learning algorithms on a variety of datasets and produce a written analysis of the empirical results, based on your knowledge of machine learning. After completing this assignment you will be able to:

- set up replicated k-fold cross-validation experiments to obtain average performance measures of algorithms on datasets
- compare the performance of different algorithms against a base-line and each other
- aggregate comparative performance scores for algorithms over a range of different datasets
- propose properties of algorithms and their parameters, or datasets, which may lead to performance differences being observed
- suggest reasons for actual observed performance differences in terms of properties of algorithms, parameter settings or datasets.
- apply methods for data transformations and parameter search and eval- uate their effects on the performance of algorithms

Overview of the assignment
The plan of the assignment is very straightforward. First, you will famil- iarise yourself with the open-source machine learning toolkit that you will use to complete the activity. Second, you will use the toolkit to config- ure experiments that will run machine learning algorithms on various real- world datasets. The toolkit will enable you to generate results from these experiments and save them to files which will form the main part of your submission. You will also have to answer a number of questions to check your understanding of what has occurred in the experiments. Your written answers will be saved to a text file which will form the remainder of your submission. You will not need to write any code for this assignment. The remainder of this document contains in Section 3 an introduction to the ma- chine learning toolkit and in Section 4 the specification of the experiments that you will run and the questions to be answered.

Questions

Question 1

For this question the idea is to run the Experimenter with two different algorithms and a range of different sample sizes taken from the same training set to assess the effect of training sample size on error. You will use 2 different algorithms (plus the standard baseline, ZeroR) to generate two different sets of "learning curves" (you won't actually have to plot curves, but they will be recorded as tables) on 8 real-world datasets:

anneal.arff

audiology.arff

autos.arff

credit-a.arff

hypothyroid.arff

letter.arff

microarray.arff

vote.arff

Running the Experimenter

1. Start a new experiment, as explained in Section 3:
(a) set a filename for results destination, like 'q1.arff'
(b) the experiment should contain 10 repetitions of 10-fold cross- validation (CV);
(c) ensure "Classification" and "Data sets first" are selected;
(d) add the 8 datasets in the order in which they appear above;
(e) add 5 versions of the FilteredClassifier, set up for each version as follows:
- for the classifier, select IBk with default parameters, i.e., 1- Nearest Neighbour (1NN));
- for the filter, select Resample with "noReplacement" set to True and "sampleSizePercent" set to, respectively, 10%, 25%, 50%, 75%, and 100%.
(f) now add 1 more version of the FilteredClassifier as follows:
- for the classifier, select ZeroR;- for the filter, select Resample with "noReplacement" set to True and "sampleSizePercent" set to 100%.
(g) check that you now have 6 entries in the Algorithms pane (5 of IBK and 1 of ZeroR).

2. Run the experiment:
(a) if there are no errors from the run, continue
(b) some error messages are not very clear, so you will need to check: for example, a common error is including a dataset that does not have a legal class value for a classifier learning algorithm.

3. Analyze the results as explained earlier:
(a) ensure "Show std. deviations" is on in the "Configure test" panel
(b) in "Output Format" ensure "Plain Text" is selected
(c) compare algorithms by "Percent incorrect" using ZeroR as the baseline classifier (select this as "Test base")
(d) save these results to a new file called "q1.out"
(e) Hint: if you use spreadsheets, you might like to also save these results in CSV format to a new file to slightly speed up some later analysis

4. Now go back to the Setup page and repeat the entire experiment, just replacing "IBk" with the decision tree learner "J48" - if you are care- ful, you should only need to change the selected classifier name in the experiment configuration, so only 5 changes will be required

5. Run and Analyze the experiment as before, just remembering this time to append your results to "q1.out" (and your CSV file, if you have one)

Results interpretation
Answer these questions in a new file called "answers.txt". Your answers must be backed up by referring to the results you saved in "q1.out". Please note: in this assignment simply writing down a description of the results is not sufficient, and will not get any marks. When asked, you must attempt to explain why you think the results are as they are.

1a) Looking at the results for both IBk and J48 with respect to the baseline of ZeroR over all training set sizes, can you observe a "learning curve" effect, i.e., that the error changes dependent on the proportion of training data seen by the algorithm ? Is there any variation in this effect, between datasets and/or algorithms ? If you think there has been variation, give a brief description of what you observed.

1b) For each algorithm, IBk and J48, over all of the datasets, find the average change in error when moving from the default predic- tion (ZeroR) to learning from 10% of the training set as follows.

Let the error on ZeroR be err0 and the error on 10% of the training set be err10.

For each algorithm, calculate the percentage reduction in error relative to the default on each dataset as: (err0-err10)/err0 × 100. Sum these values and divide by 8 to get the mean of this value for the algorithm.

Now repeat exactly the same process by comparing IBk and J48, over all of the datasets, learning from 100% of the training set, compared to default. Organise your results by grouping them into a 2 × 2 table in your file "answers.txt", something like this:

Mean error reduction relative to default

Algorithm

after 10% training

after 100% training

IBk

Your result

Your result

J48

Your result

Your result

If you observe a positive reduction in error, is this larger than, smaller than, or about the same as what you would have expected after seeing 10% of the training data ? After seeing 100% ?

Is the effect more pronounced for IBk, or for J48 ? If you think it is more pronounced for one of the algorithms, suggest an explanation. If not, say why not. [Hint: if you also saved the results tables in CSV format you can save a little time by calculating the relative and percentage reductions in error for this question in a spreadsheet.]

Question 2

Dealing with noisy data is a key issue in machine learning. Unfortunately, even algorithms that have noise-handling mechanisms built-in, like decision trees, can overfit noisy data, unless their "overfitting avoidance" or regular- ization parameters are set properly.

Once again we will use the FilteredClassifier, where this time the filter will be Weka's AddNoise filter. As the name suggests this adds "class noise" by randomly changing the actual class value to a different one for a specified percentage of the training data. Here we will specify three arbitrarily chosen levels of noise: low (20%), medium (50%) and high (80%). The learning algorithm must try to "see through" this noise and learn the best model it can, which is then evaluated on test data without added noise.

We will also let the algorithm do a limited search using cross-validation for the best pruning parameters on each training set with Weka's built-in CVParameterSelection metalearner. This is based on the "wrapper" method, where for each one of a set of parameter values, the learning algorithm is run and performance is evaluated using cross-validation. The parameter values giving the best performance is selected.

To set this up using the CVParameterSelection metalearner requires en- tering a string defining a set of values for the parameter using Weka's al- phabetic code for the parameter (similar to the flag for a Unix command), plus three numbers, namely the minimum, maximum and increment. For example, if we entered the string "M 1 5 1" this would define the set of val- ues {1, 2, 3, 4, 5} for J48's -M parameter which sets the minimum number of examples that can appear in a leaf node in a decision tree.

Running the Experimenter

1. Start by setting up a new experiment and select the following datasets:

glass.arff
primary-tumor.arff balance-scale.arff heart-h.arff

(a) set a filename for results destination, like 'q2.arff'

(b) the experiment should contain 10 repetitions of 10-fold cross- validation (CV);

(c) ensure "Classification" and "Data sets first" are selected;

(d) add the 4 datasets in the order in which they appear above;

(e) add 4 versions of the FilteredClassifier, set up for each version as follows:
- for the classifier, select CVParameterSelection from "classi- fiers/meta"; in the dialog ensure you set the classifier to be: ‘J48 -C 0.01 -M 2' and CVParameters to be: ‘M 2 30 5'
- for the filter, select AddNoise with "percent" set to, respec- tively, 0%, 20%, 50%, and 80%.

(f) now add 1 more version of the FilteredClassifier as follows:
- for the classifier, select J48 with default parameters;
- for the filter, select AddNoise with "percent" set to 50%.

(g) check that you now have 5 entries in the Algorithms pane (4 of J48 with CVParameterSelection and 1 of J48 with default param- eters).

2. Run the experiment:

3. Analyze the results as explained earlier:

(a) ensure "Show std. deviations" is on in the "Configure test" panel

(b) in "Output Format" ensure "Plain Text" is selected

(c) compare algorithms by "Percent incorrect" using the FilteredClas- sifier with J48 and AddNoise at 0% as the baseline classifier (select this as "Test base")

(d) save these results to a new file called "q2.out"
Results interpretation

Answer these questions in the file "answers.txt". Your answers must be backed up by referring to the results you saved in "q2.out".

2a) Looking at the error (Percent incorrect) results for tree learning on these data sets as noise is increased, has learning managed to avoid overfitting at low, medium and high levels of added noise ?

2b) Is parameter selection helping with overfitting avoidance ?
How can you assess this ?

Question 3

This question involves mining a data-set on California house prices from census data in the 1990s. We will be using regression to do this since the output is numeric. Since this problem involves using attribute or feature transformations we will need to do this using the Weka Explorer interface.

In the Explorer, click on the Pre-Process tab at the top of the Explorer window, then click on "Open file . . ." and select "houses.arff", then click on Open. Now click on the Classify tab at the top of the Explorer window, choose LinearRegression, select 10-fold cross-validation, select "me- dian house value" as the target (class) variable and click Start in the left side panel.

Linear Regression. In the output panel you should see a regression equation with some results. Save the results list into a new file called "q3.out".

Return to the Pre-Process tab and choose the filter AddExpression under "filters/unsupervised/attribute". In the dialog box, click on More to see examples of the kind of attribute or feature transformation expressions you can add and apply here. Now set up a log transform to the class variable "median house value". Be sure to click "Apply" to actually transform the values in the training set (the transform just takes place on the data in main memory, so your original dataset is not changed, and you can save the transformed data to a new file if needed). Your new variable should appear in the attributes list. Now delete the old class variable, rerun the linear regression, and save (append) the results list into your "q3.out" file.

In the original research all variables (apart from latitude and longitude) were removed and replaced with a set of transformed versions. Transforma- tions included squares, cubes, and logs of ratios. Experiment with at most two more sets of transformations to the variables, run linear regression on the transformed data, and save the output to your "q3.out" file.

Results interpretation Answer this in the file "answers.txt". Your answers must be backed up by referring to the results you saved in "q3.out".

3a) Have any of your variable transformations had any effect on the cross-validation error of the linear regression ? If so, state which transformations, and what the effect on error was.

3b) How do you account for such a change ?

Question 4

This question involves mining text data, for which Weka has a nice filter called "StringToWordVector". Each text example is a string of words with a class label, and the filter converts this to a vector of word counts in various formats. To save time this has already been done for you, reducing the set of words to a vocabulary size of 1000. The dataset contains 10, 060 "snippets", short sequences of words taken from Google searches, each of which has been labelled with one of 8 classes, such as business, sports, etc.

Using a vector representation for text fdata means that we can use many of the standard classifier learning methods. However, the dataset is highly "sparse", in the sense that for any example nearly all of its feature values are zero. To tackle this problem, we typically apply methods of feature selection, or dimensionality reduction. In this question you will investigate the effect of using the method of RandomProjections, a dimensionality reduction method. As in the previous question it is slightly easier to do these tasks with the Weka Explorer interface rather than the Experimenter.

Training text classifiers and dimensionality reduction

In the Explorer, load the file "web snippets.arff". Click on the Classify tab at the top of the Explorer window and choose NaiveBayesMultinomial (MNB) from "bayes". In "Test options" select "Cross-validation". Be sure that "snip class" is selected as the target variable. Since MNB has no parameters to set, click Start in the left side panel. When MNB has finished, right-click in the Result list and save this output into a new file called "q4.out".

Now return to the Preprocess tab and choose the filter unsupervised at- tribute filter "RandomProjection". The "numberOfAttributes" parameter should be set to 10 (the default) and all other parameters should be left at the default values. Now click on Apply and a dataset will be generated with the new attributes (denoted K1 to K10). Go to the Classify tab, select J48 and run it with the same settings as MNB above, saving (appending) the results to "q4.out". Now we cannot run Weka's MNB on this transformed dataset, so instead just select ‘NaiveBayes' and run it with the same settings as MNB and J48, saving the results to "q4.out" as before. Repeat this process twice more, varying only the number of attributes found by "RandomPro- jection" to obtain new datasets of 100 and 500 attributes, respectively, and saving the results of cross-validated J48 and NaiveBayes to "q4.out". This should give a total of 7 results from the learning runs: 1 from MNB on the untransformed data, plus 2 each from each of the learning algorithms applied to data reduced to 10, 100 and 500 attributes.

Attachment:- specifications and dataset.rar

Request for Solution File

Ask an Expert for Answer!!
Computer Engineering: Comp9417 - compare the performance of different algorithms
Reference No:- TGS02246342

Expected delivery within 24 Hours