Build a neural


Plagiarism is the submission of somebody else's work in a manner that gives the impression that the work is your own. The Department of Computer Science and Information Technology at La Trobe University treats plagiarism very seriously. When it is detected, penalties are strictly imposed.

1. In this question, we are going to build a neural network (NN) classifier to predict red wine quality (represented by an integer ranging from 0 to 10, higher means better) using a set of chemical properties. These properties are presented as attributes below:

fixed acidity, volatile acidity, citric acid, residual sugar, chlorides, free sulfur dioxide, total sulfur dioxide, density, pH, sulphates, alcohol

The last attribute quality is the class label.
The dataset needs to be split into training and testing datasets. Download the program "DataSplit2.exe" and execute it. Enter your student ID, specify the locations of the red wine dataset file, and the destination folder.

The dataset will be split for you by clicking the "OK" button. Note that your training and testing datasets are unique to others. Make sure the student ID is entered correctly. You are required to submit both training and testing datasets generated, or no marks will be given to the answer of this question otherwise.

a. Load both datasets into the MATLAB workspace. It is recommended to separate the class label (i.e. the attribute quality) from other attributes such that all the class labels of a dataset are stored in a matrix. As a result, there are four matrices after the import process, two for the attribute values from both datasets, and the other two for the class labels.

The class labels require encoding before they can be used for training and evaluating the NN classifier. Since there are 11 distinct class values (0 - 10), each class label is encoded into a column vector of 11 × 1. For a class value k, the k + 1 th row of the column vector is set to 1, while the others are zero. For example, if the class label is 4, then it is encoded into a column vector:
0
F 0 1
I 0 I
I 0 I
I 1 I
I 0 I
I 0 I
I 0 I
I 0 I
I 0 I
L 0 I

Therefore, if the dataset has N samples, then the class labels are encoded into an 11 × N matrix.
Implement this encoding as a MATLAB function. The function source codes are submitted as a MATLAB function file. (.m file). In your written answer, specify clearly what input argument(s) is/are expected, and the expected return from this function. (2 marks)

b. The NN classifier is created using the following parameters: Number of hidden layers: 1
Number of neurons: 10

Use default settings for other parameters. Train the classifier using the training dataset. Show the training performance by pasting the performance curve in your answer. Submit your MATLAB script file for this training.
Hint: Check carefully the dimension arrangement of the NN classifier, i.e. whether it considers a row or a column as a tuple.

c. Use the NN classifier to predict the qualities of the samples in the testing dataset. Obtain and show the confusion matrix. What is the accuracy of the classifier? Submit your MATLAB script file for this testing and evaluation.

Please submit your MATLAB source codes for parts (a) - (c) in separate MATLAB function/script files. No marks will be given to your answer unless the relevant source codes are submitted. Remember to submit the training and testing datasets as well.

2. We are going to mine some association rules from the supermarket transactions using WEKA.

Download the program "TransactionDataGenerator.exe" and execute it. Enter your student ID and specify the location of destination folder. The dataset will be generated for you by clicking the "OK" button. A transaction file will then be generated in CSV format. Each line row represents a single transaction, the first item is the transaction ID and the others are the goods bought by the customer. You are required to submit the generated transaction dataset, or no marks will be given to the answer of this question otherwise.

a. The transaction file generated must be converted to an attribute format (see appendix) that can be imported to WEKA. For example, a transaction file consists of five transactions as follows:

T001, jam
T002, bread, jam T003, bread, butter T004, jam
T005, bread

The converted format is shown below:

t_id

bread

butter

jam

T001

 

 

t

T002

t

 

t

T003

t

t

 

T004

 

 

t

T005

t

 

 

The converted transactions can be saved in CSV format. The content of the above converted format in CSV is like this:

tid,bread,butter,jam T001,,,t
T002,t,,t
T003,t,t,
T004,,,t
T005,t,,

Write a MATLAB conversion program for this task. Submit your MATLAB script file for this conversion, or no marks will be given to this part otherwise. The list of all items is available at the Appendix.

Hints:
i. Since the transactions consist of different number of items, it is recommended to read the whole transaction as a string, i.e. all the N transactions are put in an N × 1 cell array. You may find functions such as textscan or importdata useful.

ii. Following (i), it is then necessary to separate the transactions Id and every item in a single transaction. The delimiter is a comma (","). You may find the regular expression function regexp useful.

iii. A transaction schema (i.e. all possible transaction items in the header line of the above converted format) is needed. You transactions might not cover all the items, but this does not affect the final results.

iv. The transaction schema should be implemented as an array in your source codes. Also, the item order in the array should be identical to the item order in the header line. This helps determining which column to put a ‘t' label for a transaction. You may find the function ismember useful.

b. Mine the association rules from the transactions using WEKA. Specify which algorithm you select and the related parameters such as minimum support and confidence. List the best 10 rules discovered with highest possible support and confidence.

c. Suggest a potential problem you might have when inspecting the association rules mining results.

3. A training dataset is provided as follows:

Weather outlook

Temperature

Wind

Sports

Sunny

20

Strong

Outdoor

Cloudy

7

Weak

Indoor

Cloudy

15

Mild

Outdoor

Sunny

33

Mild

Outdoor

Rainy

10

Mild

Indoor

Cloudy

27

Weak

Outdoor

Rainy

15

Strong

Indoor

Sunny

9

Mild

Outdoor

Sunny

30

Strong

Indoor

Rainy

25

Weak

Outdoor

The class label is sports. Predict the class labels (i.e. play indoor sports or outdoor sports) for the following 4 tuples (a - d) using Naïve Bayesian classification. Show your calculations.

 

Weather outlook

Temperature

Wind

a

Sunny

32

Strong

b

Rainy

28

Mild

c

Cloudy

10

Weak

d

Sunny

16

Mild

1. a. Describe minimum spanning tree (MST) in hierarchical clustering and illustrate its construction using at least five unique 2D data points (e.g. (2, 1), (3, 3), etc.).

b. Suggest a way to generate MST from a set of data points without using the MST building algorithm in the lecture notes. Explain why it is so

(Hint: An alternative way has been covered in the lecture notes)


Attachment:- New WinRAR archive.rar

Solution Preview :

Prepared by a verified Expert
MATLAB Programming: Build a neural
Reference No:- TGS01135502

Now Priced at $120 (50% Discount)

Recommended (90%)

Rated (4.3/5)