Apply this threshold to our originalreal test set and find


In Section 3.6.3 we used the test set that we had put aside to both select τ, the threshold for the log odds, and to evaluate the Type I and II errors incurred when we use this threshold. Ideally, we choose τ from another set of messages that is both independent of our training data and our test data. The method of cross-validation is designed to use the training set for training and validating the model. Implement 5-fold cross-validation to choose τ and assess the error rate with our training data. To do this, follow the steps:

(a) Use the sample () function to permute the indices of the training set, and organize these permuted indices into 5 equal-size sets, called folds.

(b) For each fold, take the corresponding subset from the training data to use as a ‘test' set. Use the remaining messages in the training data as the training set. Apply the functions developed in Section 3.6 to estimate the probabilities that a word occurs in a message given it is spam or ham, and use these probabilities to compute the log likelihood ratio for the messages in the training set.

(c) Pool all of the LLR values from the messages in all of the folds, i.e., from all of the training data, and use these values and the type I Error Rate () function to select a threshold that achieves a 1% Type I error.

(d) Apply this threshold to our original/real test set and find its Type I and Type II errors.

Request for Solution File

Ask an Expert for Answer!!
Basic Computer Science: Apply this threshold to our originalreal test set and find
Reference No:- TGS01460601

Expected delivery within 24 Hours