Solved: There are two coins c1 and c2 c1 has a equal prior on a, Basic Statistics

There are two coins c1 and c2 c1 has a equal prior on a

10-701: Machine Learning - Midterm Solutions

Q1. Probability and Density Estimation

1. If two binary random variables X and Y are independent, are X¯ (X¯ is the complement of X) and Y also independent? Prove your claim.

2. There are two coins C1 and C2. C1 has a equal prior on a head (H=1) or tail (T=0) and the fate of C2 is dependent on C1. If C1 is a head, C2 will be a head with probability 0.7. If C1 is a tail, C2 will be a head with probability 0.5. C1 and C2 are tossed in sequence once, and the observed sum of the two coins, S = C1 + C2, is 1. What is the probability that C1=T and C2=H (Hint: use Bayes theorem)?

3. We estimate the head probability θ of a coin from the results of N flips. We use pseudo-counts to influence the "fairness" of the coin. This is equivalent to using which distribution as a prior for θ.

4. Assume we computed the parameters for a Naive Bayes classifier. How can we use these parameters to compute the density P(X) of an observed vector X = {x₁, x₂, . . . ,x_n}?

Q2. K-Nearest Neighbors Classification

Consider K-NN using Euclidean distance on the following data set (each point belongs to one of two classes: + and o).

1. What is the leave one out cross validation error when using 1-NN?

2. Which of the following values of k leads to the minimum number of validation errors: 3, 5 or 9? What is the error for that k?

Q3. Linear Regression

1. We would like to use the following regression function:

y = w²x + wx (1)

where x is a single value variable. Given a set of training data points {(x_i, y_i)} derive a solution for w. Simplify as much as you can.

2. We would like to compare the regression model used in (1) to a the following regression model:

y = wx

(a) Given limited training data, which model would fit the training data better:

i. Model 1

ii. Model 2

iii. Both will fit the data equally well

iv. Impossible to tell

(b) Given limited training data, which model would fit the test data better:

i. Model 1

ii. Model 2

iii. Both will fit the data equally well

iv. Impossible to tell

Q4. Logistic Regression

1. Logistic regression is named after the log-odds of success (the logit of the probability) defined as

Ln(P[Y = 1|X = x]/P[Y = 0|X = x])

Show that log-odds of success is a linear function of x.

2. How do the probabilities of outcomes y change as a function of x in the logistic regression model? How does this limit the classification ability of Logistic regression?

3. Compare the decision boundaries of Gaussian Naive Bayes and Logistic Regression on the following data sets. Draw the Logistic Regression decision boundary with a solid line and the Gaussian Naïve Bayes Boundary with a dashed line. If one (or both) of the methods cannot classify the data, indicate this fact and write one sentence explaining why not.

Q5. Learning theory

Assuming two classes and denoting the number of samples in each class as N and the dimension of the input space as D. Declare or compute the VC dimension of the following classifiers. State your assumptions and briefly justify your answers-use drawings if necessary.

1. A K-nearest neighbor classifier with K = 1.

2. A single-layer perceptron classifier.

3. Assume D = 2. A square that assigns points within as one class and points outside as another class. Draw a scenario where this classifier shatters all points for the VC dimension you have proposed.

Q6. Decision Trees

1. We are trying to classify individuals as Males (M) or Females (F) based on weight and height information. The tables below summarize the information we collected for training a model:

Weight	> 130	< = 130
M	45	5
F	30	10

Weight	> 150	< = 150
M	30	20
F	10	30

Height	> 5 feet	< = 5
M	50	0
F	30	10

Weight	> 6	< = 6
M	15	35
F	0	40

H(3/5) = 0.97

H(1/3) = 0.92

H(3/4) = 0.81

H(4/5) = 0.72

H(3/8) = 0.95

H(7/15) = 0.99

H(1) = 0

(a) Using these tables, construct two decision tress (one only using weight and other only using height). You may find some of the H values listed above useful for constructing these trees.

(b) What is the expected error of each of the trees you constructed?

(d) Assume we have all the training data summarized in the tables above. We use this to construct a single tree that uses all the data (so basically we have, for each individual, 5 binary values: the label (M/F), whether their height is > 5, whether their height is > 6, whether their weight is > 130 and whether their weight is > 150). Select the most appropriate answer from below and briefly explain. When comparing this tree to the method you proposed in (c) using a test set the most likely outcome is:

i. The single tree would outperform the two trees approach.

ii. The two trees approach will outperform the single tree approach.

iii. Both methods would achieve similar results.

Q7. SVM

1. (True/False) When the data is not completely linearly separable, the linear SVM without slack variables returns w = 0.

2. (True/False) Assume we are using the primal non linearly separable version of the SVM optimization target function. What do we need to do to guarantee that the resulting model is linearly separable?

3. True/False After training a SVM, we can discard all examples which are not support vectors and can still classify new examples.

4. Show that if k₁(x, x') and k₂(x, x') are kernels, then k(x, x') = k₁(x, x')k₂(x, x') is also a kernel.

5. Consider a 2 class classification problem with a dataset of inputs {x₁ = (-1, -1), x₂ = (-1, +1), x₃ = (+1, -1), x₄ = (+1, +1)} and a corresponding set of targets {t₁, t₂, t₃, t₄} where t_i ∈ {+1, -1}. Using this feature space (no kernel trick), can we build a SVM to perfectly classify this dataset regardless of values of t_i's?

Q8. Neural networks

1. (True/False) Increasing the number of layers always decrease the classification error of test data.

2. We are trying to classify samples that only contain binary features. Can the following three classification algorithms be implemented using a feed-forward neural networks for such data, using units that are hard thresholds or linear activation functions? For each say yes / no. If yes, briefly explain how (no need to draw). If no, briefly explain why.

(a) Naive Bayes with binary features

(b) Decision Trees with binary features

Q9. Hierarchical Clustering

1. Assume we are trying to cluster the points 2⁰, 2⁰, 2¹, 2², . . . , 2ⁿ (a total of n points where n = 2^N) using hierarchical clustering. If we are using Euclidian distance, draw a sketch of the hierarchical clustering tree we would obtain for each of the linkage methods (single, complete and average)

Single link tree:

Complete link tree:

Average link tree:

2. Now assume we are using the following distance function: d(A, B) = max(A, B)/ min(A, B). Which of the linkage methods above will result in a different tree from the one obtained in (1) when using this distance function? If you think that one or more of these methods will result in a different tree, sketch the new tree as well.

(a) Single link

(b) Complete link

3. We would like to use a hierarchical clustering tree as a decision tree. We select a linkage method and build the tree from the training data. For a test sample, at each node in the tree we compute the distance between the test sample and the sub-cluster rooted at that node using the linkage method we selected. This way each test sample is propagated down the tree. Specify a linkage method that, when using such method, will lead to a hierarchical clustering decision tree that is exactly the same as a classifier we discussed in class. What is this classifier? Be specific.

View Complete Question

Solution Preview :

Prepared by a verified Expert

Basic Statistics: There are two coins c1 and c2 c1 has a equal prior on a

Reference No:- TGS01474964

Now Priced at $90 (50% Discount)

Recommended (92%)

Rated (4.4/5)

Have a Question? (oR Write a Review)

Write atleast 100 words!!

Solution Preview :

Prepared by a verified Expert

Basic Statistics: There are two coins c1 and c2 c1 has a equal prior on a

Reference No:- TGS01474964

Have a Question? (oR Write a Review)

Recent Questions Asked Basic Statistics

Q : Must follow every single instrucstion which are in the

Q : Describe ways in which the processes of visual perception

Q : What are learning barriers adults face when aging what is

Q : Describe the environment the organization and the

Q : There are two coins c1 and c2 c1 has a equal prior on a

Q : Do you think speech is special or is it processed like

Q : Please write five to six pages contemplating and discussing

Q : Briefly describe the five keys to effective supervisor

Q : You will analyze the differences in emotion and emotional

Discuss signs and symptoms of hpv related cancer

Describe structured multimodal pain management program

Discuss client with severe atherosclerotic disease

Reflect on the definition and goal of ebp

Examine the process of putting a new policy into place

Essential information for early childhood professionals

Discuss about the value of examining your personal biases

Solution Preview :

Prepared by a verified Expert

Basic Statistics: There are two coins c1 and c2 c1 has a equal prior on a

Reference No:- TGS01474964

Recent Questions Asked Basic Statistics

Q : Must follow every single instrucstion which are in the

Q : Describe ways in which the processes of visual perception

Q : What are learning barriers adults face when aging what is

Q : Describe the environment the organization and the

Q : There are two coins c1 and c2 c1 has a equal prior on a

Q : Do you think speech is special or is it processed like

Q : Please write five to six pages contemplating and discussing

Q : Briefly describe the five keys to effective supervisor

Q : You will analyze the differences in emotion and emotional

Asked Questions