Solved: 10-701 machine learning - spring 2012 - problem set 5 what, Basic Statistics

10-701 machine learning - spring 2012 - problem set 5 what

10-701 Machine Learning - Spring 2012 - Problem Set 5

Q1. Hidden Markov Model

Hidden Markov Model is an instance of the state space model in which the latent variables are discrete. Let K be the number of hidden states. We use the following notations: x are the observed variables, z are the hidden state variables (we use 1-of-K representation: z_k = 1, z_j≠k = 0 means the hidden state is k). The transition probabilities are given by a K × K matrix A, where A_jk = p(z_n,k = 1|z_n-1,j = 1) and the initial state variable z₁ are given by a vector of probabilities π: p(z₁|π) = _k=1∏^K π^z_1k_k. Finally, the emission distribution for a hidden state k is parametrized by φ_k: p(x_n|φ_k). Let Θ = {A, π, φ}.

1.1 The full likelihood of a data set

If we have a data set X = {x₁, . . . , x_N}:

1. What is the full likelihood of observed and latent variables: p(X, Z|Θ)? Note Z = {z₁, . . . , z_N} are the hidden states of the corresponding observations.

2. What is the likelihood of the data set? (e.g. p(X|Θ).

1.2 Expectation-Maximization (EM) for Maximum Likelihood Learning-

We'd like to derive formulas for estimating A and φ to maximize the likelihood of the data set p(X|Θ).

1. Assume we can compute p(X, Z|Θ) in O(1) time complexity, what is the time complexity of computing p(X|Θ)?

We use EM algorithm for this task:

-In the E step, we take the current parameter values and compute the posterior distribution of the latent variables p(Z|X, Θ^old).

-In the M step, we find the new parameter values by solving an optimization problem:

Θ^new = argmax_ΘQ(Θ, Θ^old) (1)

where

Q(Θ, Θ^old) = ∑_Zp(Z|X, Θ^old) ln p(X, Z|Θ) (2)

2. Show that

Q(Θ, Θ^old) =_k=1∑^Kγ(z₁k) ln π_k + _n=2∑^N_j=1∑^K_k=1∑^Kξ(z_n-1,j, z_nk) ln A_jk (3)

+ _n=1∑^N_k=1∑^Kγ(z_nk) ln p(x_n|φ_k) (4)

where

γ(z_nk) = E_p(z_n|X,Θ^old)[z_nk] (5)

ξ(z_n-1,j, z_nk) = E_p(z_n-1,z_n|X,Θ^old)[z_n-1,j z_nk] (6)

Show your derivations.

3. Show that

p(X|z_n-1, z_n) = p(x₁, . . . , x_n-1|z_n-1)p(x_n|z_n)p(x_n+1, . . . x_N |z_n) (7)

4. In class, we discuss how to compute:

α(z_n) = p(x₁, . . . , x_n, z_n) (8)

β(z_n) = p(x_n+1, . . . , x_N |z_n) (9)

Show that

ξ(z_n-1, z_n) = p(z_n-1, z_n|X) (10)

= α(z_n-1)p(x_n|z_n)p(z_n|z_n-1)β(z_n)/p(X) (11)

How would you compute p(X)?

5. Show how to compute γ(z_nk) and ξ(z_n-1,j , z_nk) using α(z_n), β(z_n) and ξ(z_n-1, z_n).

6. Show that if any elements of the parameters π or A for a hidden Markov model are initially set to 0, then those elements will remain zero in all subsequent updates of the EM algorithm.

1.3 A coin game-

Two students X and Y from Cranberry Lemon University play a stochastic game with a fair coin. X and Y take turn with X going first. All the coin flips are recorded and the game finishes when a sequence of THT first appears. The player who last flips the coin is the winner. Two players can flip the coin many times as follows. At his turn, each time X flips the original coin, he also flips an extra biased coin (p(H) = 0.3.) He stops only if the extra coin lands head, otherwise he repeats flipping the original and extra coins, .... (The flips of this extra coin are not recorded.) On the other hand, at his turn, Y flips the coin until T appears (All of his flips are recorded).

You are given a sequence of recorded coin flips, you would like to infer the winner and as well as the flips of each player.

1. Describe a HMM to model this game.

2. How would you use this HMM model to infer the (most probable) winner and the (most probable) flips of each player?

Q2. Dimensionality Reduction

2.1 Singular value decomposition

In linear algebra, the singular value decomposition (SVD) is a factorization of a real matrix X as:

X = USV^T (12)

If the dimension of X is m × n, where without loss of generality m ≥ n, U is an m × n matrix, S is an n × n diagonal matrix and V^T is also an n × n matrix. Furthermore, U and V are orthonormal matrices: UU^T = I and VV^T = I.

2.2 PCA and SVD-

Consider a dataset of observations {x_n} where n = 1, . . . , N. We assume that the examples are zero-centered such that x¯ = _n=1∑^N x_n = 0. PCA algorithm computes the covariance matrix:

S = 1/N _n=1∑^Nx_nx^T_n (13)

The principal components {u_i} are eigenvectors of S.

Let X = [x₁, . . . , x_N], a D × N matrix where each column is one example x_n. If US'V^Tis a SVD of X,

1. Show that the principal components {u_i} are columns of U. This shows the relationship between PCA and SVD.

2. When the number of dimensions is much larger than the number of data points (D >> N), is it better to do PCA by using the covariance matrix or using SVD?

3. Consider the following data set:

where ∈ is a tiny number. Each column is one example. First zero-center the data set and then do PCA using two techniques: 1) by using the covariance matrix and 2) by using SVD. What do you observe? Hints: What is the "dimension" of this dataset? You can use Matlab, try ∈ = 1e - 10, which techniques return sensible result.

Q3. Markov Decision Process

1. A standard MDP is described by a set of states S, a set of actions A, a transition function T, and a reward function R. Where T(s, a, s') gives the probability of transitioning to s' after taking action a in state s, and R(s) gives the immediate reward of being in state s. A k-order MDP is described in the same way with one exception. The transition function T depends on the current state s and also the previous k-1 states. That is, T(s_k-1, . . . s₁, s, a, s') = p(s', a, s, s₁, . . . s_k-1) gives the probability of transitioning to state s' given that action a was taken in state s and the previous k - 1 states were (s_k-1, . . . , s₁).

Given a k-order MDP M = (S; A; T; R) describe how to construct a standard (first-order) MDP M' = (S', A', T', R') that is equivalent to M. Here equivalent means that a solution to M' can be easily converted into a solution to M. Be sure to describe S', A', T', and R'. Give a brief justification your construction.

2. The Q-learning update rule for deterministic MDPs is as follows:

Q(s, a) ← R(s, a) + γ max_a'Q(s', a') (15)

where s' = f(s, a) is the action to be taken. Prove that Q-learning converges in deterministic MDPs.

View Complete Question

Solution Preview :

Prepared by a verified Expert

Basic Statistics: 10-701 machine learning - spring 2012 - problem set 5 what

Reference No:- TGS01474906

Now Priced at $45 (50% Discount)

Recommended (94%)

Rated (4.6/5)

Have a Question? (oR Write a Review)

Write atleast 100 words!!

Solution Preview :

Prepared by a verified Expert

Basic Statistics: 10-701 machine learning - spring 2012 - problem set 5 what

Reference No:- TGS01474906

Have a Question? (oR Write a Review)

Recent Questions Asked Basic Statistics

Q : Describe why humans have a blind spot and describe the

Q : The president of a company travelednbsp1800nbspmi by jet

Q : Identifies which nervous system structures are involved in

Q : Draw the symbol table and its contents at the point

Q : 10-701 machine learning - spring 2012 - problem set 5 what

Q : Although we are all familiar with the essay form we may not

Q : Can the problem be solved during context-sensitive

Q : Three married couples 6 guests altogether attend a dinner

Q : In a nutrient medium the rate of increase in surface area

Sue is feeling nervous and her heart is racing

Whatr when ben spotted a menacing figure

What mode of action employed by microbial control agents

What if neurones werent post-mitotic

Calculate the expected size of the pcr product

Which type of organism mendel used for genetics research

Studying the role of pepsin in animal digestion

Solution Preview :

Prepared by a verified Expert

Basic Statistics: 10-701 machine learning - spring 2012 - problem set 5 what

Reference No:- TGS01474906

Recent Questions Asked Basic Statistics

Q : Describe why humans have a blind spot and describe the

Q : The president of a company travelednbsp1800nbspmi by jet

Q : Identifies which nervous system structures are involved in

Q : Draw the symbol table and its contents at the point

Q : 10-701 machine learning - spring 2012 - problem set 5 what

Q : Although we are all familiar with the essay form we may not

Q : Can the problem be solved during context-sensitive

Q : Three married couples 6 guests altogether attend a dinner

Q : In a nutrient medium the rate of increase in surface area

Asked Questions