Find the optimal policy that maximizes expected total


A decision maker observes a discrete-time system which moves between states {s1,s2,s3,s4} according to the following transition probability matrix: 
p= 0.3 0.4 0.2 0.1
0.2 0.3 0.5 0
0.1 0 0.8 0.1
0.4 0 0 0.6
At each point of time, the decision maker may leave the system and receive a reward of R=20 units, or alternatively remain in the system and receive a reward of r(si) units if the system occupies state si. If the decision maker decides to remain in the system, its state at the next decision epoch is determined by p. Assume a discount rate of 0.9 and that r(si)=i. Find the optimal policy that maximizes expected total discounted reward.(if you do with computer attach with the code) 

Request for Solution File

Ask an Expert for Answer!!
Basic Computer Science: Find the optimal policy that maximizes expected total
Reference No:- TGS0118068

Expected delivery within 24 Hours