A formulate this problem as a markov decision process by


A chemical company produces two chemicals, denoted by 0 and 1, and only one can be produced at a time. Each month a decision is made as to which chemical to produce that month. Because the demand for each chemical is predictable, it is known that if 1 is produced this month, there is a 70 percent chance that it will also be produced again next month. Similarly, if 0 is produced this month, there is only a 20 percent chance that it will be produced again next month. To combat the emissions of pollutants, the chemical company has two processes, process A, which is efficient in combating the pollution from the production of 1 but not from 0, and process B, which is efficient in combating the pollution from the production of 0 but not from 1. Only one process can be used at a time. The amount of pollution from the production of each chemical under each process is

Unfortunately, there is a time delay in setting up the pollution control processes, so that a decision as to which process to use must be made in the month prior to the production decision. Management wants to determine a policy for when to use each pollution control process that will minimize the expected total discounted amount of all future pollution with a discount factor of α = 0.5.

(a) Formulate this problem as a Markov decision process by identifying the states, the decisions, and the Cik. Identify all the (stationary deterministic) policies.

(b) Use the policy improvement algorithm to find an optimal policy.

Request for Solution File

Ask an Expert for Answer!!
Basic Statistics: A formulate this problem as a markov decision process by
Reference No:- TGS01482727

Expected delivery within 24 Hours