What is the best expectation of success you can achieve in


Problem

Suppose you face a binary bandit task whose true action values change randomly from play to play. Specifically, suppose that for any play the true values of actions 1 and 2 are respectively .1 and .2 with probability .5 (case A), and .9 and .8 with probability .5 (case B). If you are not able to tell which case you face at any play, what is the best expectation of success you can achieve and how should you behave to achieve it? Now suppose that on each play you are told if you are facing case A or case B (although you still don't know the true action values). This is an associative search task. What is the best expectation of success you can achieve in this task, and how should you behave to achieve it?

Request for Solution File

Ask an Expert for Answer!!
Data Structure & Algorithms: What is the best expectation of success you can achieve in
Reference No:- TGS02639188

Expected delivery within 24 Hours