Can you see why td updates are likely to be much better at


Problem

This is a exercise to help develop your intuition about why TD methods are often more efficient than MC methods. Consider the driving-home example and how it is addressed by TD and MC methods. Can you imagine a scenario in which a TD update would be better on average than an MC update? Give an example scenario---a description of past experience and a current situation---in which you would expect the TD update to be better. Here's a hint: Suppose you have lots of experience driving home from work. Then you move to a new building and a new parking lot (but you still enter the highway at the same place). Now you are starting to learn predictions for the new building. Can you see why TD updates are likely to be much better, at least initially, in this case? Might the same sort of thing happen in the original task?

 

Request for Solution File

Ask an Expert for Answer!!
Computer Engineering: Can you see why td updates are likely to be much better at
Reference No:- TGS02639219

Expected delivery within 24 Hours