How does experience replay work in the algorithm


Homework

In this homework, you will gain experience working with OpenAI Gym, which is a set of problems that can be explored with different reinforcement learning algorithms. This homework is designed to help you apply the concepts you have been learning about Q-learning algorithms to the "cartpole" problem, a common reinforcement learning problem.

Note: The original code referenced in this homework was written in Python 2.x. You have been given a zipped folder containing an updated Python 3 version of the code that will work in the Apporto environment. To make this code work, some lines have been commented out. Please leave these as comments.

Reference: Surma, G. (2018). Cartpole. Github repository.

Prompt

Access the Virtual Lab (Apporto) by using the link in the Virtual Lab Access module. It is recommended that you use the Chrome browser to access the Virtual Lab. If prompted to allow the Virtual Lab access to your clipboard, click "Yes", as this will allow you to copy text from your desktop into applications in the Virtual Lab environment.

1) Review the following reading: Cartpole: Introduction to Reinforcement Learning. In order to run the code, upload the Cartpole.zip folder into the Virtual Lab (Apporto). Unzip the folder, then upload the unzipped folder into your Documents folder in Apporto. Refer to the Jupyter Notebook in Apporto (Virtual Lab) Tutorial to help with these tasks.

Note: The Cartpole folder contains the Cartpole.ipynb file (Jupyter Notebook) and a scores folder containing score_logger.py (Python file). It is very important to keep the score_logger.py file in the scores folder (directory).

2) Open Jupyter Notebook and open up the Cartpole.ipynb and score_logger.py files. Be sure to review the code in both of these files. Rename the Cartpole.ipynb file using the following naming convention:

__Homework5.ipynb

Thus, if your name is Jane Doe, please name the submission file "Doe_Jane_Homework5.ipynb".

3) Next, run the code in Cartpole.ipynb. The code will take several minutes to run and you should see a stream of output while the file runs. When you see the following output, the program is complete:

Solved in _ runs, _ total runs.

Note: If you receive the error "NameError: name ‘exit' is not defined" after the above line, you can ignore it.

4) Modify the values for the exploration factor, discount factor, and learning rates in the code to understand how those values affect the performance of the algorithm. Be sure to place each experiment in a different code block so that your instructor can view all of your changes.

Note: Discount factor = GAMMA, learning rate = LEARNING_RATE, exploration factor = combination of EXPLORATION_MAX, EXPLORATION_MIN, and EXPLORATION_DECAY.

5) Create a Markdown cell in your Jupyter Notebook after the code and its outputs. In this cell, you will be asked to analyze the code and relate it to the concepts from your readings. You are expected to include resources to support your answers, and must include citations for those resources.

Specifically, you must address the following rubric criteria:

1) Explain how reinforcement learning concepts apply to the cartpole problem.

a) What is the goal of the agent in this case?
b) What are the various state values?
c) What are the possible actions that can be performed?
d) What reinforcement algorithm is used for this problem?

2) Analyze how experience replay is applied to the cartpole problem.

a) How does experience replay work in this algorithm?
b) What is the effect of introducing a discount factor for calculating the future rewards?

3) Analyze how neural networks are used in deep Q-learning.

a) Explain the neural network architecture that is used in the cartpole problem.
b) How does the neural network make the Q-learning algorithm more efficient?
c) What difference do you see in the algorithm performance when you increase or decrease the learning rate?

Format your homework according to the following formatting requirements:

(1) The answer should be typed, double spaced, using Times New Roman font (size 12), with one-inch margins on all sides.

(2) The response also includes a cover page containing the title of the homework, the student's name, the course title, and the date. The cover page is not included in the required page length.

(3) Also include a reference page. The Citations and references should follow APA format. The reference page is not included in the required page length.

Request for Solution File

Ask an Expert for Answer!!
Data Structure & Algorithms: How does experience replay work in the algorithm
Reference No:- TGS03139856

Expected delivery within 24 Hours