Solution to Project 1 of Udacity Deep Reinforcement Learning Nanodegree
This model is developed as a solution to Project 1 of Udacity Deep Reinforcement Learning Nanodegree. Image from official repo
Install the package requirements for this repository
pip install -r requirements.txt
The agent was developed specifically to solve a banana collection environment developed in Unity, which can be downloaded from the following locations. The objective in the banana environment is for an agent to navigate and collect yellow bananas (+1 reward) while avoiding blue bananas (-1 reward). Download your specific environment and unpack it into the ./env_unity/
folder in this repo:
Environment with discrete state space (37 dimensions):
Environment with pixel state space.
In both versions of the environment, the agent has an action space with four discrete actions;
0
: forward1
: backwards2
: left3
: rightThe environment is considered solved when the agent collect an average score of 13 bananas over 100 consecutive episodes.
libs/agents.py
: A DQN agent, which by default is configured to be a double dueling DQN.libs/models.py
: PyTorch models used by the DQN agentlibs/memory.py
: Prioritized experience replay, using sum-tree as defined in libs/sumtree.py
libs/monitor.py
: Functionality for training/testing the agent and interacting with the environmentmain.py
: Main command-line interface for training & testing the agentFor training the agent on the discrete state space, the model can be run by using one of the following (only tested on windows!):
python main.py --environment env_unity/DiscreteBanana/Banana.exe --model_name DQN
python main.py --environment env_unity/DiscreteBanana/Banana.exe --model_name DuelDQN
python main.py --environment env_unity/DiscreteBanana/Banana.exe --model_name DQN --double
python main.py --environment env_unity/DiscreteBanana/Banana.exe --model_name DuelDQN --double
For training agent on the pixel state space, the following can be used (only tested on windows!)
python main.py --environment env_unity/VisualBanana/Banana.exe --model_name DQN
python main.py --environment env_unity/VisualBanana/Banana.exe --model_name DuelDQN
python main.py --environment env_unity/VisualBanana/Banana.exe --model_name DQN --double
python main.py --environment env_unity/VisualBanana/Banana.exe --model_name DuelDQN --double
Once the agent has been trained, it can be run as follows:
python main.py --environment env_unity/VisualBanana/Banana.exe --model_name DQN --test --checkpoint logs/weights_env_unity_VisualBanana_DQN_single.pth
When trying to optimize training speed, I've used the following to profile the code:
python -m cProfile -o profile.txt -s tottime main.py --environment env_unity/VisualBanana/Banana.exe --model_name DQN --double