PyTorch based Reinforcement Learning for OpenSim Prosthetics and Learning to Run environments
MIT License
This is my code for experimenting with the CrowdAI Prosthetics Challenge (https://www.crowdai.org/challenges/nips-2018-ai-for-prosthetics-challenge)
The reinforcement learning codebase is based upon Ilya Kostrikov's awesome work (https://github.com/ikostrikov/pytorch-a2c-ppo-acktr)
As this is part of my learning process for continuous control with deep reinforcement learning, there are likely to be some issues.
All experiments were performed with PPO or PPO w/ self-improvement learning w/ 16 vector'd environments running in parallel. Keep in mind, the simulator is VERY slow so expect to wait a long time for decent results (days) -- even if you happen to have a kick ass machine.
Added:
Setup your environment as per https://github.com/stanfordnmbl/osim-rl#getting-started
Unclipped -- trains much faster but not clear what OpenSim is doing:
main.py --algo ppo --env-name osim.Prosthetics --lr 7e-4 --num-steps 1000 --use-gae --ppo-epoch 10
With clipped [0, 1] actions shifted so mean is at 0.5:
main.py --algo ppo --env-name osim.Prosthetics --lr 1e-3 --num-steps 1000 --use-gae --ppo-epoch 10 --clip-action -shift-action
With beta distribution [0, 1]:
main.py --algo ppo --env-name osim.Prosthetics --lr 1e-3 --num-steps 1000 --use-gae --ppo-epoch 10 --beta-dist