Robust policy search algorithms which train on model ensembles
This repo has robust (to model parameters) variants for policy search algorithms. Current implimentations primarily look at episodic tasks and with emphasis on batch policy optimization using various forms of policy gradients.
This project builds on top of OpenAI gym and rllab. You need to set up those first before proceeding. The code structure is as follows:
base code
folder to /path/to/rllab/rllab/envs/
and replace existing filePYTHONPATH="/path/to/robustRL:$PYTHONPATH"
or change the ~/.bashrc
file.job_data.txt
and MDP_funcs.py
MDP_funcs.py
, you need to write a function to generate the environment of your choice. Make sure this is compatible with OpenAI gym and that you have registered this environment with the gym modules. Also remember to add a function call within the generate_environment
function in MDP_funcs.py
.theanorc
file in your home directory to remove the GPU device set by default. Also uncomment the theano.sandbox.cuda.unuse()
command in algos.py
if you get a CUDA error.Have a look at the example codes to get an idea of how the different functionality can be integrated for training. If you want to use just the training function without the wrappers, you can do this easily with just a for loop.
Theano doesn't behave well with multiprocessing modules. If you run python job_script.py
, you should see a bunch of worker processes starting up and then finishing. Running it in the background using nohup sometimes affects the writing to nohup.out and you may not see all processes starting up together, but start and end will happen alternatively. If this happens, see if multiple processes are spawned using htop command in terminal.