A deep Q learning demonstration using Google Tensorflow
MIT License
Check out the new simpler, better performing and more complete implementation that we released at OpenAI:
https://github.com/openai/baselines
(scroll for docs of the obsolete version)
Check out Karpathy game in notebooks
folder.
The image above depicts a strategy learned by the DeepQ controller. Available actions are accelerating top, bottom, left or right. The reward signal is +1 for the green fellas, -1 for red and -5 for orange.
future==0.15.2
euclid==0.1
inkscape
(for animation gif creation)tf_rl
has controllers and simulators which can be pieced together using simulate function.
Want to have some fun controlling the simulation by yourself? You got it!
Use tf_rl.controller.HumanController
in your simulation.
To issue commands run in terminal
python3 tf_rl/controller/human_controller.py
For it to work you also need to have a redis server running locally.
To write your own controller define a controller class with 3 functions:
action(self, observation)
given an observation (usually a tensor of numbers) representing an observation returns action to perform.store(self, observation, action, reward, newobservation)
called each time a transition is observed from observation
to newobservation
. Transition is a consequence of action
and has associated reward
training_step(self)
if your controller requires training that is the place to do it, should not take to long, because it will be called roughly every action execution.To write your own simulation define a simulation class with 4 functions:
observe(self)
returns a current observationcollect_reward(self)
returns the reward accumulated since the last time function was called.perform_action(self, action)
updates internal state to reflect the fact that aciton
was executedstep(self, dt)
update internal state as if dt
of simulation time has passed.to_html(self, info=[])
generate an html visualization of the game. info
can be optionally passed an has a list of strings that should be displayed along with the visualizationThe simulate
method accepts save_path
argument which is a folder where all the consecutive images will be stored.
To make them into a GIF use scripts/make_gif.sh PATH
where path is the same as the path you passed to save_path
argument