Initial implementation of a combination from HTM and RL for a Software Agent in NUPIC.
AGPL-3.0 License
This repository contains the client (agent) side of a simple framework for a reinforcement learning agent using Numentas HTM-algorithms. (DEVELOPMENT STATUS)
The agent architecture is based on Ali Kaan Sungurs Master Thesis (2017). It is slightly modified and reimplemented in NUPIC, Numentas Platform for Intelligent Computing. The implementation and experimentation was part of a Bachelor Thesis Project, where further information can be found.
It can be build locally from source or the docker images are modified for an easy integration in some cloud infrastructure. However optimization for parallel training of agents is not yet implemented.
The framework makes use of OpenAIs Universe World of Bits environment. Unfortunately this repository is deprecated now, but it is still compatible until to date.
The client and remote are both running in docker containers and the agent connects via VNC to the environment. The environment runs in real time and sends an observation and reward to the client, which in turn processes the data and sends an action. More information can be found in their original blog post.
The observation and reward is processed to the base unit in NUPIC an SDR.
For this purpose a UniverseEncoder
was implemented which applies the correct filtering on pixels and is integrated into PluggableUniverseSensor
, a slightly modified version from Numentas PluggableEncoderSensor
.
The Framework can either be set up from source or using existing docker images. This makes is easy to run the agent on a local machine or migrate it into the cloud to run remotely.
The agents architecture is further explained in the mentioned papers, here only a quick overview is given. However with NUPICs network API all layers can easily be interchanged or modified to experiment.
The layers are all defined in network.py
, where the network is created. Each layer consists of a pooling layer and a (customized) temporal memory implementation.
A short list about the layer implementations:
MySPRegion
: Based on NUPIC SpatialPooler
.SensoryIntegrationRegion
: Based on HTM-Research ExtendedTemporalMemory
with possibility to weight apical and basal connections differently. The underlying algorithm is in regions/algorithms/apical_weighted_temporal_memory
.MyTemporalPoolerRegion
: Based on HTM-Research UnionTemporalPooler
with linear decay. The underlying algorithm is in regions/algorithms/union_temporal_pooler
.MyTMRegion
: Based on HTM-Research ExtendedTemporalMemory
with basal and proximal connections.MySPRegion
: Based on NUPIC SpatialPooler
.AgentStateRegion
: Based on HTM-Research ExtendedTemporalMemory
with basal, apical and proximal connections.MySPRegion
: Based on NUPIC SpatialPooler
.ReinforcementRegion
: Based on HTM-Research ExtendedTemporalMemory
with TD-Error computation and other customizations.MotorRegion
: The layer is based on HTM-Research Apical_Distal_Temporal_Memory
, which was almost completely rewritten in regions/algorithms/apical_distal_motor_memory
. It contains the logic to calculate layer 5 voluntary active cells, excite/inhibit corresponding motor cells and map motor cells to the state activation they produced.Many regions are almost identical to the original regions they are based on, thus the documentation of the regions is greatly retained. The ReinforcementRegion
and MotorRegion
are the ones most customized. Especially the Motor layer contains a lot of crucial functionality as it calculates (1) the voluntary active cells from layer 5 and (2) the actual motor cells that are excited/inhibited and mapped with the state they produced.
universe
& gym
) is installed and can be found by python.DOCKER_README.md
for more help.The environment is based on OPENAIs mini-world-of-bits. An open-domain platform for web-agents as described in their paper. It enables the agent, or experiment-observer, to connect via remote desktop control (VNC) and control the environment.
Experiments are simply written in Javascript/HTML/CSS and thus easily modified or created by any curious researcher that want to test the architecture with a new task.
Example experiments can be found in environments/app/universe-envs/world-of-bits/static/miniwob
of the environment repository. The environment repository contains more information on how to create a customized experiment. An example experiment task from the paper:
universe
and gym
import to your files.DOCKER_README.md
for more help.Deploy the docker images on some cloud instance (Ubuntu image tested) and run them as described in the DOCKER_README.md
install instructions.
Example of observing the experiments remotely via VNC from the phone:
SparseMatrixConnections
from NUPIC Core missing)The implementations might slightly vary from the current official NUPIC versions (based on NUPIC 1.0.5dev0) and used prev. versions of HTM-Research/Core repository