This is a project for the IASD course on Monte-Carlo Search of M. Cazenave (https://www.lamsade.dauphine.fr/~cazenave/MonteCarloSearch.html). It introduces a general framework for game playing and agent policies for these games. Several games and policies have been implemented.
API documentation is available here.
This project uses rust
(nightly channel) and python
with tensorflow
.
rustup
and launch rustup default nightly
to enable the nightly compiler.tensorflow
to enable PUCT/AlphaZero/MuZero policies.pip install -r requirements.txt
to install python dependencies (tensorflow
is excluded from the list as either tensorflow
or tensorflow-gpu
works).Cargo is the Rust project manager.
Use cargo run --release --bin <binary>
to execute binaries. Available binaries are:
evaluate
: evaluate two policies on breakthroughui
: interactive interface to inspect alphazerogenerate
: self-play game generatorsgym_server
: decoupled game executor for openai gymperf
: benchmarking testsevaluate
, generate
and ui
all use a configuration file located in the config/
path. It is selected
by the --config
option.
To perform training, you need to launch both python and rust binaries:
python training.py --config breakthrough --method <alpha|mu>
to execute the training loop and generate the network model.cargo run --release --bin generate -- -c breakthrough -m <alpha|mu>
to launch the self-play generator.There shouldn't be any errors and the number of generated games should increase.
Debugging information can be activated using export RUST_DEBUG=info
. Models are saved in data/<name>/model
. Tensorboard logs are saved in data/<name>/logs
and training data is saved in data/<name>/training_data
.
Two policies can be tested using evaluate
:
cargo run --release --bin evaluate -- -c breakthrough --policy <policy> --against <policy>
To launch the UI and visualize Alpha/Mu tree search live, use ui
:
cargo run --release --bin ui -- -c breakthrough --method <alpha|mu>
(to avoid tensorflow logs, use export TF_CPP_MIN_LOG_LEVEL=2
)
It's possible to reproduce Deepmind's AlphaZero results on toy games.
I haven't been able to have satisfactory results on MuZero, even on toy games, but the implementation is here.