Sable OthelloZero

This is the repo for Sable, a very strong Othello engine, and OthelloZero, the high-performance Reinforcement Learning (RL) framework used to train it. This project was developed for the CS 234 Final Project at Stanford University.

Overview

In this project, we approach the AlphaZero replication problem as a systems architecture problem rather than a learning one. The resulting system features a highly optimized search-and-evaluation pipeline capable of self-play at scale, as well as a very strong engine trained over a week of self play. We hope this code is helpful for other Othello engine developers.

Results

After a week of self play (~1.2 billion positions, ~500 billion rollouts), Sable is quite strong, and at most settings superhuman. We evaluate it as performing well against Edax. At ~400k nodes per move, Sable is roughly equal in strength to Edax depth 21. At between 100 and 1000 rollouts per move, Sable is comparable to Edax at depths 5-11.

Usage

Training

To facilitate large-scale reinforcement learning, we utilize a decoupled Hub-and-Spoke architecture leveraging Google Cloud Storage (GCS) as a high-throughput message bus between the training head and worker nodes.

The system coordinates state through two primary buckets:

Model Bucket: (in our case, gs://othello-models): Acts as the source of models, executables, and scripts for the fleet.
Hot-Swapping: Stores the C++ selfplay binaries and ONNX Runtime libraries, allowing for logic updates without rebuilding VM images.
Weight Registry: Houses latest_model.onnx and a /history/ directory for "Boss Mode" (League Play).
Dynamic Orchestration: Workers pull the worker.sh control script every loop, enabling real-time global adjustments to search parameters (rollouts, noise, temperature). Simply modify the script and reupload to the bucket.
Data Bucket: (in our case, gs://othello-data): A write-heavy sink for experience collection.

VM instances are created with setup_gcp.sh as their startup automation. For convenience, you can upload setup_gcp.sh to a bucket and just provide its URL. As soon as the instance group starts, it will automatically begin self-play.

We use the train_gcp.py script as the main interface with the self-play generated data. To begin training, just run python gcp_train.py.

Dependencies

The engine requires the C++ ONNX Runtime installed in external (see CMakeLists.txt), and Python dependencies are in requirements.txt. For GPU support, you will also need to install the appropriate CUDA and cuDNN versions for your system. You will also need to install the Google Cloud SDK and set up your GCP credentials if you want to use Google Cloud to host your training.

Engine

The engine executable provides access to the Sable engine via an interface similar to (but lacking most of the features of) UCI, a well-documented interface used by most chess engines. It supports only one way of playing a position:

setposition startpos [moves] [f5 d6 ...]
go nodes 1000

The position is set as the start position, plus every move in the game up to the current point, including passes. The go command can control the number of rollouts performed. Threads and GPU batch size can be set with:

setoption name Threads value {threads}
setoption name BatchSize value {batch_size}

We also provide gui.py where you can play with the engine via a PyGame interface on CPU, but does not support heavy search on the GPU.

The latest model checkpoint from our training runs is included in latest_model.onnx.

Notes

For completeness, we include vs_edax.py, which is called within the evaluation processes of train_gcp.py in order to run diagnostic checks against low-depth Edax. However, Edax (as far as we know) does not natively support depth limiting, so we needed to modify the source code to do so. Thus, by default vs_edax.py will probably not work as intended.

As a research project, we do not seriously optimize Sable for real game conditions, e.g. Sable uses no transposition tables or other tricks for pruning its tree. Although we support virtual visits for parallel MCTS, this system was not tuned. Sable cannot currently search particularly fast, maxing out at about 50k nodes per second in our tests. As such it is not ready out-of-the-box for competitive time-per-move conditions (and does not support timing logic in its interface either).

Citation

If you use this engine or system in your research, please cite:

@misc{tseng2026othello,
  author = {Tseng, Jonathan},
  title = {Self-play for Training 8x8 Othello Agents: Reinforcement Learning as a Systems Engineering Problem at Scale},
  year = {2026},
  publisher = {Stanford University},
  note = {CS 234 Final Project}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
cpp_src		cpp_src
figures		figures
gcp_scripts		gcp_scripts
python_src		python_src
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
latest_model.onnx		latest_model.onnx
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sable OthelloZero

Overview

Results

Usage

Training

Dependencies

Engine

Notes

Citation

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sable OthelloZero

Overview

Results

Usage

Training

Dependencies

Engine

Notes

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages