tvc-lander

A Proximal Policy Optimization (PPO) agent trained to land a rocket using thrust vector controls. The gym environment is built in Rust using macroquad and rapier2d and is inspired by the LunarLander task. The landing task has been modified to have continuous control instead of a discrete action space and to be a bit simpler (fixed landing zone, flat land).

Live Demo

A live demo of the PPO agent is available here. The demo is a WASM build of the controlled_sim binary. The model is still a work in progress (the task is a lot harder than I initially expected).

Features

A physics simulator of the lander and thrust vector controls using rapier2d via Rust.
FFI Python bindings from the Rust simulation using pyo3 to create a gym module for training.
PPO reinforcement learning agent trained using Python and PyTorch.
Curriculum learning for training the PPO agent.
Interactive simulation with random start positions and mouse click-and-drag to reposition the rocket.
Model inference in Rust (for the controlled simulation) using the ONNX runtime.
WASM build for running the simulation in the browser.

PPO Model

The Proximal Policy Optimization (PPO) model is trained to control the rocket based on the following observation and action spaces:

Observation Space

The observation space is a 6-dimensional vector (since the sim is only in 2D) containing the following in order:

$x, y$: The x and y coordinates of the rocket in the world.
$\theta$: The angle of the rocket from the vertical.
$v_x, v_y$: The linear velocity of the rocket in the x and y directions.
$\omega$: The angular velocity of the rocket.

The observations are roughly normalized to the range $[-1, 1]$ before being passed to the model. The true physical dimensions of the scene are 80 x 60, with the rocket being 2 x 4 in the same units.

Action Space

The action space is a 2-dimensional vector representing standard thrust vector controls:

$F_{\text{thrust}}$: The amount of thrust to apply, normalized to the range [-1, 1].
$\theta_{\text{gimbal}}$: The angle of the gimbal, normalized to the range [-1, 1].

See base/src/constants.rs for the true min and max values for the action and observation space.

Project Structure

The project is divided into three main parts:

base: A Rust crate that contains the simulation (including the game engine, physics, and rendering).
gym: A Rust crate that provides a Python binding to the simulation for a "gym" style reinforcement learning environment.
python: Source code for training the PPO agent using PyTorch.

Prerequisites

Installation

Clone the repository:

git clone https://github.com/akkshay0107/tvc-lander.git
cd tvc-lander

Set up python venv inside poetry and install dependencies:
```
cd python
poetry install
```

Usage

Training the Model

The PPO agent is trained using curriculum learning. The training is divided into stages, where each stage increases the difficulty of the landing task. The curriculum is defined in python/src/ppo.py.

To train the PPO agent, run the following commands from the python directory:

poetry run maturin develop # builds and installs the gym crate as a wheel in the venv
poetry run python ./src/ppo.py

The trained model will be saved to python/models/policy_net.pth.

Additionally to run test episodes using the trained model, run the following command:

poetry run python ./tests/ppo_test_episodes.py

Running the Simulation

To run the simulation with the PPO agent providing controls, you first need to export the trained model to ONNX format. From the python directory, run:

poetry run python ./utils/export_to_onnx.py

Then, run the simulation with the following command from the project root:

cargo run --bin controlled_sim --release

In the simulation, the rocket starts at a random position and tries to land the rocket safely. The rocket can then be clicked and dragged to different locations on the screen to see how the model reacts to the rocket being dropped from that location.

Additionally, the simulation can also be run with keyboard inputs (instead of the model providing controls). For this, from the project root, run:

cargo run --bin base --release --features="keyinput"

Web Build

To build the WASM version of the simulation (needs the wasm-bindgen CLI), run the following command from the project root:

./build_wasm.sh

This will create a dist directory with the WASM build. You can then serve the dist directory using a local web server (for example, python -m http.server).

Deployment

The WASM build is automatically deployed to GitHub Pages on every push to the main branch. The deployment workflow is defined in .github/workflows/deploy.yml.

Modifying the Code

The code is designed to be modular, but if you'd like to experiment with your own models, you'll need to modify the following files:

python/src/ppo.py: Modify the model parameters and training process here.
python/utils/export_to_onnx.py: This is currently hardcoded to accept the PolicyNet class, but can be adapted for other models.
base/src/bin/controlled_sim.rs: This loads the model from a hardcoded path and applies post-processing (clamping). This will need to be changed to reflect a new model.

Unlike other gym interfaces, you have some more leeway with what you can do as input to the model by modifying the code in the gym module.

gym/src/lib.rs: You can modify the calculate_reward function to change the rewards for the landing task. Additionally, the _sample function can be modified to implement different curriculums for curriculum training.

Future Work

Achieve parity with the OpenAI Gym interface (standardize the API, add a render function).
Improve the PPO agent's performance and deploy a more robust solution to the web demo.

Name		Name	Last commit message	Last commit date
Latest commit History 148 Commits
.github/workflows		.github/workflows
assets		assets
base		base
gym		gym
python		python
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
build_wasm.sh		build_wasm.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

tvc-lander

Live Demo

Features

PPO Model

Observation Space

Action Space

Project Structure

Prerequisites

Installation

Usage

Training the Model

Running the Simulation

Web Build

Deployment

Modifying the Code

Future Work

About

Uh oh!

Releases

Packages

Languages

akkshay0107/tvc-lander

Folders and files

Latest commit

History

Repository files navigation

tvc-lander

Live Demo

Features

PPO Model

Observation Space

Action Space

Project Structure

Prerequisites

Installation

Usage

Training the Model

Running the Simulation

Web Build

Deployment

Modifying the Code

Future Work

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages