GLADOS-1

📕 Release Blog | 🤗 Hugging Face Model | 🔧 Deployment (via UI-TARS) | 🖥️ Running on your own computer (via UI-TARS Desktop)

GLADOS-1 is the first computer-use (CUA) model post-trained using collective, crowd-sourced trajectories via the PANGO dataset.

Overview

Heavily inspired by the Qwen-2VL-Finetune repository, this project provides a framework for training vision-language models on GUI interaction data. While this code represents sample code for post-training UI-TARS-7B-SFT via ByteDance Seed, it can be trivially updated for any model based on the Qwen2-VL architecture.

The PANGO (Productivity Applications with Natural GUI Observations and trajectories) dataset contains real user interactions with web interfaces, converted into training conversations for multimodal models.

Dataset Structure

Each session in the PANGO dataset contains:

Screenshots: GUI state images at different timestamps
Actions: User interactions (clicks, drags, typing, etc.)
Metadata: Session IDs, timestamps, and other inputs

Action Types

The dataset supports various GUI interaction types:

Supported Actions:

click - Single left mouse clicks
left_double - Double left mouse clicks
right_single - Right mouse clicks
drag - Mouse drag operations (converted from drag_start/drag_end pairs)
key_press - Keyboard key presses
input - Text input actions
scroll - Scroll wheel actions

Ignored Actions:

mouseover_start / mouseover_end - Mouse hover events
drag_start / drag_end - Individual drag events (converted to single drag)

Converters

Converters transform raw Pango data into training conversations. Each converter implements a specific training purpose:

1. SimpleGroundingConverter

Input: Single screenshot and instruction
Output: Action prediction
Use Case: Instruction-following GUI automation

2. StateTransitionConverter

Input: Before and after screenshots
Output: Action prediction
Use Case: Reverse engineering user interactions

3. MultiTurnConverter (Beta)

Input: Conversational history containing screenshots and actions
Output: Action prediction
Use Case: Multi-turn conversation training

Getting Started

Installation

# Install uv package manager
brew install uv

# Install dependencies
make install

Training

# Train with grounding dataset
make train

# Train with state transition dataset
make train_state_transition

Storage Requirements

During setup, the script image_downloader script will download all images to the STORAGE_DIR directory. The estimated storage requirements for the pango-sample dataset is 15 GB, and 265 GB for the pango full dataset. Note, the image downloader script has a hardcoded buffer of 50GB, adjust and rebuild if this is an issue.

Extending Converters

To create a new converter:

Inherit from BasePangoConverter:

from code.converters.base_pango_converter import BasePangoConverter

class MyConverter(BasePangoConverter):
    def __init__(self, dataset_path: str, prompt: str, **kwargs):
        super().__init__(dataset_path, actions_to_ignore=[...], **kwargs)
        self.prompt = prompt

Implement required methods:

def generate_conversation(self, *args, **kwargs) -> list:
    """Convert actions to training conversation format"""
    # Return list of conversation frames with:
    # - role: "user" or "assistant"
    # - content: text/image content
    # - loss_mask: 0 (ignore) or 1 (train on)
    pass

def generate_indices(self, n: int, pct_train: float) -> tuple[list, list]:
    """Generate train/test indices for the dataset"""
    # Return (train_indices, test_indices)
    pass

Add action handling (as needed):

def _handle_custom_action(self, action: dict, original_dims, scaled_dims):
    """Handle new action types"""
    # Convert pango action to model action format
    return action_content

Create corresponding dataset class:

from torch.utils.data import Dataset

class MyDataset(Dataset):
    def __init__(self, indices: list[int], converter: MyConverter, processor):
        self.indices = indices
        self.converter = converter
        self.processor = processor

    def __getitem__(self, idx):
        # Convert index to training sample
        return {"input_ids": ..., "attention_mask": ..., "labels": ...}

Key Implementation Notes

Coordinate Scaling: Actions use standardized coordinates (0-1000 range)
Image Processing: Screenshots are resized and processed using fetch_image from qwen-vl-utils
Error Handling: Use _handle_error() and _handle_malformatted_action() for graceful failures
Lazy Loading: Images are loaded on demand during training by the __getitem__ method on the dataset class

Project Structure

code/
├── converters/         # Data conversion logic
├── datasets/           # PyTorch dataset implementations
├── training/           # Training scripts and utilities
└── train.py            # Main training entry point
└── utils.py            # Utility functions
└── consts.py           # Constants
└── exceptions.py       # Custom exceptions
└── tests/              # Test files

Citation

@misc{chakralabs2025glados-1,
  author = {Chakra Labs},
  title = {GLADOS-1},
  url = {https://github.com/Chakra-Network/GLADOS-1},
  year = {2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
code		code
example_data		example_data
tests		tests
.env.sample		.env.sample
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GLADOS-1

Overview

Dataset Structure

Action Types

Converters

1. SimpleGroundingConverter

2. StateTransitionConverter

3. MultiTurnConverter (Beta)

Getting Started

Installation

Training

Storage Requirements

Extending Converters

Key Implementation Notes

Project Structure

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

Chakra-Network/GLADOS-1

Folders and files

Latest commit

History

Repository files navigation

GLADOS-1

Overview

Dataset Structure

Action Types

Converters

1. SimpleGroundingConverter

2. StateTransitionConverter

3. MultiTurnConverter (Beta)

Getting Started

Installation

Training

Storage Requirements

Extending Converters

Key Implementation Notes

Project Structure

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages