Skip to content

Conversation

@hwilner
Copy link

@hwilner hwilner commented Nov 2, 2025

This PR implements Implicit Q-Learning (IQL), an offline reinforcement learning algorithm, addressing issue #329.

Overview

IQL is designed for learning from fixed datasets without online interaction. Unlike other offline RL methods, IQL avoids querying values of out-of-sample actions, which helps prevent overestimation and distributional shift issues.

Implementation

This implementation includes:

  • IQL Learner: Core training logic with expectile regression for value function, TD learning for Q-function, and advantage-weighted regression for policy extraction
  • IQL Networks: Policy, Q-function, and value function networks
  • IQL Builder: Constructs the IQL agent following Acme's builder pattern
  • IQL Config: Hyperparameter configuration
  • Example Script: run_iql_jax.py for training on D4RL datasets
  • Unit Tests: agent_test.py for component verification

Algorithm Details

IQL uses three key components:

  1. Value Function (V): Trained with expectile regression to estimate state values as an upper expectile of Q-values
  2. Q-Function: Trained with standard TD learning using the value function for next state values
  3. Policy: Trained with advantage-weighted regression to maximize Q-values while staying close to the data distribution

Code Quality

  • Follows Acme's established patterns (modeled after CQL agent)
  • Google-style docstrings throughout
  • Professional, academic writing style
  • 1,014 lines of well-documented code
  • Comprehensive README with usage examples

Testing

Unit tests verify:

  • Network creation
  • Config initialization
  • Builder construction
  • Learner creation and training steps

References

Kostrikov, I., Nair, A., & Levine, S. (2021). Offline Reinforcement Learning with Implicit Q-Learning. arXiv preprint arXiv:2110.06169. https://arxiv.org/abs/2110.06169

Fixes #329

Implements IQL offline RL algorithm with:
- Expectile regression for value function
- TD learning for Q-function
- Advantage-weighted regression for policy
- Complete learner, builder, and networks
- Comprehensive documentation

Addresses issue google-deepmind#329
- Example script for running IQL on D4RL datasets
- Unit tests for IQL components
- Follows CQL example pattern
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Implicit Q Learning

1 participant