Skip to content

[AAAI 2026] MolSight: Optical Chemical Structure Recognition with SMILES Pretraining, Multi-Granularity Learning and Reinforcement Learning

License

Notifications You must be signed in to change notification settings

hustvl/MolSight

Repository files navigation

MolSight: Optical Chemical Structure Recognition with SMILES Pretraining, Multi-Granularity Learning and Reinforcement Learning

AAAI 2026 Accepted Paper

Wenrui Zhang1 · Xinggang Wang1 · Bin Feng1 · Wenyu Liu1

1School of Electronic Information and Communications, Huazhong University of Science and Technology

Paper Code License

📖 Introduction

MolSight is a comprehensive learning framework for Optical Chemical Structure Recognition (OCSR), designed to bridge the gap between computer vision and chemical informatics (AI4S).

Accurately translating molecular images into machine-readable formats (like SMILES) is critical for drug discovery and digital chemistry. MolSight addresses the limitations of previous methods—particularly in handling complex stereoisomers—through a novel three-stage training paradigm:

  1. SMILES Pretraining: Aligns visual representations with chemical strings.
  2. Multi-Granularity Fine-Tuning: Captures both global structure and local functional group details.
  3. RL Post-Training: Utilizes Reinforcement Learning to optimize for chemical semantic correctness rather than simple token matching.

✨ Key Features

  • First RL-based OCSR: MolSight is the first OCSR system to integrate Reinforcement Learning. We utilize Group Relative Policy Optimization (GRPO) to directly optimize chemical validity[c.
  • Stereo-200k Dataset: We introduce a new annotated dataset consisting of 200,000 challenging stereoisomeric molecules specifically curated to address confusion in 3D chiral structures.
image
  • SOTA Performance: Extensive experiments demonstrate that MolSight achieves state-of-the-art results in accuracy, similarity, and robustness, outperforming classical and learning-based baselines.

🔥 News

  • [2025-11-26] 🎉 MolSight has been accepted to AAAI 2026!
  • [2025-11-26] 🚀 Code released.

Updates

  • Release code
  • Release Stereo-200k dataset
  • Release model weights

Getting Started

Installation

# Clone the repository
git clone https://github.com/hustvl/MolSight
cd MolSight

# Install dependencies
pip install -r requirements.txt

Data

Training Datasets

  1. Pretrain dataset: MolParser-7M
  2. SFT datasets: PubChem-1M, USPTO-680k
  3. RL dataset: Stereo-200k

Evaluation Datasets

Notes: The Stereo dataset is introduced for the first time in this work, consisting entirely of stereoisomeric molecules.

Weights

Name Predict Field Description Acc. on USPTO
MolSight-base SMILES & edge Trained on PubChem-1M and USPTO-680k for 10 epochs. 91.2
MolSight-coord SMILES & edge & coord Continue trained on PubChem-1M for 2 epochs to get a coord head. 91.1
MolSight-stereo SMILES Continue trained on Stereo-200k with LoRA for 2 epochs to get better performance on stereo molecules. 90.3
MolSight-extra SMILES & edge Similar to MolSight-base, but with extra training steps (30 epochs), usually can get better evaluation score. 92.0
MolSight-Markush SMILES Finetuned on MarkushGrapher, can predict SMILES-M to deal with Markush structures. -

Training

Start MolSight training with:

# SFT
bash train.sh
# train the additional coord predictor
bash train_loc_predictor.sh
# post training with RL
bash post_train.sh

Citation

If you find MolSight or the Stereo-200k dataset useful for your research in AI4Science or Chemistry, please cite our paper:

@article{zhang2025molsight,
  title={MolSight: Optical Chemical Structure Recognition with SMILES Pretraining, Multi-Granularity Learning and Reinforcement Learning},
  author={Zhang, Wenrui and Wang, Xinggang and Feng, Bin and Liu, Wenyu},
  journal={arXiv preprint arXiv:2511.17300},
  year={2025}
}

Acknowledgement

This project has referenced some excellent open-sourced repos (MolScribe, trl, Whisper, MMPose). Thanks for their wonderful works and contributions to the community.

About

[AAAI 2026] MolSight: Optical Chemical Structure Recognition with SMILES Pretraining, Multi-Granularity Learning and Reinforcement Learning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •