OLMo Learns Chemistry

Can a General LLM Learn Chemistry?

In this project, I test if a general large language model (LLM) can learn to handle chemistry tasks well.

I start with OLMo-7B, a general LLM pre-trained on the DOLMA dataset. I then do continued pre-training using QLoRA.

I test the model on the MoleculeNet benchmark. I skip big datasets like QM9 and SIDER to save time and resources.

Continued Pre-Training

I train the model on 2.1 million raw SMILES strings from the smiles-molecules-chembl dataset.

I use QLoRA for pre-training with these settings:

Target: all_linear
Rank: 64
Alpha: 128
Learning rate: 5e-5
Training Script: RawSmiles.py (for ChemOlmo-7B)

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Chembl_2M_and_instruction		Chembl_2M_and_instruction
Notebooks		Notebooks
ZInc20_and_Upsto_pre_training_notebooks		ZInc20_and_Upsto_pre_training_notebooks
lightning_evalutaion		lightning_evalutaion
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OLMo Learns Chemistry

Can a General LLM Learn Chemistry?

Continued Pre-Training

Results

Classification

Regression

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OLMo Learns Chemistry

Can a General LLM Learn Chemistry?

Continued Pre-Training

Results

Classification

Regression

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages