LACAN is a cheminformatics toolkit for scoring, mutating, and generating drug-like molecules using a statistical model of chemical bond environments learned from ChEMBL. It is designed as a library for generative chemistry pipelines and includes an adaptive genetic algorithm that can optimise molecules toward any user-defined scoring function.
"All sorts of things in this world behave like mirrors." — Jacques Lacan
📖 Full documentation: https://lacan.readthedocs.io/en/latest/
📝 Preprint: https://doi.org/10.26434/chemrxiv.15001196/v1
For every bond in a molecule, LACAN computes a pair of ECFP2-like atom environment identifiers, one per endpoint. Each identifier encodes atomic number, degree, hydrogen count, formal charge, and ring type (none / non-aromatic / aromatic), hashed to a 32-bit integer. The two hashes form a bond pair.
A profile (e.g. chembl.pickle) stores, for a large training corpus:
idx: how often each atom environment appearspairs: how often each bond-pair co-occurssetsize: total number of bonds seen
The pointwise mutual information (PMI) for a single bond:
observed = pairs[(env1, env2)] / setsize
expected = (idx[env1] / setsize / 2) × (idx[env2] / setsize / 2)
bond_PMI = observed / expected
The molecule-level score uses the minimum per-bond PMI:
score = min_PMI / (1 + min_PMI)
A score near 0 means at least one bond is chemically unusual; near 1.0 means all bond environments are well-represented in the training data. Bonds below a threshold (default PMI < 0.05) are reported as bad_bonds.
from rdkit import Chem
from lacan import lacan, gen
profile = lacan.load_profile("chembl")
# Score a molecule
mol = Chem.MolFromSmiles("CCCc1nn(C)c2c(=O)[nH]c(-c3ccccc3)nc12")
score, info = lacan.score_mol(mol, profile)
print(f"Score: {score:.3f} bad bonds: {info['bad_bonds']}")
# Generate drug-like molecules
mols = gen.generate_filtered_molecules(profile, n_molecules=100, n_jobs=-1)
# Optimise toward any scoring function
def my_score(mols):
return [lacan.score_mol(m, profile)[0] for m in mols]
winners = gen.generate_optimized_molecules(my_score, profile,
startN=50, generations=20)
# Returns: list of (smiles, score) sorted best-firstfrom rdkit import Chem
from lacan.lacan import get_profile_for_mols
suppl = Chem.SmilesMolSupplier("my_molecules.smi", titleLine=False)
profile = get_profile_for_mols(suppl, profile_name="my_profile", n_jobs=-1)
# Saved to lacan/data/my_profile.pickle
# Reload with: lacan.load_profile("my_profile")For the full module reference, GA parameters, corpus biasing, and protection API see the documentation.
Worked examples are in lacan/example_notebooks/:
| Notebook | Contents |
|---|---|
generate_molecules.ipynb |
Random generation, corpus biasing, running the GA |
optimize_from_mol.ipynb |
Lead optimisation from a seed molecule, pharmacophore protection, mol_cleaner |
mutating_molecules.ipynb |
Atom-level mutations and score filtering |
evaluate_bonds.ipynb |
Per-bond PMI scoring and visualisation |
median_molecules.ipynb |
Molecular crossover |
shape_optimize_vortioxetine.ipynb |
3D shape-guided scaffold hopping with pharmacophore locking |
pip install lacanInstallation is done via Pip. This package requires Python ≥ 3.9 and RDKit.
For installing from source:
git clone https://github.com/dehaenw/lacan.git
cd lacan
pip install .pip install pytest
pytest # full suite (~155 tests)
pytest tests/test_protect.py -v # single moduleIf you use LACAN in your research, please cite the preprint:
Wim Dehaen. LACAN: Leveraging adjacent co-occurrence of atomic neighborhoods for molecular scoring and generation Authors. ChemRxiv. 24 March 2026.
DOI: https://doi.org/10.26434/chemrxiv.15001196/v1