Skip to content

steto123/lacan

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LACAN

LACAN filter: Leveraging adjacent co-ocurrence of atomic neighborhoods for molecular filtering

"All sorts of things in the world behave like mirrors" -Jacques Lacan

Some molecular fragments are common, but they have the tendency not to occur together. For example, alkyloxy radicals are frequent motifs in medicinal chemistry datasets, whereas the linkage of both radicals into a peroxide is rather uncommon. Likewise, halides and amines are some of the most commonly occurring atomic neighborhoods, and yet their pairing results in the unstable and toxic haloamine motif. We apply this concept using co-occurences of ECFP2 like atomic neighborhoods at the bond interface, and leverage co-occurence patterns to construct a molecular filter that highlights uncommon linkages.

Current version

This is version 0.0.2alpha. This version is still experimental, and breaking changes are still expected. Several changes have been added since 0.0.1alpha, including a change to manually hash the environments instead of relying on hacky usage of the rdkit morgan fingerprint generator. Also introduced in this version is functionality for molecule generation. This is currently unoptimized and subject to change.

Installation

clone this repo, activate your environment, navigate to root dir and run:

pip install .

Example notebooks

Some notebooks with typical use cases are provided in lacan/example_notebooks. Note that these notebook will need jupyter installed in the python environment. The molecule generation notebook additionally requires scikit-learn installed in the python environment.

Basic usage: Localizing problem bonds

import lacan and inspect a molecule by running the following commands:

from lacan import lacan
from rdkit import Chem
p = lacan.load_profile("chembl")
m = Chem.MolFromSmiles("c1ccccc1CCN(OCCc1occc1)")
score,info = lacan.score_mol(m,p)
print(info["bad_bonds"])

which will output a dictionary with an entry for every bond in the molecule. Currently the filter is binary, so the score is 1 if the molecule passes the filter and 0 if it doesn't. The problem bonds output follow rdkit bond numbering which means we can visualize problem bonds in our molecules easily as follows:

from rdkit.Chem import Draw
d = Draw.MolToImage(m,highlightBonds=info["bad_bonds"])
display(d)

giving the following result:

image

This correctly identified the N-O linkage as problematic.

Breeding molecules

This filter enables us to recombine fragments and filter out linkages that are rare in the reference set. Lacan has a "breeding" or crossover functionality where two molecules get fragmented and recombined. By subjecting the recombinations to LACAN filter we can retain only decent looking "median molecules".

example:

from lacan import breed
from rdkit import Chem
from rdkit.Chem import Draw

m1 = Chem.MolFromSmiles("c1cc(ccc1[C@@H]2CCNC[C@H]2COc3ccc4c(c3)OCO4)F")
m2 = Chem.MolFromSmiles("CNCCC(C1=CC=CC=C1)OC2=CC=C(C=C2)C(F)(F)F")
median_molecules = breed.breed(m1,m2,p,nmols=9)

this outputs the following molecules that are "in between" its parents fluoxetine and sertraline:

d = Draw.MolsToGridImage(median_molecules)
display(d)

image

Generating molecules

Random molecules can by generated simply using

ms = gen.generate_filtered_molecules(n_jobs=-1,
                                     n_molecules=9,
                                     profile=p,
                                     seed=456,
                                     min_atoms=20)

For generation towards a goal, see the example notebooks, which showcase this functionality.

Building a profile

If you want to build a custom profile using your own reference data set, this can be done through the LACAN cli as follows

python lacan.py -i your_dataset_here.smi -m profile -p my_new_profile

This will create a pickled profile in the data folder which you can then invoke using:

p = lacan.load_profile("my_new_profile")

About

LACAN filter: Leveraging adjacent co-ocurrence of atomic neighborhoods for molecular filtering

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Jupyter Notebook 99.5%
  • Python 0.5%