cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
Bernd2Heinen
Level III

Search functionality for substructures of SMILES

For Spotfire exists an Addin (Signals Inventa, former signal leads discovery) in which you can select a substructure of a SMILES graph and then search a database for all SMILES graphs that contain this substructure (isomorphic search). Does anyone know if there is software available with this functionality that could (easily) be connected to JMP?

5 REPLIES 5
Victor_G
Super User

Re: Search functionality for substructures of SMILES

Hi @Bernd2Heinen,

There are several packages available for chemoinformatics, but one of my favorite is RDKit.
You can use it via Python (and run it with JMP), see "Substructure Searching" on this link : https://www.rdkit.org/docs/GettingStartedInPython.html

Since I'm not a coder (my goal this year is to get to know Python and start using it), you can use RDKit through low/no-code platform like KNIME.

 

Another option that could be done in JMP directly would be to do a sort of Regex pattern identification script, that can highlight rows containing the same pattern as the one you have in your input/query. In your example from the image, it could be possible to do a sort of Regex search to find all rows in which the SMILES formula contains the pattern "C=1(C=CC=CC1)" for the phenyl group.

I hope this answer will help you,

Victor GUILLER
Scientific Expertise Engineer
L'Oréal - Data & Analytics
Bernd2Heinen
Level III

Re: Search functionality for substructures of SMILES

Thanks Victor,

I will take a look on RD Kit and going through KNIME might also help.

Regex search was my first thought as well, but different formulas can produce the same graph and then you would miss structures when you do a pattern search.

thanks again

Bernd

 

Victor_G
Super User

Re: Search functionality for substructures of SMILES

Hi @Bernd2Heinen,

To generate unique SMILES formula for each specific graph/structures, you can use Canonical SMILES. Each library/package (RDKit, ChemAxon, ...) may have different ways to create them, so Canonical SMILES might not be "universal", but if the SMILES are generated from the same tool, you will get only unique SMILES for the same structure, avoiding the problem you mentioned.
https://www.daylight.com/dayhtml/doc/theory/theory.smiles.html

Hope this might help you,
Victor GUILLER
Scientific Expertise Engineer
L'Oréal - Data & Analytics
louv
Staff (Retired)

Re: Search functionality for substructures of SMILES

Hi Bernd,

Not sure if this is helpful but I found this from Ian Cox in the knowledge base.

https://community.jmp.com/t5/JMP-Add-Ins/JMP-Add-In-to-Visualise-Molecular-SMILES-Strings/ta-p/22532

 

Regards, Lou

Re: Search functionality for substructures of SMILES

Hello @Bernd2Heinen ,

I created the new add-in Python wrapper for RDKit (you need JMP 18).

You can search using Add-ins > Toolkit for Materials Informatics > Substructure Searching.
If you have any feedback, please let me know.

https://community.jmp.com/t5/JMP-Add-Ins/Toolkit-for-Materials-Informatics/ta-p/750690