Substructure Searching – H-Count and Topology

By Jameed Hussain | Monday, October 1, 2018 - 23:27 UTC
Substructure Searching – H-Count and Topology

Substructure searching is a powerful tool when it comes to searching for small molecule analogues that fit the SAR requirements for a given biological target. With this great power comes great… complexity. Formulating an appropriate substructure query to capture the SAR requirements of your target can be a daunting task. The Daylight SMARTS language can be used to create sophisticated substructure queries (see below) but it’s a tool only appropriate for the more masochistic chemists.

SMARTS expression for a spiro ring center:


Taken from Daylight SMARTS examples page

One possible way to overcome the problem of generating a specific substructure query is to use a more general one and remove any unwanted compounds by eye. This can be an onerous task if the search query generates many hits. Hence any features that can be used to make the creation of substructure queries easier and reduce the number unwanted compounds from a substructure search are likely to be very useful for chemists. In this blog post, I’ll take you through a couple of these available in Chemselector.

Given the substructure query below (where the “A” is any non-hydrogen atom1), the search will retrieve compound 1 and compound 2. However, if you want to only allow substitutions at the position marked with an A, compound 2 is not what you want.


To get the query to only retrieve compounds where only the A is substituted, you need to add explicit hydrogens at every position you do not want a substitution (see below). This, as you can imagine is pretty tedious. To get the desired query search behaviour (without having to add hydrogens everywhere) you can use the “Preserve H-count” option in Chemselector.

Query where the hydrogens are added at positions where you do not want a substitution.

Query where the hydrogens are added at positions where you do not want a substitution.

Another useful (though probably more esoteric) feature is the “Preserve topology” option. This unsurprisingly preserves the topology or ring / non-ring nature of the query atoms. In the example below, compounds 1 and 2 are both valid hits for the shown substructure query. If you would like the search results not to contain hits where the carbonyl moiety is part of a ring system, the “Preserve topology” option can be used. With this option, compound 2 will not be retrieved as the circled carbonyl atom is in a ring (unlike the query). Note that it is possible to do achieve the same results using the SMARTS language, however very few chemists use SMARTS for substructure searching.


The two options (used alone or in combination) should result in a reduction in the number unwanted molecules retrieved from a search and hopefully make life just a little bit easier when it comes to substructure searching.

1.    Note: An asterisk (*) can also be used instead of the “A” to indicate a non-hydrogen atom.