Chem#

The synkit.Chem module provides utilities for reaction SMILES processing, covering atom-map canonicalization, atom-map equivalence validation, and configurable SMILES standardization. These tools are designed to make reactions comparable across datasets and pipelines by enforcing consistent labeling and normalized string forms.

Canonicalization#

The class CanonRSMI standardizes reaction SMILES and atom-map indices by computing a canonical relabeling of mapped atoms. By default it employs a Weisfeiler–Lehman (WL) colour-refinement backend (wl_iterations=3) to obtain a deterministic ordering that is consistent across isomorphic reactions [1].

Canonicalizing a mapped reaction SMILES with WL refinement#
1from synkit.Chem.Reaction import CanonRSMI
2
3canon = CanonRSMI(backend='wl', wl_iterations=3)
4canon.canonicalise(
5    '[CH3:1][CH:2]=[O:3].[CH:4]([H:7])([H:8])[CH:5]=[O:6]'
6    '>>'
7    '[CH3:1][CH:2]=[CH:4][CH:5]=[O:6].[O:3]([H:7])([H:8])'
8)
9print(canon.canonical_rsmi)

Example output

'[CH:3]([CH3:7])=[O:8].[H:1][CH:4]([H:2])[CH:6]=[O:5]>>[CH:3](=[CH:4][CH:6]=[O:5])[CH3:7].[H:1][O:8][H:2]'

AAM comparison#

The class AAMValidator verifies atom-map equivalence by constructing an Imaginary Transition State (ITS) graph for each reaction and testing graph isomorphism via NetworkX’s VF2 algorithm. This ensures that two mapped reactions induce the same ITS topology, i.e., they represent the same transformation under different atom-map assignments [2].

Checking whether two mapped reactions are atom-map equivalent#
 1from synkit.Chem.Reaction import AAMValidator
 2
 3validator = AAMValidator()
 4rsmi_1 = (
 5    '[CH3:1][C:2](=[O:3])[OH:4].[CH3:5][OH:6]'
 6    '>>'
 7    '[CH3:1][C:2](=[O:3])[O:6][CH3:5].[OH2:4]'
 8)
 9rsmi_2 = (
10    '[CH3:5][C:1](=[O:2])[OH:3].[CH3:6][OH:4]'
11    '>>'
12    '[CH3:5][C:1](=[O:2])[O:4][CH3:6].[OH2:3]'
13)
14
15is_eq = validator.smiles_check(rsmi_1, rsmi_2, check_method='ITS')
16print(is_eq)

Example output

True

Standardization#

The class Standardize cleans and normalizes reaction SMILES by applying RDKit sanitization and optional post-processing steps such as:

  • removing atom-map annotations (remove_aam=True)

  • stripping stereochemical labels (ignore_stereo=True)

This produces a minimal, consistent representation suitable for indexing, deduplication, and downstream CRN construction.

Standardizing a reaction SMILES (remove atom maps and ignore stereo)#
 1from synkit.Chem.Reaction.standardize import Standardize
 2
 3std = Standardize()
 4rsmi = (
 5    '[CH3:1][CH:2]=[O:3].[CH:4]([H:7])([H:8])[CH:5]=[O:6]'
 6    '>>'
 7    '[CH3:1][CH:2]=[CH:4][CH:5]=[O:6].[O:3]([H:7])([H:8])'
 8)
 9
10std_rsmi = std.fit(rsmi, remove_aam=True, ignore_stereo=True)
11print(std_rsmi)

Example output

'CC=O.CC=O>>CC=CC=O.O'

Tautomerization and functional-group support#

Tautomerize now uses SynKit’s native functional-group detector instead of an external FG utility. The detector works on the same molecular graph representation used elsewhere in SynKit, so tautomer targets and graph-indexed functional-group labels stay aligned.

Detecting tautomer-relevant functional groups#
1from synkit.Graph.FG import smiles_to_graph_and_functional_groups
2
3graph, groups = smiles_to_graph_and_functional_groups("C=C(O)C")
4print(groups)

The tautomerization helper still keeps a small local compatibility rule for geminal diols. Those are treated as hydrated-carbonyl repair targets, not as a general public functional-group label.

See Also#

  • synkit.Graph — graph modeling and matching utilities