Chem#
The synkit.Chem module provides utilities for reaction SMILES processing,
covering atom-map canonicalization, atom-map equivalence validation, and configurable
SMILES standardization. These tools are designed to make reactions comparable across
datasets and pipelines by enforcing consistent labeling and normalized string forms.
Canonicalization#
The class CanonRSMI standardizes reaction
SMILES and atom-map indices by computing a canonical relabeling of mapped atoms.
By default it employs a Weisfeiler–Lehman (WL) colour-refinement backend (wl_iterations=3)
to obtain a deterministic ordering that is consistent across isomorphic reactions
[1].
1from synkit.Chem.Reaction import CanonRSMI
2
3canon = CanonRSMI(backend='wl', wl_iterations=3)
4canon.canonicalise(
5 '[CH3:1][CH:2]=[O:3].[CH:4]([H:7])([H:8])[CH:5]=[O:6]'
6 '>>'
7 '[CH3:1][CH:2]=[CH:4][CH:5]=[O:6].[O:3]([H:7])([H:8])'
8)
9print(canon.canonical_rsmi)
Example output
'[CH:3]([CH3:7])=[O:8].[H:1][CH:4]([H:2])[CH:6]=[O:5]>>[CH:3](=[CH:4][CH:6]=[O:5])[CH3:7].[H:1][O:8][H:2]'
AAM comparison#
The class AAMValidator verifies atom-map
equivalence by constructing an Imaginary Transition State (ITS) graph for each reaction
and testing graph isomorphism via NetworkX’s VF2 algorithm. This ensures that two mapped
reactions induce the same ITS topology, i.e., they represent the same transformation under
different atom-map assignments [2].
1from synkit.Chem.Reaction import AAMValidator
2
3validator = AAMValidator()
4rsmi_1 = (
5 '[CH3:1][C:2](=[O:3])[OH:4].[CH3:5][OH:6]'
6 '>>'
7 '[CH3:1][C:2](=[O:3])[O:6][CH3:5].[OH2:4]'
8)
9rsmi_2 = (
10 '[CH3:5][C:1](=[O:2])[OH:3].[CH3:6][OH:4]'
11 '>>'
12 '[CH3:5][C:1](=[O:2])[O:4][CH3:6].[OH2:3]'
13)
14
15is_eq = validator.smiles_check(rsmi_1, rsmi_2, check_method='ITS')
16print(is_eq)
Example output
True
Standardization#
The class Standardize cleans and normalizes
reaction SMILES by applying RDKit sanitization and optional post-processing steps such as:
removing atom-map annotations (
remove_aam=True)stripping stereochemical labels (
ignore_stereo=True)
This produces a minimal, consistent representation suitable for indexing, deduplication, and downstream CRN construction.
1from synkit.Chem.Reaction.standardize import Standardize
2
3std = Standardize()
4rsmi = (
5 '[CH3:1][CH:2]=[O:3].[CH:4]([H:7])([H:8])[CH:5]=[O:6]'
6 '>>'
7 '[CH3:1][CH:2]=[CH:4][CH:5]=[O:6].[O:3]([H:7])([H:8])'
8)
9
10std_rsmi = std.fit(rsmi, remove_aam=True, ignore_stereo=True)
11print(std_rsmi)
Example output
'CC=O.CC=O>>CC=CC=O.O'
Tautomerization and functional-group support#
Tautomerize now uses SynKit’s native functional-group detector instead of
an external FG utility. The detector works on the same molecular graph
representation used elsewhere in SynKit, so tautomer targets and graph-indexed
functional-group labels stay aligned.
1from synkit.Graph.FG import smiles_to_graph_and_functional_groups
2
3graph, groups = smiles_to_graph_and_functional_groups("C=C(O)C")
4print(groups)
The tautomerization helper still keeps a small local compatibility rule for geminal diols. Those are treated as hydrated-carbonyl repair targets, not as a general public functional-group label.
See Also#
synkit.Graph— graph modeling and matching utilities