Chem#
Chemical utilities for reactions, molecules, fingerprints, clustering, and related helpers.
Reaction#
- class synkit.Chem.Reaction.canon_rsmi.CanonRSMI(backend: str = 'wl', wl_iterations: int = 3, morgan_radius: int = 3, node_attrs: List[str] = ('element', 'aromatic', 'charge', 'hcount'))[source]#
Bases:
objectA pure-Python / pure-NetworkX utility for canonicalizing reaction SMILES by expanding atom-maps and deterministically reindexing reaction graphs.
Workflow#
Expand atom-maps on reactants to ensure each atom has a unique map ID.
Convert reaction SMILES to reactant/product NetworkX graphs.
Canonicalize the reactant graph using GraphCanonicaliser (generic or WL backend).
Match atom-map IDs to compute pairwise indices between reactants and products.
Remap the product graph to align with the canonical reactant ordering.
Sync each node’s atom_map attribute to its new graph index.
Reassemble the reaction SMILES from the canonical graphs.
Classes#
CanonRSMI – Main interface for transforming any reactants>>products SMILES into a canonicalized form, preserving all node and edge attributes.
Example#
>>> from canonical_rsm import CanonRSMI >>> canon = CanonRSMI(backend='wl', wl_iterations=5) >>> result = canon.canonicalise('[CH3:3][CH2:5][OH:10]>>[CH2:3]=[CH2:5].[OH2:10]') >>> print(result.canonical_rsmi) [OH:1][CH2:3][CH3:2]>>[CH2:2]=[CH2:3].[OH2:1]
- property canonical_hash: str | None#
Reaction-level hash combining reactant and product canonical hashes.
- canonicalise(rsmi: str) CanonRSMI[source]#
- Full pipeline returning self with properties populated:
raw_rsmi
raw_reactant_graph, raw_product_graph
mapping_pairs
canonical_reactant_graph, canonical_product_graph
canonical_rsmi
- expand_aam(rsmi: str) str[source]#
Assign new atom-map IDs to unmapped reactant atoms in ‘reactants>>products’ SMILES.
New IDs start at max(existing maps)+1.
- static get_aam_pairwise_indices(G: Graph, H: Graph, aam_key: str = 'atom_map') List[Tuple[int, int]][source]#
Return sorted list of (G_node, H_node) for shared atom-map IDs.
- property mapping_pairs: List[Tuple[int, int]] | None#
List of atom-map index pairs between reactants and products.
- class synkit.Chem.Reaction.standardize.Standardize[source]#
Bases:
objectUtilities to normalize and filter reaction and molecule SMILES.
This class provides methods to remove atom‑mapping, filter invalid molecules, canonicalize reaction SMILES, and a full pipeline via fit.
- Variables:
None – Stateless helper class.
- static categorize_reactions(reactions: List[str], target_reaction: str) Tuple[List[str], List[str]][source]#
Partition reactions into those matching a target and those not.
- static filter_valid_molecules(smiles_list: List[str]) List[Mol][source]#
Filter and sanitize a list of SMILES, returning only valid Mol objects.
- Parameters:
smiles_list (List[str]) – List of SMILES strings to validate.
- Returns:
List of sanitized RDKit Mol objects.
- Return type:
List[rdkit.Chem.Mol]
- fit(rsmi: str, remove_aam: bool = True, ignore_stereo: bool = True, remove_invalid: bool = True) str | None[source]#
Full standardization pipeline: strip atom‑mapping, normalize SMILES, fix hydrogen notation.
- Parameters:
rsmi (str) – Reaction SMILES to process.
remove_aam (bool) – If True, remove atom‑mapping annotations. Defaults to True.
ignore_stereo (bool) – If True, drop stereochemistry. Defaults to True.
remove_invalid (bool) – If True, drop invalid fragments and standardize remaining molecules. If False, return None when any invalid fragment exists. Defaults to True.
- Returns:
The standardized reaction SMILES, or None if standardization fails.
- Return type:
Optional[str]
- static remove_atom_mapping(reaction_smiles: str, symbol: str = '>>') str[source]#
Remove atom‑map numbers from a reaction SMILES string.
- Parameters:
- Returns:
Reaction SMILES without atom‑mapping annotations.
- Return type:
- Raises:
ValueError – If the input format is invalid or contains invalid SMILES.
- static standardize_rsmi(rsmi: str, stereo: bool = False, remove_invalid: bool = True) str | None[source]#
Normalize a reaction SMILES: validate molecules, sort fragments, optionally keep stereo.
- Parameters:
rsmi (str) – Reaction SMILES in ‘reactants>>products’ format.
stereo (bool) – If True, include stereochemistry in the output. Defaults to False.
remove_invalid (bool) – If True, drop invalid fragments and standardize remaining molecules. If False, return None when any invalid fragment exists. Defaults to True.
- Returns:
Standardized reaction SMILES or None if no valid molecules remain.
- Return type:
Optional[str]
- Raises:
ValueError – If the input format is invalid.
- class synkit.Chem.Reaction.aam_validator.AAMValidator[source]#
Bases:
objectA utility class for validating atom‐atom mappings (AAM) in reaction SMILES.
Provides methods to compare mapped SMILES against ground truth by using reaction‐center (RC) or ITS‐graph isomorphism checks, including tautomer enumeration support and batch validation over tabular data.
Quick start#
>>> from synkit.Chem.Reaction import AAMValidator >>> validator = AAMValidator() >>> rsmi_1 = ( '[CH3:1][C:2](=[O:3])[OH:4].[CH3:5][OH:6]' '>>' '[CH3:1][C:2](=[O:3])[O:6][CH3:5].[OH2:4]') >>> rsmi_2 = ( '[CH3:5][C:1](=[O:2])[OH:3].[CH3:6][OH:4]' '>>' '[CH3:5][C:1](=[O:2])[O:4][CH3:6].[OH2:3]') >>> is_eq = validator.smiles_check(rsmi_1, rsmi_2, check_method='ITS') >>> print(is_eq) >>> True
- static check_equivariant_graph(its_graphs: List[Graph]) Tuple[List[Tuple[int, int]], int][source]#
Identify all pairs of isomorphic ITS graphs.
- static check_pair(mapping: Dict[str, str], mapped_col: str, ground_truth_col: str, check_method: str = 'RC', ignore_aromaticity: bool = False, ignore_tautomers: bool = True) bool[source]#
Validate a single record (dict) entry for equivalence.
- Parameters:
mapping (dict of str→str) – A record containing both mapped and ground‐truth SMILES.
mapped_col (str) – Key for the mapped SMILES in mapping.
ground_truth_col (str) – Key for the ground-truth SMILES in mapping.
check_method (str) – “RC” or “ITS”.
ignore_aromaticity (bool) – If True, ignore aromaticity in ITS construction.
ignore_tautomers (bool) – If True, skip tautomer enumeration.
- Returns:
Validation result for this single pair.
- Return type:
- static smiles_check(mapped_smile: str, ground_truth: str, check_method: str = 'RC', ignore_aromaticity: bool = False) bool[source]#
Validate a single mapped SMILES string against ground truth.
- Parameters:
mapped_smile (str) – The mapped SMILES to validate.
ground_truth (str) – The reference SMILES string.
check_method (str) – Which method to use: “RC” for reaction‐center graph or “ITS” for full ITS‐graph isomorphism.
ignore_aromaticity (bool) – If True, ignore aromaticity differences in ITS construction.
- Returns:
True if exactly one isomorphic match is found; False otherwise.
- Return type:
- static smiles_check_tautomer(mapped_smile: str, ground_truth: str, check_method: str = 'RC', ignore_aromaticity: bool = False) bool | None[source]#
Validate against all tautomers of a ground truth SMILES.
- Parameters:
- Returns:
True if any tautomer matches.
False if none match.
None if an error occurs.
- Return type:
bool or None
- static validate_smiles(data: DataFrame | List[Dict[str, str]], ground_truth_col: str = 'ground_truth', mapped_cols: List[str] = ['rxn_mapper', 'graphormer', 'local_mapper'], check_method: str = 'RC', ignore_aromaticity: bool = False, n_jobs: int = 1, verbose: int = 0, ignore_tautomers: bool = True) List[Dict[str, str | float | List[bool]]][source]#
Batch-validate mapped SMILES in tabular or list-of-dicts form.
- Parameters:
data (pandas.DataFrame or list of dict) – A pandas DataFrame or list of dicts, each row containing at least ground_truth_col and each entry in mapped_cols.
ground_truth_col (str) – Column/key name for the ground-truth SMILES.
mapped_cols (list of str) – List of column/key names for mapped SMILES to validate.
check_method (str) – “RC” or “ITS” validation method.
ignore_aromaticity (bool) – If True, ignore aromaticity in ITS construction.
n_jobs (int) – Number of parallel jobs to use (joblib).
verbose (int) – Verbosity level for parallel execution.
ignore_tautomers (bool) – If True, use simple pairwise check; otherwise enumerate tautomers.
- Returns:
A list of dicts, one per mapper, with keys: - “mapper”: the mapper name - “accuracy”: percentage correct (float) - “results”: list of individual bool results - “success_rate”: mapping success rate metric
- Return type:
- Raises:
ValueError – If data is not a DataFrame or list of dicts.
- class synkit.Chem.Reaction.balance_check.BalanceReactionCheck(n_jobs: int = 4, verbose: int = 0)[source]#
Bases:
objectCheck elemental balance of chemical reactions in SMILES format.
Supports checking single reactions, reaction dictionaries, or lists in parallel.
- Variables:
n_jobs – Number of parallel jobs for batch checking.
verbose – Verbosity level for joblib.
- static dict_balance_check(reaction_dict: Dict[str, str], rsmi_column: str) Dict[str, Any][source]#
Check balance for a single reaction dict, preserving original keys.
- dicts_balance_check(input_data: str | List[str | Dict[str, str]], rsmi_column: str = 'reactions') Tuple[List[Dict[str, Any]], List[Dict[str, Any]]][source]#
Batch‐check balance for multiple reactions, in parallel.
- Parameters:
- Returns:
Tuple (balanced_list, unbalanced_list) of dicts each including “balanced”.
- Return type:
- static get_combined_molecular_formula(smiles: str) str[source]#
Compute the molecular formula of a SMILES.
- static parse_input(input_data: str | List[str | Dict[str, str]], rsmi_column: str = 'reactions') List[Dict[str, str]][source]#
Normalize input into a list of reaction‐dicts.
- Parameters:
- Returns:
List of dicts with a single key rsmi_column mapping to each reaction.
- Return type:
- Raises:
ValueError – If input_data is neither str nor list.
- class synkit.Chem.Reaction.cleaning.Cleaning[source]#
Bases:
objectUtilities for cleaning and filtering reaction SMILES lists.
Methods#
- remove_duplicates(smiles_list)
Remove duplicate SMILES while preserving input order.
- clean_smiles(smiles_list)
Standardize, balance‑check, and deduplicate a list of reaction SMILES.
- class synkit.Chem.Reaction.deionize.Deionize[source]#
Bases:
objectNeutralize ionic species and mixtures of ions in reactions.
Provides methods to group ions into neutral combinations, uncharge individual anions/cations, and apply these corrections to SMILES strings or entire reaction dictionaries.
- static ammonia_hydroxide_standardize(reaction_smiles: str) str[source]#
Simplify ammonium hydroxide pairs in a reaction SMILES.
- classmethod apply_uncharge_smiles_to_reactions(reactions: List[Dict[str, Any]], uncharge_smiles_func: Callable[[str], str], n_jobs: int = 4) List[Dict[str, Any]][source]#
Apply a neutralization function to each reaction’s reactants/products in parallel.
Adds keys ‘new_reactants’, ‘new_products’, and ‘standardized_reactions’ based on uncharged SMILES and verifies formula balance.
- Parameters:
- Returns:
List of updated reaction dicts with: - ‘success’: bool indicating formula match - ‘new_reactants’ / ‘new_products’ - ‘standardized_reactions’
- Return type:
List[Dict[str, Any]]
- static random_pair_ions(charges: List[int], smiles: List[str]) Tuple[List[List[str]], List[List[int]]][source]#
Identify non‑overlapping groups of ions whose charges sum to zero.
- Parameters:
- Returns:
A tuple of two lists: - groups of SMILES strings forming neutral sets, - groups of their corresponding charges.
- Return type:
- static uncharge_anion(smiles: str, charges: int = -1) str[source]#
Neutralize an anionic SMILES string.
- static uncharge_cation(smiles: str, charges: int = 1) str[source]#
Neutralize a cationic SMILES string.
- class synkit.Chem.Reaction.fix_aam.FixAAM[source]#
Bases:
objectUtilities for incrementing and correcting atom‐atom mapping (AAM) numbers in molecules and reaction SMILES.
- Provides methods to:
Increment AAM on all atoms of an RDKit Mol.
Adjust AAM numbers in a standalone SMILES string.
Apply the same adjustment to both sides of a reaction SMILES (RSMI).
- static fix_aam_rsmi(rsmi: str) str[source]#
Apply atom‐map increment to both reactant and product sides of a reaction SMILES.
- static fix_aam_smiles(smiles: str) str[source]#
Parse a SMILES string, increment all atom map numbers, and return updated SMILES.
- Parameters:
smiles (str) – SMILES string containing atom‐map annotations.
- Returns:
SMILES string with every atom‐map number increased by one.
- Return type:
- Raises:
ValueError – If the input SMILES cannot be parsed into an RDKit Mol.
- class synkit.Chem.Reaction.neutralize.Neutralize[source]#
Bases:
objectNeutralize unbalanced charges in chemical reactions by adding counter‑ions.
Provides utilities to calculate formal charges, parse reaction SMILES, and adjust reactants/products with [Na+] or [Cl‑] to restore neutrality.
- static calculate_charge_dict(reaction: Dict[str, Any], reaction_column: str) Dict[str, str | int][source]#
Compute and store the total formal charge of the products in a reaction dict.
- Parameters:
- Returns:
The same dictionary updated with: - ‘reactants’: reactant SMILES or None - ‘products’: product SMILES or None - ‘total_charge_in_products’: integer sum of product charges or None
- Return type:
- static fix_negative_charge(reaction_dict: Dict[str, Any], charges_column: str = 'total_charge_in_products', id_column: str = 'R-id', reaction_column: str = 'reactions') Dict[str, Any][source]#
Add [Na+] ions to neutralize negative product charge.
- Parameters:
reaction_dict (Dict[str, Any]) – Dictionary with ‘reactants’, ‘products’, and charge info.
charges_column (str) – Key for product total charge. Defaults to ‘total_charge_in_products’.
id_column (str) – Key for reaction identifier. Defaults to ‘R-id’.
reaction_column (str) – Key for reaction SMILES to update. Defaults to ‘reactions’.
- Returns:
New dictionary with: - updated reaction_column including added [Na+] ions - ‘reactants’ and ‘products’ with ions appended - charge column set to 0
- Return type:
Dict[str, Any]
- static fix_positive_charge(reaction_dict: Dict[str, Any], charges_column: str = 'total_charge_in_products', id_column: str = 'R-id', reaction_column: str = 'reactions') Dict[str, Any][source]#
Add [Cl‑] ions to neutralize positive product charge.
- Parameters:
reaction_dict (Dict[str, Any]) – Dictionary with ‘reactants’, ‘products’, and charge info.
charges_column (str) – Key for product total charge. Defaults to ‘total_charge_in_products’.
id_column (str) – Key for reaction identifier. Defaults to ‘R-id’.
reaction_column (str) – Key for reaction SMILES to update. Defaults to ‘reactions’.
- Returns:
New dictionary with: - updated reaction_column including added [Cl‑] ions - ‘reactants’ and ‘products’ with ions appended - charge column set to 0
- Return type:
Dict[str, Any]
- static fix_unbalanced_charged(reaction_dict: Dict[str, Any], reaction_column: str) Dict[str, Any][source]#
Detect and neutralize unbalanced product charge by adding counter‑ions.
- classmethod parallel_fix_unbalanced_charge(reaction_dicts: List[Dict[str, Any]], reaction_column: str, n_jobs: int = 4) List[Dict[str, Any]][source]#
Neutralize charges in multiple reaction dictionaries in parallel.
- Parameters:
- Returns:
List of dictionaries with balanced charges and updated SMILES.
- Return type:
List[Dict[str, Any]]
- class synkit.Chem.Reaction.radical_wildcard.RadicalWildcardAdder(start_map: int | None = None)[source]#
Bases:
objectA utility for adding wildcard dummy atoms ([*]) to radical centers in reaction SMILES, with unique incremental atom-map indices and correct propagation into products.
Each reactive radical atom in the reactant block is identified by its unpaired electron count, assigned one or more wildcard map indices, and recorded. The same wildcard(s) are then appended to the corresponding atom(s) in the product block, ensuring consistent mapping.
- Parameters:
start_map (Optional[int]) – If provided, this integer will be the first atom-map index used for wildcard dummy atoms; subsequent radicals get incremented indices. If None, the next unused index is auto-determined from the input SMILES.
Example#
>>> adder = RadicalWildcardAdder(start_map=8) >>> rxn = "[C:2][OH:4].[O:6][H:7]>>[C:2][O:6].[OH:4][H:7]" >>> print(adder.transform(rxn)) [C:2]([OH:4])[*:8].[O:6]([H:7])[*:9]>>[C:2]([O:6][*:9])[*:8].[OH:4][H:7]
- transform(rxn_smiles: str) str[source]#
Append wildcard dummy atoms to each radical center in the reactant block and propagate the same wildcards to the matching atoms in the product block.
- Parameters:
rxn_smiles (str) – Reaction SMILES string, two-component or three-component.
- Returns:
Modified reaction SMILES with consistent wildcard attachments.
- Return type:
- Raises:
ValueError – If the SMILES is not valid or fragments fail to parse.
- synkit.Chem.Reaction.radical_wildcard.clean_wc(rsmi: str, invert: bool = False, max_frag: bool = False, wild_card: bool = True) str[source]#
Clean wildcard-containing fragments from one side of a reaction SMILES, optionally selecting the largest remaining fragment.
- Parameters:
- Returns:
The processed reaction SMILES.
- Return type:
- Raises:
ValueError – If input does not split into reactant and product.
Example#
>>> clean_wc('A.B>>C.*', invert=False, wild_card=True) 'A.B>>C' >>> clean_wc('A.B>>C.D', invert=False, max_frag=True) 'A.B>>C'
- class synkit.Chem.Reaction.tautomerize.Tautomerize[source]#
Bases:
objectStandardize molecules by converting enol and hemiketal tautomers into their more stable carbonyl forms, and apply these corrections to individual SMILES or collections of reaction data.
- static fix_dict(data: Dict[str, str], reaction_column: str) Dict[str, str][source]#
Standardize the reactant and product SMILES in a reaction dictionary.
- static fix_dicts(data: List[Dict[str, str]], reaction_column: str, n_jobs: int = 4, verbose: int = 0) List[Dict[str, str]][source]#
Standardize multiple reaction dictionaries in parallel.
- Parameters:
data (List[Dict[str, str]]) – List of dictionaries containing reaction SMILES under reaction_column.
reaction_column (str) – Key in each dictionary for the reaction SMILES.
n_jobs (int) – Number of parallel jobs to run. Defaults to 4.
verbose (int) – Verbosity level for the joblib Parallel call. Defaults to 0.
- Returns:
List of dictionaries with standardized SMILES.
- Return type:
- static fix_smiles(smiles: str) str[source]#
Iteratively apply enol and hemiketal standardizations until no further changes, then return the canonical SMILES.
- static standardize_enol(smiles: str, atom_indices: List[int] | None = None) str[source]#
Convert an enol tautomer into its corresponding carbonyl form.
- Parameters:
- Returns:
SMILES of the molecule after enol→carbonyl conversion, or an error message if the input is invalid or indices fail.
- Return type:
- class synkit.Chem.Reaction.Mapper.wl_mapper.EdgeMaskCandidate(side: 'str', removed_pairs: 'frozenset', prior_score: 'float' = 0.0, meta: 'Dict[str, Any]'=<factory>)[source]#
Bases:
object
- class synkit.Chem.Reaction.Mapper.wl_mapper.GraphCache(G: Any, cfg: WLMapperConfig)[source]#
Bases:
object
- class synkit.Chem.Reaction.Mapper.wl_mapper.MappingResult(mapping: 'Dict[Hashable, Hashable]', score: 'float', meta: 'Dict[str, Any]'=<factory>)[source]#
Bases:
object
- class synkit.Chem.Reaction.Mapper.wl_mapper.MaskView(cache: GraphCache, removed_pairs: frozenset | None)[source]#
Bases:
object
- class synkit.Chem.Reaction.Mapper.wl_mapper.Solution(result: 'MappingResult', mapped_rsmi: 'str')[source]#
Bases:
object- result: MappingResult#
- class synkit.Chem.Reaction.Mapper.wl_mapper.WLMapper(config: WLMapperConfig = WLMapperConfig(iterations=4, digest_size=16, include_initial=True, edge_attr='order', node_label_keys=('element',), progressive_fallback=True, normalize_aromatic_bonds=True, enable_bond_cut=True, max_bond_cut_size=2, max_candidates=200, candidate_edge_pool=16, time_limit_s=2.0, enable_heuristic_scoring=True, heuristic_max_cut_size=4, heuristic_candidate_budget=120, pmcd_unmapped_weight=1, pmcd_bond_weight=1, pmcd_hcount_weight=1, heuristic_carbonyl_double_penalty=50.0, heuristic_aromatic_c_break_penalty=0.75, bc_cost_acyl_co=0.1, bc_cost_x_deg3_o=0.35, bc_cost_x_deg2_o=0.55, bc_cost_x_deg1_o=0.75, bc_cost_aromatic_co=1.5, bc_cost_other=1.0, bc_cost_order_mismatch_scale=0.6, bc_cost_peroxy_oo=0.05, bc_cost_acyl_co_peroxy=3.0, rc_only_bond_changes=True, rc_only_hcount_changes=True, rc_expand_hops=1, enable_rc_refine=True, rc_distance_weight=0.5, enable_symmetry_pruning=True, symmetry_depth=4, large_bucket_threshold=25, greedy_topk_per_u=10, hungarian_max_size=10, enable_dynamic_wl=True, enable_swap_refine=True, swap_refine_max_iter=10, swap_refine_class_depth=4, swap_refine_max_group_size=14, multi_solutions=True, max_solutions=6, solution_score_slack=0.0, start_atom_map=1, unmapped_value=0, assign_maps_to_unmapped=True, use_its_final=True, drop_non_aam=False, use_index_as_atom_map=False), logger: Logger | None = None)[source]#
Bases:
object- PMCD-first mapping:
Enumerate candidate masks; for each candidate compute mapping and its PMCD key.
Keep the PMCD-minimal set (can be multiple solutions).
Apply chemical heuristic ONLY to choose optimal among PMCD-minimal.
- class synkit.Chem.Reaction.Mapper.wl_mapper.WLMapperConfig(iterations: 'int' = 4, digest_size: 'int' = 16, include_initial: 'bool' = True, edge_attr: 'str' = 'order', node_label_keys: '_NodeLabelKeys' = ('element',), progressive_fallback: 'bool' = True, normalize_aromatic_bonds: 'bool' = True, enable_bond_cut: 'bool' = True, max_bond_cut_size: 'int' = 2, max_candidates: 'int' = 200, candidate_edge_pool: 'int' = 16, time_limit_s: 'Optional[float]' = 2.0, enable_heuristic_scoring: 'bool' = True, heuristic_max_cut_size: 'int' = 4, heuristic_candidate_budget: 'int' = 120, pmcd_unmapped_weight: 'int' = 1, pmcd_bond_weight: 'int' = 1, pmcd_hcount_weight: 'int' = 1, heuristic_carbonyl_double_penalty: 'float' = 50.0, heuristic_aromatic_c_break_penalty: 'float' = 0.75, bc_cost_acyl_co: 'float' = 0.1, bc_cost_x_deg3_o: 'float' = 0.35, bc_cost_x_deg2_o: 'float' = 0.55, bc_cost_x_deg1_o: 'float' = 0.75, bc_cost_aromatic_co: 'float' = 1.5, bc_cost_other: 'float' = 1.0, bc_cost_order_mismatch_scale: 'float' = 0.6, bc_cost_peroxy_oo: 'float' = 0.05, bc_cost_acyl_co_peroxy: 'float' = 3.0, rc_only_bond_changes: 'bool' = True, rc_only_hcount_changes: 'bool' = True, rc_expand_hops: 'int' = 1, enable_rc_refine: 'bool' = True, rc_distance_weight: 'float' = 0.5, enable_symmetry_pruning: 'bool' = True, symmetry_depth: 'int' = 4, large_bucket_threshold: 'int' = 25, greedy_topk_per_u: 'int' = 10, hungarian_max_size: 'int' = 10, enable_dynamic_wl: 'bool' = True, enable_swap_refine: 'bool' = True, swap_refine_max_iter: 'int' = 10, swap_refine_class_depth: 'int' = 4, swap_refine_max_group_size: 'int' = 14, multi_solutions: 'bool' = True, max_solutions: 'int' = 6, solution_score_slack: 'float' = 0.0, start_atom_map: 'int' = 1, unmapped_value: 'int' = 0, assign_maps_to_unmapped: 'bool' = True, use_its_final: 'bool' = True, drop_non_aam: 'bool' = False, use_index_as_atom_map: 'bool' = False)[source]#
Bases:
object- validated() WLMapperConfig[source]#
Molecule#
- class synkit.Chem.Molecule.atom_features.AtomFeatureExtractor(mol: Mol, per: PerMolDescriptors | None = None, profile: str = 'minimal')[source]#
Bases:
objectBuild per-atom feature dictionaries for an RDKit molecule.
- The extractor supports two profiles:
"minimal"a compact set of attributes (backwards compatible withthe original _gather_atom_properties).
"full"includes valence, ring sizes, neighbor counts,shortest distances to functional groups, and optional descriptors from
PerMolDescriptors.
The class exposes a fluent API:
.build(atom)returnsselfand stores the result in.feature(dict). For batch processing use.build_all()and read.all_featuresafterwards. For one-off usage, the compatibility helper.build_dict(atom)returns the feature dict directly.- Parameters:
mol – RDKit molecule to extract features from.
per – Optional precomputed per-atom descriptors (EState, Crippen, etc.).
profile – Feature profile to compute (
"minimal"or"full").
- SUPPORTED_PROFILES = ('minimal', 'full')#
- property all_features: List[Dict[str, Any]]#
List of feature dicts for every atom (populated by
build_all).If
build_allwas not called, this property will call it lazily.
- build(atom: Atom | _AtomLike) AtomFeatureExtractor[source]#
Compute features for one atom and store them internally.
Returns self to enable chaining. The result dictionary can be accessed via the
featureproperty.- Parameters:
atom – RDKit Atom instance (or Atom-like object).
- Returns:
self
- build_all() AtomFeatureExtractor[source]#
Compute features for all atoms in the molecule and store them in
.all_features. Returns self for chaining.- Returns:
self
- build_dict(atom: Atom | _AtomLike) Dict[str, Any][source]#
Backwards-compatible helper that returns the computed feature dict directly (does not alter
.featureor.all_features).- Parameters:
atom – RDKit Atom instance (or Atom-like object).
- Returns:
feature dictionary
- property feature: Dict[str, Any]#
The last computed feature dictionary (via
build).- Raises:
RuntimeError – if
buildhas not been called yet.
- class synkit.Chem.Molecule.descriptors.PerMolDescriptors(gasteiger: List[float], estate: List[float], crippen_logp: List[float], crippen_mr: List[float])[source]#
Bases:
objectImmutable container for per-atom descriptor lists.
- Parameters:
gasteiger – per-atom Gasteiger charges
estate – per-atom EState indices
crippen_logp – per-atom Crippen logP contributions
crippen_mr – per-atom Crippen MR contributions
- classmethod compute(mol: Mol | _MolLike, sanitize: bool = True, normalize: str | None = None) PerMolDescriptors[source]#
Best-effort compute per-atom descriptors for the given molecule.
This function delegates work to small helpers for clarity and easier testing. Behavior is identical to the previous implementation.
- Parameters:
mol – RDKit molecule (or duck-typed equivalent).
sanitize – try to sanitize the copied molecule (default True).
normalize – normalization method: None (default), “zscore”, or “minmax”.
- Returns:
PerMolDescriptors instance.
- classmethod from_smiles(smiles: str, sanitize: bool = True, normalize: str | None = None) PerMolDescriptors[source]#
Parse SMILES and compute descriptors.
- Parameters:
smiles – SMILES string to parse.
sanitize – try to sanitize the parsed molecule (default True).
normalize – optional normalization (“zscore” | “minmax” | None).
- Returns:
PerMolDescriptors instance.
- Raises:
ValueError – if SMILES fails to parse.
- class synkit.Chem.Molecule.descriptors.PerMolDescriptorsBuilder(mol: Mol | _MolLike, sanitize: bool = True)[source]#
Bases:
objectFluent builder for PerMolDescriptors.
- Usage example:
- desc = (
PerMolDescriptorsBuilder(mol) .compute_gasteiger() .compute_estate() .compute_crippen() .normalize(“zscore”) .build() .descriptor
)
The builder’s chainable methods return
self; call.build()then access the final PerMolDescriptors via the.descriptorproperty.- build() PerMolDescriptorsBuilder[source]#
Finalize internal state and prepare the immutable PerMolDescriptors.
The method stores the result internally and returns
self. Use the.descriptorproperty to access the final object.- Returns:
self
- compute_crippen() PerMolDescriptorsBuilder[source]#
Compute Crippen per-atom contributions (best-effort).
- Returns:
self (chainable).
- compute_estate() PerMolDescriptorsBuilder[source]#
Compute EState indices (best-effort).
- Returns:
self (chainable).
- compute_gasteiger() PerMolDescriptorsBuilder[source]#
Compute Gasteiger charges (best-effort) and store internally.
- Returns:
self (chainable).
- property descriptor: PerMolDescriptors#
Retrieve the built PerMolDescriptors. If not built yet,
build()is called implicitly.- Returns:
PerMolDescriptors
- normalize(method: str | None) PerMolDescriptorsBuilder[source]#
Normalize any computed vectors using
method(“zscore” | “minmax” | None).- Parameters:
method – normalization method or None to skip.
- Returns:
self (chainable).
- synkit.Chem.Molecule.descriptors.compute_gasteiger_inplace(mol: Mol | Any) None[source]#
Compatibility helper: compute Gasteiger charges in-place (best-effort). Reintroduced for backward compatibility with code that imports this from synkit.Chem.Molecule.descriptors.
- Parameters:
mol – RDKit Mol (or mol-like object) to annotate. Mutates in place.
- Returns:
None
- class synkit.Chem.Molecule.formula.Formula(n_jobs: int = 1, verbose: int = 0)[source]#
Bases:
objectDecompose SMILES into element counts and generate Hill-order formulas / molecular weights using RDKit.
- Main APIs:
decompose(): element counts as a dict (e.g., {‘C’: 6, ‘H’: 6})hill_formula(): Hill-order formula stringmol_weight(): RDKit molecular weight (sums ‘.’ fragments)process_list(): batch over a list of SMILESprocess_list_dict(): batch over a list of dicts or a pandas DataFrame
- Hill notation rules implemented:
If carbon (‘C’) present: list ‘C’, then ‘H’, then other elements alphabetically.
If no carbon: list all elements alphabetically.
Counts of 1 are omitted (CH3, not C1H3).
- Parameters:
n_jobs – Number of parallel jobs for batch processing via joblib. Use 1 to disable parallelism.
verbose – Verbosity level passed to joblib.Parallel.
- decompose(smiles: str) Dict[str, int][source]#
Decompose a SMILES string into element counts using RDKit’s CalcMolFormula. Disconnected fragments separated by ‘.’ are handled by summing counts.
- Parameters:
smiles – SMILES string (may contain ‘.’ for multiple fragments).
- Returns:
Dict of element counts (empty dict if invalid/empty).
- hill_formula(smiles: str) str[source]#
Convert a SMILES to a Hill-order formula string.
- Rules:
If ‘C’ present: C then H then other elements alphabetical.
If no ‘C’: all elements alphabetical.
Counts of 1 are omitted.
- Parameters:
smiles – SMILES string.
- Returns:
Hill-order formula string; empty string for invalid/empty SMILES.
- mol_weight(smiles: str) float | None[source]#
Compute molecular weight using RDKit (sum over ‘.’ fragments).
- Parameters:
smiles – SMILES string (may contain ‘.’ fragments).
- Returns:
Molecular weight as float, or None if invalid/empty.
- process_list(smiles_list: List[str], what: str = 'hill') List[str | Dict[str, int] | float | None][source]#
Batch process a list of SMILES.
- Parameters:
smiles_list – List of SMILES strings.
what – One of {‘hill’, ‘decompose’, ‘molwt’}.
- Returns:
List of results corresponding to ‘what’. - ‘hill’ -> List[str] - ‘decompose’ -> List[Dict[str,int]] - ‘molwt’ -> List[Optional[float]]
- Raises:
ValueError – If ‘what’ is unsupported.
- process_list_dict(records: List[Dict[str, Any]] | DataFrame, smiles_key: str = 'smiles', out_key: str = 'hill', what: str = 'hill', copy: bool = True) List[Dict[str, Any]][source]#
Batch process a list of dictionaries (or a pandas DataFrame) containing SMILES, and return a list of dictionaries with the computed output appended.
If a pandas DataFrame is provided, it is converted to a list of dicts (records).
Input dicts are deep-copied by default to avoid in-place mutation.
- Parameters:
records – List[dict] or pandas DataFrame. Each record must contain smiles_key.
smiles_key – Key in each record holding the SMILES string.
out_key – Output key to store the computed value, e.g. ‘hill’, ‘decompose’, ‘molwt’.
what – One of {‘hill’, ‘decompose’, ‘molwt’} specifying the computation.
copy – If True, deep-copy each record before adding output.
- Returns:
List of dicts with an added out_key.
- Raises:
KeyError – If a record is missing smiles_key.
ValueError – If ‘what’ is unsupported.
- class synkit.Chem.Molecule.graph_annotator.GraphAnnotator(G: Graph, in_place: bool = True, max_distance: int = 99)[source]#
Bases:
objectCompute optional topology annotations for a NetworkX molecular graph.
The annotator mutates a graph (in-place by default) or a shallow copy if in_place=False. Methods are chainable and return
self; use the.graphproperty to retrieve the annotated graph.- Supported annotations:
node degree ->
atom_degreeneighborhood element counts ->
nbr_elements_counts_r1shortest distances to motif sets (halogen/hetero/aromatic/carbonyl) ->
dist_to_<motif>conjugated/pi connected component size ->
conj_component_sizering sizes via cycle basis ->
ring_sizesand updatedin_ring
Notes#
Graph nodes are expected to carry an
'element'key (string) and may optionally have'aromatic'or'is_halogen'boolean flags.Edge attributes used:
'order'(numeric-like) and'conjugated'.
- DEFAULT_MAX_DISTANCE = 99#
- annotate() GraphAnnotator[source]#
Molecule standardization helpers and a chainable MolStandardizer class.
- Provides:
lightweight helpers: sanitize_and_canonicalize_smiles, fix_radical_rsmi, remove_isotopes, …
MolStandardizer: fluent, chainable standardization API with convenience constructors.
- class synkit.Chem.Molecule.standardize.MolStandardizer(mol: Mol, sanitize: bool = True)[source]#
Bases:
objectChainable molecule standardizer wrapper around RDKit utilities.
Use the fluent API to apply a sequence of standardizations and then retrieve the resulting molecule via the
.molproperty or.to_smiles().Example#
>>> std = MolStandardizer.from_smiles("CC(=O)[O-]").remove_salts().uncharge().mol
- add_hs_and_clear_radicals(removeH: bool = True) MolStandardizer[source]#
Replace radical electrons with explicit hydrogens and optionally remove them.
- Parameters:
removeH (bool) – if True remove explicit hydrogens after addition.
- Returns:
self (chainable).
- Return type:
- canonicalize_tautomer() MolStandardizer[source]#
Canonicalize tautomer using rdMolStandardize.TautomerEnumerator.
- Returns:
self (chainable).
- Return type:
- clear_stereochemistry() MolStandardizer[source]#
Remove stereochemical annotations (Chem.RemoveStereochemistry).
- Returns:
self (chainable).
- Return type:
- classmethod from_smiles(smiles: str, sanitize: bool = True) MolStandardizer[source]#
Parse SMILES and return a configured standardizer.
- Parameters:
- Returns:
MolStandardizer.
- Return type:
- Raises:
ValueError – if SMILES fails to parse.
- classmethod help() str[source]#
Short machine-readable help describing capabilities.
- Returns:
help string.
- Return type:
- keep_largest_fragment() MolStandardizer[source]#
Keep only the largest fragment by atom count.
- Returns:
self (chainable).
- Return type:
- property mol: Mol | None#
Return the internal RDKit Mol (or None if absent).
- Returns:
the internal RDKit Mol or None.
- Return type:
Optional[Chem.Mol]
- normalize() MolStandardizer[source]#
Normalize the internal molecule using rdMolStandardize.Normalizer.
- Returns:
self (chainable).
- Return type:
- remove_explicit_hs() MolStandardizer[source]#
Remove explicit hydrogens (Chem.RemoveHs).
- Returns:
self (chainable).
- Return type:
- remove_isotopes() MolStandardizer[source]#
Clear isotope labels on all atoms.
- Returns:
self (chainable).
- Return type:
- remove_salts(salt_remover: SaltRemover | None = None) MolStandardizer[source]#
Remove salts using RDKit’s SaltRemover.
- Parameters:
salt_remover (Optional[SaltRemover]) – Optional SaltRemover instance to use; if None a new one is created.
- Returns:
self (chainable).
- Return type:
- classmethod standardize_smiles(smiles: str, *, keep_largest_fragment: bool = True) str | None[source]#
Quick convenience: parse SMILES, apply a sensible default standardization, and return canonical SMILES or None on failure.
- Default pipeline:
sanitize -> normalize (if available) -> keep largest fragment -> remove salts -> uncharge -> canonicalize tautomer (if available)
- summarize_last_error() str | None[source]#
Return a short string describing the last internal exception, if any.
- Returns:
descriptive string for last error or None.
- Return type:
Optional[str]
- to_smiles(canonical: bool = True) str | None[source]#
Return a SMILES string for the internal molecule.
- uncharge() MolStandardizer[source]#
Neutralize charges using rdMolStandardize.Uncharger.
- Returns:
self (chainable).
- Return type:
- synkit.Chem.Molecule.standardize.canonicalize_tautomer(mol: Mol) Mol[source]#
Canonicalize tautomeric form using rdMolStandardize.TautomerEnumerator if available.
- Parameters:
mol (Chem.Mol) – RDKit Mol object to canonicalize.
- Returns:
Canonicalized tautomer Mol (or original if unavailable).
- Return type:
Chem.Mol
- synkit.Chem.Molecule.standardize.clear_stereochemistry(mol: Mol) Mol[source]#
Remove stereochemical annotations from a molecule.
- Parameters:
mol (Chem.Mol) – RDKit Mol object to process.
- Returns:
Mol object with stereochemistry removed.
- Return type:
Chem.Mol
- synkit.Chem.Molecule.standardize.fix_radical_rsmi(rsmi: str, removeH: bool = True) str[source]#
Fix radicals in a reaction SMILES by converting them to hydrogens.
- synkit.Chem.Molecule.standardize.fragments_remover(mol: Mol) Mol | None[source]#
Keep only the largest fragment by atom count.
- Parameters:
mol (Chem.Mol) – RDKit Mol object to fragment.
- Returns:
Mol of the largest fragment, or None if input is empty.
- Return type:
Optional[Chem.Mol]
- synkit.Chem.Molecule.standardize.normalize_molecule(mol: Mol) Mol[source]#
Normalize a molecule using rdMolStandardize.Normalizer when available.
- Parameters:
mol (Chem.Mol) – RDKit Mol object to normalize.
- Returns:
Normalized RDKit Mol object (or original if normalizer missing).
- Return type:
Chem.Mol
- synkit.Chem.Molecule.standardize.remove_explicit_hydrogens(mol: Mol) Mol[source]#
Remove explicit hydrogens from the molecule (Chem.RemoveHs wrapper).
- Parameters:
mol (Chem.Mol) – RDKit Mol object to process.
- Returns:
Mol object without explicit hydrogens.
- Return type:
Chem.Mol
- synkit.Chem.Molecule.standardize.remove_isotopes(mol: Mol) Mol[source]#
Clear isotope labels on every atom in the molecule.
- Parameters:
mol (Chem.Mol) – RDKit Mol object to process.
- Returns:
The same RDKit Mol instance with isotopic labels cleared.
- Return type:
Chem.Mol
- synkit.Chem.Molecule.standardize.remove_radicals_and_add_hydrogens(mol: Mol, removeH: bool = True) Mol | None[source]#
Replace radical electrons by adding hydrogens and optionally remove explicit H.
- Parameters:
mol (Chem.Mol) – RDKit Mol with possible radical atoms.
removeH (bool) – If True, remove explicit hydrogens after addition.
- Returns:
Mol with radicals neutralized (or None on failure).
- Return type:
Optional[Chem.Mol]
- synkit.Chem.Molecule.standardize.salts_remover(mol: Mol, remover: SaltRemover | None = None) Mol[source]#
Remove salts from a molecule using RDKit’s SaltRemover.
- Parameters:
mol (Chem.Mol) – RDKit Mol object to process.
remover (Optional[SaltRemover]) – Optional SaltRemover instance to use.
- Returns:
Mol object with salts removed (best-effort).
- Return type:
Chem.Mol
- synkit.Chem.Molecule.standardize.sanitize_and_canonicalize_smiles(smiles: str) str | None[source]#
Sanitize and canonicalize a SMILES string.
- Parameters:
smiles (str) – Input SMILES string.
- Returns:
Canonical SMILES if valid, otherwise
None.- Return type:
Optional[str]
Notes#
The function attempts to parse and sanitize the SMILES with RDKit. On any parsing/sanitization failure it returns
None(best-effort policy).
- synkit.Chem.Molecule.standardize.uncharge_molecule(mol: Mol) Mol[source]#
Neutralize/uncharge a molecule using rdMolStandardize.Uncharger if available.
- Parameters:
mol (Chem.Mol) – RDKit Mol object to neutralize.
- Returns:
Neutralized Mol object (or original if uncharger missing).
- Return type:
Chem.Mol
- class synkit.Chem.Molecule.valence.ValenceResolver[source]#
Bases:
objectWarning-free valence utilities for RDKit atoms.
These helpers retrieve explicit, implicit, and total valences while silencing common deprecation or signature warnings across RDKit versions. They first try the modern keyword-argument API and gracefully fall back to older call signatures or legacy methods.
- Preferred (modern) RDKit API:
atom.GetValence(which=rdchem.ValenceType.EXPLICIT)atom.GetValence(which=rdchem.ValenceType.IMPLICIT)
- Fallbacks maintain compatibility with older wrappers:
atom.GetValence(rdchem.ValenceType.EXPLICIT)(positional)atom.GetExplicitValence()atom.GetImplicitValence()atom.GetNumImplicitHs()(as a last resort for implicit)
Notes#
Returned values are coerced to Python
intand guaranteed non-negative, with0returned if all strategies fail.Values reflect the current state of the atom. If you modify hydrogen counts, aromaticity, or bond orders, query again.
Chem.Atomis an alias ofrdchem.Atom, but a structural duck-type_AtomLikeprotocol is provided for static typing tools.
Examples#
>>> from rdkit import Chem >>> m = Chem.MolFromSmiles("CCO") >>> a = m.GetAtomWithIdx(1) # central carbon >>> ValenceResolver.explicit(a) >= 0 True >>> ValenceResolver.total(a) == ValenceResolver.explicit(a) + ValenceResolver.implicit(a) True
- static explicit(atom: Atom | _AtomLike) int[source]#
Return the explicit valence of an atom.
Tries modern
GetValence(which=EXPLICIT)first, then older positional form, thenGetExplicitValence(). Returns0on failure.- Parameters:
atom (rdchem.Atom) – RDKit atom instance.
- Returns:
Explicit valence (non-negative integer).
- Return type:
- static implicit(atom: Atom | _AtomLike) int[source]#
Return the implicit valence of an atom.
Tries modern
GetValence(which=IMPLICIT)first, then older positional form, thenGetImplicitValence(), finally falls back to the number of implicit hydrogens if needed. Returns0on failure.- Parameters:
atom (rdchem.Atom) – RDKit atom instance.
- Returns:
Implicit valence (non-negative integer).
- Return type:
Fingerprint#
- class synkit.Chem.Fingerprint.fp_calculator.FPCalculator(n_jobs: int = 1, verbose: int = 0)[source]#
Bases:
objectCalculate fingerprint vectors for chemical reactions represented by SMILES strings.
- Variables:
fps (TransformationFP) – Shared fingerprint engine instance.
VALID_FP_TYPES (List[str]) – Supported fingerprint type identifiers.
- Parameters:
- VALID_FP_TYPES: List[str] = ['drfp', 'avalon', 'maccs', 'torsion', 'pharm2D', 'ecfp2', 'ecfp4', 'ecfp6', 'fcfp2', 'fcfp4', 'fcfp6', 'rdk5', 'rdk6', 'rdk7', 'ap']#
- static dict_process(data_dict: Dict[str, Any], rsmi_key: str, symbol: str = '>>', fp_type: str = 'ecfp4', absolute: bool = True) Dict[str, Any][source]#
Compute a fingerprint for a single reaction SMILES entry and add it to the dict.
- Parameters:
data_dict (dict) – Dictionary containing reaction data.
rsmi_key (str) – Key in data_dict for the reaction SMILES string.
symbol (str) – Delimiter between reactant and product in the SMILES.
fp_type (str) – Fingerprint type to compute.
absolute (bool) – Whether to take absolute values of the fingerprint difference.
- Returns:
The input dictionary with a new key fp_{fp_type} holding the fingerprint vector.
- Return type:
- Raises:
ValueError – If rsmi_key is missing in data_dict.
- fps: TransformationFP = <TransformationFP>#
- help() None[source]#
Print details about supported fingerprint types and usage.
- Returns:
None
- Return type:
NoneType
- parallel_process(data_dicts: List[Dict[str, Any]], rsmi_key: str, symbol: str = '>>', fp_type: str = 'ecfp4', absolute: bool = True) List[Dict[str, Any]][source]#
Compute fingerprints for a batch of reaction dictionaries in parallel.
- Parameters:
data_dicts (list of dict) – List of dictionaries, each containing a reaction SMILES.
rsmi_key (str) – Key in each dict for the reaction SMILES string.
symbol (str) – Delimiter between reactant and product in the SMILES.
fp_type (str) – Fingerprint type to compute.
absolute (bool) – Whether to take absolute values of the fingerprint difference.
- Returns:
A list of dictionaries augmented with fp_{fp_type} entries.
- Return type:
- Raises:
ValueError – If fp_type is unsupported or any dict is missing rsmi_key.
smiles_featurizer.py#
Utility for converting SMILES strings into various cheminformatics fingerprints, with optional NumPy‐array conversion.
Key features#
Multi‐fingerprint support – MACCS, Avalon, ECFP/FCFP, RDKit, AtomPair, Torsion, Pharm2D
SMILES validation – raises on invalid input
Array conversion – output as NumPy arrays for ML pipelines
Extensible – add new methods or override via subclassing
Quick start#
>>> from synkit.Chem.Fingerprint.smiles_featurizer import SmilesFeaturizer
>>> arr = SmilesFeaturizer.featurize_smiles("CCO", "ecfp4", convert_to_array=True)
- class synkit.Chem.Fingerprint.smiles_featurizer.SmilesFeaturizer[source]#
Bases:
objectConvert SMILES strings into chemical fingerprint vectors.
- Variables:
None – This class only provides static/class methods and holds no state.
- Supported fingerprint methods:
MACCS keys
Avalon
ECFP/FCFP (Morgan)
RDKit topological
AtomPair
Torsion
2D Pharmacophore
Use featurize_smiles for one‑line access.
- classmethod featurize_smiles(smiles: str, fingerprint_type: str, convert_to_array: bool = True, **kwargs: Any) Any[source]#
Featurize a SMILES string into a chosen fingerprint, optionally converting to a NumPy array.
- Parameters:
smiles (str) – The SMILES string to featurize.
fingerprint_type (str) – One of ‘maccs’, ‘avalon’, ‘ecfp#’, ‘fcfp#’, ‘rdk#’, ‘ap’, ‘torsion’, ‘pharm2d’.
convert_to_array (bool) – If True, convert the result to a NumPy array.
kwargs (dict) – Additional parameters passed to the chosen method: - nBits for Avalon/ECFP/FCFP - radius for ECFP/FCFP - maxPath, fpSize, nBitsPerHash for RDKit FP
- Returns:
Fingerprint as a NumPy array (if convert_to_array) or RDKit bit vector.
- Return type:
np.ndarray or ExplicitBitVect
- Raises:
ValueError – If fingerprint_type is unsupported.
- static get_avalon_fp(mol: Mol, nBits: int = 1024) Any[source]#
Generate the Avalon fingerprint for a molecule.
- Parameters:
mol (Chem.Mol) – RDKit Mol object.
nBits (int) – Length of the fingerprint vector.
- Returns:
Avalon fingerprint bit vector.
- Return type:
ExplicitBitVect
- static get_ecfp(mol: Mol, radius: int, nBits: int = 2048, useFeatures: bool = False) Any[source]#
Generate a Morgan fingerprint (ECFP or FCFP) for a molecule.
- static get_maccs_keys(mol: Mol) Any[source]#
Generate the MACCS keys fingerprint for a molecule.
- Parameters:
mol (Chem.Mol) – RDKit Mol object.
- Returns:
MACCS keys fingerprint bit vector.
- Return type:
ExplicitBitVect
- static get_rdk_fp(mol: Mol, maxPath: int, fpSize: int = 2048, nBitsPerHash: int = 2) Any[source]#
Generate an RDKit topological fingerprint for a molecule.
- help() None[source]#
Print supported fingerprint types and usage summary.
- Returns:
None
- Return type:
NoneType
- static mol_to_ap(mol: Mol) Any[source]#
Generate an Atom Pair fingerprint for a molecule.
- Parameters:
mol (Chem.Mol) – RDKit Mol object.
- Returns:
Atom Pair fingerprint as an integer vector.
- Return type:
ExplicitBitVect
- static mol_to_pharm2d(mol: Mol) Any[source]#
Generate a 2D Pharmacophore fingerprint for a molecule.
- Parameters:
mol (Chem.Mol) – RDKit Mol object.
- Returns:
2D pharmacophore fingerprint bit vector.
- Return type:
ExplicitBitVect
- static mol_to_torsion(mol: Mol) Any[source]#
Generate a Topological Torsion fingerprint for a molecule.
- Parameters:
mol (Chem.Mol) – RDKit Mol object.
- Returns:
Torsion fingerprint as an integer vector.
- Return type:
ExplicitBitVect
- static smiles_to_mol(smiles: str) Mol[source]#
Convert a SMILES string to an RDKit Mol object.
- Parameters:
smiles (str) – The SMILES string to convert.
- Returns:
RDKit Mol object corresponding to the SMILES.
- Return type:
Chem.Mol
- Raises:
ValueError – If the SMILES string is invalid.
transformation_fp.py#
Compute reaction‐level fingerprints by combining molecular fingerprints of reactants and products, with optional absolute mode and bit‐vector conversion.
Quick start#
>>> from synkit.Chem.Fingerprint.transformation_fp import TransformationFP
>>> arr = TransformationFP().fit('CCO>>CC=O', symbols='>>', fp_type='ecfp4', abs=True)
>>> bv = TransformationFP().fit('CCO>>CC=O', symbols='>>', fp_type='ecfp4', abs=True, return_array=False)
- class synkit.Chem.Fingerprint.transformation_fp.TransformationFP[source]#
Bases:
objectCalculate reaction fingerprints by featurizing individual molecules and combining them via vector subtraction.
- Variables:
None – Stateless utility class.
- static convert_arr2vec(arr: ndarray) ExplicitBitVect[source]#
Convert a NumPy array of bits into an RDKit ExplicitBitVect.
- Parameters:
arr (np.ndarray) – Array of 0/1 values representing a fingerprint.
- Returns:
RDKit bit vector constructed from the bit string.
- Return type:
cDataStructs.ExplicitBitVect
- fit(reaction_smiles: str, symbols: str, fp_type: str, abs: bool, return_array: bool = True, **kwargs: Any) ndarray | ExplicitBitVect[source]#
Generate a reaction fingerprint by subtracting reactant from product fingerprints.
- Parameters:
reaction_smiles (str) – Reaction SMILES, reactant and product separated by symbols.
symbols (str) – Delimiter between reactants and products in the SMILES string.
fp_type (str) – Fingerprint type to use for individual molecules (e.g., ‘ecfp4’).
abs (bool) – If True, take absolute value of the difference vector.
return_array (bool) – If True, return a NumPy array; otherwise convert to an RDKit bit vector.
kwargs (Any) – Additional keyword arguments passed to SmilesFeaturizer.featurize_smiles.
- Returns:
Reaction fingerprint as a NumPy array or RDKit bit vector.
- Return type:
Union[np.ndarray, cDataStructs.ExplicitBitVect]
- Raises:
ValueError – If reaction_smiles is not correctly formatted.
Cluster#
- class synkit.Chem.Cluster.butina.ButinaCluster[source]#
Bases:
objectCluster chemical fingerprint vectors using the Butina algorithm from RDKit, with integrated t-SNE visualization of clusters.
Key features#
Butina clustering – fast hierarchical clustering with a similarity cutoff.
t-SNE visualization – 2D embedding of fingerprints, highlighting top‑k clusters.
NumPy support – accepts 2D arrays of 0/1 fingerprint data.
Configurable – user‑defined cutoff, perplexity, and top‑k highlight.
Quick start#
>>> from synkit.Chem.Fingerprint.fingerprint_clusterer import ButinaCluster >>> clusters = ButinaCluster.cluster(arr, cutoff=0.3) >>> ButinaCluster.visualize(arr, clusters, k=5)
- static cluster(arr: ndarray, cutoff: float = 0.2) List[List[int]][source]#
Perform Butina clustering on fingerprint bit-vectors.
- help() None[source]#
Print usage summary for clustering and visualization.
- Returns:
None
- Return type:
NoneType
- static visualize(arr: ndarray, clusters: List[List[int]], k: int | None = None, perplexity: float = 30.0, random_state: int = 42) None[source]#
Visualize clusters in 2D via t-SNE embedding.
- Parameters:
arr (np.ndarray) – 2D array of shape (n_samples, n_features) with fingerprint data.
clusters (list of list of int) – Clusters as returned by cluster().
k (int or None) – If provided, highlight only the top‑k largest clusters; others shown as ‘Other’.
perplexity (float) – t-SNE perplexity parameter. Defaults to 30.0.
random_state (int) – Random seed for reproducibility. Defaults to 42.
- Returns:
None
- Return type:
NoneType
- Example:
>>> clusters = ButinaCluster.cluster(arr, cutoff=0.3) >>> ButinaCluster.visualize(arr, clusters, k=5)
Utilities#
- synkit.Chem.utils.clean_radical_rsmi(rsmi: str) str[source]#
Load each side of a reaction SMILES (rSMI) into RDKit, split into disconnected fragments, remove any fragment that contains an atom with nonzero radical electrons, then reassemble back into a cleaned reaction SMILES.
- Parameters:
rsmi (str) – Reaction SMILES string, e.g. ‘A>>B.C’
- Returns:
Cleaned reaction SMILES with radical-containing fragments removed.
- Return type:
Example: >>> clean_radical_rsmi( … ‘COC(=O)C(CCCCNC(=O)OCc1ccccc1)NC(=O)Nc1cc(OC)cc(C(C)(C)C)c1O’ … ‘>>COC(=O)C(CCCCNC(=O)OCc1ccccc1)NC(N)=O.COc1c[c]c(O)c(C(C)(C)C)c1’ … ) ‘COC(=O)C(CCCCNC(=O)OCc1ccccc1)NC(=O)Nc1cc(OC)cc(C(C)(C)C)c1O’ ‘>>COC(=O)C(CCCCNC(=O)OCc1ccccc1)NC(N)=O’
- synkit.Chem.utils.count_carbons(smiles: str) int[source]#
Count the number of carbon atoms in a molecule.
- Parameters:
smiles (str) – SMILES string of the molecule.
- Returns:
Number of carbon atoms, or raises ValueError if SMILES invalid.
- Return type:
- Raises:
ValueError – If the SMILES string is invalid.
- synkit.Chem.utils.enumerate_tautomers(reaction_smiles: str) List[str] | None[source]#
Enumerate possible tautomers of reactants while canonicalizing products.
- Parameters:
reaction_smiles (str) – Reaction SMILES in ‘reactants>>products’ format.
- Returns:
List of reaction SMILES for each reactant tautomer (including the original), or None on error.
- Return type:
Optional[List[str]]
- Raises:
ValueError – If reactant or product SMILES are invalid.
- synkit.Chem.utils.filter_smiles(smiles_list: List[str], target_smiles: str) List[str][source]#
Filter SMILES list to those containing carbon and not equal to a target.
- synkit.Chem.utils.find_longest_fragment(input_list: List[str]) str | None[source]#
Find the longest string in a list.
- synkit.Chem.utils.get_max_fragment(smiles: str | List[str]) str[source]#
Return the largest fragment by atom count from SMILES.
- synkit.Chem.utils.get_sanitized_smiles(smiles_list: List[str]) List[str][source]#
Sanitize SMILES list by removing mappings and invalid entries.
- synkit.Chem.utils.mapping_success_rate(list_mapping_data: List[str]) float[source]#
Calculate percentage of entries containing atom‑mapping annotations.
- Parameters:
list_mapping_data (List[str]) – List of strings to search for mappings.
- Returns:
Percentage of entries containing :<digits> patterns, rounded to two decimals.
- Return type:
- Raises:
ValueError – If input list is empty.
- synkit.Chem.utils.merge_reaction(rsmi_1: str, rsmi_2: str) str | None[source]#
Merge two reaction SMILES into a single combined reaction.
- synkit.Chem.utils.process_smiles_list(smiles_list: List[str]) List[str][source]#
Split dot‑connected SMILES into individual components.
- synkit.Chem.utils.remove_atom_mappings(mol: Mol) Mol[source]#
Strip atom‑mapping numbers from a molecule.
- Parameters:
mol (Chem.Mol) – RDKit Mol object.
- Returns:
The same Mol with all atom‑map numbers set to zero.
- Return type:
Chem.Mol
- synkit.Chem.utils.remove_common_reagents(reaction_smiles: str) Tuple[str | None, str | None][source]#
Remove reagents present on both sides of a reaction SMILES.
- synkit.Chem.utils.remove_duplicates(smiles_list: List[str]) List[str][source]#
Remove duplicate strings from a list, preserving first occurrence.