Rule#
Rule objects and utilities for composing, applying, and modifying reaction rules.
Core#
syn_rule.py#
Immutable description of a reaction template (SynRule) with canonical forms and optional implicit‐hydrogen stripping.
Key features#
Fragment decomposition – splits the ITS graph into rc, left, and right.
Implicit H‐handling – converts explicit H nodes into hcount + h_pairs.
Canonicalisation – wraps rc/left/right in SynGraph for stable signatures.
Value‑object semantics – __eq__/__hash__ use fragment signatures.
Quick start#
>>> from synkit.Graph.syn_rule import SynRule
>>> rule = SynRule.from_smart("[CH3:1]C>>[CH2:1]C")
>>> rule.left.signature, rule.right.signature
('abc123...', 'def456...')
- class synkit.Rule.syn_rule.SynRule(rc: Graph, name: str = 'rule', canonicaliser: GraphCanonicaliser | None = None, *, canon: bool = True, implicit_h: bool = True, format: Literal['typesGH', 'tuple'] | None = None)[source]#
Bases:
objectImmutable reaction template: rc, left, and right fragments as SynGraph Object.
Parameters#
- rc_graphnx.Graph
Raw reaction-centre (RC) graph.
- namestr, default
"rule" Identifier for the rule.
- canonicaliserOptional[GraphCanonicaliser]
Custom canonicaliser; if None a default is created.
- canonbool, default
True If True, build canonical forms and SHA-256 signatures.
- implicit_hbool, default
True Convert explicit hydrogens in the rc/left/right fragments to an integer
hcountattribute and record cross-fragment hydrogen pairs in ah_pairsattribute.
Attributes#
- rcSynGraph
Wrapped reaction‐centre graph.
- leftSynGraph
Wrapped left fragment.
- rightSynGraph
Wrapped right fragment.
- canonical_smilesOptional[Tuple[str,str]]
Pair of left/right fragment SHA‐256 signatures (or None if canon=False).
- classmethod from_gml(gml: str, name: str = 'rule', canonicaliser: GraphCanonicaliser | None = None, *, canon: bool = True, implicit_h: bool = True) SynRule[source]#
Instantiate from a GML string.
Compose#
- class synkit.Rule.Compose.compose_rule.ComposeRule[source]#
Bases:
object- static filter_smallest_vertex(combo: List[object]) List[object][source]#
Filters and returns the elements from a list that have the smallest number of vertices in their context.
Parameters: - combo (List[object]): A list of objects, each with a ‘context’ attribute that has a ‘numVertices’ attribute.
Returns: - List[object]: A list of objects from the input list that have the minimum number of vertices in their context.
- get_rule_comp(smart_1: str, smart_2: str) str | None[source]#
Compose two reaction SMARTS strings into a rule (GML format) that reproduces a reference reaction.
Parameters: - smart_1 (str): The first reaction in SMARTS notation. - smart_2 (str): The second reaction in SMARTS notation.
Returns: - Optional[str]: The composed rule (in GML) if a valid candidate is found; otherwise, None.
- static rule_cluster(graphs: List[Any]) List[Any][source]#
Cluster graphs based on their isomorphic relationships and return a representative from each cluster.
Parameters: - graphs (List[Any]): A list of graph objects.
Returns: - List[Any]: A list of graphs where each graph is a representative from a different cluster.
- class synkit.Rule.Compose.rule_compose.RuleCompose[source]#
Bases:
object- static filter_smallest_vertex(combo: List[object]) List[object][source]#
Filters and returns the elements from a list that have the smallest number of vertices in their context.
Parameters: - combo (List[object]): A list of objects, each with a ‘context’ attribute that has a ‘numVertices’ attribute.
Returns: - List[object]: A list of objects from the input list that have the minimum number of vertices in their context.
- static rule_cluster(graphs: List) List[source]#
Clusters graphs based on their isomorphic relationship and returns a list of graphs, each from a different cluster.
Parameters: - graphs: A list of graph objects.
Returns: - List: A list of graphs where each graph is a representative from a different cluster.
- static save_gml_from_text(gml_content: str, gml_file_path: str, rule_id: str, parent_ids: List[str]) bool[source]#
Save a text string to a GML file by modifying the ‘ruleID’ line to include parent rule names. This function parses the given GML content, identifies any lines starting with ‘ruleID’, and replaces these lines with a new ruleID that incorporates identifiers from parent rules.
Parameters: - gml_content (str): The content to be saved to the GML file. This should be the entire textual content of a GML file. - gml_file_path (str): The file path where the GML file should be saved. If the path does not exist or is inaccessible, the function will return False and print an error message. - rule_id (str): The original rule ID from the content. This is the identifier that will be modified to include parent IDs in the new ruleID. - parent_ids (List[str]): List of parent rule IDs to prepend to the original rule ID. These are combined into a new identifier to reflect the hierarchical relationship in rule IDs.
Returns: - bool: True if the file was successfully saved, False otherwise. The function attempts to write the modified GML content to the specified file path.
- class synkit.Rule.Compose.rule_mapping.RuleMapping[source]#
Bases:
object- static enumerate_all_unique_mappings(child: Graph, parent: Graph) List[Dict[Any, Any]][source]#
Generate all unique mappings (as dictionaries) from the child graph to the parent graph.
A mapping is valid if: - Every node from the child graph is assigned exactly one parent node. - The parent’s node has the same ‘element’ attribute as the child node. - No parent’s node is repeated in a mapping.
Parameters: - child (nx.Graph): The child graph whose nodes will be mapped. - parent (nx.Graph): The parent graph in which to search for matching nodes.
Returns: - List[dict]: A list of mapping dictionaries. Each dictionary maps a child node to a unique
parent node with the same ‘element’. If no valid mapping exists, returns an empty list.
- fit(rule_1: str, rule_2: str, comp_rule: str) Dict[Any, Any] | None[source]#
Demonstrate an alignment-based composition workflow using the class methods.
Convert each GML-based rule into an internal graph (via gml_to_its).
Enumerate all unique mappings from rule_2 to comp_rule.
For each mapping, subtract rule_2 from comp_rule using that mapping.
Check if rule_1 is isomorphic to the resulting new graph. - If isomorphic, build a child1→child2 mapping and return it.
Parameters: - rule_1 (str): GML representation of the first rule. - rule_2 (str): GML representation of the second rule. - comp_rule (str): GML representation of a composite rule.
Returns: - Optional[dict]: A dictionary mapping rule_1’s nodes to the new_graph’s nodes if alignment is found.
Returns None otherwise.
- static get_child1_to_child2_mapping(mapping_child1_to_parent: Dict[Any, Any], mapping_child2_to_parent: Dict[Any, Any]) Dict[Any, Any | None][source]#
Build a mapping from Child1 to Child2 using each child’s mapping to a common Parent.
If a Parent node in Child1’s mapping is not in Child2’s inverted mapping, that Child1 node will map to None.
Parameters: - mapping_child1_to_parent (dict): Mapping from Child1 nodes → Parent nodes. - mapping_child2_to_parent (dict): Mapping from Child2 nodes → Parent nodes.
Returns: - dict: A dictionary from Child1 node → Child2 node based on the shared Parent node.
- static graph_alignment(child: Graph, parent: Graph, node_label_names: List[str] = ['element'], node_label_default: List[str] = ['*'], edge_attribute: str = 'standard_order') Tuple[bool, Dict[Any, Any] | None][source]#
Check whether the child and parent graphs are isomorphic using specified node and edge match criteria. If they are isomorphic, return the mapping from child to parent.
Parameters: - child (nx.Graph): The child graph to align. - parent (nx.Graph): The parent graph to align with. - node_label_names (List[str]): Node attribute names for matching (default: [“element”]). - node_label_default (List[str]): Default values for those attributes if missing (default: [“*”]). - edge_attribute (str): The edge attribute to match (default: “standard_order”).
Returns: - Tuple[bool, Optional[Dict[Any, Any]]]:
- A tuple (is_iso, mapping):
is_iso (bool): True if the graphs are isomorphic; otherwise, False.
mapping (dict or None): The child→parent node mapping if isomorphic, else None.
- static keep_largest_component(graph: Graph) Graph[source]#
Given an undirected graph, returns the subgraph corresponding to the largest connected component.
Parameters: - graph (nx.Graph): The input graph from which the largest component is extracted.
Returns: - nx.Graph: A subgraph induced by the largest connected component of the input graph.
- static standardize_order(order_tuple: Tuple[float, ...]) Tuple[float, ...] | None[source]#
Standardizes an order tuple by adding 1 to every element repeatedly until no element is negative. If the resulting tuple becomes all zeros, returns None, which indicates that the edge should be dropped.
- For example:
(-1.0, 0.0) –> add 1 gives (0.0, 1.0) (-2.0, -1.0) –> add 1 yields (-1.0, 0.0) –> add 1 yields (0.0, 1.0) (0.0, 0.0) remains (0.0, 0.0) and then returns None.
Parameters: - order_tuple (Tuple[float, …]): The order attribute (tuple of floats).
Returns: - Optional[Tuple[float, …]]: The standardized tuple, or None if it becomes all zeros.
- static subtract_parent_from_child(child: Graph, parent: Graph, mapping: Dict[Any, Any]) Graph[source]#
Create a new graph by performing a (parent - child) subtraction of edge attributes using a given mapping from child nodes to parent nodes. The result is then reduced to its largest connected component.
Steps: 1. Make a deep copy of the parent graph and remove all its edges. 2. Build the union of the parent’s edges plus the child’s edges mapped into the parent’s node IDs. 3. For each edge in the union (using parent node IDs):
new_standard_order = parent’s standard_order - child’s standard_order.
- If an ‘order’ tuple exists:
If one side is missing, assume zeros of appropriate length.
Compute (parent_order - child_order) element-wise.
Standardize the resulting tuple via standardize_order().
If None, omit the edge entirely.
Add each valid edge to the new graph.
Keep only the largest connected component.
Parameters: - child (nx.Graph): The child graph (provides edge attributes to subtract). - parent (nx.Graph): The parent graph (provides baseline edge/node attributes). - mapping (Dict[Any, Any]): A one-to-one mapping from child nodes to parent nodes.
Returns: - nx.Graph: A new graph (deep copy of parent, with edges recomputed),
reduced to its largest connected component.
- class synkit.Rule.Compose.seq_comp.SeqComp[source]#
Bases:
objectA class for generating pairwise mappings between sequential chemical reaction rules.
This class takes a list of reaction SMARTS strings, converts them to their corresponding GML representations, composes candidate reaction rules for each consecutive pair, and computes a mapping between the rules using a rule mapping algorithm.
- static sequence_map(smarts: List[str]) Dict[str, dict | None][source]#
Generate pairwise mapping dictionaries between consecutive reaction SMARTS strings.
- This function processes a list of reaction SMARTS strings by:
Converting each SMARTS string to its GML representation.
For each consecutive pair, composing candidate rules using ComposeRule().get_rule_comp().
Using the first candidate (if available) and the original GMLs to compute a mapping using RuleMapping().fit().
Storing the resulting mapping in a dictionary with keys in the format “i:i+1”.
Parameters: - smarts (List[str]): The list of reaction SMARTS strings.
Returns: - Dict[str, Optional[dict]]:
A dictionary where each key is a string “i:i+1” representing the consecutive pair indices, and the corresponding value is the mapping dictionary produced by RuleMapping().fit() for that pair, or None if no valid mapping could be computed.
- class synkit.Rule.Compose.valence_constrain.ValenceConstrain[source]#
Bases:
object- check_rule(rule, verbose: bool = False, log_error: bool = False) bool[source]#
Check if the rule is chemically valid according to valence rules.
Parameters: - rule (Rule): The rule to check for chemical validity. - verbose (bool): If true, logs additional information about the rule checking process. - log_error (bool): If true, logs additional information about the valence checking issue.
Returns: - bool: True if the rule is chemically valid, False otherwise.
- split(rules: List) Tuple[List, List][source]#
Split rules into ‘good’ and ‘bad’ based on their chemical validity.
Parameters: - rules (List[Rule]): A list of rules to be checked and split.
Returns: - Tuple[List[Rule], List[Rule]]: A tuple containing two lists, one for ‘good’ rules and another for ‘bad’ rules.
Apply#
- class synkit.Rule.Apply.reactor_rule.ReactorRule[source]#
Bases:
objectHandles the transformation of SMILES strings to reaction SMILES (RSMI) by applying chemical reaction rules defined in GML strings.
It can optionally reverse the reaction, exclude atom mappings, and include unchanged reagents in the output.
- class synkit.Rule.Apply.retro_reactor.RetroReactor[source]#
Bases:
object- backward_synthesis_search(product_smiles: str, known_precursor_smiles: str, rules: List[str], max_solutions: int = 1, fast_process: bool = True) List[Dict[str, List]][source]#
Perform a backward synthesis search from a product to a known precursor using A* search.
- Constrains any intermediate X to satisfy:
n_C(known_precursor_smiles) <= n_C(X) <= n_C(product_smiles).
If fast_process=True, we prune expansions by storing the best cost at which we have visited each SMILES. If a new path to the same SMILES has a higher cost, we do not expand it again.
If fast_process=False, we disable cost-based pruning, which can yield more solutions (possibly duplicates) but also potentially more computational expense.
Parameters: - product_smiles (str): SMILES string of the product molecule. - known_precursor_smiles (str): SMILES string of the known precursor molecule. - rules (List[str]): List of transformation rules to apply in backward mode. - max_solutions (int): Maximum number of solution pathways to return. Defaults to 1. - fast_process (bool): If True, enable pruning (classic A*). If False,
do not prune (which can discover more solutions but is slower).
Returns: - List[Dict[str, List]]: A list of solution pathways, each represented as a dictionary with:
- {{
‘rule_index’: List[int], # The sequence of rule indices used ‘smiles’: List[str] # The sequence of SMILES (excluding the final known precursor)
}}
rule_matcher.py#
Immutable matcher for applying a reaction‑template rule to a reaction SMILES.
Key features#
Standardization – canonicalize the input RSMI.
Balanced vs. partial matching – uses stoichiometric balance checks.
SMARTS extraction – extracts SMARTS that reproduce the RSMI.
Introspective API – stores the match on init; exposes get_result(), help(), __str__(), and __repr__() for inspection.
Quick start#
>>> from synkit.Graph.rule_matcher import RuleMatcher
>>> matcher = RuleMatcher('CCO>>CC=O', some_rule_graph)
>>> smarts, rule = matcher.get_result()
- class synkit.Rule.Apply.rule_matcher.RuleMatcher(rsmi: str, rule: str | Graph, explicit_h: bool = True, electron_diagnostics: bool = False)[source]#
Bases:
objectMatch a reaction SMILES against a transformation‑rule graph and extract the SMARTS pattern that reproduces the reaction.
On initialization, the matcher standardizes the RSMI, builds reactant/product graphs, checks balance, and finds the matching SMARTS (stored in self.result).
- Parameters:
rsmi (str) – Reaction SMILES in ‘reactant>>product’ format.
rule (nx.Graph) – A NetworkX graph encoding the reaction template.
- Variables:
std (Standardize) – SMILES standardizer instance.
rsmi (str) – Standardized reaction SMILES.
r_graph (nx.Graph) – Reactant graph extracted from rsmi.
p_graph (nx.Graph) – Product graph extracted from rsmi.
balanced (bool) – True if reaction passes stoichiometric balance check.
result (Tuple[str, nx.Graph]) – The matching SMARTS and rule graph tuple.
- static all_in(a: List[str], b: List[str]) bool[source]#
Check if every element of list a appears in list b.
- class synkit.Rule.Apply.rule_rbl.RuleRBL[source]#
Bases:
object- rbl(rsmi: str, gml_rule: str, remove_aam: bool = True) List[str][source]#
Applies transformation rules to a reaction SMILES string based on GML rules.
Parameters: - rsmi (str): Reaction SMILES string to process. - gml_rule (str): GML rule string to apply transformations.
Returns: - List[str]: List of new reaction SMILES strings after applying the rules.
Modify#
Note
synkit.Rule.Modify.implict_rule is currently omitted from this page because
it fails to import in the uploaded repository due to an internal import error.
Re-enable the directive below after that module is fixed.
- class synkit.Rule.Modify.longest_path.LongestPath(G: Graph)[source]#
Bases:
object- BFS(u: int) Tuple[int, int][source]#
Performs a Breadth-First Search (BFS) from a given node u to find the farthest node and its distance.
Parameters: - u (int): The starting node for the BFS.
Returns: - Tuple[int, int]: The farthest node from u and its distance.
- LongestPathInDisconnectedGraph() int[source]#
Finds the longest path in a potentially disconnected graph. The graph can consist of multiple components.
This method performs a BFS on every unvisited component to find the farthest node and computes the longest path across all components.
- Returns:
int: The length of the longest path in the graph across all components.
- class synkit.Rule.Modify.molecule_rule.MoleculeRule[source]#
Bases:
objectA class for generating molecule rules, atom-mapped SMILES, and GML representations from SMILES strings.
- static generate_atom_map(smiles: str) str | None[source]#
Generate atom-mapped SMILES by assigning unique map numbers to each atom in the molecule.
Parameters: - smiles (str): The SMILES string representing the molecule.
Returns: - Optional[str]: The atom-mapped SMILES string, or None if the SMILES string is invalid.
- generate_molecule_rule(smiles: str, name: str = 'molecule', explicit_hydrogen: bool = True, sanitize: bool = True) str | None[source]#
Generate a GML representation of the molecule rule from SMILES.
Parameters: - smiles (str): The SMILES string representing the molecule. - name (str, optional): The rule name used in GML generation. Defaults to ‘molecule’. - explicit_hydrogen (bool, optional): Whether to include explicit hydrogen atoms in GML. Defaults to True. - sanitize (bool, optional): Whether to sanitize the molecule before conversion. Defaults to True.
Returns: - Optional[str]: The GML representation of the molecule rule, or None if invalid.
- static generate_molecule_smart(smiles: str) str | None[source]#
Generate a SMARTS-like string from atom-mapped SMILES.
Parameters: - smiles (str): The SMILES string representing the molecule.
Returns: - Optional[str]: The SMARTS-like string derived from atom-mapped SMILES, or None if the SMILES is invalid.
- static remove_edges_from_left_right(input_str: str) str[source]#
Remove all contents from the ‘left’ and ‘right’ sections of a chemical rule description.
Parameters: - input_str (str): The string representation of the rule.
Returns: - str: The modified string with cleared ‘left’ and ‘right’ sections.
- class synkit.Rule.Modify.prune_templates.PruneTemplate(templates: List[List[Dict[str, Any]]], graph_key: str)[source]#
Bases:
object- fit() List[List[Dict[str, Any]]][source]#
Prune the templates by removing subgraphs where the longest path is shorter than the radius.
- Returns:
List[List[Dict[str, Any]]]: The pruned list of templates.
- static remove_edges_by_attribute(input_graph: Graph, attribute: str = 'standard_order', value: Any = 0) Graph[source]#
Remove edges from the input graph where a given attribute equals a specified value.
Parameters: - input_graph (nx.Graph): The input graph from which edges will be removed. - attribute (str, optional): The edge attribute based on which edges will be removed. Default is ‘standard_order’. - value (Any, optional): The value of the attribute that determines which edges to remove. Default is 0.
- Returns:
nx.Graph: A new graph with the specified edges removed.
- synkit.Rule.Modify.rule_utils.filter_context(context_lines, relevant_nodes)[source]#
Given the context lines and a set of relevant nodes, remove hydrogen nodes not in relevant_nodes and all edges connected to them.
Returns filtered lines.
- synkit.Rule.Modify.rule_utils.find_block(lines, keyword)[source]#
Finds the start and end indices of a block (e.g., “left [”, “context [“, etc.) in the given lines of GML.
Returns (start_idx, end_idx) or (None, None) if not found.
- synkit.Rule.Modify.rule_utils.get_nodes_from_edges(block_lines)[source]#
Extract node IDs from edges in the given block lines.
Returns a set of node IDs found in the edges.
- synkit.Rule.Modify.rule_utils.parse_context(context_lines, node_regex=None, edge_regex=None)[source]#
Parse the context lines to identify nodes and edges.
Returns two structures: - context_nodes: {node_id: label} - context_edges: list of (source, target, label)
- synkit.Rule.Modify.rule_utils.strip_context(gml_text: str, remove_all: bool = True) str[source]#
Filters or clears the ‘context’ section of GML-like content based on the remove_all flag. If remove_all is True, all edges in the ‘context’ section are removed. If False, it removes hydrogen nodes that do not appear in both ‘left’ and ‘right’ sections, along with their edges, while preserving the original structure and formatting of the GML.
Parameters: - gml_text (str): GML-like content describing a chemical reaction rule. - remove_all (bool): Flag to determine if all edges should be removed from the ‘context’.
Returns: - str: The modified GML content with the filtered ‘context’ section.
- synkit.Rule.Modify.strip_rule.filter_context(context_lines, left_edges)[source]#
Given the context lines and a set of edges from the left graph, remove edges from the context that are also present in the left graph (ignoring labels).
Returns filtered lines.
- synkit.Rule.Modify.strip_rule.strip_context(gml_text: str, remove_all: bool = False) str[source]#
Filters or clears the ‘context’ section of GML-like content based on the remove_all flag.
If remove_all is True, all edges in the ‘context’ section are removed. If False, it removes edges in the ‘context’ that are also present in the ‘left’ section.