Graph#
Graph representations, canonicalization, ITS utilities, matching engines, MTG tools, and wildcard-aware graph workflows.
Core#
synkit.Graph.syn_graph#
Wrapper around networkx.Graph providing both original and canonical forms, plus a SHA‑256 signature for fast isomorphism checks.
Key features#
Value‑object semantics – __eq__ and __hash__ use the canonical signature, so graphs can be used in sets/dicts.
Lazy canonicalisation – canonical graph & signature are computed once on demand (cached internally) to avoid upfront cost when not needed.
Transparent delegation – any unknown attribute/method is forwarded to the raw graph.
Example#
>>> G = nx.Graph(); G.add_node(1, element='C')
>>> SG = SynGraph(G)
>>> SG.signature # 32‑hex SHA‑256 digest
'8dc1f7b843e447ff4b67bf0ccc175f63'
>>> SG.canonical # relabelled, sorted graph
<networkx.Graph with 1 nodes>
>>>
- class synkit.Graph.syn_graph.SynGraph(graph: Graph, canonicaliser: GraphCanonicaliser | None = None, canon: bool = True)[source]#
Bases:
objectWrapper around networkx.Graph providing both its original and (optionally) canonicalized form, plus a SHA-256 signature.
Parameters: - graph (nx.Graph): The NetworkX graph to wrap. - canonicaliser (Optional[GraphCanonicaliser]): If provided, used to
produce the canonical form; otherwise a default is constructed.
canon (bool): If True (default), computes and stores both .canonical and .signature. Otherwise they remain None.
Public Properties: - raw nx.Graph The original graph. - canonical Optional[nx.Graph] The canonicalized graph (or None). - signature Optional[str] The SHA-256 hex digest (or None).
Methods: - get_nodes(data: bool = True) -> Iterable[…] - get_edges(data: bool = True) -> Iterable[…] - help() Print this API summary.
- get_edges(data: bool = True) Iterable[Tuple[Any, Any] | Tuple[Any, Any, Dict[str, Any]]][source]#
Yield edges from the original graph.
Parameters#
- databool, default True
If True, yield (u, v, data_dict), else just (u, v).
canonicalize_graph.py#
A pure‑Python / pure‑NetworkX library that assigns canonical, deterministic identifiers to graphs without any external toolchain. It can be dropped into cheminformatics, bio‑networks, knowledge graphs, or any workflow that needs stable de‑duplication of isomorphic graphs.
Why canonicalise?#
De‑duplication – hash the canonical form, then compare hashes instead of running an expensive isomorphism test for every pair.
Index keys – store the 32‑hex digest in a database and use it as a primary key for sub‑structure search or provenance tracking.
Version control‑friendly – serialise nodes/edges in a predictable, line‑ordered way so diffs stay minimal.
Two back‑ends#
backend="generic"(default)Sort‑and‑hash strategy identical to earlier releases. Fastest on graphs where node/edge attributes already break most automorphisms.
backend="wl"Weisfeiler–Lehman colour‑refinement adds structure awareness without leaving pure Python. Slightly slower but collapses many more isomorphic graphs to the same label.
Quick start#
>>> import networkx as nx
>>> from canonicalize_graph import GraphCanonicaliser
>>>
>>> G = nx.Graph()
>>> G.add_node(1, element="C"); G.add_node(2, element="O")
>>> G.add_edge(1, 2, order=1)
>>>
>>> cg = GraphCanonicaliser(backend="wl").canonicalise_graph(G)
>>> cg.canonical_hash
'0df9e34a7c3cd9b35c0ba5f5cbe7598e'
>>> cg.canonical_graph.nodes(data=True)
[(1, {'element': 'C'}), (2, {'element': 'O'})]
- class synkit.Graph.canon_graph.CanonicalGraph(g: Graph, canon: GraphCanonicaliser)[source]#
Bases:
objectValue object tying together:
the original NetworkX graph (mutable, user‑supplied);
its canonical twin (immutable copy, nodes relabelled 1…N);
a 32‑char SHA‑256 digest.
Instances compare & hash by digest only – perfect for set/dict membership while still carrying the underlying graphs.
Do not mutate :pyattr:`original_graph` in place if you need to rely on :pyattr:`canonical_hash`; repeat the canonicalisation after any structural change instead.
- class synkit.Graph.canon_graph.CanonicalRule(rule: str, canon: GraphCanonicaliser = <GraphCanonicaliser backend='generic' node_key=_default_node_key edge_key=_default_edge_key>)[source]#
Bases:
objectValue object that wraps a graph transformation rule in GML string form, providing a canonicalised GML output and a stable 32-character SHA-256 hash.
Internally, the GML rule is parsed into a NetworkX graph via gml_to_its, canonicalised using a GraphCanonicaliser, and re-serialized back to GML with its_to_gml.
Equality and hashing are based solely on the canonical hash, so isomorphic rules (under the chosen backend) compare equal.
Attributes#
- original_rulestr
The raw GML string supplied by the user.
- original_graphnx.Graph
The NetworkX graph parsed from original_rule.
- canonical_graphnx.Graph
The relabelled canonical graph (nodes renumbered 1…N).
- canonical_rulestr
The canonical graph re-serialized to a GML string.
- canonical_hashDigest
32-hex-character SHA-256 digest of the canonical graph.
- class synkit.Graph.canon_graph.GraphCanonicaliser(*, backend: ~typing.Literal['generic', 'wl', 'morgan', 'nauty'] = 'generic', wl_iterations: int = 3, morgan_radius: int = 3, node_attrs: ~typing.List[str] = ['element', 'aromatic', 'charge', 'lone_pairs', 'radical', 'hcount'], node_sort_key: ~typing.Callable[[~typing.Hashable, ~typing.Dict[str, ~typing.Any]], ~typing.Tuple[~typing.Any, ...]] = <function _default_node_key>, edge_sort_key: ~typing.Callable[[~typing.Hashable, ~typing.Hashable, ~typing.Dict[str, ~typing.Any]], ~typing.Tuple[~typing.Any, ...]] = <function _default_edge_key>)[source]#
Bases:
objectFactory that turns arbitrary
networkx.Graphobjects into their canonical twin plus a stable 32‑hex digest.Parameters#
- backend:
"generic"or"wl"(structure‑aware Weisfeiler–Lehman).- wl_iterations:
Depth of WL refinement (ignored for
generic). Three iterations distinguish nearly all real‑world chemical graphs; increase for very large or highly regular topologies.- node_sort_key, edge_sort_key:
Custom deterministic orderings. They must treat their arguments as read‑only and return plain tuples for total ordering.
Notes#
All returned graphs are of the same class as the input (
nx.Graph,nx.DiGraph…), so multigraphs and digraphs are preserved.Examples#
>>> canon = GraphCanonicaliser(backend="generic") >>> sig = canon.canonical_signature(G) >>> cg = canon.canonicalise_graph(G) >>> cg.canonical_graph # a relabelled copy, nodes 1…N
- canonical_signature(graph: Graph) str[source]#
Return the hash of the canonical form of graph.
Equal digests ⇒ graphs are guaranteed isomorphic under the chosen back‑end and keys.
- canonicalise_graph(graph: Graph) CanonicalGraph[source]#
Return a
CanonicalGraphwrapper around graph.The wrapper exposes:
:pyattr:`~CanonicalGraph.canonical_graph` – relabelled 1…N
:pyattr:`~CanonicalGraph.canonical_hash` – 32‑char digest
- canonicalise_graphs(graphs: Iterable[Graph]) Tuple[CanonicalGraph, ...][source]#
Bulk helper that returns all wrappers sorted by hash.
Useful when you want fast set comparison but need the canonical graphs as well:
>>> wrappers = canon.canonicalise_graphs([G1, G2, G3]) >>> unique = {w.canonical_hash for w in wrappers}
- synkit.Graph.utils.add_wildcard_subgraph_for_unmapped(G: Graph, L: Graph, mapping: Dict[Any, Any], edge_keys: List[str] = ['order'], inplace: bool = False, tuple_mode: bool = False) Tuple[Graph, Dict[Any, Any]][source]#
Extend G with wildcard nodes/edges for every L-node not already mapped, preserving original L->G mapping and returning the full mapping.
Parameters#
- Gnx.Graph
Target graph. If inplace=False (default), operates on a shallow copy.
- Lnx.Graph
Pattern/reference graph containing full nodes and edges.
- mappingDict[L_node, G_node]
Partial mapping from pattern L nodes to graph G nodes.
- edge_keysList[str], optional
Edge attributes to copy (first element if list/tuple). Default [‘order’].
- inplacebool, optional
If True, modify G in place; otherwise modify a copy.
- tuple_modebool, optional
If True, scalarize tuple ITS node attrs onto the left side before adding wildcard placeholders to the host graph.
Returns#
- G_extnx.Graph
Extended graph with added wildcard nodes and edges.
- full_mapDict[L_node, G_node]
Combined L->G mapping, original plus newly added wildcard nodes.
- synkit.Graph.utils.clean_graph_keep_largest_component(graph: Graph) Graph[source]#
Return a shallow copy of the input graph with all edges removed where the ‘standard_order’ attribute is exactly 0, then retain only the largest connected component.
Parameters#
- graphnx.Graph
The input molecular graph.
Returns#
- nx.Graph
A modified copy of the original graph with specified edges removed and only the largest connected component preserved.
- synkit.Graph.utils.has_wildcard_node(G: Graph, element_attr: str = 'element', wildcard: Any = '*') bool[source]#
Fast check: return True if any node has its element_attr equal to the wildcard, using the public API with minimal overhead.
- synkit.Graph.utils.print_graph_attributes(G: Graph) None[source]#
Print all node and edge attributes from a NetworkX graph.
- Parameters:
G (nx.Graph): A NetworkX graph (Graph, DiGraph, MultiGraph, etc.).
- synkit.Graph.utils.remove_wildcard_nodes(G: Graph, inplace: bool = True) Graph[source]#
Remove all wildcard nodes from the graph.
A wildcard node is identified by having its ‘element’ attribute equal to ‘*’.
Parameters#
- Gnx.Graph
The input graph from which wildcard nodes will be removed.
- inplacebool, optional
If True, modify the input graph in place and return it. If False (default), a copy of the graph is created and the removal is applied to the copy.
Returns#
- nx.Graph
The graph after removing all wildcard nodes.
Canonicalization#
networkx_canonical_algorithms.py#
NetworkX-based canonical-labelling utilities for molecular graphs. Each helper produces a deterministic ordering (or signature) for graph isomorphism tasks and returns:
a relabelled NetworkX graph copy (where applicable),
a 32-hex SHA-256 digest.
- Dependencies:
networkx
numpy
python-bliss (optional, for NAUTY/BLISS canonicalisation)
- synkit.Graph.Canon.canon_algs.canon_morgan(g: Graph, morgan_radius: int = 2, node_attributes: List[str] = None) Tuple[Graph, str][source]#
Prime-based neighbourhood refinement analogous to Morgan fingerprinting.
Each node is initially assigned a unique prime number; optionally, specified node attributes are incorporated into the seed label. For each iteration up to morgan_radius, node labels are updated by multiplying by the labels of neighboring nodes.
Parameters#
- gnx.Graph
Input molecular graph.
- morgan_radiusint, optional
Number of refinement iterations, by default 2.
- node_attributesList[str], optional
Node attribute keys to include in initial hashing; if None, only prime seeding is used.
Returns#
- Tuple[nx.Graph, Digest]
Relabelled graph with canonical node ordering.
32-hex digest of the sequence of final labels per node.
- synkit.Graph.Canon.canon_algs.eigen_canonical_signature(g: Graph) str[source]#
Compute a graph signature from sorted eigenvalues of its weighted adjacency matrix.
Edge weights are taken from the ‘order’ attribute (default=1). The adjacency matrix is symmetric for undirected graphs.
Parameters#
- gnx.Graph
Input molecular graph.
Returns#
- Digest
32-hex digest of sorted real parts of eigenvalues.
- synkit.Graph.Canon.canon_algs.pgraph_signature(g: Graph, p: int = 4) str[source]#
Generate a signature by hashing all simple paths up to length p.
Each path is represented as a hyphen-separated sequence of node ‘element’ attributes (or ‘?’ if missing), and the sorted list of these sequences is concatenated for hashing.
Parameters#
- gnx.Graph
Input molecular graph.
- pint, optional
Maximum path length (number of edges), by default 4.
Returns#
- Digest
32-hex digest of the concatenated sorted path strings.
- synkit.Graph.Canon.canon_algs.ring_canonical_graph(g: Graph) Tuple[Graph, str][source]#
Generate a relabelled graph based on SSSR membership hierarchy and compute its canonical signature.
- Nodes are ordered by:
Number of smallest rings they belong to (SSSR count).
Node degree.
Original node identifier.
Parameters#
- gnx.Graph
Input molecular graph (nodes may have attributes).
Returns#
- Tuple[nx.Graph, Digest]
Relabelled graph with nodes numbered 1..N according to canonical order.
32-hex digest based on node membership counts and ordering.
canonicalize_graph.py#
A pure‑Python / pure‑NetworkX library that assigns canonical, deterministic identifiers to graphs without any external toolchain. It can be dropped into cheminformatics, bio‑networks, knowledge graphs, or any workflow that needs stable de‑duplication of isomorphic graphs.
Why canonicalise?#
De‑duplication – hash the canonical form, then compare hashes instead of running an expensive isomorphism test for every pair.
Index keys – store the 32‑hex digest in a database and use it as a primary key for sub‑structure search or provenance tracking.
Version control‑friendly – serialise nodes/edges in a predictable, line‑ordered way so diffs stay minimal.
Two back‑ends#
backend="generic"(default)Sort‑and‑hash strategy identical to earlier releases. Fastest on graphs where node/edge attributes already break most automorphisms.
backend="wl"Weisfeiler–Lehman colour‑refinement adds structure awareness without leaving pure Python. Slightly slower but collapses many more isomorphic graphs to the same label.
Quick start#
>>> import networkx as nx
>>> from canonicalize_graph import GraphCanonicaliser
>>>
>>> G = nx.Graph()
>>> G.add_node(1, element="C"); G.add_node(2, element="O")
>>> G.add_edge(1, 2, order=1)
>>>
>>> cg = GraphCanonicaliser(backend="wl").canonicalise_graph(G)
>>> cg.canonical_hash
'0df9e34a7c3cd9b35c0ba5f5cbe7598e'
>>> cg.canonical_graph.nodes(data=True)
[(1, {'element': 'C'}), (2, {'element': 'O'})]
- class synkit.Graph.Canon.canon_graph.CanonicalGraph(g: Graph, canon: GraphCanonicaliser)[source]#
Bases:
objectValue object tying together:
the original NetworkX graph (mutable, user‑supplied);
its canonical twin (immutable copy, nodes relabelled 1…N);
a 32‑char SHA‑256 digest.
Instances compare & hash by digest only – perfect for set/dict membership while still carrying the underlying graphs.
Do not mutate :pyattr:`original_graph` in place if you need to rely on :pyattr:`canonical_hash`; repeat the canonicalisation after any structural change instead.
- class synkit.Graph.Canon.canon_graph.CanonicalRule(rule: str, canon: GraphCanonicaliser = <GraphCanonicaliser backend='generic' node_key=_default_node_key edge_key=_default_edge_key>)[source]#
Bases:
objectValue object that wraps a graph transformation rule in GML string form, providing a canonicalised GML output and a stable 32-character SHA-256 hash.
Internally, the GML rule is parsed into a NetworkX graph via gml_to_its, canonicalised using a GraphCanonicaliser, and re-serialized back to GML with its_to_gml.
Equality and hashing are based solely on the canonical hash, so isomorphic rules (under the chosen backend) compare equal.
Attributes#
- original_rulestr
The raw GML string supplied by the user.
- original_graphnx.Graph
The NetworkX graph parsed from original_rule.
- canonical_graphnx.Graph
The relabelled canonical graph (nodes renumbered 1…N).
- canonical_rulestr
The canonical graph re-serialized to a GML string.
- canonical_hashDigest
32-hex-character SHA-256 digest of the canonical graph.
- class synkit.Graph.Canon.canon_graph.GraphCanonicaliser(*, backend: ~typing.Literal['generic', 'wl', 'morgan', 'nauty'] = 'generic', wl_iterations: int = 3, morgan_radius: int = 3, node_attrs: ~typing.List[str] = ['element', 'aromatic', 'charge', 'hcount'], node_sort_key: ~typing.Callable[[~typing.Hashable, ~typing.Dict[str, ~typing.Any]], ~typing.Tuple[~typing.Any, ...]] = <function _default_node_key>, edge_sort_key: ~typing.Callable[[~typing.Hashable, ~typing.Hashable, ~typing.Dict[str, ~typing.Any]], ~typing.Tuple[~typing.Any, ...]] = <function _default_edge_key>)[source]#
Bases:
objectFactory that turns arbitrary
networkx.Graphobjects into their canonical twin plus a stable 32‑hex digest.Parameters#
- backend:
"generic"or"wl"(structure‑aware Weisfeiler–Lehman).- wl_iterations:
Depth of WL refinement (ignored for
generic). Three iterations distinguish nearly all real‑world chemical graphs; increase for very large or highly regular topologies.- node_sort_key, edge_sort_key:
Custom deterministic orderings. They must treat their arguments as read‑only and return plain tuples for total ordering.
Notes#
All returned graphs are of the same class as the input (
nx.Graph,nx.DiGraph…), so multigraphs and digraphs are preserved.Examples#
>>> canon = GraphCanonicaliser(backend="generic") >>> sig = canon.canonical_signature(G) >>> cg = canon.canonicalise_graph(G) >>> cg.canonical_graph # a relabelled copy, nodes 1…N
- canonical_signature(graph: Graph) str[source]#
Return the hash of the canonical form of graph.
Equal digests ⇒ graphs are guaranteed isomorphic under the chosen back‑end and keys.
- canonicalise_graph(graph: Graph) CanonicalGraph[source]#
Return a
CanonicalGraphwrapper around graph.The wrapper exposes:
:pyattr:`~CanonicalGraph.canonical_graph` – relabelled 1…N
:pyattr:`~CanonicalGraph.canonical_hash` – 32‑char digest
- canonicalise_graphs(graphs: Iterable[Graph]) Tuple[CanonicalGraph, ...][source]#
Bulk helper that returns all wrappers sorted by hash.
Useful when you want fast set comparison but need the canonical graphs as well:
>>> wrappers = canon.canonicalise_graphs([G1, G2, G3]) >>> unique = {w.canonical_hash for w in wrappers}
- class synkit.Graph.Canon.nauty.NautyCanonicalizer(node_attrs: list[str] | None = None, edge_attrs: list[str] | None = None)[source]#
Bases:
objectPerform Nauty-style canonicalization of a NetworkX graph, optionally refining and distinguishing nodes and edges by specified attributes, and extracting automorphisms, orbits, and canonical permutations.
- Parameters:
node_attrs (list[str] | None) – List of node attribute keys to include in the initial partition refinement. Nodes sharing the same tuple of values under these keys will start in the same cell.
edge_attrs (list[str] | None) – List of edge attribute keys to include when distinguishing edges in the canonical label. If an edge has none of these keys, its contribution will be empty.
- canonical_form(G: Graph, return_aut: bool = False, remap_aut: bool = False, return_orbits: bool = False, return_perm: bool = False, max_depth: int | None = None)[source]#
Compute canonical form of graph G with optional automorphisms, orbits, and early stopping.
- Parameters:
G – NetworkX graph to canonicalize.
return_aut – bool, whether to return list of automorphism permutations.
Default: False. :param remap_aut: bool, whether to remap automorphisms to canonical labels (only valid if return_aut=True). Default: False. :param return_orbits: bool, whether to return node orbits (symmetry groups). Default: False. :param return_perm: bool, whether to return canonical permutation (ordering of nodes). Default: False. :param max_depth: int or None, max recursion depth for backtracking search (early stopping). Default: None (unlimited). :return: tuple containing requested results and a boolean early_stop flag indicating if search terminated early.
The order of outputs is (G_canon, perm?, automorphisms?, orbits?, early_stop).
- edge_attrs#
- node_attrs#
Context#
- class synkit.Graph.Context.hier_context.HierContext(node_label_names: List[str] = ['element', 'charge'], node_label_default: List[Any] = ['*', 0], edge_attribute: str = 'order', max_radius: int = 3)[source]#
Bases:
RadiusExpandHierarchical clustering class for reaction context graphs.
Extends RadiusExpand to build multi-level graph representations and clusters them based on structural features such as Weisfeiler-Lehman hashing.
- fit(original_data: List[Dict[str, Any]], its_key: str = 'ITS', context_key: str = 'K') Tuple[List[Dict[str, Any]], List[List[Dict[str, Any]]]][source]#
Processes a list of graph data entries, classifying each based on hierarchical clustering. The method extracts context subgraphs, computes graph hashes, and clusters the data at multiple hierarchical levels. Finally, child node indices are updated based on parent–cluster relationships.
Parameters: - original_data (List[Dict[str, Any]]): A list of dictionaries, each representing a graph data entry
with an ITS graph.
its_key (str): The key in each dictionary corresponding to the ITS graph (default is “ITS”).
context_key (str): The key under which the extracted context subgraph is stored (default is “K”).
Returns: - Tuple[List[Dict[str, Any]], List[List[Dict[str, Any]]]]: A tuple containing:
The updated list of graph data entries with hierarchical cluster indices.
A list (per hierarchical level) of template dictionaries that have been updated with child indices.
- class synkit.Graph.Context.radius_expand.RadiusExpand[source]#
Bases:
objectA utility class for extracting and expanding reaction contexts from chemical reaction graphs.
This class provides methods to: - Identify reaction center nodes based on unequal edge orders. - Expand reaction centers by including n-level nearest neighbors. - Extract subgraphs from larger graphs. - Construct a reaction context subgraph (K graph) from an ITS graph. - Retrieve the longest unique extension path from reaction centers using DFS. - Perform parallel extraction of reaction contexts from multiple reaction dictionaries. - Remove edges based on specified edge attribute values.
- static context_extraction(data: Dict[str, Any], its_key: str = 'ITS', context_key: str = 'K', n_knn: int = 0) Dict[str, Any][source]#
Extracts the reaction context for a single reaction dictionary by computing both the context subgraph and the longest extension path.
Parameters: - data (Dict[str, Any]): Reaction data containing at least an ITS graph. - its_key (str, optional): Key in the dictionary for retrieving the ITS graph. Default is ITS. - context_key (str, optional): Key under which to store the extracted context subgraph. Default is K. - n_knn (int, optional): Number of neighbor levels to include for context extraction. Default is 0.
Returns: - Dict[str, Any]: The updated reaction data dictionary including the extracted context subgraph under the key specified by context_key.
- static extract_k(its: Graph, n_knn: int = 0) Tuple[Graph, Any][source]#
Constructs the context subgraph (K graph) from an ITS graph based on reaction centers, and computes the longest extension path from these centers constrained by ‘standard_order’ edges.
Parameters: - its (nx.Graph): The ITS graph representing the reaction network. - n_knn (int, optional): The number of neighbor levels to include in the context subgraph. Default is 0.
Returns: - Tuple[nx.Graph, Any]:
The extracted context subgraph (K graph). If n_knn is 0, this is the reaction center graph,
if n_knn is -1, maximum n_knn is used.
- static extract_subgraph(G: Graph, node_indices: List[int]) Graph[source]#
Extracts a subgraph from the original graph containing the specified node indices.
Parameters: - G (nx.Graph): The original graph. - node_indices (List[int]): A list of node indices to include in the subgraph.
Returns: - nx.Graph: A new graph that is a copy of the subgraph containing only the specified nodes.
- static find_nearest_neighbors(G: Graph, center_nodes: List[int], n_knn: int = 1) Set[int][source]#
Finds the n-level nearest neighbors around the specified center nodes in a graph.
Parameters: - G (nx.Graph): The graph in which to search for neighboring nodes. - center_nodes (List[int]): Initial center node indices. - n_knn (int, optional): The number of neighbor levels to include (default is 1).
Returns: - Set[int]: A set of node indices including the original center nodes and their nearest neighbors.
- static find_unequal_order_edges(G: Graph) List[int][source]#
Identifies reaction center nodes in a graph based on the presence of unequal order edges.
Parameters: - G (nx.Graph): Graph to analyze for reaction centers.
Returns: - List[int]: A list of node indices identified as reaction centers based on unequal order edges.
- static longest_radius_extension(G: Graph, rc_nodes: List[int]) List[int][source]#
Computes the longest unique extension path in the graph starting from the given reaction center nodes, constrained by traversing only those edges where the ‘standard_order’ attribute equals 0.
This method uses a depth-first search (DFS) strategy to explore all possible unique paths and returns the longest one.
Parameters: - G (nx.Graph): The graph to search for extension paths. - rc_nodes (List[int]): A list of reaction center node indices to serve as starting points for the search.
Returns: - List[int]: A list of node indices representing the longest unique extension path found.
- classmethod paralle_context_extraction(data: List[Dict[str, Any]], its_key: str = 'ITS', context_key: str = 'K', n_jobs: int = 1, verbose: int = 0, n_knn: int = 0) List[Dict[str, Any]][source]#
Performs parallel extraction of reaction contexts for multiple reaction dictionaries.
Parameters: - data (List[Dict[str, Any]]): A list of reaction data dictionaries, each containing an ITS graph.
its_key (str, optional): Key in the dictionary for retrieving the ITS graph.
Default is ITS. - context_key (str, optional): Key under which to store the extracted context subgraph. Default is K. - n_jobs (int, optional): Number of parallel jobs to use. Default is 1. - verbose (int, optional): Verbosity level for the parallel processing. Default is 0. - n_knn (int, optional): Number of neighbor levels to include for context extraction. Default is 0.
Returns: - List[Dict[str, Any]]: A list of updated reaction data dictionaries, each augmented with the
extracted context subgraph and the longest extension path.
- static remove_normal_edges(graph: Graph, property_key: str) Graph[source]#
Removes edges from a graph where the specified edge attribute has a value of 0.
Parameters: - graph (nx.Graph): The input graph to modify. - property_key (str): The key of the edge attribute to check for removal; edges with a value of 0 will be removed.
Returns: - nx.Graph: A copy of the input graph with the specified edges removed.
Features#
- class synkit.Graph.Feature.graph_descriptors.GraphDescriptor[source]#
Bases:
object- static check_graph_type(G: Graph) str[source]#
Classifies the graph as acyclic, single cyclic, or complex cyclic.
Parameters: - G (nx.Graph): The graph to be checked.
Returns: - str: The classification result.
- static get_cycle_member_rings(G: Graph, type='minimal') List[int][source]#
Identifies all cycles in the given graph using cycle bases to ensure no overlap and returns a list of the sizes of these cycles (member rings), sorted in ascending order.
Parameters: - G (nx.Graph): The NetworkX graph to be analyzed.
Returns: - List[int]: A sorted list of cycle sizes (member rings) found in the graph.
- static get_descriptors(entry: Dict, reaction_centers: str = 'RC', its: str = 'ITS', condensed: bool = True) Dict[source]#
Enhance an entry dictionary with topology type and reaction type descriptors.
Parameters: - entry (Dict): A dictionary with reaction data. - reaction_centers (str): Key for accessing reaction center data. - its (str): Key for accessing ITS (Intermediate Transition State) data.
Returns: - Dict: The enhanced entry with additional descriptors.
- static get_element_count(graph: Graph) Dict[str, int][source]#
Counts occurrences of each element in the graph nodes.
Parameters: - graph (nx.Graph): A NetworkX graph with ‘element’ attribute in nodes.
Returns: - Dict[str, int]: An ordered dictionary with element counts.
- static is_acyclic_graph(G: Graph) bool[source]#
Determines if the given graph is acyclic.
Parameters: - G (nx.Graph): The graph to be checked.
Returns: - bool: True if the graph is acyclic, False otherwise.
- static is_complex_cyclic_graph(G: Graph) bool[source]#
Determines if the graph is complex cyclic with multiple cycles.
Parameters: - G (nx.Graph): The graph to be checked.
Returns: - bool: True if the graph is complex cyclic, False otherwise.
- static is_graph_empty(graph: Graph | dict | list | Any) bool[source]#
Determine if a graph representation is empty.
Parameters: - graph (Union[nx.Graph, dict, list, Any]): A graph representation which can be
a NetworkX graph, a dictionary, a list, or an object with an ‘is_empty’ method.
Returns: - bool: True if the graph is empty, False otherwise.
Raises: - TypeError: If the graph representation is not supported.
- static is_single_cyclic_graph(G: Graph) bool[source]#
Determines if the given graph has exactly one cycle.
Parameters: - G (nx.Graph): The graph to be checked.
Returns: - bool: True if the graph is single cyclic, False otherwise.
- static process_entries_in_parallel(entries: List[Dict], reaction_centers: str = 'RC', its: str = 'ITS', condensed: bool = True, n_jobs: int = 4, verbose: int = 0) List[Dict][source]#
Processes a list of entries in parallel to enhance each entry with descriptors.
Parameters: - entries (List[Dict]): List of dictionaries containing reaction data to enhance. - reaction_centers (str): Key to retrieve reaction center graph data from each entry dictionary. - its (str): Key to retrieve ITS (Intermediate Transition State) graph data from each entry dictionary. - condensed (bool): If True, condenses node signatures with counts. - n_jobs (int): Number of jobs to run in parallel. -1 uses all processors. - verbose (int): The verbosity level for joblib’s Parallel.
Returns: - List[Dict]: A list of enhanced dictionaries with added descriptors.
- synkit.Graph.Feature.graph_descriptors.check_graph_connectivity(graph: Graph) str[source]#
Check the connectivity of a NetworkX graph.
This function assesses whether all nodes in the graph are connected by some path, applicable to undirected graphs.
Parameters: - graph (nx.Graph): A NetworkX graph object.
Returns: - str: Returns ‘Connected’ if the graph is connected, otherwise ‘Disconnected’.
Raises: - NetworkXNotImplemented: If graph is directed and does not support is_connected.
- class synkit.Graph.Feature.graph_fps.GraphFP(graph: Graph, nBits: int = 1024, hash_alg: str = 'sha256')[source]#
Bases:
object- fingerprint(method: str) str[source]#
Generate a binary string fingerprint of the graph using the specified method.
Parameters: - method (str): The method to use for fingerprinting (‘spectrum’, ‘adjacency’, ‘degree’, ‘motif’)
Returns: - str: A binary string of length nBits that represents the fingerprint of the graph.
- iterative_deepening(remaining_bits: int) str[source]#
Extend the hash length using iterative hashing until the desired bit length is achieved.
Parameters: - remaining_bits (int): Number of bits needed to complete the fingerprint to nBits.
Returns: - str: Additional binary data to achieve the desired hash length.
- class synkit.Graph.Feature.graph_signature.GraphSignature(graph: Graph)[source]#
Bases:
objectProvides methods to generate canonical signatures for graph edges (with flexible ‘order’ and ‘state’ attributes, and node degrees/neighbor information), various spectral invariants, adjacency matrix, and complete graphs.
Aims for high uniqueness without relying solely on isomorphism checks.
- create_edge_signature(include_neighbors: bool = False, max_hop: int = 2) str[source]#
Generates a canonical edge signature by formatting each edge with sorted node elements (including charge), node degrees, bond order, bond state, and optionally including neighbor information and topological context.
Parameters: - include_neighbors (bool): Whether to include neighbors’ details in the edge signature. - max_hop (int): Maximum number of hops to include for neighbor-level structural information.
Returns: - str: A concatenated and sorted string of edge representations.
- create_graph_signature(include_wl_hash: bool = True, include_neighbors: bool = True, max_hop: int = 1) str[source]#
Combines edge, various spectral invariants, and WL hash into a single comprehensive graph signature.
Parameters: - include_wl_hash (bool): Whether to include the Weisfeiler-Lehman hash. - include_spectral (bool): Whether to include spectral invariants. - include_combined_hash (bool): Whether to include the combined hash. - include_neighbors (bool): Whether to include neighbor information in edge signatures.
Returns: - str: A concatenated string representing the complete graph signature.
- class synkit.Graph.Feature.hash_fps.HashFPs(graph: Graph, numBits: int = 256, hash_alg: str = 'sha256')[source]#
Bases:
object- extract_features(start_node: int | None, end_node: int | None, max_path_length: int | None) str[source]#
Extract features from the graph based on paths and cycles.
Parameters: - start_node (Optional[int]): The starting node for path detection. - end_node (Optional[int]): The ending node for path detection. - max_path_length (Optional[int]): Cutoff for path length during detection.
Returns: - str: A string of concatenated feature values.
- finalize_hash(hash_object: Any, features: str) str[source]#
Finalize the hash using the features extracted and return the hash as a binary string.
Parameters: - hash_object (Any): The hash object. - features (str): Concatenated string of graph features.
Returns: - str: The final binary string of the hash, truncated or extended to numBits.
- hash_fps(start_node: int | None = None, end_node: int | None = None, max_path_length: int | None = None) str[source]#
Generate a binary hash fingerprint of the graph based on its paths and cycles.
Parameters: - start_node (Optional[int]): The starting node index for path detection. - end_node (Optional[int]): The ending node index for path detection. - max_path_length (Optional[int]): The maximum length for paths to be considered.
Returns: - str: A binary string representing the truncated hash of the graph’s structural features.
- initialize_hash() Any[source]#
Initialize and return the hash object based on the specified algorithm.
- iterative_deepening(hash_object: Any, remaining_bits: int) str[source]#
Extend hash length using iterative hashing until the desired bit length is achieved.
Parameters: - hash_object (hashlib._Hash): The hash object for iterative deepening. - remaining_bits (int): Number of bits needed to reach numBits.
Returns: - str: Additional binary data to achieve the desired hash length.
- class synkit.Graph.Feature.morgan_fps.MorganFPs(graph: Graph, radius: int = 3, nBits: int = 1024, hash_alg: str = 'sha256')[source]#
Bases:
object- generate_fingerprint() str[source]#
Generate a binary string fingerprint of the graph based on the local environments of nodes. Ensures the output is exactly nBits in length using iterative deepening if necessary.
Returns: - str: A binary string of length nBits representing the fingerprint of the graph.
- iterative_deepening(hash_object: Any, remaining_bits: int) str[source]#
Extend the hash length using iterative hashing until the desired bit length is achieved.
Parameters: - hash_object (hashlib._Hash): The hash object used for iterative deepening. - remaining_bits (int): Number of bits needed to complete the fingerprint to nBits.
Returns: - str: Additional binary data to achieve the desired hash length.
- class synkit.Graph.Feature.path_fps.PathFPs(graph: Graph, max_length: int = 10, nBits: int = 1024, hash_alg: str = 'sha256')[source]#
Bases:
object- generate_fingerprint() str[source]#
Generate a binary string fingerprint of the graph by hashing paths up to a certain length and combining them.
Returns: - str: A binary string of length nBits that represents the fingerprint of the graph.
- iterative_deepening(hash_object: Any, remaining_bits: int) str[source]#
Extend the hash length using iterative hashing until the desired bit length is achieved.
Parameters: - hash_object (hashlib._Hash): The hash object used for iterative deepening. - remaining_bits (int): Number of bits needed to complete the fingerprint to nBits.
Returns: - str: Additional binary data to achieve the desired hash length.
- class synkit.Graph.Feature.wl_hash.WLHash(node: str | List[str] = ['element', 'charge'], edge: str | List[str] = 'order', iterations: int = 5, digest_size: int = 16)[source]#
Bases:
objectA class that implements the Weisfeiler-Lehman graph hashing algorithm, supporting multiple node/edge attributes for hashing.
Attributes: - node: A single attribute name or a list of attribute names for nodes used in hashing. - edge: A single attribute name or a list of attribute names for edges used in hashing. - iterations: Number of iterations for the Weisfeiler-Lehman algorithm. - digest_size: Length of the hash to be generated.
- process_data(data: List[Dict[str, str | Graph]], graph_key: str = 'ITS', subgraph: bool = False) List[Dict[str, str | None]][source]#
Applies WL hashing (or subgraph hashing) to a list of data entries.
Each entry must contain a graph under ‘graph_key’.
Functional groups#
- synkit.Graph.FG.api.smiles_to_graph_and_functional_groups(smiles: str, *, sanitize: bool = True) tuple[Graph, list[tuple[str, tuple[int, ...]]]][source]#
Convert SMILES to a molecular graph and detect functional groups.
Atom-mapped SMILES keep their non-zero atom-map numbers as graph node IDs. Unmapped atoms use their 1-based atom order as node IDs, so both mapped and unmapped SMILES can be passed to the same API.
- Parameters:
- Returns:
Molecular graph and detected
(name, node_ids)FG labels.- Return type:
- Raises:
ValueError – If the SMILES cannot be converted to a molecular graph.
- class synkit.Graph.FG.audit.FunctionalGroupAudit(reactions: int, molecules: int, parse_failures: int, elapsed_seconds: float, label_counts: Counter[str], heteroaromatic_systems: int, named_heteroaromatic_systems: int, unnamed_heteroaromatic_systems: Counter[tuple], uncovered_atom_signatures: Counter[tuple], uncovered_edge_signatures: Counter[tuple])[source]#
Bases:
objectAggregated detector coverage over a reaction-SMILES corpus.
- synkit.Graph.FG.audit.audit_reaction_smiles(reactions: Iterable[str], *, standardizer: Standardize | None = None) FunctionalGroupAudit[source]#
Audit FG coverage for an iterable of reaction SMILES strings.
- synkit.Graph.FG.catalog.default_registry() FunctionalGroupRegistry[source]#
Build the default graph-native functional-group registry.
- class synkit.Graph.FG.detector.FunctionalGroupDetector(registry: FunctionalGroupRegistry | None = None)[source]#
Bases:
objectDetect functional groups from an input molecular
nx.Graph.- detect(graph: Graph) list[tuple[str, tuple[int, ...]]][source]#
Return simple
(name, node_ids)functional-group labels.
- matches(graph: Graph) list[FunctionalGroupMatch][source]#
Return hierarchy-resolved matches.
- class synkit.Graph.FG.model.FunctionalGroupMatch(name: str, group_nodes: tuple[int, ...], mapping: dict[int, int], pattern: FunctionalGroupPattern)[source]#
Bases:
objectOne matched functional group in a host graph.
- pattern: FunctionalGroupPattern#
- class synkit.Graph.FG.model.FunctionalGroupPattern(name: str, graph: Graph, group_nodes: tuple[int, ...], parents: tuple[str, ...] = (), suppresses: tuple[str, ...] = (), requires: tuple[str, ...] = (), anchor_node: int | None = None, priority: int = 0, validator: Callable[[Graph, dict[int, int]], bool] | None = None, recognizer: Callable[[Graph, FunctionalGroupPattern], list[FunctionalGroupMatch]] | None = None, public: bool = True)[source]#
Bases:
objectGraph-native functional-group definition.
- recognizer: Callable[[Graph, FunctionalGroupPattern], list[FunctionalGroupMatch]] | None = None#
- class synkit.Graph.FG.model.FunctionalGroupRegistry(patterns: list[FunctionalGroupPattern] = <factory>)[source]#
Bases:
objectContainer for functional-group patterns and hierarchy metadata.
- add(pattern: FunctionalGroupPattern) None[source]#
- by_name(name: str) FunctionalGroupPattern[source]#
- execution_order() list[FunctionalGroupPattern][source]#
Return patterns in prerequisite-respecting order.
- extend(patterns: Iterable[FunctionalGroupPattern]) None[source]#
- is_ancestor(ancestor: str, child: str) bool[source]#
Return whether
ancestoris an ancestor ofchild.
- patterns: list[FunctionalGroupPattern]#
- class synkit.Graph.FG.ring_system.AromaticRingSystem(nodes: tuple[int, ...], edges: tuple[tuple[int, int], ...], hetero_nodes: tuple[int, ...], element_counts: dict[str, int], ring_sizes: tuple[int, ...], subrings: tuple[AromaticSubring, ...], is_fused: bool, hetero_sequence: tuple[str, ...] | None, hetero_pattern: str)[source]#
Bases:
objectOne connected aromatic ring system from a molecular graph.
- subrings: tuple[AromaticSubring, ...]#
- class synkit.Graph.FG.ring_system.AromaticRingSystemDetector[source]#
Bases:
objectExtract aromatic connected components and characterize their rings.
- static detect(graph: Graph) list[AromaticRingSystem][source]#
Hydrogen utilities#
- synkit.Graph.Hyrogen._misc.check_equivariant_graph(its_graphs: List[Graph]) Tuple[List[Tuple[int, int]], int][source]#
Checks for isomorphism among a list of ITS graphs.
Parameters: - its_graphs (List[nx.Graph]): A list of ITS graphs.
Returns: - List[Tuple[int, int]]: A list of tuples representing pairs of indices of isomorphic graphs.
- synkit.Graph.Hyrogen._misc.check_explicit_hydrogen(graph: Graph) tuple[source]#
Counts the explicit hydrogen nodes in the given graph and collects their IDs.
Parameters: - graph (nx.Graph): The graph to inspect.
Returns: tuple: A tuple containing the number of hydrogen nodes and a list of their node IDs.
- synkit.Graph.Hyrogen._misc.check_hcount_change(react_graph: Graph, prod_graph: Graph) int[source]#
Computes the maximum change in hydrogen count (‘hcount’) between corresponding nodes in the reactant and product graphs. It considers both hydrogen formation and breakage.
Parameters: - react_graph (nx.Graph): The graph representing reactants. - prod_graph (nx.Graph): The graph representing products.
Returns: int: The maximum hydrogen change observed across all nodes.
- synkit.Graph.Hyrogen._misc.get_cycle_member_rings(G: Graph, type='minimal') List[int][source]#
Identifies all cycles in the given graph using cycle bases to ensure no overlap and returns a list of the sizes of these cycles (member rings), sorted in ascending order.
Parameters: - G (nx.Graph): The NetworkX graph to be analyzed.
Returns: - List[int]: A sorted list of cycle sizes (member rings) found in the graph.
- synkit.Graph.Hyrogen._misc.get_priority(reaction_centers: List[Any]) List[int][source]#
Evaluate reaction centers for specific graph characteristics, selecting indices based on the shortest reaction paths and maximum ring sizes, and adjusting for certain graph types by modifying the ring information.
Parameters: - reaction_centers: List[Any], a list of reaction centers where each center should be capable of being analyzed for graph type and ring sizes.
Returns: - List[int]: A list of indices from the original list of reaction centers that meet the criteria of having the shortest reaction steps and/or the largest ring sizes. Returns indices with minimum reaction steps if no indices meet both criteria.
- synkit.Graph.Hyrogen._misc.h_to_explicit(G: Graph, nodes: List[int] = None, its: bool = False) Graph[source]#
Convert implicit hydrogen counts on heavy atoms into explicit hydrogen nodes.
For each node ID in nodes, this function reads the node’s ‘hcount’, adds that many new hydrogen nodes, connects them to the node with a single bond (order=1.0), and decrements the node’s ‘hcount’. Optionally updates the ‘typesGH’ field if present.
Parameters#
- Gnx.Graph
Input graph with heavy atoms containing ‘hcount’ indicating implicit hydrogens.
- nodesList[int]
List of node IDs (typically heavy atoms) on which to expand implicit hydrogens.
Returns#
- nx.Graph
A copy of the graph with new explicit hydrogen nodes added and connected to the specified heavy atoms.
- synkit.Graph.Hyrogen._misc.h_to_implicit(G: Graph) Graph[source]#
Convert explicit hydrogen atoms to implicit counts on heavy atoms.
For each hydrogen atom (‘element’ == ‘H’), its neighbor (assumed to be a heavy atom) will have its ‘hcount’ attribute incremented. The hydrogen nodes are then removed.
Parameters#
- Gnx.Graph
Input graph with explicit hydrogen atoms as nodes (element=’H’). Heavy atoms must have ‘element’ and optionally ‘hcount’ attributes.
Returns#
- nx.Graph
A copy of the original graph with hydrogen atoms removed and their counts added to the corresponding heavy atoms’ ‘hcount’ attribute.
- synkit.Graph.Hyrogen._misc.has_HH(G: Graph) bool[source]#
Check whether the graph contains any heavy atom–hydrogen bond.
A heavy atom is any atom whose ‘element’ attribute is not ‘H’. This function searches for any edge that connects a heavy atom to a hydrogen atom.
Parameters#
- Gnx.Graph
A graph where each node has an ‘element’ attribute indicating the atom type.
Returns#
- bool
True if at least one edge connects a hydrogen atom (‘H’) to a heavy atom (element ≠ ‘H’). False otherwise.
- synkit.Graph.Hyrogen._misc.has_XH(G: Graph) bool[source]#
Check whether the graph contains any heavy atom–hydrogen bond.
A heavy atom is any atom whose ‘element’ attribute is not ‘H’. This function searches for any edge that connects a heavy atom to a hydrogen atom.
Parameters#
- Gnx.Graph
A graph where each node has an ‘element’ attribute indicating the atom type.
Returns#
- bool
True if at least one edge connects a hydrogen atom (‘H’) to a heavy atom (element ≠ ‘H’). False otherwise.
- synkit.Graph.Hyrogen._misc.implicit_hydrogen(graph: Graph, preserve_atom_maps: Set[int], reindex: bool = False) Graph[source]#
Adds implicit hydrogens to a molecular graph and removes non-preserved hydrogens. This function operates on a deep copy of the input graph to avoid in-place modifications. It counts hydrogen neighbors for each non- hydrogen node and adjusts based on hydrogens that need to be preserved. Non-preserved hydrogen nodes are removed from the graph.
Parameters: - graph (nx.Graph): A NetworkX graph representing the molecule, where each node has an ‘element’
attribute for the element type (e.g., ‘C’, ‘H’) and an ‘atom_map’ attribute for atom mapping.
preserve_atom_maps (Set[int]): Set of atom map numbers for hydrogens that should be preserved.
reindex (bool): If true, reindexes node indices and atom maps sequentially after modifications.
Returns: - nx.Graph: A new NetworkX graph with updated hydrogen atoms, where non-preserved hydrogens
have been removed and hydrogen counts adjusted for non-hydrogen atoms.
- synkit.Graph.Hyrogen._misc.normalize_edge_orders(G: Graph) None[source]#
- In-place normalize all edge attributes in G:
If ‘order’ is a float or int, replace it with (order, order).
If ‘standard_order’ is missing, set it to 0.0.
- synkit.Graph.Hyrogen._misc.normalize_h_pair_graph(rc_graph: Graph, inplace: bool = False) Graph[source]#
Normalize paired hydrogen counts for all ITS nodes.
New-style ITS nodes may store
hcountdirectly as(reactant_hcount, product_hcount). Legacy ITS nodes may instead store hydrogen counts insidetypesGH:typesGH = (reactant_attr, product_attr)where each side tuple has the form
(element, aromatic, hydrogen_count, charge, neighbors). Both representations are normalized when present.- Parameters:
rc_graph (nx.Graph) – Reaction-center graph.
inplace (bool) – Whether to modify the input graph in place.
- Returns:
Graph with normalized
typesGHhydrogen fields.- Return type:
nx.Graph
- synkit.Graph.Hyrogen._misc.standardize_hydrogen(G: Graph, in_place: bool = False) Graph[source]#
For each node, shift the third element (index 2) of each tuple in ‘typesGH’ so that the minimum among those values becomes zero. Nonconforming entries are preserved.
- class synkit.Graph.Hyrogen.hcomplete.HComplete[source]#
Bases:
objectA class for infering hydrogen to complete reaction center or ITS graph.
- static add_hydrogen_nodes_multiple(react_graph: Graph, prod_graph: Graph, ignore_aromaticity: bool, balance_its: bool, get_priority_graph: bool = False) List[Tuple[Graph, Graph]][source]#
Generates multiple permutations of reactant and product graphs by adjusting hydrogen counts, exploring all possible configurations of hydrogen node additions or removals.
Parameters: - react_graph (nx.Graph): The reactant graph. - prod_graph (nx.Graph): The product graph. - ignore_aromaticity (bool): If True, aromaticity is ignored. - balance_its (bool): If True, attempts to balance the ITS by adjusting hydrogen nodes. - get_priority_graph (bool): If True, additional priority-based processing is applied to select optimal graph configurations.
Returns: - List[Tuple[nx.Graph, nx.Graph]]: A list of graph tuples, each representing a possible configuration of reactant and product graphs with adjusted hydrogen nodes.
- static add_hydrogen_nodes_multiple_utils(graph: Graph, node_id_pairs: Iterable[Tuple[int, int]], atom_map_update: bool = True) Graph[source]#
Creates and returns a new graph with added hydrogen nodes based on the input graph and node ID pairs.
Parameters: - graph (nx.Graph): The base graph to which the nodes will be added. - node_id_pairs (Iterable[Tuple[int, int]]): Pairs of node IDs (original node, new hydrogen node) to link with hydrogen. - atom_map_update (bool): If True, update the ‘atom_map’ attribute with the new hydrogen node ID; otherwise, retain the original node’s ‘atom_map’.
Returns: - nx.Graph: A new graph instance with the added hydrogen nodes.
- process_graph_data_parallel(graph_data_list: List[Dict[str, Graph]], its_key: str = 'ITS', rc_key: str = 'RC', n_jobs: int = 1, verbose: int = 0, ignore_aromaticity: bool = False, balance_its: bool = True, get_priority_graph: bool = False, max_hydrogen: int = 7) List[Dict[str, Graph | None]][source]#
Processes a list of graph data dictionaries in parallel to optimize the hydrogen completion and other graph modifications.
Parameters: - graph_data_list (List[Dict[str, nx.Graph]]): List of dictionaries containing the graph data. - its_key (str): Key where the ITS graph is stored. - rc_key (str): Key where the RC graph is stored. - n_jobs (int): Number of parallel jobs to run. - verbose (int): Verbosity level for the parallel process. - ignore_aromaticity (bool): If True, aromaticity is ignored during processing. Default is False. - balance_its (bool): If True, the ITS is balanced. Default is True. - get_priority_graph (bool): If True, priority is given to graph data during processing. Default is False. - max_hydrogen (int): Maximum number of hydrogens that can be handled in the inference step.
Returns: - List[Dict[str, Optional[nx.Graph]]]: List of dictionaries with updated ITS and RC graph data, or None if processing fails.
- static process_multiple_hydrogens(graph_data: Dict[str, Graph], its_key: str, rc_key: str, react_graph: Graph, prod_graph: Graph, ignore_aromaticity: bool, balance_its: bool, get_priority_graph: bool = False) Dict[str, Graph | None][source]#
Handles significant hydrogen count changes between reactant and product graphs, adjusting hydrogen nodes accordingly and assessing graph equivalence.
Parameters: - graph_data (Dict[str, nx.Graph]): Dictionary containing the graph data. - its_key (str): Key for the ITS graph in the dictionary. - rc_key (str): Key for the RC graph in the dictionary. - react_graph (nx.Graph): Graph representing the reactants. - prod_graph (nx.Graph): Graph representing the products. - ignore_aromaticity (bool): If True, aromaticity will not be considered in processing. - balance_its (bool): If True, balances the ITS graph. - get_priority_graph (bool): If True, processes graphs with priority considerations.
Returns: - Dict[str, Optional[nx.Graph]]: Updated graph dictionary with potentially modified ITS and RC graphs.
- static process_single_graph_data(graph_data: Dict[str, Graph], its_key: str = 'ITS', rc_key: str = 'RC', ignore_aromaticity: bool = False, balance_its: bool = True, get_priority_graph: bool = False, max_hydrogen: int = 7) Dict[str, Graph | None][source]#
Processes a single graph data dictionary by modifying hydrogen counts and other features based on configuration settings.
Parameters: - graph_data (Dict[str, nx.Graph]): Dictionary containing the graph data. - its_key (str): Key where the ITS graph is stored. - rc_key (str): Key where the RC graph is stored. - ignore_aromaticity (bool): If True, aromaticity is ignored during processing. Default is False. - balance_its (bool): If True, the ITS is balanced. Default is True. - get_priority_graph (bool): If True, priority is given to graph data during processing. Default is False. - max_hydrogen (int): Maximum number of hydrogens that can be handled in the inference step.
Returns: - Dict[str, Optional[nx.Graph]]: Dictionary with updated ITS and RC graph data, or None if processing fails.
- class synkit.Graph.Hyrogen.hextend.HExtend[source]#
Bases:
HComplete- static fit(data, its_key: str, rc_key: str, ignore_aromaticity: bool = False, balance_its: bool = True, n_jobs: int = 1, verbose: int = 0) List[source]#
Fit the model to the data in parallel, processing each entry to generate new graph data based on the ITS and reaction graph keys.
Parameters: - data (iterable): Data to be processed. - its_key (str): Key for the ITS graphs in the data. - rc_key (str): Key for the reaction graphs in the data. - ignore_aromaticity (bool): Whether to ignore aromaticity during processing. Default to False. - balance_its (bool): Whether to balance the ITS during processing. Default to True. - n_jobs (int): Number of jobs to run in parallel. Default to 1. - verbose (int): Verbosity level for parallel processing. Default to 0.
Returns: - List: A list containing the results of the processed data.
- static get_unique_graphs_for_clusters(graphs: List[Graph], cluster_indices: List[set]) List[Graph][source]#
Retrieve a unique graph for each cluster from a list of graphs based on cluster indices.
This method selects one graph per cluster based on the first index found in each cluster set. Note: Clusters are expected to be represented as sets of indices, each corresponding to a graph in the graphs list.
Parameters: - graphs (List[nx.Graph]): List of networkx graphs. - cluster_indices (List[set]): List of sets, each containing indices representing graphs that belong to the same cluster.
Returns: - List[nx.Graph]: A list containing one unique graph from each cluster. The graph chosen is the one corresponding to the first index in each cluster set, which is arbitrary due to the unordered nature of sets.
Raises: - ValueError: If any index in cluster_indices is out of the range of graphs. - TypeError: If cluster_indices is not a list of sets.
ITS#
- class synkit.Graph.ITS.its_builder.ITSBuilder[source]#
Bases:
objectBuild and annotate an Imaginary Transition State (ITS) graph from a base graph and a reaction-center (RC) graph.
- Variables:
None – This class only provides static methods and does not maintain state.
- static ITSGraph(G: Graph, RC: Graph) Graph[source]#
Create an ITS graph by merging attributes from a reaction-center graph (RC) into a copy of the base graph G and initializing transition- state metadata.
- The returned ITS graph will have:
A deep copy of G’s nodes and edges.
A new node attribute ‘typesGH’ storing G‑side and H‑side element/aromaticity/etc.
Edge attributes: - ‘order’: tuple of the original order replicated for G and H. - ‘standard_order’: initialized to 0.0.
All node and edge attributes from RC grafted onto corresponding nodes/edges in the copy of G, matched by RC’s ‘atom_map’ values.
A final renumbering of ‘atom_map’ to each node’s index.
- Parameters:
G (nx.Graph) – The original molecular graph representing either reactants or products.
RC (nx.Graph) – The reaction-center graph containing updated atom and bond changes.
- Returns:
A new graph representing the ITS, with merged and initialized attributes.
- Return type:
nx.Graph
- Raises:
KeyError – If a required attribute is missing from G or RC during merging.
- Example:
>>> from synkit.Graph.ITS.its_construction import ITSConstruction >>> base = nx.Graph() >>> # ... populate base with 'atom_map' and other attrs ... >>> rc = ITSConstruction().ITSGraph(base, some_other_graph) >>> its = ITSBuilder.ITSGraph(base, rc) >>> isinstance(its, nx.Graph) True
- static update_atom_map(graph: Graph) None[source]#
Reset and renumber the ‘atom_map’ attribute of every node to match its node index.
- Parameters:
graph (nx.Graph) – The graph whose nodes will be renumbered.
- Returns:
None
- Return type:
NoneType
- Example:
>>> G = nx.Graph() >>> G.add_node(5) >>> ITSBuilder.update_atom_map(G) >>> G.nodes[5]['atom_map'] 5
- class synkit.Graph.ITS.its_construction.ITSConstruction[source]#
Bases:
objectUtility class for constructing an ITS graph from two input graphs.
Nodes store paired state information through the
typesGHattribute. Edges store direct paired attributes such asorder=(g, h)without an edge-leveltypesGH.The main public entry point is
construct().- CORE_EDGE_DEFAULTS: Dict[str, Any] = {'bond_type': '', 'conjugated': False, 'ez_isomer': '', 'in_ring': False, 'kekule_order': 0.0, 'order': 0.0, 'pi_order': 0.0, 'sigma_order': 0.0}#
- CORE_NODE_DEFAULTS: Dict[str, Any] = {'aromatic': False, 'atom_map': 0, 'charge': 0, 'element': '*', 'hcount': 0, 'hybridization': '', 'lone_pairs': 0, 'neighbors': <function ITSConstruction.<lambda>>, 'partial_charge': 0, 'radical': 0, 'valence_electrons': 0}#
- static ITSGraph(G: Graph, H: Graph, ignore_aromaticity: bool = False, attributes_defaults: Dict[str, Any] | None = None, balance_its: bool = False, store: bool = False) Graph[source]#
Backward-compatible wrapper around
construct().- Parameters:
G (nx.Graph) – First input graph.
H (nx.Graph) – Second input graph.
ignore_aromaticity (bool) – If
True, small bond-order differences are ignored.attributes_defaults (Optional[Dict[str, Any]]) – Optional node defaults for missing values.
balance_its (bool) – If
True, prefer the smaller graph as base.store (bool) – If
True, node attributes are stored as paired tuples.
- Returns:
Constructed ITS graph using legacy node and edge attribute defaults.
- Return type:
nx.Graph
- static construct(G: Graph, H: Graph, *, ignore_aromaticity: bool = False, balance_its: bool = True, store: bool = True, node_attrs: List[str] | None = None, edge_attrs: List[str] | None = None, attributes_defaults: Dict[str, Any] | None = None) Graph[source]#
Construct an ITS graph from two input graphs.
Nodes store
typesGHas paired tuples overnode_attrs. Requested edge attributes are stored directly as paired values such asorder=(g, h)andbond_type=(g, h). No edge-leveltypesGHis created.- Parameters:
G (nx.Graph) – First input graph, typically the reactant-side graph.
H (nx.Graph) – Second input graph, typically the product-side graph.
ignore_aromaticity (bool) – If
True, bond-order differences with absolute value smaller than1are treated as zero when computingstandard_order.balance_its (bool) – If
True, initialize from the smaller graph; otherwise from the larger.store (bool) – Controls node attribute storage only. If
True, node attributes are stored as(G, H)tuples. IfFalse, only theG-side value is stored. Edge attributes are always stored as paired tuples.node_attrs (Optional[List[str]]) – Ordered list of node attributes included in node-level
typesGH.edge_attrs (Optional[List[str]]) – Ordered list of edge attributes stored directly as
(G, H)tuples.attributes_defaults (Optional[Dict[str, Any]]) – Optional overrides for node attribute defaults.
- Returns:
ITS graph with merged nodes, paired node/edge annotations, and derived
standard_order.- Return type:
nx.Graph
Example#
node_attrs = [ "element", "aromatic", "hcount", "charge", "neighbors", "hybridization", "atom_map", "lone_pairs", ] edge_attrs = [ "kekule_order", "order", "bond_type", "conjugated", "in_ring", ] its = ITSConstruction.construct( r_graph, p_graph, node_attrs=node_attrs, edge_attrs=edge_attrs, store=True, ) print(its.edges[12, 30]["order"]) print(its.edges[12, 30]["bond_type"]) print(its.edges[12, 30]["standard_order"])
- static get_node_attribute(graph: Graph, node: Hashable, attribute: str, default: Any) Any[source]#
Retrieve a node attribute or return a default if missing.
- Parameters:
graph (nx.Graph) – Input graph.
node (Hashable) – Node identifier.
attribute (str) – Attribute name.
default (Any) – Fallback value.
- Returns:
Stored node attribute or fallback default.
- Return type:
Any
- static get_node_attributes_with_defaults(graph: Graph, node: Hashable, attributes_defaults: Dict[str, Any] = None) Tuple[source]#
Retrieve multiple node attributes using provided defaults.
- Parameters:
graph (nx.Graph) – Input graph.
node (Hashable) – Node identifier.
attributes_defaults (Optional[Dict[str, Any]]) – Mapping from attribute names to fallback values.
- Returns:
Tuple of node attributes in mapping order.
- Return type:
Tuple
Example#
attrs = ITSConstruction.get_node_attributes_with_defaults( graph=G, node=1, attributes_defaults={ "element": "*", "aromatic": False, "hcount": 0, "charge": 0, "neighbors": ["", ""], }, )
- static typesGH_info(node_attrs: List[str] | None = None, edge_attrs: List[str] | None = None) Dict[str, Dict[str, Tuple[type, Any]]][source]#
Provide expected types and defaults for node and edge attributes.
- Parameters:
- Returns:
Nested mapping describing
(type, default)for each selected attribute.- Return type:
- synkit.Graph.ITS.its_decompose.get_rc(ITS: Graph, element_key: List[str] = ['element', 'charge', 'typesGH', 'atom_map'], bond_key: str = 'order', standard_key: str = 'standard_order', disconnected: bool = False, keep_mtg: bool = False) Graph[source]#
Extract the reaction-center (RC) subgraph from an ITS graph.
- synkit.Graph.ITS.its_decompose.its_decompose(its_graph: Graph, nodes_share='typesGH', edges_share='order')[source]#
Decompose an ITS graph into two separate reactant (G) and product (H) graphs.
- Nodes and edges in its_graph carry composite attributes:
Each node has its_graph.nodes[nodes_share] = (node_attrs_G, node_attrs_H).
Each edge has its_graph.edges[edges_share] = (order_G, order_H).
This function splits those tuples to reconstruct the original G and H graphs.
- Parameters:
- Returns:
A tuple of two graphs (G, H) reconstructed from the ITS.
- Return type:
Tuple[nx.Graph, nx.Graph]
- Example:
>>> its = nx.Graph() >>> # ... set its.nodes[n]['typesGH'] and its.edges[e]['order'] ... >>> G, H = its_decompose(its) >>> isinstance(G, nx.Graph) and isinstance(H, nx.Graph) True
- class synkit.Graph.ITS.its_destruction.ITSDestruction(its_graph: Graph, node_attrs: List[str] | None = None, edge_share: str = 'order', edge_attrs: List[str] | None = None, clean_wildcard: bool = False)[source]#
Bases:
objectObject-oriented helper to decompose an ITS graph back into its reactant (G) and product (H) graphs given the enhanced per-attribute tuple representation.
Node attributes such as ‘element’, ‘charge’, ‘hcount’, ‘aromatic’, and ‘atom_map’ are expected to be stored either directly on the node as (before, after) tuples (e.g., data[“element”] == (“C”, “C”)) or inside data[“typesGH”] as a dict mapping each attribute to such a tuple. Edges carry a tuple under edge_share (default “order”) like (“order”: (order_G, order_H)).
- Example usage:
destr = ITSDestruction(its_graph, clean_wildcard=True) G = destr.G H = destr.H
- Parameters:
its_graph (nx.Graph) – ITS graph with merged node/edge annotations.
node_attrs (list[str] or None) – Names of node attributes to extract for decomposition. Defaults to [“element”, “charge”, “hcount”, “aromatic”, “atom_map”].
edge_share (str) – Edge attribute key storing the (G, H) tuple (typically “order”).
clean_wildcard – If True, automatically remove wildcard nodes (element == “*”)
from G and H after decomposition. :type clean_wildcard: bool
- property G: Graph#
Reactant-like graph reconstructed from the ITS.
- Returns:
Graph corresponding to the ‘before’ side.
- Return type:
nx.Graph
- property H: Graph#
Product-like graph reconstructed from the ITS.
- Returns:
Graph corresponding to the ‘after’ side.
- Return type:
nx.Graph
- decompose() Tuple[Graph, Graph][source]#
Explicitly trigger decomposition and return (G, H).
- Returns:
Tuple of reconstructed graphs (G, H).
- Return type:
Tuple[nx.Graph, nx.Graph]
- help() str[source]#
Return a human-readable summary of this decomposer’s purpose and usage.
- Returns:
Description of how to use the decomposer.
- Return type:
- class synkit.Graph.ITS.its_expand.ITSExpand[source]#
Bases:
objectPartially expand a reaction SMILES (RSMI) by reconstructing intermediate transition states (ITS) and applying transformation rules based on the reaction center graph.
This class identifies the reaction center from an RSMI, builds and reconstructs the ITS graph, decomposes it back into reactants and products, and standardizes atom mappings to produce a fully mapped AAM RSMI.
The optional
preserve_older_mapmode keeps existing atom-map numbers from the input RSMI by reindexing the side graph before ITS reconstruction.Notes#
preserve_older_map=Trueis intended for the ITS expansion path only. It should not be combined withrelabel=True, becauseITSRelabelglobally renumbers atom maps.- cvar std:
Standardize instance for reaction SMILES standardization.
- type std:
Standardize
- static expand_aam_with_its(rsmi: str, relabel: bool = False, use_G: bool = True, preserve_older_map: bool = False) str[source]#
Expand a partial reaction SMILES to a full AAM RSMI using ITS reconstruction.
- Parameters:
rsmi (str) – Reaction SMILES string in the format
reactant>>product.relabel (bool) – If True, directly apply
ITSRelabel().fit(rsmi). This globally renumbers atom maps.use_G (bool) – If True, expand using the reactant side. If False, expand using the product side.
preserve_older_map (bool) – If True, preserve existing nonzero atom-map numbers by reindexing the side graph before ITS reconstruction. This keeps old maps such as
:20attached to the same atom. This option is incompatible withrelabel=True.
- Returns:
Fully atom-mapped reaction SMILES after ITS expansion and standardization.
- Return type:
- Raises:
ValueError – If input RSMI format is invalid, if incompatible options are used, or if side-graph reindexing is unsafe.
- Example:
>>> expander = ITSExpand() >>> expander.expand_aam_with_its( ... "CC[CH2:3][Cl:1].[N:2]>>CC[CH2:3][N:2].[Cl:1]", ... preserve_older_map=True, ... ) '[CH3:1][CH2:2][CH2:3][Cl:4].[N:5]>>[CH3:1][CH2:2][CH2:3][N:5].[Cl:4]'
- static reindex_side_graph_by_atom_map(graph)[source]#
Reindex a side graph so mapped atoms use
atom_mapas node ID.The returned graph keeps node IDs contiguous from
1..N.This is useful because the reaction-center graph produced by
ITSConstruction().ITSGraph(...)uses atom-map numbers as node IDs, whereas the side graph produced bysmiles_to_graph(...)may use RDKit-style atom indices as node IDs.Example#
Before reindexing:
Node 20: atom_map = 0 Node 27: atom_map = 20
After reindexing:
Node 20: atom_map = 20 Node 27: atom_map = 0
or another unmapped atom may be moved into the freed node position.
- param graph:
Molecular side graph.
- type graph:
networkx.Graph
- returns:
Reindexed side graph with contiguous node IDs.
- rtype:
networkx.Graph
- raises ValueError:
If atom-map numbers cannot be safely used as node IDs while preserving
1..Nindexing.
- class synkit.Graph.ITS.its_relabel.ITSRelabel[source]#
Bases:
objectExtend reaction SMILES through atom-map alignment between reactant and product SynGraphs.
- Variables:
logger – Logger instance for debug and info messages.
graph_to_mol – Converter from SynGraph to RDKit Mol.
- fit(rsmi: str) str[source]#
Generate an extended reaction SMILES by aligning atom maps of reactant and product.
- Parameters:
rsmi (str) – Reaction SMILES string formatted as ‘reactant>>product’.
- Returns:
Extended reaction SMILES after remapping.
- Return type:
- Raises:
ValueError – If input format is invalid or graphs are not isomorphic.
- Example:
>>> its = ITSRelabel() >>> its.fit('CCO:1>>CC=O:1') 'CCO>>CC=O'
- class synkit.Graph.ITS.normalize_aam.NormalizeAAM[source]#
Bases:
objectProvides functionalities to normalize atom mappings in SMILES representations, extract and process reaction centers from ITS graphs, and convert between graph representations and molecular models.
- static extract_subgraph(graph: Graph, indices: List[int]) Graph[source]#
Extracts a subgraph from a given graph based on a list of node indices.
Parameters: graph (nx.Graph): The original graph from which to extract the subgraph. indices (List[int]): A list of node indices that define the subgraph.
Returns: nx.Graph: The extracted subgraph.
- fit(rsmi: str, fix_aam_indice: bool = True) str[source]#
Processes a reaction SMILES (RSMI) to adjust atom mappings, extract reaction centers, decompose into separate reactant and product graphs, and generate the corresponding SMILES.
Parameters: - rsmi (str): The reaction SMILES string to be processed. - fix_aam_indice (bool): Whether to fix the atom mapping numbers. Defaults to True.
Returns: str: The resulting reaction SMILES string with updated atom mappings.
- static fix_kekulize(smiles: str) str[source]#
Filters and returns valid SMILES strings from a string of SMILES, joined by ‘.’.
This function processes a string of SMILES separated by periods (e.g., “CCO.CC=O”), filters out invalid SMILES, and returns a string of valid SMILES joined by periods.
Parameters: - smiles (str): A string containing SMILES strings separated by periods (‘.’).
Returns: - str: A string of valid SMILES, joined by periods (‘.’).
- static fix_rsmi_kekulize(rsmi: str) str[source]#
Filters the reactants and products of a reaction SMILES string.
Parameters: - rsmi (str): A string representing the reaction SMILES in the form of “reactants >> products”.
Returns: - str: A filtered reaction SMILES string where invalid reactants/products are removed.
- reset_indices_and_atom_map(subgraph: Graph, aam_key: str = 'atom_map') Graph[source]#
Resets the node indices and the atom_map of the subgraph to be continuous from 1 onwards.
Parameters: subgraph (nx.Graph): The subgraph with possibly non-continuous indices. aam_key (str): The attribute key for atom mapping. Defaults to ‘atom_map’.
Returns: nx.Graph: A new subgraph with continuous indices and adjusted atom_map.
- class synkit.Graph.ITS.partial_its.PartialITS[source]#
Bases:
objectUtility class for building partial Imaginary‑Transition‑State (ITS) graphs from a pair of reactant/product networkx graphs.
The resulting ITS graph contains
a union of nodes from G (reactant) and H (product),
a per‑node attribute
typesGH– a 2‑tuple(attrs_from_G, attrs_from_H)– where missing sides are filled by the present one,edges categorised as unchanged, broken or formed and stored as an
ordertuple(o_G, o_H), anda convenience edge attribute
standard_order = o_G - o_H(optionally zeroed when |Δ| < 1 to ignore aromaticity changes).
- static balance_valences(graph: Graph) Graph[source]#
Balances valences in a NetworkX graph by adding wildcard ‘*’ nodes for atoms that have missing bonds according to their broken bonds and hydrogen counts.
- Parameters:
graph (nx.Graph) – NetworkX Graph with node attributes: - element: str, chemical symbol - charge: int, formal charge - typesGH: tuple of descriptors (element, aromatic, hcount, h_change, connections) - atom_map: int, unique identifier (node key)
- Returns:
Modified graph with wildcard nodes added
- Return type:
nx.Graph
- static construct(G: Graph, H: Graph, *, ignore_aromaticity: bool = False, attributes_defaults: Dict[str, Any] | None = None, balance: bool = True) Graph[source]#
Return a partial ITS graph for G → H.
- Parameters:
G – reactant graph.
H – product graph.
ignore_aromaticity – if True, set
standard_orderto 0 when |Δ|<1.attributes_defaults – mapping of attribute → default value used for the
typesGHtuples. If None, a small sensible default set is used.
- Returns:
an ITS graph with nodes,
typesGHtuples and annotated edges.
Mechanistic and Lewis-state utilities#
- synkit.Graph.Mech.conversion.arrow_atom_maps(arrow_code: str) set[int][source]#
Return all atom maps used in an arrow code.
- synkit.Graph.Mech.conversion.atom_map_to_nodes(its) dict[int, list[Any]][source]#
Build atom-map-number -> list of ITS node ids.
This catches ambiguous duplicated atom maps after ITS construction.
- Parameters:
its (networkx.Graph) – ITS graph.
- Returns:
Mapping from atom-map number to ITS node IDs.
- Return type:
- synkit.Graph.Mech.conversion.bond_minus_type(reactant_order: float) str[source]#
Type consumed bond/electron-pair source.
Rules#
reactant_order == 1.0 -> Sigma- reactant_order > 1.0 -> Pi-
includes double, triple, aromatic 1.5
unknown -> B-
- param reactant_order:
Bond order on the reactant side.
- type reactant_order:
float
- returns:
Typed consumed-bond label.
- rtype:
str
- synkit.Graph.Mech.conversion.bond_plus_type(reactant_order: float, product_order: float) str[source]#
Type formed/increased bond destination.
Rules#
0 -> 1 : Sigma+ 0 -> 1.5 : Sigma+, because new connectivity starts as sigma 0 -> 2 : Sigma+, because new connectivity starts as sigma 1 -> 2 : Pi+ 1.5 -> 2 : Pi+ 2 -> 3 : Pi+
- param reactant_order:
Bond order on the reactant side.
- type reactant_order:
float
- param product_order:
Bond order on the product side.
- type product_order:
float
- returns:
Typed formed-bond label.
- rtype:
str
- synkit.Graph.Mech.conversion.build_its_from_rsmi(rsmi: str, arrow_code: str, expand_aam: bool = True, remove_non_arrow_maps: bool = True)[source]#
Build SynKit ITS graph from reaction SMILES.
Pipeline#
- raw SMIRKS
-> validate arrow atom maps -> remove non-arrow atom maps -> CanonRSMI().expand_aam(…) -> rsmi_to_its(…)
- param rsmi:
Reaction SMILES in
reactants>>productsformat.- type rsmi:
str
- param arrow_code:
Arrow code used to preserve relevant atom maps.
- type arrow_code:
str
- param expand_aam:
Whether to expand atom mapping before ITS construction.
- type expand_aam:
bool
- param remove_non_arrow_maps:
Whether to remove atom maps not used by the arrow code.
- type remove_non_arrow_maps:
bool
- returns:
ITS graph, expanded RSMI, cleaned RSMI, and validation diagnostics.
- rtype:
tuple
- raises ImportError:
If required SynKit conversion helpers are unavailable.
- synkit.Graph.Mech.conversion.check_arrow_code_coverage(arrow_codes: list[str]) dict[str, Any][source]#
Check which arrow-code shapes appear in a dataset.
- synkit.Graph.Mech.conversion.check_typed_conversion_quality(results: list[dict[str, Any]]) dict[str, Any][source]#
Check whether typed conversions still contain generic B-/B+ labels.
- synkit.Graph.Mech.conversion.classify_arrow_shape(step: str) str[source]#
Classify one arrow-code step.
- synkit.Graph.Mech.conversion.convert_arrow_code(arrow_code: str, its=None, strict_bond_lookup: bool = True) dict[str, Any][source]#
Convert arrow code into generic and typed formats.
If
itsisNone,typed_convertedisNone.- Parameters:
arrow_code (str) – Semicolon-separated arrow code.
its (Optional[networkx.Graph]) – Optional ITS graph for typed conversion.
strict_bond_lookup (bool) – Whether missing typed bond lookups should raise.
- Returns:
Arrow code with generic and optional typed conversions.
- Return type:
- synkit.Graph.Mech.conversion.convert_reaction_arrow(reaction_smiles: str, arrow_code: str, orbital_class: str | None = None, expand_aam: bool = True, remove_non_arrow_maps: bool = True, strict_bond_lookup: bool = True) dict[str, Any][source]#
Complete wrapper.
- reaction SMILES + arrow code
-> clean non-arrow maps -> expand AAM with SynKit -> ITS graph -> generic converted -> typed converted
orbital_class is stored as metadata only. It is not used to force Sigma/Pi typing.
- Parameters:
reaction_smiles (str) – Reaction SMILES in
reactants>>productsformat.arrow_code (str) – Semicolon-separated arrow code.
orbital_class (Optional[str]) – Optional source-dataset orbital classification metadata.
expand_aam (bool) – Whether to expand atom mapping before ITS construction.
remove_non_arrow_maps (bool) – Whether to remove atom maps not used by the arrow code.
strict_bond_lookup (bool) – Whether missing typed bond lookups should raise.
- Returns:
Conversion result and ITS preparation metadata.
- Return type:
- synkit.Graph.Mech.conversion.convert_record(record: dict[str, Any], reaction_key: str = 'SMIRKS', arrow_key: str = 'arrow_code', orbital_key: str = 'orbital pair classification', expand_aam: bool = True, remove_non_arrow_maps: bool = True, strict_bond_lookup: bool = True) dict[str, Any][source]#
Convert one dictionary record.
Expected input keys#
- {
“SMIRKS”: “…>>…”, “arrow_code”: “…”, “orbital pair classification”: “pi_empty”
}
- param record:
Source record to convert.
- type record:
dict[str, Any]
- param reaction_key:
Key containing reaction SMILES.
- type reaction_key:
str
- param arrow_key:
Key containing arrow code.
- type arrow_key:
str
- param orbital_key:
Key containing optional orbital classification metadata.
- type orbital_key:
str
- param expand_aam:
Whether to expand atom mapping before ITS construction.
- type expand_aam:
bool
- param remove_non_arrow_maps:
Whether to remove atom maps not used by the arrow code.
- type remove_non_arrow_maps:
bool
- param strict_bond_lookup:
Whether missing typed bond lookups should raise.
- type strict_bond_lookup:
bool
- returns:
Converted record with original metadata preserved.
- rtype:
dict[str, Any]
- synkit.Graph.Mech.conversion.convert_records(records: list[dict[str, Any]], reaction_key: str = 'SMIRKS', arrow_key: str = 'arrow_code', orbital_key: str = 'orbital pair classification', expand_aam: bool = True, remove_non_arrow_maps: bool = True, strict_bond_lookup: bool = True, keep_errors: bool = False) list[dict[str, Any]][source]#
Batch conversion.
- keep_errors=False:
raise immediately on first error.
- keep_errors=True:
collect errors into result dictionaries.
- Parameters:
reaction_key (str) – Key containing reaction SMILES.
arrow_key (str) – Key containing arrow code.
orbital_key (str) – Key containing optional orbital classification metadata.
expand_aam (bool) – Whether to expand atom mapping before ITS construction.
remove_non_arrow_maps (bool) – Whether to remove atom maps not used by the arrow code.
strict_bond_lookup (bool) – Whether missing typed bond lookups should raise.
keep_errors (bool) – Whether to collect conversion failures instead of raising.
- Returns:
Converted records, including failures when
keep_errorsis true.- Return type:
- synkit.Graph.Mech.conversion.debug_arrow_bond_orders(reaction_smiles: str, arrow_code: str, expand_aam: bool = True, remove_non_arrow_maps: bool = True, strict_bond_lookup: bool = True) None[source]#
Print the ITS bond orders used by each arrow step.
- Parameters:
reaction_smiles (str) – Reaction SMILES in
reactants>>productsformat.arrow_code (str) – Semicolon-separated arrow code.
expand_aam (bool) – Whether to expand atom mapping before ITS construction.
remove_non_arrow_maps (bool) – Whether to remove atom maps not used by the arrow code.
strict_bond_lookup (bool) – Whether missing typed bond lookups should raise.
- Returns:
None.- Return type:
None
- synkit.Graph.Mech.conversion.debug_record(record: dict[str, Any], reaction_key: str = 'SMIRKS', arrow_key: str = 'arrow_code', orbital_key: str = 'orbital pair classification') dict[str, Any][source]#
Print full debug output for one record.
- synkit.Graph.Mech.conversion.duplicate_atom_maps_in_side(smiles: str) dict[int, int][source]#
Find duplicated atom maps in one side of a reaction.
- synkit.Graph.Mech.conversion.extract_atom_maps_from_smiles(smiles: str) list[int][source]#
Extract atom-map numbers from bracket atoms.
For example,
"[CH:10]"yields10and"[N+:61]"yields61.
- synkit.Graph.Mech.conversion.extract_order_from_edge_data(edge_data: Any) tuple[float, float][source]#
Extract SynKit ITS edge order.
- Expected normal edge format:
{“order”: (reactant_order, product_order)}
- MultiGraph-like fallback:
{0: {“order”: (reactant_order, product_order)}}
- synkit.Graph.Mech.conversion.generic_convert_arrow_code(arrow_code: str) list[list[Any]][source]#
Convert every step in an arrow code to generic LP/B form.
- synkit.Graph.Mech.conversion.generic_convert_step(step: str) list[Any][source]#
Generic graph-independent conversion.
Supported grammar#
- a=b
LP(a) forms bond a-b -> [“LP-/B+”, [a], [a, b]]
- a=b,c
LP(a) forms/increases bond b-c -> [“LP-/B+”, [a], [b, c]]
- a,b=c
bond a-b breaks; electrons end as LP on c -> [“B-/LP+”, [a, b], [c]]
- a,b=c,d
bond a-b becomes bond c-d -> [“B-/B+”, [a, b], [c, d]]
- param step:
One arrow-code step.
- type step:
str
- returns:
Generic LP/B conversion record.
- rtype:
list[Any]
- raises ValueError:
If the step shape is unsupported.
- synkit.Graph.Mech.conversion.get_its_bond_order(its, atom_a: int, atom_b: int, strict: bool = True, context: str = '', atom_map_nodes: dict[int, list[Any]] | None = None) tuple[float, float][source]#
Return ITS bond order for atom-map pair.
For example, an edge with order
(0.0, 1.0)represents new bond formation from reactants to products.- Parameters:
its (networkx.Graph) – ITS graph.
atom_a (int) – First atom-map number.
atom_b (int) – Second atom-map number.
strict (bool) – Whether missing nodes or edges should raise.
context (str) – Optional context appended to strict-mode edge errors.
atom_map_nodes (Optional[dict[int, list[Any]]]) – Optional precomputed atom-map to node index.
- Returns:
Reactant-side and product-side bond orders.
- Return type:
- Raises:
ValueError – If strict lookup fails.
- synkit.Graph.Mech.conversion.get_unique_node_for_atom_map(its, atom_map: int, strict: bool = True, atom_map_nodes: dict[int, list[Any]] | None = None) Any | None[source]#
Get the unique ITS node corresponding to an atom map.
- Parameters:
- Returns:
Unique ITS node ID, or
Nonewhen missing andstrictis false.- Return type:
Optional[Any]
- Raises:
ValueError – If the atom map is missing in strict mode or is ambiguous.
- synkit.Graph.Mech.conversion.is_one(x: float, tol: float = 1e-06) bool[source]#
Return whether a value is approximately one.
- synkit.Graph.Mech.conversion.is_zero(x: float, tol: float = 1e-06) bool[source]#
Return whether a value is approximately zero.
- synkit.Graph.Mech.conversion.parse_arrow_step(step: str) tuple[list[int], list[int]][source]#
Convert one arrow-code step.
For example,
"10=20"becomes([10], [20])and"12=11,12"becomes([12], [11, 12]).
- synkit.Graph.Mech.conversion.parse_atom_list(text: str) list[int][source]#
Parse a comma-separated atom-map list.
- synkit.Graph.Mech.conversion.remove_non_arrow_atom_maps(rsmi: str, arrow_code: str) str[source]#
Keep only atom maps involved in arrow_code. Remove every other atom map.
This is important because some source SMIRKS have duplicated non-arrow atom maps, e.g.
[N+:61]2=[CH:61]
If 61 is not used by arrow_code, we remove it and let SynKit CanonRSMI().expand_aam(…) generate clean full maps.
- synkit.Graph.Mech.conversion.split_arrow_code(arrow_code: str) list[str][source]#
Split an arrow code into non-empty steps.
- synkit.Graph.Mech.conversion.typed_convert_arrow_code(arrow_code: str, its, strict_bond_lookup: bool = True) list[list[Any]][source]#
Convert every step in an arrow code to typed LP/Sigma/Pi form.
- Parameters:
arrow_code (str) – Semicolon-separated arrow code.
its (networkx.Graph) – ITS graph used for local bond-order lookup.
strict_bond_lookup (bool) – Whether missing bond lookups should raise.
- Returns:
Typed conversion records for each step.
- Return type:
- synkit.Graph.Mech.conversion.typed_convert_step(step: str, its, strict_bond_lookup: bool = True, atom_map_nodes: dict[int, list[Any]] | None = None) list[Any][source]#
Convert one arrow-code step into typed LP/Sigma/Pi format.
Important#
This function does NOT globally force Sigma/Pi from orbital_class. Each step is typed from local ITS bond-order changes.
- param step:
One arrow-code step.
- type step:
str
- param its:
ITS graph used for local bond-order lookup.
- type its:
networkx.Graph
- param strict_bond_lookup:
Whether missing bond lookups should raise.
- type strict_bond_lookup:
bool
- param atom_map_nodes:
Optional precomputed atom-map to node index.
- type atom_map_nodes:
Optional[dict[int, list[Any]]]
- returns:
Typed LP/Sigma/Pi conversion record.
- rtype:
list[Any]
- raises ValueError:
If the step shape is unsupported or strict lookup fails.
- synkit.Graph.Mech.conversion.validate_arrow_maps(rsmi: str, arrow_code: str, raise_on_arrow_duplicates: bool = True, raise_on_missing_arrow_maps: bool = True) dict[str, Any][source]#
Validate atom maps before SynKit.
Rules#
Duplicated atom maps used by arrow_code are fatal.
Missing atom maps used by arrow_code are fatal.
Duplicated non-arrow atom maps are warnings only, because they can be removed before SynKit expansion.
- param rsmi:
Reaction SMILES in
reactants>>productsformat.- type rsmi:
str
- param arrow_code:
Arrow code whose atom maps must be validated.
- type arrow_code:
str
- param raise_on_arrow_duplicates:
Whether duplicated arrow atom maps are fatal.
- type raise_on_arrow_duplicates:
bool
- param raise_on_missing_arrow_maps:
Whether missing arrow atom maps are fatal.
- type raise_on_missing_arrow_maps:
bool
- returns:
Validation diagnostics.
- rtype:
dict[str, Any]
- raises ValueError:
If the RSMI is malformed or enabled validation fails.
- synkit.Graph.Mech.electron_accounting.bond_order_sum(graph: Graph, node: Any) float[source]#
Return the sigma-plus-pi bond-order sum around one node.
- synkit.Graph.Mech.electron_accounting.graph_to_sanitized_kekule_mol(graph: Graph) Mol[source]#
Reconstruct a product from
kekule_orderand let RDKit sanitize it.
- synkit.Graph.Mech.electron_accounting.recompute_charge(graph: Graph, node: Any) int | float[source]#
Recompute formal charge from stored electron-state fields.
- synkit.Graph.Mech.electron_accounting.refresh_electron_fields(graph: Graph, *, in_place: bool = False) Graph[source]#
Refresh derived electron bookkeeping on a molecular graph.
The graph is expected to store scalar
sigma_orderandpi_orderedge fields plus node-level electron state. Presentation-facingorderis not rewritten here; RDKit reconstruction remains responsible for aromatic re-perception at the product boundary.
Matcher#
- class synkit.Graph.Matcher.approx_mcs.ApproxMCSMatcher(node_attrs: List[str] | None = None, node_defaults: List[Any] | None = None, allow_shift: bool = True, *, edge_attrs: List[str] | None = None, prune_wc: bool = False, prune_automorphisms: bool = False, wildcard_element: Any = '*', element_key: str = 'element', use_wl: bool = False, wl_max_iter: int = 3)[source]#
Bases:
objectHeuristic / approximate common-subgraph matcher.
This class provides a fast, approximate alternative to
MCSMatcher. It does not enumerate all subgraph isomorphisms. Instead, it:Picks a set of high-scoring node pairs as seeds based on attribute and local-structure similarity.
For each seed, greedily grows a subgraph isomorphism by extending along neighbouring nodes while maintaining local adjacency consistency.
Collects the resulting partial mappings and reports them in the same orientation style as
MCSMatcher.
Optionally, a 1-WL-style color refinement (
use_wl=True) is used to produce coarse structural colors; matching nodes with identical WL colors receive a small similarity bonus.The result is an approximate maximum-common-subgraph (MCS) mapping that is usually close to optimal for molecular graphs, but can be computed much faster and can be bounded by simple iteration parameters.
Orientation and storage#
Internally, mappings are stored in pattern → host orientation, where the pattern is the smaller of the two graphs after optional wildcard pruning. The
get_mappings()helper converts these intoG1→G2orG2→G1orientation based on which original graph served as the pattern.API compatibility#
The constructor and public methods mirror
MCSMatcheras closely as possible so thatApproxMCSMatchercan drop in as a faster, heuristic substitute in many workflows.Parameters#
- node_attrslist[str] or None, optional
Node attribute keys to compare. If
None, defaults to["element"].- node_defaultslist[Any] or None, optional
Fallback values for each node attribute when missing. If
None, defaults to a list of"*"with the same length as :paramref:`node_attrs`.- allow_shiftbool, optional
Placeholder for future asymmetric rules. Kept for API compatibility with
MCSMatcher.- edge_attrslist[str] or None, optional
Edge attribute keys to use for scalar comparison (e.g.
["order"]). IfNone, defaults to["order"].- prune_wcbool, optional
If
True, strip wildcard nodes (see :paramref:`wildcard_element`, :paramref:`element_key`) from both graphs before searching.- prune_automorphismsbool, optional
If
True, collapse mappings that have the same host node set (automorphism pruning).- wildcard_elementAny, optional
Attribute value denoting wildcard nodes (typically
"*", used together with :paramref:`element_key`).- element_keystr, optional
Node attribute key used to detect wildcard nodes when :paramref:`prune_wc` is
True.- use_wlbool, optional
If
True, run a simple 1-WL-style color refinement on both graphs and include the resulting colors in the node similarity score.- wl_max_iterint, optional
Maximum number of WL refinement iterations.
- find_common_subgraph(G1: Graph, G2: Graph, *, mcs: bool = False, mcs_mol: bool = False, max_seeds: int = 16, max_steps: int = 256) ApproxMCSMatcher[source]#
Approximate analogue of
MCSMatcher.find_common_subgraph().The signature mirrors the exact matcher, but the implementation is greedy/heuristic:
Optionally prunes wildcard nodes from both graphs.
If :paramref:`mcs_mol` is
True, performs component-level (molecule-level) approximate matching with_find_mcs_mol_approx().Otherwise, orients the pair so that the smaller graph is the pattern and runs the heuristic search.
The :paramref:`mcs` flag is accepted for API compatibility but has no distinct effect here; the heuristic always aims for large mappings.
- Parameters:
G1 (nx.Graph) – First input graph.
G2 (nx.Graph) – Second input graph.
mcs (bool) – Ignored (kept for API compatibility).
mcs_mol (bool) – If
True, perform approximate connected-component (molecule-level) matching.max_seeds (int) – Maximum number of seed pairs.
max_steps (int) – Maximum growth steps per seed.
- Returns:
The matcher instance (with internal cache updated).
- Return type:
- find_common_subgraph_approx(G1: Graph, G2: Graph, *, max_seeds: int = 16, max_steps: int = 256) ApproxMCSMatcher[source]#
Heuristically search for approximate common subgraphs.
This is a lightweight wrapper that ignores molecule-level options and simply runs the greedy approximate search on the whole (possibly wildcard-pruned) graphs.
- Parameters:
- Returns:
The matcher instance (with cache updated).
- Return type:
- find_rc_mapping(rc1: Any, rc2: Any, *, side: str = 'op', mcs: bool = True, mcs_mol: bool = False, component: bool = True, max_seeds: int = 16, max_steps: int = 256) ApproxMCSMatcher[source]#
Convenience wrapper for ITS reaction-centre or ITS-like graph objects, analogous to
MCSMatcher.find_rc_mapping()but using the heuristic search internally.Depending on :paramref:`side`, this either uses
synkit.Graph.ITS.its_decompose()to obtain left/right graphs or treats the inputs directly as graphs.Side selection#
'r'→ compare right sides:r1vsr2.'l'→ compare left sides:l1vsl2.'op'→ compare opposite:r1vsl2.'its'→ treatrc1andrc2directly as graphs (no decomposition), useful when the inputs are already ITS (or ITS-like)networkx.Graphobjects.
Component-wise mode#
If :paramref:`component` is
True, the selected graphs are decomposed into connected components, sorted by size (descending), and matched pairwise using_componentwise_approx(). The resulting mappings are combined into a single G1 → G2 mapping in terms of the original node ids. In this mode, :paramref:`mcs_mol` is ignored.- param rc1:
First reaction-centre or ITS-like graph object.
- type rc1:
Any
- param rc2:
Second reaction-centre or ITS-like graph object.
- type rc2:
Any
- param side:
Which ITS sides to compare (
'r','l','op', or'its').- type side:
str
- param mcs:
Ignored (kept for compatibility with
MCSMatcher).- type mcs:
bool
- param mcs_mol:
If
Trueand :paramref:`component` isFalse, use approximate molecule-level matching viafind_common_subgraph()with :paramref:`mcs_mol=True`.- type mcs_mol:
bool
- param component:
If
True, perform size-sorted, component-wise approximate matching between the selected sides and combine the per-component mappings into a single mapping.- type component:
bool
- param max_seeds:
Maximum number of seeds per call.
- type max_seeds:
int
- param max_steps:
Maximum growth steps per seed.
- type max_steps:
int
- returns:
The matcher instance (with internal cache updated).
- rtype:
ApproxMCSMatcher
- raises ImportError:
If
synkitITS utilities are not available forsidein{'r', 'l', 'op'}.- raises ValueError:
If
sideis not one of'r','l','op','its'.
- get_mappings(direction: str = 'pattern_to_host') List[Dict[int, int]][source]#
Return a copy of the cached mapping list in the requested orientation.
Internal orientation is pattern → host. This method can convert to
G1→G2orG2→G1based on the last call tofind_common_subgraph(),find_common_subgraph_approx()orfind_rc_mapping().
- property help: str#
Return the module-level documentation string.
- Returns:
The full module docstring, if available.
- Return type:
- property last_size: int#
Size of the largest approximate mapping from the last search.
- Returns:
Size of the best mapping.
- Return type:
- property mapping_direction: str#
Human-readable description of internal mapping orientation.
- Returns:
"G1_to_G2","G2_to_G1", or"unknown"if no search has been run.- Return type:
auto_est.py#
Approximate node automorphism groups (orbits) via 1-WL color refinement, plus orbit- and component-aware deduplication utilities.
Design goals (SynKit style)#
OOP with a scikit-like
fit() -> self.Deterministic output ordering.
Sphinx-style docstrings.
Helper methods and useful properties.
Optional “components style” grouping, analogous to exact Automorphism: you can obtain orbit-components induced by a subset of nodes (anchors).
Important note#
WL-1 provides an approximate orbit partition: distinct WL colors imply distinct orbits, but equal WL colors do not guarantee true symmetry.
This module offers:
- AutoEst.orbits: WL-equivalence classes on the given graph
- AutoEst.components(nodes): connected components on an induced subgraph
- AutoEst.orbit_components(nodes): components of the orbit-quotient graph
- AutoEst.deduplicate_host_orbits(mappings): host-orbit based pruning
- AutoEst.deduplicate_pattern_orbits(mappings, pattern_orbits, ...):
pattern-orbit based pruning (anchor-aware)
The anchor-aware deduplication follows your recent constraint: anchor components must not be pruned by orbit-independence.
- class synkit.Graph.Matcher.auto_est.AutoEst(graph: Graph, node_attrs: List[str] | None = None, edge_attrs: List[str] | None = None, max_iter: int = 10)[source]#
Bases:
objectApproximate node automorphism groups (orbits) via 1-WL color refinement.
This class performs a Weisfeiler–Lehman (WL-1) style color refinement on the input graph to approximate a partition of nodes into automorphism-indistinguishability classes (often called “WL-orbits”). In many practical graphs (especially with chemically meaningful node/edge labels), the WL partition coincides with, or closely approximates, the true orbit partition and is much cheaper than enumerating automorphisms.
Besides the basic orbit partition,
AutoEstprovides a “components-style” interface analogous to the exact automorphism helper:components()returns connected components of an induced subgraph (useful for “anchor components”).orbit_components()returns connected components in the orbit-quotient graph restricted to a node subset, capturing which orbits are coupled by edges inside the subset.deduplicate_host_orbits()prunes mappings by host-side WL-orbits.deduplicate_pattern_orbits()prunes mappings by pattern orbits, with optional anchor-aware behavior (no pruning inside anchored nodes).
- Parameters:
graph (nx.Graph) – Input NetworkX graph. It is not modified in-place.
node_attrs (list[str] or None) – Node attribute keys whose values will be included in the initial coloring. If
None, defaults are used.edge_attrs (list[str] or None) – Edge attribute keys whose values will be incorporated into the neighborhood signatures. If
None, defaults are used.max_iter (int) – Maximum number of WL refinement iterations.
Note
This is an approximate estimator of automorphism orbits: two nodes with different final WL colors cannot be in the same orbit, but nodes with the same WL color might still be distinguishable by higher-order invariants (e.g. higher-dimensional WL, spectral invariants, or full automorphism search). In many molecular graphs where node/edge labels are informative, this partition is typically very close to the true automorphism partition and is often sufficient for symmetry-aware pruning.
Note
“Anchor components” can invalidate orbit-wise independence: when a subset of nodes is treated as anchored, orbits that are connected through the anchored subgraph should be considered coupled. Use
orbit_components()and anchor-awarededuplicate_pattern_orbits()to avoid incorrect pruning.See also
For discussions relating Weisfeiler–Lehman refinement to automorphism indistinguishability and orbit structure, see:
A. Dawar and G. Vagnozzi, Generalizations of k-dimensional Weisfeiler–Leman stabilization, arXiv preprint (2019/2020).
Example#
import networkx as nx from synkit.Graph.automorphism import AutoEst # Simple 4-cycle where all nodes are symmetric under rotation/reflection G = nx.cycle_graph(4) est = AutoEst(G, node_attrs=[], edge_attrs=[]) est = est.fit() print(est.orbits) # [frozenset({0, 1, 2, 3})] print(est.n_orbits) # 1 # "components style": connected components of an induced subgraph comps = est.components(nodes=[0, 1]) print(comps) # [frozenset({0, 1})] # orbit-quotient components of an induced subgraph oc = est.orbit_components(nodes=[0, 1, 2, 3]) print(oc) # [frozenset({0})] # one orbit-id component in this symmetric case
- property anchor_component: FrozenSet[Hashable]#
Largest connected component of the fitted graph.
This is a convenience “components-style” accessor. It is commonly used as an anchor set for match pruning and symmetry breaking.
- Returns:
The node-set of the largest connected component. If multiple components share the maximum size, the one with the smallest (sorted) node is returned for determinism.
- Return type:
frozenset[hashable]
- Raises:
RuntimeError – If
fit()has not been called.
- components(nodes: Iterable[Hashable] | None = None) List[FrozenSet[Hashable]][source]#
Compute connected components on an induced subgraph.
This mirrors the “components” utilities you used around Automorphism.
- Parameters:
nodes (iterable[hashable] or None) – Subset of nodes to induce. If None, uses all nodes.
- Returns:
Connected components as frozensets (deterministic order).
- Return type:
- Raises:
RuntimeError – If not fitted.
- fit() AutoEst[source]#
Run WL-1 refinement and compute approximate orbits.
- Returns:
The fitted estimator (
self).- Return type:
- property graph: Graph#
Underlying graph.
- Returns:
Graph passed to the constructor.
- Return type:
nx.Graph
- property max_iter: int#
Maximum number of WL refinement iterations.
- Returns:
Maximum refinement iterations.
- Return type:
- orbit_components(nodes: Iterable[Hashable] | None = None) List[FrozenSet[int]][source]#
Components of the orbit-quotient graph restricted to an induced subgraph.
First restrict to nodes (or all nodes).
Collapse nodes to their orbit ids.
Build orbit-quotient adjacency based on edges between orbits.
Return connected components of orbit ids.
This is useful when you have an “anchor component” defined as a set of pattern nodes and want to treat coupled orbits as a single unit.
- Parameters:
nodes (iterable[hashable] or None) – Subset of nodes. If None, uses all nodes.
- Returns:
List of connected components in orbit-id space.
- Return type:
- Raises:
RuntimeError – If not fitted.
- synkit.Graph.Matcher.auto_est.estimate_automorphism_groups(graph: Graph, node_attrs: Iterable[str] | None = None, edge_attrs: Iterable[str] | None = None, max_iter: int = 10) AutoEst[source]#
Convenience function to fit
AutoEst.- Parameters:
- Returns:
Fitted estimator.
- Return type:
automorphism.py#
Utility for computing graph automorphisms and pruning redundant sub-graph mappings equivalent under those symmetries.
This module provides the Automorphism helper, which computes the
node-orbits of a graph and uses them to deduplicate subgraph-match mappings.
Key idea#
Group host nodes into orbits under the automorphism group of the host graph: two nodes are in the same orbit if there exists an automorphism \(\sigma\) such that \(\sigma(u) = v\).
Mappings are considered equivalent if they hit the same multiset of orbits on the host side, and a single representative is kept.
Disconnected graphs#
If the host graph is disconnected, we choose one connected component as an anchor (by default the largest component) to suppress automorphisms that swap isomorphic components. We then compute automorphisms within each component independently and combine component orbits. The total number of automorphisms is the product of component automorphism counts (excluding component-permutation symmetries by design due to anchoring).
- class synkit.Graph.Matcher.automorphism.Automorphism(graph: Graph, node_attr_keys: Sequence[str] | None = None, edge_attr_keys: Sequence[str] | None = None, *, anchor_largest_component: bool = True)[source]#
Bases:
objectAnalyze the automorphism group of a graph and prune sub-graph mappings that are equivalent under those symmetries.
Two nodes are in the same orbit if there exists an automorphism \(\sigma\) such that \(\sigma(u) = v\).
Parameters#
- graphnx.Graph
The host graph for which to compute automorphisms.
- node_attr_keysSequence[str] | None, optional
Sequence of node attribute keys to respect in the automorphism computation (i.e., nodes must match on these attributes). Defaults to
("element", "charge").- edge_attr_keysSequence[str] | None, optional
Sequence of edge attribute keys to respect in the automorphism computation. Defaults to
("order",).- anchor_largest_componentbool, optional
If
Trueand the graph is disconnected, chooses the largest connected component as an “anchor” to suppress automorphisms that swap isomorphic components. Defaults toTrue.
- property anchor_component: frozenset[int | str | Tuple | object] | None#
Anchor component used for disconnected graphs.
Returns#
- frozenset[NodeId] | None
Anchor component node-set, or
Noneif graph is connected.
- property components: List[frozenset[int | str | Tuple | object]]#
Connected components (weakly for directed graphs).
Returns#
- list[frozenset[NodeId]]
Components as frozensets of node IDs.
- property is_connected: bool#
Whether the host graph is connected (weakly for directed graphs).
Returns#
- bool
Trueif connected or has 0/1 nodes, elseFalse.
- class synkit.Graph.Matcher.batch_cluster.BatchCluster(node_label_names: List[str] = ['element', 'charge'], node_label_default: List[Any] = ['*', 0], edge_attribute: str = 'order', backend: str = 'nx')[source]#
Bases:
object- available_backends() List[str][source]#
Return available backends: always includes ‘nx’; adds ‘rule’ if the ‘mod’ package is installed.
- static batch_dicts(input_list, batch_size)[source]#
Splits a list of dictionaries into batches of a specified size.
Args: input_list (list of dict): The list of dictionaries to be batched. batch_size (int): The size of each batch.
Returns: list of list of dict: A list where each element is a batch (sublist) of dictionaries.
Raises: ValueError: If batch_size is less than 1.
- cluster(data: List[Dict], templates: List[Dict], rule_key: str = 'gml', attribute_key: str = 'WLHash') Tuple[List[Dict], List[Dict]][source]#
Processes a list of graph data entries, classifying each based on existing templates.
Parameters: - data (List[Dict]): A list of dictionaries, each representing a graph or rule
to be classified.
templates (List[Dict]): Dynamic templates used for categorization.
Returns: - Tuple[List[Dict], List[Dict]]: A tuple containing the list of classified data
and the updated templates.
- fit(data: List[Dict], templates: List[Dict], rule_key: str = 'gml', attribute_key: str = 'WLHash', batch_size: int | None = None) Tuple[List[Dict], List[Dict]][source]#
Processes and classifies data in batches. Uses GraphCluster for initial processing and a stratified sampling technique to update templates if there is only one batch and no initial templates are provided.
Parameters: - data (List[Dict]): Data to process. - templates (List[Dict]): Templates for categorization. - rule_key (str): Key to access rule or graph data. - attribute_key (str): Key to access attributes used for filtering. - batch_size (Optional[int]): Size of batches for processing, if not provided, processes all data at once.
Returns: - Tuple[List[Dict], List[Dict]]: The processed data and the potentially updated templates.
- lib_check(data: Dict, templates: List[Dict], rule_key: str = 'gml', attribute_key: str = 'signature', nodeMatch: Callable | None = None, edgeMatch: Callable | None = None) Dict[source]#
Checks and classifies a graph or rule based on existing templates using either graph or rule isomorphism.
Parameters: - data (Dict): A dictionary representing a graph or rule with its attributes and classification. - templates (List[Dict]): Dynamic templates used for categorization. If None, initializes to an empty list. - rule_key (str): Key to access the graph or rule data within the dictionary. - attribute_key (str): An attribute used to filter templates before isomorphism check. - nodeMatch (Optional[Callable]): A function to match nodes, defaults to a predefined generic_node_match. - edgeMatch (Optional[Callable]): A function to match edges, defaults to a predefined generic_edge_match.
Returns: - Dict: The updated dictionary with its classification.
- synkit.Graph.Matcher.dedup_matches.deduplicate_matches_with_anchor(matches: Iterable[Dict[int, int]], *, pattern_orbits: Iterable[FrozenSet[int]] | None = None, pattern_anchor: FrozenSet[int] | None = None, host_orbits: Iterable[FrozenSet[int]] | None = None, host_anchor: FrozenSet[int] | None = None) List[Dict[int, int]][source]#
Deduplicate pattern→host matches with optional anchor-aware symmetry breaking on both pattern and host sides.
This function supports partial mappings: a match may map only a subset of pattern nodes. Orbit-based signatures are computed using only orbit nodes present in each mapping.
Rules#
Matches are always interpreted as pattern → host mappings.
- If
pattern_orbitsis provided: Pattern nodes inside
pattern_anchorare fixed when present.Pattern orbits disjoint from the anchor are deduplicated up to permutation, with host-side symmetry optionally collapsed by
host_orbits.
- If
- If
pattern_orbitsis None andhost_orbitsis provided: Deduplicate by the multiset of host orbits hit (mapping values only).
- If
If both orbit arguments are None, return matches unchanged.
- param matches:
Iterable of pattern → host mapping dictionaries.
- param pattern_orbits:
Automorphism orbits of the pattern graph (optional).
- param pattern_anchor:
Anchor component of the pattern graph (optional).
- param host_orbits:
Automorphism orbits of the host graph (optional).
- param host_anchor:
Anchor component of the host graph (optional). Kept for API symmetry; host anchoring is handled indirectly by orbit collapsing.
- returns:
Deduplicated list of pattern → host mappings, preserving input order.
- raises ValueError:
If
host_orbitsis provided but a mapping contains a host node not covered by any host orbit.
- class synkit.Graph.Matcher.graph_cluster.GraphCluster(node_label_names: List[str] = ['element', 'charge'], node_label_default: List[Any] = ['*', 0], edge_attribute: str = 'order', backend: str = 'nx')[source]#
Bases:
object- available_backends() List[str][source]#
Return available backends: always includes ‘nx’; adds ‘mode’ if the ‘mod’ package is installed.
- fit(data: List[Dict], rule_key: str = 'gml', attribute_key: str = 'WLHash', strip: bool = False) List[Dict][source]#
Automatically clusters the rules and assigns them cluster indices based on the similarity, potentially using provided templates for clustering, or generating new templates.
Parameters: - data (List[Dict]): A list containing dictionaries, each representing a
rule along with metadata.
rule_key (str): The key in the dictionaries under data where the rule data is stored.
attribute_key (str): The key in the dictionaries under data where rule attributes are stored.
Returns: - List[Dict]: Updated list of dictionaries with an added ‘class’ key for cluster
identification.
- iterative_cluster(rules: List[str], attributes: List[Any] | None = None, nodeMatch: Callable | None = None, edgeMatch: Callable | None = None) Tuple[List[Set[int]], Dict[int, int]][source]#
Clusters rules based on their similarities, which could include structural or attribute-based similarities depending on the given attributes.
Parameters: - rules (List[str]): List of rules, potentially serialized strings of rule
representations.
attributes (Optional[List[Any]]): Attributes associated with each rule for preliminary comparison, e.g., labels or properties.
Returns: - Tuple[List[Set[int]], Dict[int, int]]: A tuple containing a list of sets
(clusters), where each set contains indices of rules in the same cluster, and a dictionary mapping each rule index to its cluster index.
- class synkit.Graph.Matcher.graph_matcher.GraphMatcherEngine(*, backend: str = 'nx', node_attrs: List[str] | None = None, edge_attrs: List[str] | None = None, wl1_filter: bool = False, max_mappings: int | None = 1)[source]#
Bases:
objectReusable engine for (sub‑)graph isomorphism checks & embeddings.
Parameters#
- backend:
"nx"(default) – pure‑Python implementation that relies on
GraphMatcher. *"rule"– optional, requires the third‑party mod package.- node_attrs, edge_attrs:
Lists of attribute keys used for matching.
hcountandlone_pairsare treated specially: the host must be ≥ the pattern. Other requested attributes, includingradical, match exactly.- wl1_filter:
If True, a fast WL‑based colour refinement pre‑filter discards host graphs that cannot possibly contain the pattern.
- max_mappings:
Upper bound on the number of mappings to enumerate in
get_mappings(). None means “no limit”.
- synkit.Graph.Matcher.graph_morphism.find_graph_isomorphism(G1: Graph | DiGraph | MultiGraph | MultiDiGraph, G2: Graph | DiGraph | MultiGraph | MultiDiGraph, node_match: Callable[[Dict[str, Any], Dict[str, Any]], bool] | None = None, edge_match: Callable[[Dict[str, Any], Dict[str, Any]], bool] | None = None, use_defaults: bool = True, fast_invariant_check: bool = True, logger: Logger | None = None) Dict[Any, Any] | None[source]#
Check whether two graphs are isomorphic and return the node-mapping.
- Parameters:
G1 (nx.Graph | nx.DiGraph | nx.MultiGraph | nx.MultiDiGraph) – The first NetworkX graph to compare.
G2 (nx.Graph | nx.DiGraph | nx.MultiGraph | nx.MultiDiGraph) – The second NetworkX graph to compare.
node_match (callable or None) – Optional function taking two node attribute dicts and returning True if they match.
edge_match (callable or None) – Optional function taking two edge attribute dicts and returning True if they match.
use_defaults (bool) – Whether to use default matchers when None.
fast_invariant_check (bool) – Perform quick node/edge count and degree sequence checks prior to matcher.
logger (logging.Logger or None) – Logger for debug messages. Defaults to root logger.
- Returns:
A dict mapping nodes in G1 to nodes in G2 if isomorphic; otherwise None.
- Return type:
dict[Any, Any] or None
- synkit.Graph.Matcher.graph_morphism.graph_isomorphism(graph_1: Graph, graph_2: Graph, node_match: Callable | None = None, edge_match: Callable | None = None, use_defaults: bool = False) bool[source]#
Determines if two graphs are isomorphic, considering provided node and edge matching functions. Uses default matching settings if none are provided.
Parameters: - graph_1 (nx.Graph): The first graph to compare. - graph_2 (nx.Graph): The second graph to compare. - node_match (Optional[Callable]): The function used to match nodes. Uses default if None. - edge_match (Optional[Callable]): The function used to match edges. Uses default if None.
Returns: - bool: True if the graphs are isomorphic, False otherwise.
- synkit.Graph.Matcher.graph_morphism.heuristics_MCCS(graphs: List[Graph], node_label_names: List[str] = ['element', 'charge'], node_label_default: List[Any] = ['*', 0], edge_attribute: str = 'standard_order') Graph[source]#
Computes the Maximum Connected Common Subgraph (MCCS) over a list of graphs using a heuristic approach.
This function computes the MCCS between the first two graphs using the maximum_connected_common_subgraph function based on customizable node and edge attributes. For more than two graphs, it iteratively updates the common subgraph by calculating the MCCS between the current common subgraph and each subsequent graph. An early exit occurs if the intermediate common subgraph becomes empty.
Parameters: - graphs (List[nx.Graph]): A list of networkx graphs for which the common subgraph is to be computed. - node_label_names (List[str]): List of node attribute names used for matching. - node_label_default (List[Any]): Default values for missing node attributes. - edge_attribute (str): The edge attribute to compare.
Returns: - nx.Graph: The maximum connected common subgraph common to all provided graphs. If no common
subgraph exists, an empty graph is returned.
Raises: - ValueError: If the input list of graphs is empty.
- synkit.Graph.Matcher.graph_morphism.maximum_connected_common_subgraph(graph_1: Graph, graph_2: Graph, node_label_names: List[str] = ['element', 'charge'], node_label_default: List[Any] = ['*', 0], edge_attribute: str = 'standard_order') Graph[source]#
Computes the largest connected common subgraph (MCS) between two graphs using subgraph isomorphism based on customizable node and edge attributes.
The function iterates over subsets of nodes from the smaller graph—starting from the largest possible subgraph size down to 1—and returns the first (largest) candidate that is connected and is isomorphic to a subgraph of the larger graph.
Parameters: - graph_1 (nx.Graph): The first graph for comparison. - graph_2 (nx.Graph): The second graph for comparison. - node_label_names (List[str]): List of node attribute names used for matching. - node_label_default (List[Any]): Default values for missing node attributes. - edge_attribute (str): The edge attribute to compare.
Returns: - nx.Graph: A graph representing the largest connected common subgraph found; if none exists,
returns an empty graph.
- synkit.Graph.Matcher.graph_morphism.subgraph_isomorphism(child_graph: Graph, parent_graph: Graph, node_label_names: List[str] = ['element', 'charge'], node_label_default: List[Any] = ['*', 0], edge_attribute: str = 'order', use_filter: bool = False, check_type: str = 'induced', node_comparator: Callable[[Any, Any], bool] | None = None, edge_comparator: Callable[[Any, Any], bool] | None = None) bool[source]#
Enhanced checks if the child graph is a subgraph isomorphic to the parent graph based on customizable node and edge attributes.
Parameters: - child_graph (nx.Graph): The child graph. - parent_graph (nx.Graph): The parent graph. - node_label_names (List[str]): Labels to compare. - node_label_default (List[Any]): Defaults for missing node labels. - edge_attribute (str): The edge attribute to compare. - use_filter (bool): Whether to use pre-filters based on node and edge count. - check_type (str): “induced” (default) or “monomorphism” for the type of subgraph matching. - node_comparator (Callable[[Any, Any], bool]): Custom comparator for node attributes. - edge_comparator (Callable[[Any, Any], bool]): Custom comparator for edge attributes.
Returns: - bool: True if subgraph isomorphism is found, False otherwise.
mcs_matcher.py — Maximum/Common Subgraph Matcher#
A convenience wrapper around networkx.algorithms.isomorphism.GraphMatcher
that finds all common-subgraph (or maximum-common-subgraph) node mappings
between two molecular graphs.
Highlights#
Flexible node matching via
generic_node_match().Multi-attribute edge matching via a list of
edge_attrs.Optional wildcard pruning: if
prune_wc=True, nodes withattrs[element_key] == wildcard_elementare removed (non-inplace) from both graphs before searching.Optional automorphism pruning: if
prune_automorphisms=True, mappings that cover the same host-node set are collapsed, greatly reducing equivalent mappings from symmetric subgraphs.Results are cached – call
get_mappings()(or use the :pyattr:`mappings` property) to retrieve them.Can report mappings as pattern→host, G1→G2 or G2→G1 via the
get_mappings()helper.Helpful :pyattr:`help` and
__repr__()utilities, in the same OOP style asPartialMatcher.
Public API#
- ``MCSMatcher(node_attrs=None,
node_defaults=None, allow_shift=True, edge_attrs=None, prune_wc=False, prune_automorphisms=False, wildcard_element=’*’, element_key=’element’)``
Construct a matcher instance.
matcher.find_common_subgraph(G1, G2, mcs=False, mcs_mol=False)Run the search (stores and returns
self). Ifmcs_mol=True, match by entire connected components (molecule-level matching).matcher.get_mappings(direction='pattern_to_host')Retrieve the stored mapping list.
directioncan be one of"pattern_to_host","G1_to_G2","G2_to_G1".matcher.mappingsShorthand for
get_mappings(direction='pattern_to_host').matcher.mapping_directionString describing internal orientation:
"G1_to_G2","G2_to_G1", or"unknown"if no search has been run yet.- ``matcher.find_rc_mapping(rc1, rc2,
side=’op’, mcs=True, mcs_mol=False, component=True)``
Convenience wrapper for ITS reaction-centre or ITS-like graph objects (via
synkit.Graph.ITS.its_decompose()when applicable).sidechooses which ITS sides to compare:'r': compare right sides (r1vsr2)'l': compare left sides (l1vsl2)'op': compare opposite (r1vsl2)'its': treatrc1andrc2directly as graphs (no decomposition).
- class synkit.Graph.Matcher.mcs_matcher.MCSMatcher(node_attrs: List[str] | None = None, node_defaults: List[Any] | None = None, allow_shift: bool = True, *, edge_attrs: List[str] | None = None, prune_wc: bool = False, prune_automorphisms: bool = False, wildcard_element: Any = '*', element_key: str = 'element')[source]#
Bases:
objectCommon / maximum-common subgraph matcher.
This class wraps
networkx.algorithms.isomorphism.GraphMatcherto provide higher-level utilities for computing sets of common subgraphs between two graphs, with a focus on molecular graphs (atoms/bonds).Node matching is controlled via
generic_node_match()using one or more attribute names and default values. Edge matching compares one or more scalar edge attributes (e.g. bond order) specified in :paramref:`edge_attrs`.Mappings and orientation#
Internally, mappings are always stored as pattern → host, where the pattern is the smaller of the two (after optional wildcard pruning). The helper
get_mappings()can convert these toG1→G2orG2→G1orientation as needed, and the property :pyattr:`mapping_direction` exposes which graph acted as the pattern.Optional wildcard pruning#
If :paramref:`prune_wc` is
True, nodes withattrs[element_key] == wildcard_elementare removed from both input graphs non-inplace before the MCS search. Node labels are preserved, so the resulting mappings still reference the original node ids.Optional automorphism pruning#
If :paramref:`prune_automorphisms` is
True, mappings that induce the same host node set (i.e. same image in the host graph) are collapsed. This is especially useful for highly symmetric hosts (rings, repeated subunits, etc.) where many mappings are equivalent at the level of “which part of the host is covered”.- param node_attrs:
Node attribute keys to compare. If
None, defaults to["element"].- type node_attrs:
list[str] | None
- param node_defaults:
Fallback values for each node attribute when missing. If
None, defaults to a list of"*"of the same length as :paramref:`node_attrs`.- type node_defaults:
list[Any] | None
- param allow_shift:
Placeholder for future asymmetric rules. Currently unused but kept for API compatibility.
- type allow_shift:
bool
- param edge_attrs:
Edge attribute keys to use for scalar comparison (e.g.
["order"]or["order", "standard_order"]). IfNone, defaults to["order"].- type edge_attrs:
list[str] | None
- param prune_wc:
If
True, strip wildcard nodes (see :paramref:`wildcard_element`, :paramref:`element_key`) from both graphs before searching.- type prune_wc:
bool
- param prune_automorphisms:
If
True, collapse mappings that have the same host node set (automorphism pruning).- type prune_automorphisms:
bool
- param wildcard_element:
Attribute value denoting wildcard nodes (typically
"*", used together with :paramref:`element_key`).- type wildcard_element:
Any
- param element_key:
Node attribute key used to detect wildcard nodes when :paramref:`prune_wc` is
True.- type element_key:
str
- find_common_subgraph(G1: Graph, G2: Graph, *, mcs: bool = False, mcs_mol: bool = False) MCSMatcher[source]#
Search for common subgraphs between two graphs.
The results are cached in :pyattr:`mappings` and :pyattr:`last_size`. The method returns
selfto enable a fluent style.If :pyattr:`prune_wc` is
True, wildcard nodes are stripped (non-inplace) from both graphs before the search.- Parameters:
G1 (nx.Graph) – First input graph.
G2 (nx.Graph) – Second input graph.
mcs (bool) – If
True, restrict to maximum-common-subgraph mappings (largest possible node count).mcs_mol (bool) – If
True, perform connected-component (molecule-level) matching using_find_mcs_mol(). In this mode,mcsis ignored.
- Returns:
The matcher instance (with internal cache updated).
- Return type:
- find_rc_mapping(rc1: Any, rc2: Any, *, side: str = 'op', mcs: bool = True, mcs_mol: bool = False, component: bool = True) MCSMatcher[source]#
Convenience wrapper for ITS reaction-centre or ITS-like graph objects.
Depending on :paramref:`side`, this either uses
synkit.Graph.ITS.its_decompose()to obtain left/right graphs or treats the inputs directly as graphs.Side selection#
'r'→ compare right sides:r1vsr2.'l'→ compare left sides:l1vsl2.'op'→ compare opposite:r1vsl2.'its'→ treatrc1andrc2directly as graphs (no decomposition), useful when the inputs are already ITS (or ITS-like)networkx.Graphobjects.
Component-wise mode#
If :paramref:`component` is
True, the selected graphs are decomposed into connected components, sorted by size (descending), and matched pairwise (largest with largest, etc.) using a common-/maximum-common-subgraph search for each pair. The resulting mappings are combined into a single G1 → G2 mapping in terms of the original node ids. In this mode, :paramref:`mcs_mol` is ignored.- param rc1:
First reaction-centre or ITS-like graph object.
- type rc1:
Any
- param rc2:
Second reaction-centre or ITS-like graph object.
- type rc2:
Any
- param side:
Which ITS sides to compare (
'r','l','op', or'its').- type side:
str
- param mcs:
If
True, restrict to maximum-common-subgraph mappings (for the whole graph or per-component in component-wise mode).- type mcs:
bool
- param mcs_mol:
If
True, use connected-component matching via_find_mcs_mol(). Ignored if :paramref:`component` isTrue.- type mcs_mol:
bool
- param component:
If
True, perform size-sorted, component-wise MCS between the selected sides and combine the per-component mappings into a single mapping.- type component:
bool
- returns:
The matcher instance (with internal cache updated).
- rtype:
MCSMatcher
- raises ImportError:
If
synkitITS utilities are not available forsidein{'r', 'l', 'op'}.- raises ValueError:
If
sideis not one of'r','l','op','its'.
- get_mappings(direction: str = 'pattern_to_host') List[Dict[int, int]][source]#
Return a copy of the cached mapping list in the requested orientation.
Internal cache is pattern → host (where pattern is the smaller graph after pruning). This method can convert to original
G1→G2orG2→G1orientation based on the last call tofind_common_subgraph().- Parameters:
direction (str) – Orientation of the returned mappings. One of: -
"pattern_to_host"(default): internal orientation. -"G1_to_G2": mapping from first input graph to second. -"G2_to_G1": mapping from second input graph to first.- Returns:
List of node-mapping dictionaries.
- Return type:
- Raises:
ValueError – If
directionis not supported.
- property help: str#
Return the module-level documentation string.
- Returns:
The full module docstring, if available.
- Return type:
- property last_size: int#
Number of nodes in the most recent maximum mapping set.
This is the size of the largest mapping found in the last call to
find_common_subgraph()(or zero if no mappings exist).- Returns:
Size of the largest mapping.
- Return type:
- property mapping_direction: str#
Human-readable description of internal mapping orientation.
- Returns:
"G1_to_G2","G2_to_G1", or"unknown"if no search has been run yet.- Return type:
- class synkit.Graph.Matcher.multi_turbo_iso.MultiTurboISO(hosts: List[Graph], node_label: str | List[str] = 'label', edge_label: str | List[str] | None = None, distance_threshold: int = 5000)[source]#
Bases:
objectAccelerated sub-graph search across a batch of host graphs.
Builds a single global signature bucket over all hosts and reuses a lightweight TurboISO matcher per host. For each query graph, hosts are first pruned by a signature + degree filter, and then TurboISO’s backtracking is run only on the surviving hosts.
- Parameters:
hosts (List[nx.Graph]) – List of host graphs to index.
node_label (str or list[str]) – Node attribute(s) used for signature matching.
edge_label (str or list[str] or None) – Edge attribute(s) to match; pass None to ignore edges.
distance_threshold (int) – Skip distance filtering if candidate pool is smaller.
- Returns:
An instance of MultiTurboISO with global index built.
- Return type:
- search_many(patterns: List[Graph], *, prune: bool = False) List[Dict[int, bool | List[Dict[Any, Any]]]][source]#
Match a list of pattern graphs.
Returns a list of per‑pattern dictionaries in the same order as the input list.
- search_one(Q: Graph, *, prune: bool = False) Dict[int, bool | List[Dict[Any, Any]]][source]#
Match a single pattern graph Q against every host.
Parameters#
- Qnx.Graph
Query / pattern graph.
- prunebool, default False
Forwarded to TurboISO. If True, return just a boolean per host (‘found?’), otherwise return the full list of mappings.
Returns#
- dict
{host_idx: result}where result is bool if prune is True else a list of node‑mapping dicts.
- class synkit.Graph.Matcher.orbit.OrbitAccuracy(approx_orbits: Iterable[FrozenSet[Hashable]], exact_orbits: Iterable[FrozenSet[Hashable]])[source]#
Bases:
objectCompare two orbit partitions (approximate vs exact) and compute accuracy metrics.
The class is intentionally small and OOP-styled: most methods are chainable (return
self) and computed results are exposed via properties.Parameters#
- approx_orbits :
Iterable of frozenset-like objects (each containing node identifiers). Each element represents one orbit (set of node ids) from the approximate partition. The iterable is consumed and a copy (list of frozensets) is stored internally.
- exact_orbits :
Iterable of frozenset-like objects (each containing node identifiers). Each element represents one orbit from the exact partition.
Raises#
- ValueError
If the union of nodes covered by the two partitions differs (i.e. they do not refer to the same node set), a
ValueErroris raised with a short diagnostic listing nodes missing in either partition.
Attributes#
- nodesset
The set of all node identifiers (union of both partitions). Available after initialization.
- metricsdict
Computed metrics (see
compute()) exposed as a dictionary via the :pyattr:`metrics` property.- confusion_mapdict
A mapping
approx_orbit_index -> { exact_orbit_index: overlap_count }exposed via the :pyattr:`confusion_map` property.
Notes#
Node identifiers must be hashable (ints, str, …).
The input iterables are not modified; the class stores its own frozenset copies.
compute()is chainable and returnsself; read metrics via the :pyattr:`metrics` property.
Examples#
approx = [frozenset({1}), frozenset({2, 3})] exact = [frozenset({1}), frozenset({2, 3})] oa = OrbitAccuracy(approx, exact).compute() print(oa) # -> <OrbitAccuracy nodes=3 approx_orbits=2 exact_orbits=2> print(oa.metrics) # -> {'node_exact_match_fraction': 1.0, ...} print(oa.confusion_map) # -> {0: {0: 1}, 1: {1: 2}}
- property approx_orbits: List[FrozenSet[Hashable]]#
Return the stored approx-orbits as a list of frozensets.
Returns#
- list
Internal copy of the approximate orbit list.
- compute(brute_force_pairs: bool = True) OrbitAccuracy[source]#
Compute all metrics and build the confusion map.
This method is chainable and returns
self; call :pyattr:`metrics` or :pyattr:`confusion_map` afterwards to access results.Parameters#
- brute_force_pairsbool, optional
If True (default) compute pairwise accuracy by checking all unordered node pairs (O(N^2)). For very large node sets a combinatorial method (based on orbit sizes) may be preferred; this implementation defaults to brute-force because typical orbit counts are moderate.
Returns#
- OrbitAccuracy
Returns
selfto enable chaining.
- property confusion_map: Dict[int, Dict[int, int]]#
Return the confusion map: approx_orbit_index -> { exact_orbit_index: count }.
Returns#
- dict
Copy of the internal confusion mapping. Call
compute()first.
- property exact_orbits: List[FrozenSet[Hashable]]#
Return the stored exact-orbits as a list of frozensets.
Returns#
- list
Internal copy of the exact orbit list.
- help() str[source]#
Return a short usage/help string.
Returns#
- str
Brief one-line instructions on how to use the class.
- property metrics: Dict[str, float]#
Return computed metrics.
Returns#
- dict
Copy of the metrics dictionary. Call
compute()first to populate.
- report(max_rows: int = 10) str[source]#
Produce a short human-readable report summarising computed metrics and the top confusion rows.
Parameters#
- max_rowsint, optional
Maximum number of confusion rows to include in the textual report. Default is 10.
Returns#
- str
A multi-line string summarising the results. Call
compute()before calling this method.
- class synkit.Graph.Matcher.partial_matcher.PartialMatcher(host: Graph | Sequence[Graph], pattern: Graph, node_attrs: List[str], edge_attrs: List[str], *, strategy: Strategy = Strategy.COMPONENT, max_results: int | None = None, partial: bool = True, threshold: int | None = None, pre_filter: bool = False, prune_auto: bool = False, wl_max_iter: int = 10)[source]#
Bases:
objectComponent-subset helper for pattern→host subgraph matching.
This matcher treats each connected component of the pattern as an independent “micro-pattern” and searches for consistent embeddings of subsets of these components into one or more host graphs. It can behave like a classic “partial matcher” (searching all component counts) or like a strict full-pattern matcher, depending on the
partialflag.Internally, all embeddings for each pair (host, pattern component) are pre-computed once and then re-used for all combinations. This significantly reduces redundant calls to
SubgraphSearchEnginewhen exploring many subsets.Optionally, approximate WL-1 automorphism orbits can be used to prune embeddings that are equivalent under host symmetries via :paramref:`prune_auto`.
Parameters#
- hostnx.Graph | Sequence[nx.Graph]
Single host graph or sequence of host graphs.
- patternnx.Graph
Pattern graph whose connected components act as building blocks.
- node_attrslist[str]
Node attribute keys enforced equal during matching.
- edge_attrslist[str]
Edge attribute keys enforced equal during matching.
- strategyStrategy, optional
Matching strategy forwarded to
SubgraphSearchEngine.- max_resultsint | None, optional
Global cap on number of embeddings to store. If
None, no explicit cap is applied.- partialbool, optional
If
True, auto-mode (k=None) searches all component counts from full pattern down to 1. IfFalse, auto-mode only triesk = n_components(i.e. full-pattern matching only).- thresholdint | None, optional
Optional cap on embeddings per (host, component) pairing. If exceeded, that pairing is treated as “no valid embeddings” and skipped. Defaults to
SubgraphSearchEngine.DEFAULT_THRESHOLD.- pre_filterbool, optional
If
True, enable the cheap Cartesian-product pre-filter inSubgraphSearchEnginefor each (host, component) pair.- prune_autobool, optional
If
True, apply approximate automorphism-based pruning on the final list of embeddings using WL-1 orbits computed byAutoEst. For safety, pruning is only applied when there is a single host graph. Defaults toFalse.- wl_max_iterint, optional
Maximum number of WL refinement iterations in
AutoEstwhen :paramref:`prune_auto` is enabled. Defaults to 10.
- property approx_embedding_count: int#
WL-style approximate embedding count.
- Returns:
Last estimated embedding count.
- Return type:
- Raises:
RuntimeError – If
estimate_embeddings_wl()has not been called.
- estimate_embeddings_wl(k: int | None = None) PartialMatcher[source]#
Estimate the number of embeddings using WL-style initial labels.
This is a cheap, approximate upper bound that:
Builds WL-style labels
(degree, node_attrs...)on the host and pattern components.For each (host, component) pair, estimates the number of label-consistent injective mappings ignoring adjacency, via a product of falling factorials per label class.
Aggregates these per-pair estimates over subsets of pattern components using the same semantics as
_match_components().
No calls to
SubgraphSearchEngineor backtracking are performed. The result is stored inapprox_embedding_count.- Parameters:
k (int | None) –
Number of pattern components to use. If
None, behaviour mirrors_match_components():partial=False→ use only full pattern (k=n_cc).partial=True→ aggregate over all k fromn_ccdown to 1.
- Returns:
The estimator itself (for chained use).
- Return type:
- static find_partial_mappings(host: Graph | Sequence[Graph], pattern: Graph, *, node_attrs: List[str], edge_attrs: List[str], k: int | None = None, strategy: Strategy = Strategy.COMPONENT, max_results: int | None = None, partial: bool = True, threshold: int | None = None, pre_filter: bool = False, prune_auto: bool = False, wl_max_iter: int = 10) List[Dict[int, int]][source]#
Stateless convenience wrapper – one-liner for users in a hurry.
This mirrors the OO API but avoids explicitly instantiating the matcher in user code.
- Parameters:
host (nx.Graph | Sequence[nx.Graph]) – A single host graph or a sequence of host graphs.
pattern (nx.Graph) – Pattern graph whose connected components are used as building blocks.
node_attrs (list[str]) – Node attribute keys to enforce equality on during matching.
edge_attrs (list[str]) – Edge attribute keys to enforce equality on during matching.
k (int | None) – If an integer, restricts the search to subsets of exactly
kpattern connected components. IfNone, behaviour follows thepartialflag.strategy (Strategy) – Matching strategy forwarded to
SubgraphSearchEngine.max_results (int | None) – Optional global cap on the number of embeddings to return.
partial (bool) – If
True, all component counts are tried in auto-mode. IfFalse, only the full pattern is used.threshold (int | None) – Optional per-(host, component) embedding cap forwarded to
SubgraphSearchEngine.pre_filter (bool) – Whether to enable the cheap pre-filter in
SubgraphSearchEngine.prune_auto (bool) – If
True, apply WL-1-based approximate automorphism pruning on the final mappings.wl_max_iter (int) – Maximum number of WL iterations for the internal
AutoEstif :paramref:`prune_auto` is enabled.
- Returns:
Flat list of pattern→host node mappings.
- Return type:
list[MappingDict]
- get_mappings() List[Dict[int, int]][source]#
Return the list of discovered embeddings (auto-computed).
- Returns:
List of pattern→host node mappings.
- Return type:
list[MappingDict]
- property help: str#
Return the full module docstring.
- Returns:
Module-level documentation string.
- Return type:
- property num_mappings: int#
Number of embeddings found.
- Returns:
Count of discovered embeddings.
- Return type:
- property num_pattern_components: int#
Number of connected components in the pattern graph.
- Returns:
Number of pattern connected components.
- Return type:
- class synkit.Graph.Matcher.sing.SING(graph: Graph, max_path_length: int = 3, node_att: str | List[str] | None = None, edge_att: str | List[str] | None = 'order')[source]#
Bases:
objectSubgraph search In Non-homogeneous Graphs (SING).
A lightweight Python implementation of the path-based filter-and-refine strategy introduced by Di Natale et al. (SING: Subgraph search In Non-homogeneous Graphs, BMC Bioinformatics, 2010) for subgraph search in large, possibly heterogeneous graphs.
The index is built once over a single data graph and can then be queried with multiple pattern graphs via
search().Notes#
This implementation focuses on the path-feature variant of SING, where features are simple paths (with optional node/edge labels) up to a maximum length.
Multi-graphs are not supported.
If the underlying data graph is modified after construction, call
reindex()to rebuild the feature index.
Example#
A minimal example on an undirected, unlabeled graph:
import networkx as nx from sing import SING # Data graph: 0-1-2-3 G = nx.path_graph(4) # Query: 0-1-2 Q = nx.path_graph(3) index = SING(G, max_path_length=2, node_att=[], edge_att=None) matches = index.search(Q) # matches contains four embeddings: # {0: 0, 1: 1, 2: 2} # {0: 1, 1: 2, 2: 3} # {0: 2, 1: 1, 2: 0} # {0: 3, 1: 2, 2: 1}
- reindex(graph: Graph | None = None) None[source]#
Rebuild the index, optionally replacing the underlying data graph.
- Parameters:
graph (nx.Graph | None, optional) – New data graph. If
None, the existinggraphis re-indexed.
- search(query_graph: Graph, prune: bool = False, dedup_autos: bool = False) List[Dict[Any, Any]] | bool[source]#
Find subgraph isomorphisms from
query_graphinto the data graph.This method performs a path-feature-based filter to obtain candidate vertices, followed by a VF2-style refinement via backtracking with neighbourhood and label consistency checks.
- Parameters:
query_graph (nx.Graph) – Pattern graph to match against
graph.prune (bool, optional) – If
True, stop after finding the first mapping and return a boolean indicating existence of at least one embedding. IfFalse(default), return the full list of mappings.dedup_autos (bool, optional) – If
True, collapse symmetric embeddings that differ only by automorphisms of the query graph, returning one representative per equivalence class. Has no effect whenprune=True.
- Returns:
Either
True/False(whenprune=True) or a list of injective node mappings[{q_node: data_node, ...}, ...].- Return type:
Example#
import networkx as nx from sing import SING G = nx.cycle_graph(4) Q = nx.path_graph(3) index = SING(G, max_path_length=2, node_att=[], edge_att=None) all_mappings = index.search(Q) # all embeddings unique_mappings = index.search(Q, dedup_autos=True) # collapse symmetries
- class synkit.Graph.Matcher.subgraph_matcher.SubgraphMatch[source]#
Bases:
objectBoolean-only checks for graph isomorphism and subgraph (induced or monomorphic) matching.
Provides static methods for NetworkX-based checks and optional GML “rule” backend.
- static is_subgraph(pattern: Graph | str, host: Graph | str, node_label_names: List[str] = ['element', 'charge'], node_label_default: List[Any] = ['*', 0], edge_attribute: str = 'order', use_filter: bool = False, check_type: str = 'induced', backend: str = 'nx') bool[source]#
Unified API for subgraph/isomorphism either via NX or GML backend.
- static rule_subgraph_morphism(rule_1: str, rule_2: str, use_filter: bool = False) bool[source]#
Evaluates if two GML-formatted rule representations are isomorphic or one is a subgraph of the other (monomorphic).
Parameters: - rule_1 (str): GML string of the first rule. - rule_2 (str): GML string of the second rule. - use_filter (bool, optional): Whether to filter by node/edge labels and vertex counts.
Returns: - bool: True if the monomorphism condition is met, False otherwise.
- static subgraph_isomorphism(child_graph: Graph, parent_graph: Graph, node_label_names: List[str] = ['element', 'charge'], node_label_default: List[Any] = ['*', 0], edge_attribute: str = 'order', use_filter: bool = False, check_type: str = 'induced', node_comparator: Callable[[Any, Any], bool] | None = None, edge_comparator: Callable[[Any, Any], bool] | None = None) bool[source]#
Enhanced checks if the child graph is a subgraph isomorphic to the parent graph based on customizable node and edge attributes.
- class synkit.Graph.Matcher.subgraph_matcher.SubgraphSearchEngine[source]#
Bases:
objectStatic helper routines for sub-graph monomorphism search.
- Variables:
DEFAULT_THRESHOLD – default cap on embedding enumeration (5000)
- static find_subgraph_mappings(host: Graph, pattern: Graph, *, node_attrs: List[str], edge_attrs: List[str], strategy: str | Strategy = Strategy.COMPONENT, max_results: int | None = None, strict_cc_count: bool = True, threshold: int | None = None, pre_filter: bool = False) List[Dict[int, int]][source]#
Dispatch to a subgraph-matching strategy with optional guards.
Parameters#
- host, pattern
NetworkX graphs (host ≥ pattern).
- node_attrs, edge_attrs
Keys of attributes to match;
hcountandlone_pairsuse host-greater-or-equal semantics, while the rest are exact.- strategy
Matching strategy code or enum (“all”, “comp”, “bt”).
- max_results
Stop after this many embeddings (None = no limit).
- strict_cc_count
If True, host CC count must ≤ pattern CC count for COMPONENT/BACKTRACK.
- threshold
Override the default cap (DEFAULT_THRESHOLD) on embeddings.
- pre_filter
If True, run a cheap Cartesian-product pre-filter against the threshold.
Returns#
List of dictionaries mapping pattern node→host node. Empty if none or if any guard (pre-filter or enumeration) exceeds the threshold.
- class synkit.Graph.Matcher.turbo_iso.TurboISO(graph: Graph, node_label: str | List[str] = 'label', edge_label: str | List[str] | None = None, distance_threshold: int = 5000)[source]#
Bases:
objectTurboISO with pragmatic speed‑ups for many small queries.
Pre‑indexes the host graph by node‑signature → nodes bucket.
Uses lazy, radius‑bounded BFS instead of a pre‑computed all‑pairs matrix (saving both startup time and memory).
Skips distance consistency if the total candidate pool is already smaller than a configurable threshold (defaults to 5 000).
- class synkit.Graph.Matcher.wl_sel.WLSel(fw: Sequence[Graph], bw: Sequence[Graph], element_key: str | None = 'element', node_attrs: Sequence[str] | None = None, edge_attrs: Sequence[str] | None = None, wl_iters: int = 1, min_score: float = 0.8, node_weight: float = 0.85)[source]#
Bases:
objectWL-based selector for pairing two lists of graphs.
Parameters#
- fwSequence[nx.Graph]
Forward graphs (indices form first element of pairs).
- bwSequence[nx.Graph]
Backward graphs (indices form second element of pairs).
- element_keystr or None
Node attribute name used to detect wildcard nodes. Nodes with
data[element_key] == "*"are removed from the core. If None, no wildcard filtering is applied.- node_attrssequence of str or None
Node attributes used to build base labels. If provided, the base label for a node is
str(tuple(data[k] for k in node_attrs)). If empty and element_key is provided, the element value is used. If both are empty/None, node degree is used as base label.- edge_attrssequence of str or None
Edge attributes used inside WL neighbor signatures. If multiple keys are provided, temporary edge tuples are formed internally.
- wl_itersint
WL refinement iterations (0 disables WL, uses base labels).
- min_scorefloat
Minimum score (0..1) for pairs to be kept by default in scoring.
- node_weightfloat
Weight for node-overlap in final score (size-sim gets 1-node_weight).
Notes#
Use
build_signatures()thenscore_pairs().Results available via
pair_scoresandpair_indices.
- build_signatures() WLSel[source]#
Build WL-based node label multisets, edge multisets and degree multisets. Returns self for fluent usage.
- candidate_pairs(max_pairs: int | None = None) Generator[Tuple[int, int], None, None][source]#
Yield candidate index pairs (i, j). If scoring hasn’t been run, it will be invoked with default settings.
- filter_best_pairs(top_k: int = 1, min_score: float | None = None) WLSel[source]#
Keep only the best top_k pairs (by current ordering) and optionally enforce a minimum primary score. Returns self.
- property pair_scores: List[Tuple[int, int, float, Tuple[Any, ...]]]#
Return scored pairs as (i, j, primary_score, tie_tuple).
- score_pairs(top_k: int | None = None, require_label_exact: bool = False) WLSel[source]#
Score all fw–bw pairs using WL-overlap + size similarity.
Parameters#
- top_kint or None
If provided, keep only top_k pairs after sorting.
- require_label_exactbool
If True, keep only pairs whose WL label multisets are identical.
Returns#
- WLSel
self (pairs stored in .pair_scores and .pair_indices).
MTG#
mcs_matcher.py — Maximum/Common Subgraph Matcher#
A convenience wrapper around networkx.algorithms.isomorphism.GraphMatcher
that finds all common-subgraph (or maximum-common-subgraph) node mappings
between two molecular graphs.
Highlights#
Flexible node matching via
generic_node_match.Scalar edge attribute comparison (e.g.
order).Results are cached – call
get_mappings()to retrieve them.Helpful
help()and__repr__utilities inspired by the MTG API style.
Public API#
MCSMatcher(node_label_names, node_label_defaults, edge_attribute='order', allow_shift=True)Construct a matcher instance.
matcher.find_common_subgraph(G1, G2, mcs=False, mcs_mol=False)Run the search (stores but does not return mappings). If
mcs_mol=True, find mappings by matching entire connected components (largest molecules).matcher.get_mappings()Retrieve the stored mapping list.
matcher.find_rc_mapping(rc1, rc2, mcs=False)Convenience wrapper for ITS‐reaction‑centre objects (via
its_decompose).
Dependencies#
Python 3.9+
NetworkX ≥ 3.0
synkit.Graph.ITS.its_decompose(optional helper)
- class synkit.Graph.MTG.mcs_matcher.MCSMatcher(node_label_names: List[str] | None = None, node_label_defaults: List[Any] | None = None, edge_attribute: str = 'order', allow_shift: bool = True)[source]#
Bases:
objectCommon / maximum‑common subgraph matcher.
Parameters#
- node_label_nameslist[str], optional
Node attribute keys to compare (default
["element"]).- node_label_defaultslist[Any], optional
Fallback values when an attribute is missing (default
["*"]).- edge_attributestr, optional
Edge attribute storing the scalar order (default
"order").- allow_shiftbool, optional
Placeholder for future asymmetric rules (ignored for scalars).
- find_common_subgraph(G1: Graph, G2: Graph, *, mcs: bool = False, mcs_mol: bool = False) None[source]#
Search for subgraph isomorphisms and cache the mappings.
Parameters#
G1 : nx.Graph - pattern graph (searched as a subgraph) G2 : nx.Graph - host graph mcs : bool, optional
If True, keep only mappings of maximum size.
- mcs_molbool, optional
If True, match entire connected components (largest molecules).
- class synkit.Graph.MTG.mtg.MTG(sequences: List[Graph] | List[str], mappings: List[Dict[int, int]] | None = None, *, node_label_names: List[str] | None = None, canonicaliser: GraphCanonicaliser | None = None, mcs_mol: bool = False, mcs: bool = False, its_format: Literal['typesGH', 'tuple'] = 'tuple')[source]#
Bases:
objectFuse a chronological series of ITS graphs into a Mechanistic Transition Graph.
- Parameters:
sequences – A list of ITS-format NetworkX graphs or RSMI strings.
mappings – Optional list of precomputed mappings; computed via MCS if None.
node_label_names – Keys for node-label matching.
canonicaliser – Optional GraphCanonicaliser for snapshot canonicalisation.
its_format – ITS format used when
sequencescontains RSMI strings. Defaults to"tuple"for Lewis State Graph MTGs. Pass"typesGH"to build legacy MTGs from strings.
- Raises:
ValueError – On invalid sequence or mapping lengths.
RuntimeError – On mapping failures.
- get_its_steps(*, directed: bool = False) List[Graph][source]#
Reconstruct the ordered list of per-step ITS graphs from the MTG.
- synkit.Graph.MTG.mtg_explore.find_mtg(g1: Graph, g2: Graph, ground_truth: str, node_label_names: List[str] | None = None) MTG | None[source]#
Attempt to construct a Mapping Transformation Graph (MTG) for two input graphs by finding maximum common substructure mappings and validating against a ground truth.
- Parameters:
g1 (networkx.Graph) – The first input graph to match.
g2 (networkx.Graph) – The second input graph to match.
ground_truth (str) – A string representation of the expected atom-atom mapping (AAM) used to validate candidate mappings.
node_label_names (list of str, optional) – List of node attribute names to use for MCS matching. Defaults to [“element”, “charge”, “hcount”].
- Returns:
An MTG instance if a valid mapping satisfying the ground truth is found; otherwise, None.
- Return type:
MTG or None
- Raises:
ValueError – If input graphs are empty or ground_truth is invalid format.
- Example:
>>> from networkx import Graph >>> g1, g2 = Graph(), Graph() >>> # populate g1 and g2 with nodes/edges >>> mtg = find_mtg( ... g1, ... g2, ... ground_truth="{0:1, 1:0}", ... node_label_names=["element", "charge", "hcount"] ... ) >>> if mtg: ... print(mtg)
- synkit.Graph.MTG.utils.compute_standard_order(G: Graph, inplace: bool = False) Graph[source]#
Compute and assign the ‘standard_order’ attribute for each edge in the graph. ‘standard_order’ is defined as the difference order[0] - order[1] for edges whose ‘order’ attribute is a 2-tuple of numeric values.
- Parameters:
G (nx.Graph, nx.DiGraph, nx.MultiGraph, or nx.MultiDiGraph) – Input NetworkX graph
inplace (bool) – If True, modify G in-place; otherwise operate on a copy
- Returns:
Graph with ‘standard_order’ attributes set
- Return type:
same type as G
- Raises:
TypeError – If G is not a NetworkX graph
- Example:
>>> import networkx as nx >>> G = nx.Graph() >>> G.add_edge(7, 3, order=(1.0, 0)) >>> H = compute_standard_order(G) >>> H.edges[7,3]['standard_order'] 1.0
- synkit.Graph.MTG.utils.extract_order_norm(order_sequence: Sequence[Tuple[float, float] | Tuple[Set[float], Set[float]]]) Tuple[float, float] | None[source]#
Given a sequence of order tuples and/or placeholders (MissingOrder), return the normalized bond order as a 2-tuple:
left: the first tuple element ‘a’ in the sequence where not both parts are sets
right: the second tuple element ‘b’ in the sequence where not both parts are sets, scanning from the end
The input sequence must have length >= 2.
- Parameters:
order_sequence (Sequence[tuple[float, float]] or Sequence[MissingOrder]) – A sequence of order tuples or placeholders
- Returns:
A 2-tuple (left, right) if found; otherwise None
- Return type:
- Raises:
ValueError – If sequence length is less than 2
- Example:
>>> seq = [({1}, {2}), (3.0, 4.0), ({5}, {6}), (7.0, 8.0)] >>> extract_order_norm(seq) (3.0, 8.0)
- synkit.Graph.MTG.utils.label_mtg_edges(G: Graph, inplace: bool = False) Graph[source]#
Label each edge in the MTG graph with a boolean ‘is_mtg’ attribute based on two criteria: 1. There are at least two steps where the standard order (order[0] - order[1]) is non-zero. 2. The sum of all non-None standard orders is zero.
- Parameters:
G (nx.Graph or nx.DiGraph) – Input MTG graph with ‘order’ history per edge
inplace (bool) – If True, modify G in place; otherwise work on a copy
- Returns:
Graph with ‘is_mtg’ boolean attribute on each edge
- Return type:
same type as G
- Raises:
TypeError – If G is not a NetworkX Graph
- Example:
>>> import networkx as nx >>> G = nx.Graph() >>> # Single change only -> less than 2 non-zero steps => False >>> G.add_edge(7,3, order=((1.0,1.0),(1.0,0))) >>> H = label_mtg_edges(G) >>> H.edges[7,3]['is_mtg'] False >>> # Two-step equal but opposite changes -> sum zero and count>=2 => True >>> G = nx.Graph() >>> G.add_edge(2,1, order=((1.0,2.0),(2.0,1.0))) >>> H = label_mtg_edges(G) >>> H.edges[2,1]['is_mtg'] True
- synkit.Graph.MTG.utils.normalize_hcount_and_typesGH(G: Graph | DiGraph | MultiGraph | MultiDiGraph) Graph | DiGraph | MultiGraph | MultiDiGraph[source]#
- Return a fresh copy of G where:
each node’s hcount attribute is set to 0
- each node’s typesGH is processed as follows:
Flatten one level so that nested tuples-of-tuples are expanded.
Drop any tuple that contains a set anywhere.
From the remaining tuples, keep only the first and last (if more than two).
Zero indices 1 and 2 in each kept tuple (if they exist).
If nothing remains after dropping, result is an empty tuple.
- Parameters:
G (nx.Graph or nx.DiGraph or nx.MultiGraph or nx.MultiDiGraph) – input NetworkX graph
- Returns:
a new graph with normalized hcount and typesGH
- Return type:
same type as G
- Raises:
TypeError if G is not a supported NetworkX graph or if typesGH is malformed.
- synkit.Graph.MTG.utils.normalize_order(G: Graph) Graph[source]#
Return a copy of the graph with each edge’s ‘order’ attribute normalized. If an edge has an ‘order’ attribute that is a sequence of length >= 2, it is replaced by the 2-tuple returned by
extract_order_norm(), if that function returns a non-None result.- Parameters:
G (nx.Graph, nx.DiGraph, nx.MultiGraph, or nx.MultiDiGraph) – Input NetworkX graph
- Returns:
A new graph of the same type with normalized edge ‘order’
- Return type:
same as G
- Raises:
TypeError – If G is not a NetworkX graph
- Example:
>>> import networkx as nx >>> G = nx.Graph() >>> G.add_edge(1, 2, order=[(1,2), ({3},{4}), (5,6)]) >>> H = normalize_order(G) >>> H.edges[1,2]['order'] (1, 6)
Wildcard#
- synkit.Graph.Wildcard.fuse_graph.find_wc_graph_isomorphism(G1: Graph | DiGraph | MultiGraph | MultiDiGraph, G2: Graph | DiGraph | MultiGraph | MultiDiGraph, node_match: Callable[[Dict[str, Any], Dict[str, Any]], bool] | None = None, edge_match: Callable[[Dict[str, Any], Dict[str, Any]], bool] | None = None, logger: Logger | None = None) Dict[Any, Any] | None[source]#
Wildcard‑aware sub‑graph isomorphism. Returns a mapping from every node in the smaller graph to a node in the larger graph, allowing any node whose
element == "*"to match any concrete node (or group of nodes) on the host side.- Parameters:
G1 (nx.Graph | nx.DiGraph | nx.MultiGraph | nx.MultiDiGraph) – First input graph.
G2 (nx.Graph | nx.DiGraph | nx.MultiGraph | nx.MultiDiGraph) – Second input graph.
node_match (Callable[[dict, dict], bool] | None) – Optional node‑predicate; default treats “*” as a joker.
edge_match (Callable[[dict, dict], bool] | None) – Optional edge‑predicate; default ignores edge data.
logger (logging.Logger | None) – Optional logger for diagnostics.
- Returns:
Mapping pattern‑node → host‑node if a wildcard isomorphism exists; otherwise
None.- Return type:
dict[Any, Any] | None
- synkit.Graph.Wildcard.fuse_graph.fuse_wc_graphs(G1: Graph | DiGraph | MultiGraph | MultiDiGraph, G2: Graph | DiGraph | MultiGraph | MultiDiGraph, mapping: Dict[Any, Any], wildcard: str = '*', logger: Logger | None = None) Graph | DiGraph | MultiGraph | MultiDiGraph[source]#
Fuse a wildcard‑pattern graph G1 into the concrete host G2.
The result lives entirely in G2’s node‑ID space and contains:
every host‑node
mapping[p]for a non‑wildcard node p in G1 (and its attributes are overridden with those from G1, so no “*” leaks back in);every host‑node
mapping[w]for a wildcard node w in G1 plus all one‑hop neighbours of that host that were not already used by a non‑wildcard;all edges present in G2 among the nodes kept above.
Parameters#
- G1, G2GraphType
G1 may contain nodes whose
elementis the wildcard marker (default"*", change via wildcard). G2 is the concrete graph we will graft from.- mappingDict[Any, Any]
The full node–node map returned by find_wc_graph_isomorphism (must include every node of G1).
- wildcardstr, default “*”
The value of the
"element"attribute that marks a wildcard node.- loggerlogging.Logger or None
Optional logger for debug output.
Returns#
- GraphType
A new graph of the same class as G2 containing the fused structure.
- class synkit.Graph.Wildcard.graph_wc.GraphCollectionSelector(graphs: Iterable[Graph])[source]#
Bases:
objectChainable selector for a collection of NetworkX graphs.
The selector never mutates input graphs. Filtering methods are chainable (return
self) and the final selection is available via the :pyattr:`filtered` property orto_list().Example#
>>> selector = GraphCollectionSelector(graphs) >>> selector.select_with_wc().select_by_node_attr("charge", 0).to_list()
- param graphs:
Iterable of NetworkX graphs (will be copied into an internal list; graphs themselves are not copied).
- describe() str[source]#
Human-friendly one-line description of current selector state.
- Returns:
description string
- reset() GraphCollectionSelector[source]#
Reset the selection to the original input list.
- Returns:
self (chainable)
- select_by_node_attr(key: str = 'element', value: Any = '*', include: bool = True, match_any: bool = True) GraphCollectionSelector[source]#
Keep graphs based on node attribute equality.
By default this keeps graphs that contain at least one node such that
node[key] == value(i.e.,match_any=True).- Parameters:
key – Node attribute key to inspect (default: “element”).
value – Value to compare equality against (default: “*”).
include – If True, keep graphs that match the criterion. If False, keep graphs that do NOT match the criterion.
match_any – If True, criterion is satisfied if any node matches (default). If False, criterion requires all nodes to match (rarely used).
- Returns:
self (chainable)
- select_by_node_attr_in(key: str = 'element', values: Iterable[Any] = ('*',), include: bool = True, match_any: bool = True) GraphCollectionSelector[source]#
Keep graphs that contain a node whose
node[key]is invalues(or, whenmatch_any=False, require all nodes to be invalues).- Parameters:
key – Node attribute key (default: “element”).
values – Iterable of allowed values (default: (“*”,)).
include – If True, keep graphs that match; if False, keep graphs that do not match.
match_any – If True, criterion is satisfied if any node belongs to
values. If False, all nodes must belong tovalues.
- Returns:
self (chainable)
- select_by_node_pred(node_pred: Callable[[dict], bool], require_all_nodes: bool = False, include: bool = True) GraphCollectionSelector[source]#
Select graphs according to a predicate applied to node attribute dicts.
- Parameters:
node_pred – Callable receiving a node attribute dict and returning a boolean.
require_all_nodes – If True, require all nodes in a graph to satisfy
node_pred. If False, require any node to satisfy it.include – If True, keep graphs that match; if False, drop them.
- Returns:
self (chainable)
- select_by_pred(predicate: Callable[[Graph], bool]) GraphCollectionSelector[source]#
Keep graphs for which
predicate(graph)is True.- Parameters:
predicate – Callable that receives a graph and returns True to keep it.
- Returns:
self (chainable)
- select_wc(*, element_key: str = 'element', wildcard: str = '*', select_with_wc: bool = True) GraphCollectionSelector[source]#
Convenience wrapper to select graphs with or without wildcard nodes.
- Parameters:
element_key – Node attribute key storing elements (default: “element”).
wildcard – Wildcard value to search for (default: “*”).
select_with_wc – If True, keep graphs that contain at least one node with
node[element_key] == wildcard. If False, keep graphs that do NOT contain any such node.
- Returns:
self (chainable)
- select_with_wc(element_key: str = 'element', wildcard: str = '*') GraphCollectionSelector[source]#
Shorthand for selecting graphs that contain at least one wildcard node.
- Returns:
self (chainable)
- select_without_wc(element_key: str = 'element', wildcard: str = '*') GraphCollectionSelector[source]#
Shorthand for selecting graphs that contain no wildcard nodes.
- Returns:
self (chainable)
- stats() dict[source]#
Compute and return a small summary of the current selection.
- The result includes:
original_count: number of input graphs
selected_count: number of graphs after filtering
unique_node_attr_values: a mapping of attribute keys seen across selected graphs to the set of observed values (limited to attributes present on at least one node).
- Returns:
dictionary summary
- to_list() List[Graph][source]#
Alias for :pyattr:`filtered`.
- class synkit.Graph.Wildcard.its_merge.ITSMerge(G1: Graph | DiGraph | MultiGraph | MultiDiGraph, G2: Graph | DiGraph | MultiGraph | MultiDiGraph, mapping: Dict[Any, Any], *, types_key: str = 'typesGH', element_key: str = 'element', wildcard_element: str = '*', remove_wildcards: bool = True, logger: Logger | None = None)[source]#
Bases:
objectMerge two ITS graphs given a node mapping between them.
This class encapsulates the logic of fusing two ITS graphs (e.g. from wildcard pattern matching) in an object-oriented way. The result is a fused graph that:
starts as a copy of the host graph,
merges ITS typing (
typesGH) on mapped node pairs,adds leftover (non-mapped, non-wildcard) pattern nodes and edges, and
optionally removes all wildcard nodes in the final fused graph.
Orientation#
The graph whose nodes appear as values of the mapping is treated as the host; the other graph is the pattern. If the mapping is given in the opposite direction (host → pattern), the class detects this and automatically inverts the mapping.
ITS merging semantics#
For mapped node pairs (p → h), the
typesGHattribute is merged: the hydrogen count entries (index 2 in each inner tuple) are set to the maximum of host vs pattern.- Leftover pattern nodes:
If
element == wildcard_element, they are ignored.Otherwise they are added as new nodes with new IDs and edges are created according to the pattern topology.
If :paramref:`remove_wildcards` is
True(default), all wildcard nodes (element == wildcard_element) are removed from the fused graph; their incident edges disappear. IfFalse, wildcard nodes are kept.
Examples#
Simple usage with integer-labeled graphs:
import networkx as nx from synkit.Graph.ITS.its_merge import ITSMerge G1 = nx.Graph() G2 = nx.Graph() # Example: two ITS graphs with 'typesGH' and 'element' attributes G1.add_node(1, element="C", typesGH=(("C", False, 2, 0, ["O"]), ("C", False, 2, 0, ["O"]))) G2.add_node(10, element="C", typesGH=(("C", False, 1, 0, ["O"]), ("C", False, 1, 0, ["O"]))) mapping = {1: 10} # pattern node → host node merger = ITSMerge(G1, G2, mapping, remove_wildcards=True).merge() F = merger.fused_graph print("Fused nodes:", F.nodes(data=True))
- param G1:
First input ITS graph.
- type G1:
GraphType
- param G2:
Second input ITS graph.
- type G2:
GraphType
- param mapping:
Node mapping between the graphs. Must be a bijection either from pattern → host or host → pattern; the class detects orientation automatically.
- type mapping:
dict[Any, Any]
- param types_key:
Node attribute key holding the ITS typing tuple, e.g.
(('C', False, 3, 0, ['O']), ('C', False, 3, 0, ['O'])).- type types_key:
str
- param element_key:
Node attribute key for element / atom type.
- type element_key:
str
- param wildcard_element:
Value of :paramref:`element_key` that denotes wildcard nodes.
- type wildcard_element:
str
- param remove_wildcards:
If
True, remove wildcard nodes in the final fused graph. IfFalse, wildcard nodes are kept.- type remove_wildcards:
bool
- param logger:
Optional logger for debug output.
- type logger:
logging.Logger | None
- property fused_graph: Graph | DiGraph | MultiGraph | MultiDiGraph#
Fused ITS graph.
The graph is in the host’s node ID space, plus any extra IDs for leftover pattern nodes. Wildcard nodes may have been removed, depending on :paramref:`remove_wildcards`.
- Returns:
Fused ITS graph.
- Return type:
GraphType
- Raises:
RuntimeError – If
merge()has not been called yet.
- property host_graph: Graph | DiGraph | MultiGraph | MultiDiGraph#
Graph that was treated as the host for merging.
- Returns:
Host graph.
- Return type:
GraphType
- merge() ITSMerge[source]#
Execute the ITS fusion process.
The method:
Starts from a copy of the host graph.
Merges
typesGHattributes on mapped node pairs.Adds leftover non-wildcard pattern nodes.
Adds pattern edges between mapped/added nodes.
Optionally removes wildcard nodes from the fused graph.
- Returns:
Self, with :pyattr:`fused_graph` updated.
- Return type:
- property pattern_graph: Graph | DiGraph | MultiGraph | MultiDiGraph#
Graph that was treated as the pattern for merging.
- Returns:
Pattern graph.
- Return type:
GraphType
- synkit.Graph.Wildcard.its_merge.fuse_its_graphs(G1: Graph | DiGraph | MultiGraph | MultiDiGraph, G2: Graph | DiGraph | MultiGraph | MultiDiGraph, mapping: Dict[Any, Any], *, types_key: str = 'typesGH', element_key: str = 'element', wildcard_element: str = '*', remove_wildcards: bool = True, logger: Logger | None = None) Graph | DiGraph | MultiGraph | MultiDiGraph[source]#
Functional wrapper around
ITSMerge.- Parameters:
G1 (GraphType) – First input ITS graph.
G2 (GraphType) – Second input ITS graph.
mapping (dict[Any, Any]) – Node mapping between the graphs.
types_key (str) – Node attribute key holding ITS typing information.
element_key (str) – Node attribute key for element / atom type.
wildcard_element (str) – Value of :paramref:`element_key` that denotes wildcard nodes.
remove_wildcards (bool) – If
True, remove wildcard nodes from the fused graph. IfFalse, keep them.logger (logging.Logger | None) – Optional logger for debug output.
- Returns:
Fused ITS graph.
- Return type:
GraphType
- class synkit.Graph.Wildcard.radwc.RadWC[source]#
Bases:
objectStatic utility for appending wildcard dummy atoms ([*]) with atom-map indices to all radical centers in the product block of a reaction SMILES.
Reactant and agent blocks are not modified.
Only atoms in the product with unpaired electrons are considered.
Each product radical gets a new [*:N] with unique map number (auto or user-supplied).
Example#
>>> rxn = '[CH2:1][OH:2]>>[CH2:1][O:2]' >>> RadWC.transform(rxn) '[CH2:1][OH:2]>>[CH2:1][O:2]' >>> rxn2 = '[CH2:1][OH:2]>>[CH:1].[OH:2]' >>> RadWC.transform(rxn2) '[CH2:1][OH:2]>>[CH:1]([*:3]).[OH:2]'
- static transform(rxn_smiles: str, start_map: int | None = None) str[source]#
Add [*] wildcards (with atom-map index) to every radical in the product block of the input reaction SMILES.
- Parameters:
- Returns:
Modified reaction SMILES with product wildcards.
- Return type:
- Raises:
ValueError – On parse error or invalid input.
Example#
>>> RadWC.transform('[CH2:1][OH:2]>>[CH:1].[OH:2]') '[CH2:1][OH:2]>>[CH:1]([*:3]).[OH:2]'
- class synkit.Graph.Wildcard.wc_matcher.WCMatcher(G1: Graph | DiGraph | MultiGraph | MultiDiGraph, G2: Graph | DiGraph | MultiGraph | MultiDiGraph, *, wildcard_element: str = '*', element_key: str = 'element', node_attrs: Sequence[str] | None = None, edge_attrs: Sequence[str] | None = None, node_match: Callable[[Dict[str, Any], Dict[str, Any]], bool] | None = None, edge_match: Callable[[Dict[str, Any], Dict[str, Any]], bool] | None = None, logger: Logger | None = None)[source]#
Bases:
objectWildcard-aware pattern→host matcher with subgraph regions for wildcard nodes.
Semantics#
The graph containing nodes with
attrs[element_key] == wildcard_elementis treated as the pattern.Wildcard nodes (
element == "*"by default) are removed from the pattern when computing the core isomorphism.Only the core pattern nodes (non-wildcards) are matched against the host using
networkx.algorithms.isomorphism.GraphMatcher.Wildcard nodes are treated as “don’t care” substituents attached to core atoms. Their exact number and structure are ignored – we only ensure that the core scaffold is present in the host.
After a core mapping is found, each wildcard node is associated with a host subgraph region, obtained as neighbours of its mapped anchor nodes in the host.
Attribute matching#
Node attributes (for keys listed in :pydata:`node_attrs`):
strandbool: must match exactly.intandfloat: pattern value ≤ host value (lower-bound semantics).- Special handling for a
"neighbors"attribute if present: Pattern neighbour list may contain wildcard entries (e.g.
"*"for any element).Concrete neighbour labels in the pattern act as lower bounds on host counts (multiset inclusion).
Wildcard neighbour labels impose no constraint.
- Special handling for a
Edge attributes (for keys listed in :pydata:`edge_attrs`):
Same typed semantics as node attributes (str/bool exact, numeric ≤).
Typical usage#
import networkx as nx from synkit.Graph.Matcher.wc_matcher import WCMatcher # Host: C-C-C chain host = nx.path_graph(3) nx.set_node_attributes(host, "C", "element") # Pattern: C-C-* (wildcard substituent on the second carbon) pattern = nx.path_graph(3) nx.set_node_attributes(pattern, "C", "element") pattern.nodes[2]["element"] = "*" # wildcard node matcher = WCMatcher( pattern, host, wildcard_element="*", element_key="element", node_attrs=["element"], edge_attrs=[], ).fit() if matcher.is_match: print("Core mapping:", matcher.core_mapping_without_wildcard_regions) print("Wildcard regions:", matcher.wildcard_subgraph_mapping)
- param G1:
First input graph.
- type G1:
GraphType
- param G2:
Second input graph.
- type G2:
GraphType
- param wildcard_element:
Node attribute value that denotes a wildcard in the pattern. Defaults to
"*"- type wildcard_element:
str
- param element_key:
Name of the node attribute storing the element / atom type. Defaults to
"element".- type element_key:
str
- param node_attrs:
Node attribute keys to be checked in addition to :pydata:`element_key`. If
"neighbors"is included, neighbour lists are compared with lower-bound / wildcard semantics.- type node_attrs:
Sequence[str] | None
- param edge_attrs:
Edge attribute keys to be checked with typed semantics.
- type edge_attrs:
Sequence[str] | None
- param node_match:
Optional additional node predicate, combined with the default matcher. Signature:
(host_attr: dict, pattern_attr: dict) -> bool.- type node_match:
Callable[[Dict[str, Any], Dict[str, Any]], bool] | None
- param edge_match:
Optional additional edge predicate, combined with the default matcher. Signature:
(host_attr: dict, pattern_attr: dict) -> bool.- type edge_match:
Callable[[Dict[str, Any], Dict[str, Any]], bool] | None
- param logger:
Optional logger for diagnostics.
- type logger:
logging.Logger | None
- property core_mapping_without_wildcard_regions: Dict[Any, Any]#
Mapping from non-wildcard pattern nodes → host nodes.
Wildcard nodes in the pattern are ignored in this mapping. This is the clean “core” mapping without any enlargement due to wildcard regions.
- Returns:
Mapping from pattern-core nodes to host nodes.
- Return type:
dict[Any, Any]
- fit() WCMatcher[source]#
Run the wildcard-aware core isomorphism search.
The method computes the core subgraph isomorphism (ignoring wildcard nodes) and stores the mapping internally. Use :pyattr:`is_match` and :pyattr:`core_mapping_without_wildcard_regions` to inspect the result.
- Returns:
Self, to allow fluent chaining.
- Return type:
- property help: str#
Human-readable summary of the matcher behaviour.
- Returns:
Description string summarising semantics and attributes.
- Return type:
- property host_graph: Graph | DiGraph | MultiGraph | MultiDiGraph#
Graph that was treated as the host.
- Returns:
Host graph.
- Return type:
GraphType
- property is_match: bool#
Whether a wildcard-compatible core mapping was found.
- Returns:
Trueif the core pattern matches a subgraph of the host.- Return type:
- property pattern_graph: Graph | DiGraph | MultiGraph | MultiDiGraph#
Graph that was treated as the pattern (may contain wildcards).
- Returns:
Pattern graph.
- Return type:
GraphType
- property wildcard_subgraph_mapping: Dict[Any, Set[Any]]#
Mapping from each wildcard pattern node to a set of host nodes forming its wildcard subgraph region.
Construction heuristic#
For each wildcard node
win the pattern:Collect its non-wildcard neighbour(s) in the pattern.
Map those neighbours to host anchors using the core mapping.
For each anchor
hin the host, add all neighbours ofhthat are not already used by any core mapping.
Notes#
If there is no core mapping (
is_match == False), the result is an empty dict.If a wildcard’s anchors cannot be mapped (e.g. missing in core), its region is an empty set.
- returns:
Mapping
wildcard_pattern_node → set(host_nodes_in_region).- rtype:
dict[Any, set[Any]]
- class synkit.Graph.Wildcard.wildcard.WildCard[source]#
Bases:
objectStatic utility class for generating reaction SMILES with wildcards by augmenting the product graph with subgraphs unique to the reactant and patching lost external connections with wildcard atoms (‘*’).
Optionally, can rebalance the reactant side to ensure both sides have matching atom maps (by adding wildcard atoms if needed).
All methods are static and do not store any internal state.
Example#
>>> WildCard.rsmi_with_wildcards('CCO>>CC') 'CCO>>CC*'
>>> WildCard.rsmi_with_wildcards('CCO>>CC', rebalance=True) 'CCO*>>CC*'
- static add_unique_subgraph_with_wildcards(G: Graph, H: Graph, attributes_defaults: Dict[str, Any] | None = None, rebalance: bool = False) Tuple[Graph, Graph][source]#
Add the subgraph unique to G as a disconnected union to H, and patch lost external connections with plain wildcard bonds. Optionally, rebalance the reactant side to ensure both sides have matching atom maps by adding wildcards.
- Parameters:
- Returns:
Tuple (new_G, new_H) with both graphs possibly augmented by wildcards
- Return type:
Tuple[nx.Graph, nx.Graph]
- Raises:
ValueError – If G or H are not valid graphs.
Example#
>>> r, p = WildCard.from_rsmi('CCO>>CC') >>> r2, p2 = WildCard.add_unique_subgraph_with_wildcards(r, p, rebalance=True)
- static from_rsmi(rsmi: str) Tuple[Graph, Graph][source]#
Convert a reaction SMILES string into reactant and product graphs.
- Parameters:
rsmi (str) – Reaction SMILES string
- Returns:
Tuple (reactant_graph, product_graph)
- Return type:
Tuple[nx.Graph, nx.Graph]
- Raises:
ValueError – If input cannot be parsed.
- static rsmi_with_wildcards(rsmi: str, attributes_defaults: Dict[str, Any] | None = None, rebalance: bool = False) str[source]#
Given a reaction SMILES string, returns a new reaction SMILES where the product side contains any disconnected subgraphs unique to the reactant, with lost external bonds patched with wildcard atoms. Optionally, also adds wildcards to the reactant side to ensure matching atom maps (rebalance).
- Parameters:
- Returns:
Augmented reaction SMILES string
- Return type:
- Raises:
ValueError – If parsing or output generation fails.
Example#
>>> WildCard.rsmi_with_wildcards('CCO>>CC') 'CCO>>CC*' >>> WildCard.rsmi_with_wildcards('CCO>>CC', rebalance=True) 'CCO*>>CC*'