IO#
Input/output helpers and format conversion utilities.
Core conversion#
- synkit.IO.chem_converter.detect_its_format(graph: Graph) Literal['typesGH', 'tuple'][source]#
Detect the ITS storage representation used by a graph.
Legacy ITS graphs keep scalar node attributes and store side-specific values only in
typesGH. Tuple ITS graphs store direct paired node and edge attributes such aselement=("C", "C")orsigma_order=(1.0, 1.0).- Parameters:
graph (nx.Graph) – ITS-like graph to inspect.
- Returns:
Detected ITS format.
- Return type:
ITSFormat
- synkit.IO.chem_converter.dfs_to_smiles(dfs: str, keep_map: bool = True) str[source]#
Convert DFS-style annotated SMILES to normal SMILES form.
Rules: - Replace
[]with[*]. - Convert bracketed tokens followed by digits, such as[H]12,into atom-mapped tokens
[H:12]whenkeep_map=True.If
keep_map=False, remove trailing digits instead.Tokens already containing
:inside brackets are preserved.
- synkit.IO.chem_converter.gml_to_its(gml: str) Graph[source]#
Convert a GML reaction rule back into an ITS graph.
- Parameters:
gml (str) – GML string.
- Returns:
ITS graph.
- Return type:
nx.Graph
- synkit.IO.chem_converter.gml_to_smart(gml: str, sanitize: bool = True, explicit_hydrogen: bool = False, useSmiles: bool = True) str[source]#
Convert a GML reaction rule back to reaction SMILES or SMARTS.
- Parameters:
- Returns:
Reaction SMILES or SMARTS string.
- Return type:
- Raises:
ValueError – If conversion fails.
- synkit.IO.chem_converter.graph_to_rsmi(r: Graph, p: Graph, its: Graph | None = None, sanitize: bool = True, explicit_hydrogen: bool = False) str | None[source]#
Convert reactant and product graphs into a reaction SMILES string.
- Parameters:
r (networkx.Graph) – Graph representing the reactants.
p (networkx.Graph) – Graph representing the products.
its (networkx.Graph or None) – Imaginary transition state graph. If None, it will be constructed.
sanitize (bool) – Whether to sanitize molecules during conversion.
explicit_hydrogen (bool) – Whether to preserve explicit hydrogens in the SMILES.
- Returns:
Reaction SMILES string in ‘reactants>>products’ format or None on failure.
- Return type:
str or None
- synkit.IO.chem_converter.graph_to_smi(graph: Graph, sanitize: bool = True, preserve_atom_maps: Sequence[int] | None = None) str | None[source]#
Convert a molecular graph to a SMILES string.
- synkit.IO.chem_converter.its_to_gml(its: Graph, core: bool = True, rule_name: str = 'rule', reindex: bool = True, explicit_hydrogen: bool = False, format: Literal['typesGH', 'tuple'] = 'typesGH') str[source]#
Convert an ITS graph to GML format.
- Parameters:
- Returns:
GML representation of the reaction.
- Return type:
- synkit.IO.chem_converter.its_to_rsmi(its: Graph, sanitize: bool = True, explicit_hydrogen: bool = False, clean_wildcards: bool = False, format: Literal['typesGH', 'tuple'] = 'typesGH') str[source]#
Convert an ITS graph into a reaction SMILES (rSMI) string.
This function decomposes or reverts the ITS graph into reactant and product graphs depending on the selected ITS format, then serializes them into a reaction SMILES string.
- Parameters:
its (nx.Graph) – ITS graph to convert back into reaction SMILES.
sanitize (bool) – If
True, sanitize graphs before SMILES generation.explicit_hydrogen (bool) – If
True, include explicit hydrogens in the generated SMILES.clean_wildcards (bool) – If
True, clean wildcard radicals in the generated reaction SMILES.format (ITSFormat) – ITS format. Supported values are
"typesGH"and"tuple".
- Returns:
Reaction SMILES string.
- Return type:
- Raises:
ValueError – If the ITS format is unsupported.
- synkit.IO.chem_converter.normalize_dfs_for_compare(dfs: str) str[source]#
Normalize DFS-style strings for comparison.
- synkit.IO.chem_converter.rsmarts_to_rsmi(rsmarts: str) str[source]#
Convert reaction SMARTS to reaction SMILES.
- Parameters:
rsmarts (str) – Reaction SMARTS input.
- Returns:
Reaction SMILES string.
- Return type:
- Raises:
ValueError – If conversion fails.
- synkit.IO.chem_converter.rsmi_to_graph(rsmi: str, drop_non_aam: bool = True, sanitize: bool = True, use_index_as_atom_map: bool = True, node_attrs: Sequence[str] | None = None, edge_attrs: Sequence[str] | None = None) tuple[Graph | None, Graph | None][source]#
Convert a reaction SMILES into reactant and product graphs.
- Parameters:
rsmi (str) – Reaction SMILES string in
reactants>>productsformat.drop_non_aam (bool) – If
True, drop atoms lacking atom maps.sanitize (bool) – If
True, sanitize molecules during conversion.use_index_as_atom_map (bool) – If
True, overwrite atom-map labels using atom indices.node_attrs (Optional[Sequence[str]]) – Node attributes to export into the graphs.
edge_attrs (Optional[Sequence[str]]) – Edge attributes to export into the graphs.
- Returns:
Tuple of reactant and product graphs.
- Return type:
tuple[Optional[nx.Graph], Optional[nx.Graph]]
- synkit.IO.chem_converter.rsmi_to_its(rsmi: str, drop_non_aam: bool = True, sanitize: bool = True, use_index_as_atom_map: bool = True, core: bool = False, node_attrs: Sequence[str] | None = None, edge_attrs: Sequence[str] | None = None, explicit_hydrogen: bool = False, format: Literal['typesGH', 'tuple'] = 'typesGH') Graph[source]#
Convert a reaction SMILES into an ITS graph.
Supported formats:
"typesGH": legacy ITS representation"tuple": paired-attribute ITS representation
- Parameters:
rsmi (str) – Reaction SMILES string.
drop_non_aam (bool) – If
True, discard fragments lacking atom maps.sanitize (bool) – If
True, sanitize molecules during conversion.use_index_as_atom_map (bool) – If
True, overwrite atom maps using atom indices.core (bool) – If
True, return only the reaction-center graph.node_attrs (Optional[Sequence[str]]) – Node attributes to include in graph construction.
edge_attrs (Optional[Sequence[str]]) – Edge attributes to include in graph construction.
explicit_hydrogen (bool) – If
True, convert implicit hydrogens to explicit nodes for the selected ITS format.format (ITSFormat) – ITS format.
- Returns:
ITS graph or RC graph.
- Return type:
nx.Graph
- Raises:
ValueError – If graph construction fails.
- synkit.IO.chem_converter.rsmi_to_rsmarts(rsmi: str) str[source]#
Convert mapped reaction SMILES to reaction SMARTS.
- Parameters:
rsmi (str) – Reaction SMILES input.
- Returns:
Reaction SMARTS string.
- Return type:
- Raises:
ValueError – If conversion fails.
- synkit.IO.chem_converter.smart_to_gml(smart: str, core: bool = True, sanitize: bool = True, rule_name: str = 'rule', reindex: bool = False, explicit_hydrogen: bool = False, useSmiles: bool = True) str[source]#
Convert a reaction SMARTS or SMILES string into GML.
This function uses the legacy ITS/GML pipeline.
- Parameters:
smart (str) – Reaction SMARTS or SMILES string.
core (bool) – If
True, export only the reaction core.sanitize (bool) – If
True, sanitize molecules during conversion.rule_name (str) – Rule name stored in the GML output.
reindex (bool) – If
True, reindex graph nodes before export.explicit_hydrogen (bool) – If
True, include explicit hydrogens.useSmiles (bool) – If
True, treat input as reaction SMILES. Otherwise, treat it as reaction SMARTS.
- Returns:
GML representation of the reaction rule.
- Return type:
- Raises:
ValueError – If graph construction fails.
- synkit.IO.chem_converter.smiles_to_dfs(smiles: str) str[source]#
Convert SMILES with atom maps into DFS-style notation.
Rules: -
[X:123]becomes[X]123-[*:3]becomes[]3- unmapped tokens remain unchanged - remaining[*]is normalized back to[]
- synkit.IO.chem_converter.smiles_to_graph(smiles: str, drop_non_aam: bool = False, sanitize: bool = True, use_index_as_atom_map: bool = False, node_attrs: Sequence[str] | None = None, edge_attrs: Sequence[str] | None = None) Graph | None[source]#
Convert a SMILES string to a molecular graph.
- Parameters:
smiles (str) – SMILES representation of the molecule.
drop_non_aam (bool) – If
True, drop atoms without atom-map labels.sanitize (bool) – If
True, sanitize the RDKit molecule.use_index_as_atom_map (bool) – If
True, overwrite atom-map labels using atom indices.node_attrs (Optional[Sequence[str]]) – Node attributes to export into the graph.
edge_attrs (Optional[Sequence[str]]) – Edge attributes to export into the graph.
- Returns:
Molecular graph or
Noneon failure.- Return type:
Optional[nx.Graph]
- class synkit.IO.mol_to_graph.MolToGraph(node_attrs: List[str] | None = None, edge_attrs: List[str] | None = None, *, attr_profile: str = 'minimal', with_topology: bool = False)[source]#
Bases:
objectConvert an RDKit molecule into a NetworkX molecular graph.
The converter preserves the public API while adding corrected lone-pair bookkeeping for aromatic heteroatoms, especially pyrrolic /
[nH]-like aromatic nitrogen. RDKit aromatic bonds have order1.5; for aromatic lone-pair donor heteroatoms, this class counts aromatic bonds as sigma bonds during lone-pair estimation.Important node fields are
estimated_lone_pairs,lone_pairsbackward-compatible alias,available_lone_pairs,available_lp,bond_order_sum,lp_bond_order_sum,valence_electrons, andoxidation_state.- Parameters:
- Raises:
ValueError – If
attr_profileis unsupported.
from rdkit import Chem from synkit.IO.mol_to_graph import MolToGraph mol = Chem.MolFromSmiles("c1cc[nH]c1") graph = MolToGraph(attr_profile="minimal").transform(mol) for node, data in graph.nodes(data=True): print(node, data["element"], data["lone_pairs"], data["available_lp"])
mol = Chem.MolFromSmiles("[CH3:1][CH2:2][Br:3]") graph = MolToGraph( node_attrs=["element", "atom_map", "charge", "lone_pairs"], edge_attrs=["order", "kekule_order"], ).transform(mol, use_index_as_atom_map=True)
- PAULING_EN: Dict[str, float] = {'B': 2.04, 'Br': 2.96, 'C': 2.55, 'Cl': 3.16, 'F': 3.98, 'H': 2.2, 'I': 2.66, 'N': 3.04, 'O': 3.44, 'P': 2.19, 'S': 2.58, 'Se': 2.55}#
- SUPPORTED_PROFILES = ('minimal', 'full')#
- static add_partial_charges(mol: Mol) None[source]#
Compute Gasteiger partial charges in-place.
- Parameters:
mol (Chem.Mol) – RDKit molecule to modify.
- Returns:
None.- Return type:
None
- classmethod estimate_available_lone_pairs(atom: Atom) int[source]#
Estimate lone pairs locally available for
LP-/B+donation.- Parameters:
atom (Chem.Atom) – RDKit atom.
- Returns:
Locally available lone-pair count.
- Return type:
- classmethod estimate_lone_pairs(atom: Atom) int[source]#
Estimate total lone-pair count.
- Parameters:
atom (Chem.Atom) – RDKit atom.
- Returns:
Estimated total lone-pair count.
- Return type:
mol = Chem.MolFromSmiles("c1cc[nH]c1") n_atom = next(a for a in mol.GetAtoms() if a.GetSymbol() == "N") print(MolToGraph.estimate_lone_pairs(n_atom))
- classmethod estimate_oxidation_states(mol: Mol, *, kek_mol: Mol | None = None, prefer_kekule: bool = True, en_tie_threshold: float = 0.05) Dict[int, float][source]#
Estimate atom oxidation states.
For each bond, bond electrons are assigned to the more electronegative atom. Formal charge is used as the starting value.
- Parameters:
- Returns:
Oxidation states keyed by RDKit atom index.
- Return type:
- static get_bond_stereochemistry(bond: Bond) str[source]#
Return
E,Z, orNfor double-bond stereochemistry.- Parameters:
bond (Chem.Bond) – RDKit bond.
- Returns:
Simple bond stereochemistry label.
- Return type:
- static get_stereochemistry(atom: Atom) str[source]#
Return
S,R, orNfrom the RDKit chiral tag.- Parameters:
atom (Chem.Atom) – RDKit atom.
- Returns:
Simple atom stereochemistry label.
- Return type:
- property graph: Graph#
Return the graph produced by
transform_store().- Returns:
Stored molecular graph.
- Return type:
nx.Graph
- Raises:
RuntimeError – If no graph has been stored yet.
- static has_atom_mapping(mol: Mol) bool[source]#
Return whether any atom has a non-zero atom-map number.
- Parameters:
mol (Chem.Mol) – RDKit molecule.
- Returns:
Trueif mapped.- Return type:
- classmethod mol_to_graph(mol: Mol, drop_non_aam: bool = False, light_weight: bool = False, use_index_as_atom_map: bool = False) Graph[source]#
Backward-compatible graph converter.
New code should usually prefer
transform().- Parameters:
- Returns:
Molecular graph.
- Return type:
nx.Graph
- Raises:
ValueError – If
drop_non_aam=Truebutuse_index_as_atom_map=False.
mol = Chem.MolFromSmiles("[CH3:1][CH2:2][Br:3]") graph = MolToGraph.mol_to_graph( mol, drop_non_aam=True, light_weight=True, use_index_as_atom_map=True, )
- classmethod oxidation_states_by_atom_map(mol: Mol, *, kek_mol: Mol | None = None, prefer_kekule: bool = True, en_tie_threshold: float = 0.05) Dict[int, Dict[str, Any]][source]#
Return oxidation states keyed by non-zero atom-map number.
- Parameters:
- Returns:
Oxidation-state records keyed by atom-map number.
- Return type:
- static random_atom_mapping(mol: Mol) Mol[source]#
Assign random atom-map numbers from
1tonin-place.- Parameters:
mol (Chem.Mol) – RDKit molecule to mutate.
- Returns:
Same molecule with assigned atom-map numbers.
- Return type:
Chem.Mol
- classmethod reaction_oxidation_state_delta_from_rsmi(rsmi: str, *, threshold: float = 0.5, prefer_kekule: bool = True, en_tie_threshold: float = 0.05) Dict[int, Dict[str, Any]][source]#
Compute oxidation-state changes for mapped reaction SMILES.
Positive
deltameans oxidation; negativedeltameans reduction.- Parameters:
- Returns:
Significant oxidation-state changes keyed by atom map.
- Return type:
- Raises:
ValueError – If
rsmilacks">>".
rsmi = "[CH3:1][OH:2]>>[CH2:1]=[O:2]" print(MolToGraph.reaction_oxidation_state_delta_from_rsmi(rsmi))
- transform(mol: Mol, drop_non_aam: bool = False, use_index_as_atom_map: bool = False) Graph[source]#
Build a NetworkX graph from an RDKit molecule.
- Parameters:
- Returns:
Molecular graph with atom and bond attributes.
- Return type:
nx.Graph
- Raises:
ValueError – If
drop_non_aam=Truebutuse_index_as_atom_map=False.
mol = Chem.MolFromSmiles("[CH3:1][CH2:2][Br:3]") graph = MolToGraph().transform( mol, drop_non_aam=True, use_index_as_atom_map=True, )
- transform_store(mol: Mol, drop_non_aam: bool = False, use_index_as_atom_map: bool = False) MolToGraph[source]#
Build, store, and return
self.- Parameters:
- Returns:
Current converter instance.
- Return type:
- class synkit.IO.graph_to_mol.GraphToMol(node_attributes: Dict[str, str] = {'atom_map': 'atom_map', 'charge': 'charge', 'element': 'element'}, edge_attributes: Dict[str, str] = {'order': 'order'})[source]#
Bases:
objectConverts a NetworkX graph representation of a molecule into an RDKit molecule object.
This class reconstructs RDKit molecules from node and edge attributes in a graph, correctly interpreting atom types, charges, mapping numbers, bond orders, and optionally explicit hydrogen counts.
- Parameters:
node_attributes (Dict[str, str]) – Mapping of expected attribute names to node keys in the graph. For example, {“element”: “element”, “charge”: “charge”, “atom_map”: “atom_map”}.
edge_attributes (Dict[str, str]) – Mapping of expected attribute names to edge keys in the graph. For example, {“order”: “order”}.
- static get_bond_type_from_order(order: float) BondType[source]#
Converts a numerical bond order into the corresponding RDKit BondType.
- Parameters:
order (float) – The numerical bond order (typically 1, 2, or 3).
- Returns:
The corresponding RDKit bond type (single, double, triple, or aromatic).
- Return type:
Chem.BondType
- graph_to_mol(graph: Graph, ignore_bond_order: bool = False, sanitize: bool = True, use_h_count: bool = False) Mol[source]#
Converts a NetworkX graph into an RDKit molecule.
- Parameters:
graph (nx.Graph) – The NetworkX graph representing the molecule.
ignore_bond_order (bool) – If True, all bonds are created as single bonds regardless of edge attributes. Defaults to False.
sanitize (bool) – If True, the resulting RDKit molecule will be sanitized after construction. Defaults to True.
use_h_count (bool) – If True, the ‘hcount’ attribute (if present) will be used to set explicit hydrogen counts on atoms. Defaults to False.
- Returns:
An RDKit molecule constructed from the graph’s nodes and edges.
- Return type:
Chem.Mol
- class synkit.IO.gml_to_nx.GMLToNX(gml_text: str)[source]#
Bases:
objectParses GML-like text and transforms it into three NetworkX graphs representing the left, right, and context graphs of a chemical reaction step.
- class synkit.IO.nx_to_gml.NXToGML[source]#
Bases:
objectConverts NetworkX graph representations of chemical reactions to GML (Graph Modelling Language) strings. Useful for exporting reaction rules in a standard graph format.
This class provides static methods for converting individual graphs, sets of reaction graphs, and managing charge/attribute changes in the export process.
- static transform(graph_rules: Tuple[Graph, Graph, Graph], rule_name: str = 'Test', reindex: bool = False, attributes: List[str] = ['charge'], explicit_hydrogen: bool = False) str[source]#
Processes a triple of reaction graphs to generate a GML string rule, with options for node reindexing and explicit hydrogen expansion.
- Parameters:
graph_rules (tuple[nx.Graph, nx.Graph, nx.Graph]) – Tuple containing (L, R, K) reaction graphs.
rule_name (str) – The rule name to use in the output.
reindex (bool) – Whether to reindex node IDs based on the L graph sequence.
attributes (list[str]) – List of attribute names to check for node changes.
explicit_hydrogen (bool) – Whether to explicitly include hydrogen atoms in the output.
- Returns:
The GML string representing the chemical rule.
- Return type:
Data and debug#
- synkit.IO.data_io.collect_data(num_batches: int, temp_dir: str, file_template: str) List[Any][source]#
Collects and aggregates data from multiple pickle files into a single list.
- Parameters:
- Returns:
A list of aggregated data items from all batch files.
- Return type:
- synkit.IO.data_io.load_compressed(filename: str) ndarray[source]#
Loads a NumPy array from a compressed .npz file.
- synkit.IO.data_io.load_database(pathname: str = './Data/database.json') List[Dict][source]#
Load a database (a list of dictionaries) from a JSON file.
- Parameters:
pathname (str) – The path from where the database will be loaded. Defaults to ‘./Data/database.json’.
- Returns:
The loaded database.
- Return type:
- Raises:
ValueError – If there is an error reading the file.
- synkit.IO.data_io.load_dg(path: str, graph_db: list, rule_db: list)[source]#
Load a DG instance from a dumped file.
- Parameters:
- Returns:
The loaded derivation graph instance.
- Return type:
DG
- Raises:
Exception – If loading fails.
- synkit.IO.data_io.load_dict_from_json(file_path: str) dict | None[source]#
Load a dictionary from a JSON file.
- synkit.IO.data_io.load_from_pickle_generator(file_path: str) Generator[Any, None, None][source]#
A generator that yields items from a pickle file where each pickle load returns a list of dictionaries.
- Parameters:
file_path (str) – The path to the pickle file to load.
- Yields:
A single item from the list of dictionaries stored in the pickle file.
- Return type:
Any
- synkit.IO.data_io.load_gml_as_text(gml_file_path: str) str | None[source]#
Load the contents of a GML file as a text string.
- synkit.IO.data_io.load_list_from_file(file_path: str) list[source]#
Load a list from a JSON-formatted file.
- synkit.IO.data_io.load_model(filename: str) Any[source]#
Load a machine learning model from a file using joblib.
- synkit.IO.data_io.save_compressed(array: ndarray, filename: str) None[source]#
Saves a NumPy array in a compressed format using .npz extension.
- Parameters:
array (numpy.ndarray) – The NumPy array to be saved.
filename (str) – The file path or name to save the array to, with a ‘.npz’ extension.
- synkit.IO.data_io.save_database(database: List[Dict], pathname: str = './Data/database.json') None[source]#
Save a database (a list of dictionaries) to a JSON file.
- Parameters:
- Raises:
TypeError – If the database is not a list of dictionaries.
ValueError – If there is an error writing the file.
- synkit.IO.data_io.save_dg(dg, path: str) str[source]#
Save a DG instance to disk using MØD’s dump method.
- synkit.IO.data_io.save_dict_to_json(data: dict, file_path: str) None[source]#
Save a dictionary to a JSON file.
- synkit.IO.data_io.save_list_to_file(data_list: list, file_path: str) None[source]#
Save a list to a file in JSON format.
- synkit.IO.data_io.save_model(model: Any, filename: str) None[source]#
Save a machine learning model to a file using joblib.
- synkit.IO.data_io.save_text_as_gml(gml_text: str, file_path: str) bool[source]#
Save a GML text string to a file.
- synkit.IO.data_io.save_to_pickle(data: List[Dict[str, Any]], filename: str) None[source]#
Save a list of dictionaries to a pickle file.
- synkit.IO.debug.configure_warnings_and_logs(ignore_warnings: bool = False, disable_rdkit_logs: bool = False) None[source]#
Configures Python warnings and RDKit log behavior based on input flags.
- Parameters:
- Returns:
None :usage: Use this function to control verbosity (e.g. in production or testing), but use with caution during development to avoid missing critical issues.
- synkit.IO.debug.setup_logging(log_level: str = 'INFO', log_filename: str = None, task_type: str = None) Logger[source]#
Configures logging to either the console or a file, based on provided parameters.
- Parameters:
log_level (str) – Logging level to set. Defaults to ‘INFO’. Options: ‘DEBUG’, ‘INFO’, ‘WARNING’, ‘ERROR’, ‘CRITICAL’.
log_filename (str or None) – If provided, logs are written to this file. Defaults to None (logs to console).
task_type (str or None) – Logger name/namespace. Useful for distinguishing loggers in multi-task settings. Defaults to None.
- Returns:
Configured logger instance.
- Return type:
- Raises:
ValueError – If an invalid log level is provided.