Synthesis#

Synthesis engines, multi-step search, benchmarking, and ranking utilities.

Reactor#

class synkit.Synthesis.Reactor.batch_reactor.BatchReactor(data: List[str | Dict[str, Any]], host_key: str | None = None, *, react_engine: str = 'syn', pre_filter_engine: str | None = None, explicit_h: bool = True, implicit_temp: bool = False, strategy: str = 'bt', dedupe: bool = True, entry_n_jobs: int = 1, rule_n_jobs: int = 1, parallel_rules: bool = False, allow_nested: bool = False, cache_enabled: bool = True, cache_maxsize: int = 32768, logger: Logger | None = None, enable_logging: bool = True)[source]#

Bases: object

Parallel, cache-aware batch application of reaction rules to SMILES substrates.

Parameters:
  • data (list of str or dict) – List of SMILES strings or dicts containing SMILES under host_key.

  • host_key (str or None) – Key to extract SMILES from dict entries (optional).

  • react_engine (str) – Reactor engine: ‘syn’ or ‘mod’.

  • pre_filter_engine (str or None) – Pre-filtering engine for rules (None to skip).

  • explicit_h (bool) – Use explicit hydrogens in SynReactor.

  • implicit_temp (bool) – Use implicit templates in SynReactor.

  • strategy (str) – Matching strategy for SynReactor.

  • dedupe (bool) – Deduplicate results per-substrate.

  • entry_n_jobs (int) – Number of parallel jobs for substrates.

  • rule_n_jobs (int) – Number of parallel jobs for rules per substrate.

  • parallel_rules (bool) – Enable parallelism over rules.

  • allow_nested (bool) – Allow nested parallelism.

  • cache_enabled (bool) – Enable in-process per-rule caching.

  • cache_maxsize (int) – Max entries in per-process cache.

  • logger (logging.Logger or None) – Optional custom logger.

Raises:

ValueError – If react_engine is invalid or SMILES/rule conversion fails.

describe() str[source]#

Return a configuration summary.

Returns:

Human-readable settings overview.

Return type:

str

fit(rules: Iterable[Any], *, invert: bool = False) List[Dict[str, Any]][source]#

Apply reaction rules to each substrate in the batch.

Parameters:
  • rules (iterable) – Iterable of rule graphs or SMILES strings.

  • invert (bool) – Whether to apply rules in reverse direction.

Returns:

A list of dicts, each with: - “out”: list of product SMARTS/reaction SMILES - “count”: number of outputs

Return type:

list of dict

help() str[source]#

Return usage examples and API description.

Returns:

Multi-line help text.

Return type:

str

class synkit.Synthesis.Reactor.benchmark.Benchmark(data: List[Dict[str, Any]], reaction_key: str = 'reactions', *, react_engine: str = 'syn', pre_filter_engine: str | None = None, explicit_h: bool = True, implicit_temp: bool = False, strategy: str = 'bt', dedupe: bool = True, entry_n_jobs: int = 1, rule_n_jobs: int = 1, parallel_rules: bool = False, allow_nested: bool = False, cache_enabled: bool = True, cache_maxsize: int = 32768, logger: Logger | None = None, enable_logging: bool = True)[source]#

Bases: BatchReactor

Extension of BatchReactor to benchmark forward/backward application on reaction-SMILES entries.

Parameters:
  • data (list of dict) – List of dicts containing reaction SMILES under reaction_key.

  • reaction_key (str) – Key for reaction-SMILES strings (format ‘reactants>>products’).

  • react_engine (str) – Reactor engine: ‘syn’ or ‘mod’.

  • pre_filter_engine (str or None) – Pre-filtering engine for rules (None to skip).

  • explicit_h (bool) – Use explicit hydrogens in SynReactor.

  • implicit_temp (bool) – Use implicit templates in SynReactor.

  • strategy (str) – Matching strategy for SynReactor.

  • dedupe (bool) – Deduplicate results per substrate.

  • entry_n_jobs (int) – Parallel jobs for substrates.

  • rule_n_jobs (int) – Parallel jobs for rules per substrate.

  • parallel_rules (bool) – Enable rule-level parallelism.

  • allow_nested (bool) – Allow nested parallelism.

  • cache_enabled (bool) – Enable per-process caching.

  • cache_maxsize (int) – Max cache entries before eviction.

  • logger (logging.Logger or None) – Optional custom logger.

Raises:

ValueError – If reaction_key entry malformed or SMILES invalid.

describe() str[source]#

Return detailed configuration for Benchmark, including reaction_key.

Returns:

Multi-line summary.

Return type:

str

fit(rules: Iterable[Any]) List[Dict[str, Any]][source]#

Perform forward (invert=False) on ‘r’ and backward (invert=True) on ‘p’.

Parameters:
  • rules (iterable) – Iterable of rule graphs or SMILES.

  • reaction_key (str) – Key for reaction-SMILES (unused here).

Returns:

List of dicts each with keys ‘fw’,’bw’,’fw_count’,’bw_count’.

Return type:

list of dict

class synkit.Synthesis.Reactor.imba_engine.ImbaEngine(substrate: str | Graph | SynGraph, template: str | Graph | SynRule, add_wildcard: bool = True, clean_fragments: bool = False, max_frag: bool = False, invert: bool = False, canonicaliser: GraphCanonicaliser | None = None, strategy: Strategy | str = Strategy.ALL, partial: bool = False, embed_threshold: float = None, embed_pre_filter: bool = False, electron_diagnostics: bool = False)[source]#

Bases: object

Reactor for applying a SynKit reaction template to a substrate, with options for inversion, canonicalisation, strategy, partial ITS, and radical wildcard appending and fragment cleaning in products.

Parameters:
  • substrate (Union[str, nx.Graph, SynGraph]) – Input substrate; SMILES string, networkx.Graph, or SynGraph.

  • template (Union[str, nx.Graph, SynRule]) – Reaction template; SMARTS (bracketed) string, networkx.Graph, or SynRule.

  • add_wildcard (bool) – If True, apply radical wildcard transform to each product SMARTS.

  • clean_fragments (bool) – If True, remove wildcard fragments and optionally keep max fragment.

  • max_frag (bool) – If True, force maximal fragment selection when cleaning.

  • invert (bool) – If True, apply the template in reverse (product → reactant).

  • canonicaliser (Optional[GraphCanonicaliser]) – Optional GraphCanonicaliser for preprocessing or postprocessing.

  • strategy (Union[Strategy, str]) – Enumeration strategy (Strategy enum or string).

  • partial (bool) – If True, perform partial ITS graph construction on results.

static describe() None[source]#

Print class documentation and usage examples.

property diagnostics: list[dict]#

Electron diagnostics from the last underlying reactor run.

fit() ImbaEngine[source]#

Apply the reaction template to the substrate, producing product SMARTS. Optionally clean wildcard fragments and add radical wildcards. Results are stored internally and self is returned.

Returns:

self

Return type:

ImbReactor

Raises:

ValueError – If substrate cannot be parsed or reaction fails.

property smarts_list: List[str]#

Product SMARTS results from the last fit() invocation.

Returns:

List of SMARTS strings.

Return type:

List[str]

to_list() List[str][source]#

Return all product SMARTS as a list.

Returns:

List of SMARTS strings.

Return type:

List[str]

class synkit.Synthesis.Reactor.mod_aam.MODAAM(substrate: str | List[str], rule_file: str | Path, *, invert: bool = False, strategy: str | Strategy = Strategy.BACKTRACK, verbosity: int = 0, print_results: bool = False, check_isomorphic: bool = True)[source]#

Bases: object

Runs MØD (via MODReactor) then a full AAM/ITS post-processing pipeline.

Parameters#

substrateUnion[str, List[str]]

Dot-delimited SMILES or list of SMILES for reactants.

rule_fileUnion[str, Path]

GML rule file path or raw GML/SMARTS string.

invertbool, optional

If True, apply the rule in reverse (default False).

strategyUnion[str, Strategy], optional

Matching strategy: ALL, COMPONENT, or BACKTRACK (default BACKTRACK).

verbosityint, optional

Verbosity for MODReactor (default 0).

print_resultsbool, optional

If True, print the derivation graph (default False).

check_isomorphicbool, optional

If True, deduplicate results by isomorphism (default True).

property dg: Any#

The MØD derivation graph (DG).

get_reaction_smiles() List[str][source]#

Alias for accessing the processed reaction SMILES.

get_smarts() List[str][source]#

Synonym for .get_reaction_smiles().

help() None[source]#

Print a summary of inputs and outputs.

property product_count: int#

Number of product SMILES generated.

property reaction_smiles: List[str]#

The post-processed reaction SMILES.

run() List[str][source]#

Re-run the entire pipeline (MØD + AAM) and return fresh results.

synkit.Synthesis.Reactor.mod_aam.expand_aam(rsmi: str, rule: str) List[str][source]#

Expand Atom–Atom Mapping (AAM) for a given reaction SMARTS/SMILES (rsmi) using a pre‐sanitized GML rule string.

Parameters#

rsmistr

Reaction SMILES/SMARTS in ‘reactants>>products’ form.

rulestr

A GML rule string (already sanitized upstream).

Returns#

List[str]

All reaction SMILES from MODAAM whose standardized form matches rsmi.

class synkit.Synthesis.Reactor.mod_reactor.MODReactor(substrate: str | List[str], rule_file: str | Path, *, invert: bool = False, strategy: str | Strategy = Strategy.BACKTRACK, verbosity: int = 0, print_results: bool = False)[source]#

Bases: object

Lazy, ergonomic wrapper around the MØD toolkit’s derivation pipeline.

Workflow#

  1. Instantiate: give substrate SMILES and a rule GML (path or string).

  2. Call .run() to execute the reaction strategy.

  3. Inspect results via .get_reaction_smiles(), .product_sets, .get_dg(), etc.

Attributes#

initial_smilesList[str]

List of SMILES strings for reactants (or products, if inverted).

rule_filePath

Filesystem path or raw GML string or raw smart with AAM for the reaction rule.

invertbool

If True, apply the rule in reverse (products → reactants).

strategyStrategy

One of ALL, COMPONENT, or BACKTRACK.

verbosityint

Verbosity level for the MØD DG.apply() call.

print_resultsbool

If True, prints the derivation graph to stdout.

property dg: None#

DG or None – cached derivation graph.

See also#

get_dg

static generate_reaction_smiles(temp_results: List[List[str]], base_smiles: str, *, invert: bool = False, arrow: str = '>>', separator: str = '.') List[str][source]#

Build reaction SMILES of the form “A>>B”, where A and B swap roles if invert=True.

Parameters#

temp_resultsList[List[str]]

Batches of product (or reactant) SMILES.

base_smilesstr

The “other side” of the reaction: the reactant side when invert=False, or the product side when invert=True.

invertbool

If False, generates “base_smiles>>joined_batch”; if True, generates “joined_batch>>base_smiles”.

arrowstr

The reaction arrow to use (default “>>”).

separatorstr

How to join multiple SMILES in a batch (default “.”).

Returns#

List[str]

Reaction SMILES strings, one per batch.

get_dg() None[source]#

Access the underlying derivation graph.

Returns#

DG

The MØD derivation graph constructed during .run().

Raises#

RuntimeError

If .run() has not yet been called.

get_reaction_smiles() List[str][source]#

Retrieve the reaction SMILES strings (lazy).

Returns#

List[str]

List of reaction SMILES, in “A>>B” format.

help() None[source]#

Print a one-page summary of reactor configuration and results.

property prediction_count: int#

Number of distinct prediction batches generated.

property product_sets: List[List[str]]#

Raw product sets (lists of SMILES) before joining into full reactions.

property product_smiles: List[str]#

Flattened list of all product SMILES (may contain duplicates).

property reaction_smiles: List[str]#

Lazy-loaded reaction SMILES strings of form “A>>B”.

Returns#

List[str]

run() MODReactor[source]#

Execute the chosen strategy once and return self so you can chain:

`python r = MODReactor(...).run() smiles = r.get_reaction_smiles() `

property temp_results: List[List[str]]#

Lazy-loaded raw product lists.

Returns#

List[List[str]]

class synkit.Synthesis.Reactor.partial_engine.PartialEngine(smi: str, template: str, electron_diagnostics: bool = False)[source]#

Bases: object

Partial Reaction Learning Engine that applies a single‐direction (forward or backward) template transformation, injects radical wildcards, and returns a list of intermediate ITS strings.

Parameters:
  • smi (str) – A reaction SMARTS (rsmi) string in the form “Reactants>>Products” or a simple SMILES string when used for one‐sided synthesis.

  • template (str) – A reaction template SMARTS string, which may include explicit H.

property diagnostics: list[dict]#

Electron diagnostics from the last reactor run.

fit(invert: bool = False) list[str][source]#

Apply the template in one direction to generate radical‐wildcarded reaction SMARTS (ITS).

  • Instantiates a SynReactor on the host graph and ITS.

  • Sets partial, implicit‐template, and explicit‐H flags.

  • If invert=True, runs the backward direction; otherwise forward.

  • Post‐processes each reaction SMARTS with RadicalWildcardAdder.

Parameters:

invert (bool) – If True, apply the template in the reverse direction (Products→Reactants). Default is False (forward direction).

Returns:

A list of ITS‐encoded reaction SMARTS strings, each augmented with radical wildcard notation.

Return type:

list[str]

class synkit.Synthesis.Reactor.post_syn.PostSyn(n_jobs: int = 1, verbose: int = 2, standardizer: Standardize | None = None, reaction_key: str = 'reactions', fw_key: str = 'fw', bw_key: str = 'bw')[source]#

Bases: object

Post-processing helper for reaction data: standardize reactions and clean AAM strings, with optional parallelism, progress reporting, and filtering of incomplete reaction SMILES inside fw/bw lists. Input keys for reaction, fw, and bw are configurable.

clean_aam(list_aam: Iterable[str], remove_radical: bool = True) List[str][source]#

Remove atom-atom mappings, optionally clean radicals, deduplicate while preserving order.

process(data: Iterable[Dict[str, Any]], *, progress: bool = False, prefilter: Callable[[Dict[str, Any]], bool] | None = None, filter_incomplete_rxn: bool = True) List[Dict[str, Any]][source]#

Process reaction entries.

Parameters:
  • data – iterable of dicts.

  • progress – show progress bar if True.

  • prefilter – predicate to pre-filter entries.

  • filter_incomplete_rxn – if True, drop incomplete SMILES inside fw/bw.

Returns:

processed list with standardized reaction and cleaned fw/bw under their original keys.

class synkit.Synthesis.Reactor.rbl_engine.RBLEngine(*, wildcard_element: str = '*', element_key: str = 'element', node_attrs: ~typing.Sequence[str] | None = None, edge_attrs: ~typing.Sequence[str] | None = None, prune_wc: bool = True, prune_automorphisms: bool = True, mcs_side: str = 'l', early_stop: bool = True, fast_paths_only: bool = False, max_mappings_per_pair: int = 1, implicit_temp: bool = True, explicit_h: bool = False, electron_diagnostics: bool = False, embed_threshold: int = 10000, reactor_cls: type = <class 'synkit.Synthesis.Reactor.syn_reactor.SynReactor'>, wildcard_adder_cls: type = <class 'synkit.Chem.Reaction.radical_wildcard.RadicalWildcardAdder'>, matcher_cls: type = <class 'synkit.Graph.Matcher.mcs_matcher.MCSMatcher'>, fuse_fn: ~typing.Callable[[~typing.Any, ~typing.Any, ~typing.Dict[~typing.Any, ~typing.Any]], ~typing.Any] = <function fuse_its_graphs>, remove_explicit_H_fn: ~typing.Callable[[str], str] = <function remove_explicit_H_from_rsmi>, rsmi_to_its_fn: ~typing.Callable[[...], ~typing.Any] = <function rsmi_to_its>, its_to_rsmi_fn: ~typing.Callable[[~typing.Any], str] = <function its_to_rsmi>, h_to_implicit_fn: ~typing.Callable[[~typing.Any], ~typing.Any] = <function h_to_implicit>, standardize_h_fn: ~typing.Callable[[~typing.Any], ~typing.Any] = <function standardize_hydrogen>, standardize_fn: ~typing.Callable[[str], str] | None = <bound method Standardize.fit of <synkit.Chem.Reaction.standardize.Standardize object>>, logger: ~logging.Logger | None = None)[source]#

Bases: object

Radical-based linking (RBL) engine for bidirectional template application and ITS-graph fusion using wildcard-based subgraph matching.

Overview#

The RBL engine turns a reaction template (RSMI or ITS graph) into a set of fused reaction graphs that link forward and backward template applications through a wildcard-aware core. The workflow is:

  1. Template preparation: Convert a template (RSMI or ITS graph) into a standardized ITS representation with normalized hydrogen handling.

  2. Forward / backward application: Use SynReactor to apply the template to a substrate (reactants or products) in forward or inverted mode, convert to RSMI, decorate with radical wildcards, and convert back to ITS.

  3. Wildcard-based fusion: For each forward/backward ITS pair, run a matcher (MCSMatcher or ApproxMCSMatcher) to detect a core overlap (ignoring wildcard regions) and fuse the graphs via fuse_its_graphs(). The fused ITS graphs are then converted back to post-processed RSMI strings.

Matching back-ends: exact vs. approximate#

The engine delegates ITS matching to matcher_cls, which is assumed to be API-compatible with MCSMatcher:

  • MCSMatcher (default)
  • ApproxMCSMatcher
    • Heuristic / greedy approximate MCS search.

    • Uses seed selection and local greedy growth instead of exhaustive enumeration.

    • Much faster on large graphs but only approximate – mappings are usually close to optimal in practice but not guaranteed to be globally maximal.

Any custom matcher can be plugged in as long as it implements the MCSMatcher public API:

  • __init__(node_attrs, node_defaults, edge_attrs, prune_wc, ...)

  • find_rc_mapping()

  • get_mappings()

Early-stop semantics#

The engine exposes two orthogonal control flags: :paramref:`early_stop` and :paramref:`fast_paths_only`.

  • If early_stop is True:

    • A cheap quick-check is attempted first via _quick_check().

    • If that fails, the engine looks for ITS graphs without any wildcard atoms in the forward and backward sets and post-processes them directly via _early_stop_on_nonwildcard(), without any MCS/fusion.

      For each such candidate, a canonical reactant/product check is performed to ensure consistency with the original reaction:

      • forward candidates must preserve the original main product component;

      • backward candidates must preserve the original main reactant component.

    • Only if both these cheap paths fail, fusion and post-processing are run in a streaming loop: mappings are fused and post-processed one by one, and the pipeline stops after the first successful fused RSMI.

  • If early_stop is False, the same loop runs without early exit, collecting all fused ITS and fused RSMIs.

Fast-path-only mode#

  • If fast_paths_only is True (or process() is called with fast_paths_only=True):

    • The engine never enters the expensive MCS/fusion stage (_fuse_and_postprocess() is skipped).

    • It only attempts:

      1. _quick_check()

      2. _early_stop_on_nonwildcard()

    • If neither path yields a solution, the engine returns with empty :pyattr:`fused_its` / :pyattr:`fused_rsmis` and result['mode'] == "fast_paths_only" and result['reason'] == "fast_paths_no_solution".

    • The flag early_stop is ignored for the fusion stage in this mode, but still controls behaviour when fast_paths_only=False.

Reactor / hydrogen control#

The underlying SynReactor is configured via three flags that are exposed on the engine:

  • implicit_temp – forwarded to SynReactor(..., implicit_temp=...).

  • explicit_h – forwarded to SynReactor(..., explicit_h=...).

  • embed_threshold – forwarded to SynReactor(..., embed_threshold=...).

This gives fine-grained external control over how templates are embedded and how hydrogens are handled during the reaction stage.

Parameters#

param wildcard_element:

Element symbol used to denote wildcard atoms (default "*", as in your wildcard framework).

type wildcard_element:

str, optional

param element_key:

Node attribute key that stores the element symbol (default "element").

type element_key:

str, optional

param node_attrs:

Node attributes used by the matcher when comparing nodes. Defaults to ["element", "aromatic", "charge"].

type node_attrs:

Sequence[str] or None, optional

param edge_attrs:

Edge attributes used by the matcher when comparing bonds. Defaults to ["order"].

type edge_attrs:

Sequence[str] or None, optional

param prune_wc:

If True, ask the matcher to prune wildcard nodes from both graphs before matching (when supported by the matcher class).

type prune_wc:

bool, optional

param prune_automorphisms:

If True, ask the matcher (for example MCSMatcher or ApproxMCSMatcher) to prune automorphism-equivalent mappings, typically collapsing mappings that cover the same host-node set.

type prune_automorphisms:

bool, optional

param mcs_side:

Side of the reaction centres to match when using MCSMatcher.find_rc_mapping(). Typical values are "l", "r" or "op".

type mcs_side:

str, optional

param early_stop:

If True, activate the multi-stage pruning described above and enable streaming early-stop inside the fusion loop.

type early_stop:

bool, optional

param fast_paths_only:

If True, only fast paths (quick-check and non-wildcard ITS early-stop) are used. The expensive fusion stage is skipped entirely, even if early_stop is True. This can be overridden per-call in process().

type fast_paths_only:

bool, optional

param max_mappings_per_pair:

Hard cap on the number of mappings to consider for each (forward ITS, backward ITS) pair. Default is 1.

type max_mappings_per_pair:

int, optional

param implicit_temp:

Flag forwarded to SynReactor (implicit_temp argument). Controls whether the template is treated as implicit.

type implicit_temp:

bool, optional

param explicit_h:

Flag forwarded to SynReactor (explicit_h argument). Controls whether explicit hydrogens are kept during reaction application.

type explicit_h:

bool, optional

param embed_threshold:

Hard cap forwarded to SynReactor (embed_threshold argument), typically controlling the maximum number of embeddings before the reactor aborts.

type embed_threshold:

int, optional

param reactor_cls:

Class used to instantiate the reactor. Must be compatible with SynReactor and expose an its attribute and (optionally) smarts.

type reactor_cls:

type, optional

param wildcard_adder_cls:

Class used to decorate reactions with radical wildcards. Defaults to RadicalWildcardAdder.

type wildcard_adder_cls:

type, optional

param matcher_cls:

Class used for ITS matching. By default this is MCSMatcher (exact MCS). It can be replaced by ApproxMCSMatcher for a greedy, approximate search that is much faster but not guaranteed to be globally optimal.

type matcher_cls:

type[MCSMatcher] or type[ApproxMCSMatcher], optional

param fuse_fn:

Function used to fuse ITS graphs based on a core mapping. Defaults to fuse_its_graphs().

type fuse_fn:

Callable[[ITSLike, ITSLike, Dict[Any, Any]], ITSLike], optional

param remove_explicit_H_fn:

Function that removes explicit hydrogens from a reaction SMILES. Defaults to synkit.Chem.utils.remove_explicit_H_from_rsmi().

type remove_explicit_H_fn:

Callable[[str], str], optional

param rsmi_to_its_fn:

Function to convert RSMI to ITS; defaults to synkit.IO.rsmi_to_its().

type rsmi_to_its_fn:

Callable[…, ITSLike], optional

param its_to_rsmi_fn:

Function to convert ITS to RSMI; defaults to synkit.IO.its_to_rsmi().

type its_to_rsmi_fn:

Callable[[ITSLike], str], optional

param h_to_implicit_fn:

Function to convert explicit hydrogens to implicit in an ITS or graph; defaults to synkit.Graph.Hyrogen._misc.h_to_implicit().

type h_to_implicit_fn:

Callable[[ITSLike], ITSLike], optional

param standardize_h_fn:

Function to perform final hydrogen standardization; defaults to synkit.Graph.Hyrogen._misc.standardize_hydrogen().

type standardize_h_fn:

Callable[[ITSLike], ITSLike], optional

param standardize_fn:

Function used by the quick-check and verification for reaction canonicalization. It should take a reaction string and return a canonicalized reaction string. Typical usage is Standardize().fit. Defaults to a simple identity standardizer that strips whitespace.

type standardize_fn:

Callable[[str], str] or None, optional

param logger:

Logger for debug information. If None, a module-level logger is created.

type logger:

logging.Logger or None, optional

Examples#

Exact MCS back-end#

Use the default MCSMatcher for exact MCS fusion:

from synkit.Synthesis.Reactor.rbl_engine import RBLEngine

rxn = "CCO.CBr>>CCOBr"
template = "CBr>>C[*]"  # toy example

engine = RBLEngine(
    early_stop=True,
    fast_paths_only=False,
    implicit_temp=True,
    explicit_h=False,
    embed_threshold=5000,
)

engine = engine.process(rxn, template)
print(engine.result["mode"])
print(engine.fused_rsmis)

Approximate MCS back-end#

Swap in ApproxMCSMatcher to accelerate matching on large graphs while retaining the same RBL API:

from synkit.Graph.Matcher.approx_mcs import ApproxMCSMatcher
from synkit.Synthesis.Reactor.rbl_engine import RBLEngine

rxn = "CC1=CC=CC=C1.OBr>>CC1=CC=CC=C1OBr"
template = "OBr>>O[*]"

engine = RBLEngine(
    matcher_cls=ApproxMCSMatcher,   # use heuristic MCS
    early_stop=False,               # collect all fused hits
    fast_paths_only=False,
)

engine = engine.process(rxn, template)
for fused in engine.fused_rsmis:
    print(fused)
property backward_its: List[Any]#

ITS graphs obtained from the last backward (invert) application.

Returns:

List of backward ITS graphs.

Return type:

list[ITSLike]

property diagnostics: Dict[str, List[Dict[str, Any]]]#

Electron diagnostics grouped by reactor stage.

property forward_its: List[Any]#

ITS graphs obtained from the last forward application.

Returns:

List of forward ITS graphs.

Return type:

list[ITSLike]

property fused_its: List[Any]#

Fused ITS graphs obtained after wildcard-based core matching.

Returns:

List of fused ITS graphs.

Return type:

list[ITSLike]

property fused_rsmis: List[str]#

Post-processed reaction SMILES derived from the fused ITS graphs.

Returns:

List of fused reaction SMILES.

Return type:

list[str]

help() str[source]#

Return a short textual description of the current engine state.

Useful for quick inspection in interactive sessions.

Returns:

Multi-line human-readable summary string.

Return type:

str

property last_reaction: str | None#

Last processed reaction RSMI string.

Returns:

Reaction SMILES or None if process() was not run.

Return type:

Optional[str]

prepare_template(template: str | Graph | Any) RBLEngine[source]#

Prepare a reaction template into a standardized ITS representation.

Parameters:

template (str | nx.Graph | ITSLike) – Template as reaction SMILES, graph or ITS-like.

Returns:

The current engine instance (for chaining).

Return type:

RBLEngine

process(rsmi: str, template: str | Graph | Any, *, replace_wc: bool = True, fast_paths_only: bool | None = None) RBLEngine[source]#

Run the full RBL pipeline on a reaction RSMI and a template.

  1. Split the reaction into reactants/products via '>>'.

  2. Optionally attempt a quick-check (_quick_check()) if early-stop or fast-paths-only logic is active. On success, store the solution as the sole entry in :pyattr:`fused_rsmis`.

  3. Prepare the template via prepare_template().

  4. Run forward and backward template application via react().

  5. Optionally attempt _early_stop_on_nonwildcard() to exploit ITS graphs that contain no wildcard atoms at all, with canonical reactant/product verification.

  6. If fast-path-only logic is active and no solution was found in steps 2–5, return without running fusion.

  7. Otherwise, run _fuse_and_postprocess() with streaming early-stop behaviour controlled by early_stop.

When fast_paths_only (argument or attribute) is True, only steps 1–6 are executed and the expensive fusion stage is skipped entirely.

Parameters:
  • rsmi (str) – Input reaction SMILES.

  • template (str | nx.Graph | ITSLike) – Template as reaction SMILES, graph or ITS-like.

  • replace_wc (bool) – If True, replace wildcard atoms by hydrogen during final post-processing.

  • fast_paths_only (bool or None) – Optional per-call override of the engine-level fast_paths_only flag. If None, the attribute value is used.

Returns:

The current engine instance.

Return type:

RBLEngine

Raises:

ValueError – If the reaction string does not contain '>>' or if template preparation fails.

react(substrate: str | Any, pattern: Any | None = None, invert: bool = False) RBLEngine[source]#

Public wrapper around _run_reaction() that updates engine state.

If pattern is None, the last prepared template (:pyattr:`template_its`) is used.

Results are stored in :pyattr:`forward_its` (for invert=False) or :pyattr:`backward_its` (for invert=True).

Parameters:
  • substrate (str | ITSLike) – Substrate reaction string or ITS-like object.

  • pattern (ITSLike or None) – Optional template ITS; if None, use :pyattr:`template_its`.

  • invert (bool) – If True, store results as backward ITS.

Returns:

The current engine instance.

Return type:

RBLEngine

Raises:

ValueError – If no template pattern was provided or prepared.

replace_wildcard_with_H(G: Graph) Graph[source]#

Replace wildcard atoms in an ITS graph with hydrogen.

This updates node-level attributes:

  • node[element_key]

  • typesGH (if present, element field only)

  • neighbors lists (string-based)

Edge structure and other attributes are not touched.

Parameters:

G (nx.Graph) – ITS graph to modify in-place.

Returns:

The same graph instance, for convenience.

Return type:

nx.Graph

property result: Dict[str, Any]#

Summary of the result from the last process() call.

The dictionary contains:

  • "fused_rsmis": list of final fused reaction strings.

  • "mode": high-level termination mode (e.g. "quick_check", "early_stop", "full_pipeline", "fast_paths_only").

  • "reason": short explanation of how/why the pipeline finished.

  • "metadata": small auxiliary dictionary with extra details.

  • "n_forward_its": number of forward ITS graphs.

  • "n_backward_its": number of backward ITS graphs.

  • "n_fused_its": number of fused ITS graphs.

Returns:

Summary dictionary with fused SMILES and termination info.

Return type:

dict[str, Any]

property template_its: Any | None#

Standardized ITS representation of the last prepared template.

Returns:

Template ITS or None if not prepared.

Return type:

Optional[ITSLike]

class synkit.Synthesis.Reactor.rule_filter.RuleFilter(host_graph: Graph, rules_list: List[Any], invert: bool = False, engine: str = 'turbo', node_label: str | List[str] = ['element', 'charge'], edge_label: str | List[str] = 'order', distance_threshold: int = 5000, sing_max_path: int = 3)[source]#

Bases: object

Filter a host graph by a list of transformation rules (patterns), keeping only those rules whose (decomposed) pattern appears as a subgraph in the host.

Parameters:
  • host_graph (nx.Graph) – The host graph to search within (will be converted to explicit H).

  • rules_list (list) – A list of rule objects to filter against.

  • invert (bool) – If True, use the “modifier” component of each decomposition; otherwise use the normal part.

  • engine (str) – Matching engine to use: “turbo”, “sing”, “nx”, or “mod”.

  • node_label (str or list) – Node attribute(s) for TurboISO to match on.

  • edge_label (str or list) – Edge attribute(s) for TurboISO to match on.

  • distance_threshold (int) – Threshold to skip distance filtering in TurboISO.

  • sing_max_path (int) – Maximum path length for SING engine.

Returns:

An instance with only the rules that matched.

Return type:

RuleFilter

property engine: str#

Matching engine in use.

Returns:

The name of the engine.

Return type:

str

property host: Graph#

The explicit host graph.

Returns:

The host graph used for matching.

Return type:

nx.Graph

property matches: List[bool]#

Boolean list indicating which patterns were found.

Returns:

List of booleans aligned with patterns.

Return type:

list of bool

property new_rules: List[Any]#

Subset of rules for which matches[i] is True.

Returns:

Filtered list of matching rules.

Return type:

list

property patterns: List[Graph]#

Decomposed subgraph queries used internally.

Returns:

List of ITS-decomposed query graphs.

Return type:

list of nx.Graph

property rules: List[Any]#

Original list of rules provided.

Returns:

The list of rules.

Return type:

list

class synkit.Synthesis.Reactor.single_predictor.SinglePredictor[source]#

Bases: object

A class designed for one-step chemical reaction predictions using transformation rules.

This class utilizes transformation rules to predict the outcomes of chemical reactions based on provided SMILES strings.

class synkit.Synthesis.Reactor.strategy.Strategy(value)[source]#

Bases: str, Enum

Strategy for sub-graph matching/application:

  • ALL: classic VF2 on the whole graph

  • COMPONENT: component-aware only (no cross-CC backtracking)

  • BACKTRACK: component-aware with backtracking across CCs

  • PARTIAL: partial matching (mcs)

ALL = 'all'#
BACKTRACK = 'bt'#
COMPONENT = 'comp'#
PARTIAL = 'partial'#
classmethod from_string(value: str | Strategy) Strategy[source]#

Convert a string or Strategy to a Strategy enum.

Parameters#

valuestr or Strategy

The strategy to parse.

Returns#

Strategy

Parsed Strategy.

Raises#

ValueError

If the input is not a valid Strategy.

class synkit.Synthesis.Reactor.syn_reactor.SynReactor(substrate: str | Graph | SynGraph, template: str | Graph | SynRule, invert: bool = False, canonicaliser: GraphCanonicaliser | None = None, explicit_h: bool = True, implicit_temp: bool = False, strategy: Strategy | str = Strategy.ALL, partial: bool = False, template_format: Literal['typesGH', 'tuple'] = 'typesGH', electron_diagnostics: bool = False, embed_threshold: int | None = None, embed_pre_filter: bool = False, automorphism: bool = True)[source]#

Bases: object

A hardened and typed re-write of the original SynReactor, preserving API compatibility while offering safer, faster, and cleaner behavior.

Parameters:
  • substrate (Union[str, nx.Graph, SynGraph]) – The input reaction substrate, as a SMILES string, a raw NetworkX graph, or a SynGraph.

  • template (Union[str, nx.Graph, SynRule]) – Reaction template, provided as SMILES/SMARTS, a raw NetworkX graph, or a SynRule.

  • invert (bool) – Whether to invert the reaction (predict precursors). Defaults to False.

  • canonicaliser (Optional[GraphCanonicaliser]) – Optional canonicaliser for intermediate graphs. If None, a default GraphCanonicaliser is used.

  • explicit_h (bool) – If True, render all hydrogens explicitly in the reaction-center SMARTS. Defaults to True.

  • implicit_temp (bool) – If True, treat the input template as implicit-H (forces explicit_h=False). Defaults to False.

  • strategy (Strategy or str) – Matching strategy, one of Strategy.ALL, ‘comp’, or ‘bt’. Defaults to Strategy.ALL.

  • partial (bool) – If True, use a partial matching fallback. Defaults to False.

  • template_format (ITSFormat) – ITS representation used when template is a reaction string. Defaults to "typesGH" for compatibility.

  • electron_diagnostics (bool) – If True, expose per-result electron-accounting diagnostics without changing generated products.

Variables:
  • _graph (Optional[SynGraph]) – Cached SynGraph for the substrate.

  • _rule (Optional[SynRule]) – Cached SynRule for the template.

  • _mappings (Optional[List[MappingDict]]) – Cached list of subgraph-mapping dicts.

  • _its (Optional[List[nx.Graph]]) – Cached list of ITS graphs.

  • _smarts (Optional[List[str]]) – Cached list of SMARTS strings.

  • _flag_pattern_has_explicit_H (bool) – Internal flag indicating explicit-H constraints.

automorphism: bool = True#
canonicaliser: GraphCanonicaliser | None = None#
property diagnostics: List[Dict[str, Any]]#

Return optional electron-accounting diagnostics for built ITS graphs.

electron_diagnostics: bool = False#
embed_pre_filter: bool = False#
embed_threshold: int | None = None#
explicit_h: bool = True#
classmethod from_smiles(smiles: str, template: str | Graph | SynRule, *, invert: bool = False, canonicaliser: GraphCanonicaliser | None = None, explicit_h: bool = True, implicit_temp: bool = False, automorphism: bool = False, strategy: Strategy | str = Strategy.ALL, template_format: Literal['typesGH', 'tuple'] = 'typesGH', electron_diagnostics: bool = False) SynReactor[source]#

Alternate constructor: build a SynReactor directly from SMILES.

Parameters:
  • smiles (str) – SMILES string for the substrate.

  • template (str or networkx.Graph or SynRule) – Reaction template (SMILES/SMARTS string, Graph, or SynRule).

  • invert (bool) – If True, perform backward prediction (target→precursors). Defaults to False (forward prediction).

  • canonicaliser (GraphCanonicaliser or None) – Optional GraphCanonicaliser to use for internal graphs.

  • explicit_h (bool) – If True, keep explicit hydrogens in the reaction center.

  • implicit_temp (bool) – If True, treat the template as implicit-H (forces explicit_h=False).

  • strategy (Strategy or str) – Matching strategy: ALL, ‘comp’, or ‘bt’. Defaults to ALL.

  • template_format (ITSFormat) – ITS representation used when template is a reaction string. Defaults to "typesGH".

  • electron_diagnostics (bool) – If True, expose per-result electron diagnostics without changing products.

Returns:

A new SynReactor instance.

Return type:

SynReactor

property graph: SynGraph#

Lazily wrap the substrate into a SynGraph.

Returns:

The reaction substrate as a SynGraph.

Return type:

SynGraph

help(print_results=False) None[source]#
implicit_temp: bool = False#
invert: bool = False#
property its#
property its_list: List[Graph]#

Build ITS graphs for each subgraph mapping.

Returns:

A list of ITS (Internal Transition State) graphs.

Return type:

list of networkx.Graph

property mapping_count#

Number of mappings

property mappings: List[Dict[Any, Any]]#

Return unique sub‑graph mappings, optionally pruned via automorphisms.

partial: bool = False#
property rule: SynRule#

Lazily wrap the template into a SynRule.

Returns:

The reaction template as a SynRule.

Return type:

SynRule

property smarts#
property smarts_list: List[str]#

Serialise each ITS graph to a reaction-SMARTS string.

Returns:

A list of SMARTS strings (inverted if invert=True).

Return type:

list of str

property smiles_list#
strategy: Strategy | str = 'all'#
substrate: str | Graph | SynGraph#
property substrate_smiles#
template: str | Graph | SynRule#
template_format: Literal['typesGH', 'tuple'] = 'typesGH'#

Multi-step search#

class synkit.Synthesis.MSR.multi_steps.MultiSteps[source]#

Bases: object

multi_step(original_rsmi: str, list_rule: List[str], order: List[int], cat: str | List[str]) List[str][source]#

Orchestrate a multi-step chemical reaction process using a set of rules and a starting reactant.

Parameters: - original_rsmi (str): Initial reactant SMILES string. - list_rule (List[str]): List of GML rules for the reactions. - order (List[int]): Order of application of the GML rules. - cat (Union[str, List[str]]): Catalysts or additional reagents to be added, can be a single string or a list of strings.

Returns: - List[str]: List of reaction SMILES strings with atom-atom mapping applied after all steps.

class synkit.Synthesis.MSR.path_finder.PathFinder(reaction_rounds: List[Dict[str, List[str]]])[source]#

Bases: object

search_paths(input_smiles: str, target_smiles: str, method: str = 'bfs', max_solutions: int | None = None, cheapest: bool = True) List[List[str]][source]#

Search for reaction pathways from the input molecule to the target molecule using a specified method, optionally limiting the number of solutions.

Additionally, cheapest can be set to True or False:
  • If cheapest=True, BFS uses a visited set and A* prunes costlier routes (typical approach).

  • If cheapest=False, BFS does not track visited states (returns more solutions), and A* does not prune costlier routes (also returns more solutions). (May lead to duplicates or many solutions if cycles exist.)

Parameters: - input_smiles (str): SMILES of the starting molecule. - target_smiles (str): SMILES of the target molecule. - method (str, optional): ‘bfs’, ‘astar’, or ‘mc’. - iterations (int, optional): Number of MC iterations (if method=’mc’). - max_solutions (int, optional): If set, stop after finding this many solutions. - cheapest (bool, optional): Controls pruning.

Default True => standard BFS/A*; False => “unrestricted” BFS/A*.

Returns: - List[List[str]]: Each solution path is a list of reaction SMILES from start to target.

Metrics#

synkit.Synthesis.Metrics._plot.plot_f2_scores_line(data, figsize=(8, 6), show_f2=True, show_legend=True)[source]#

Plots F2 scores across different radii using a line plot, showing the trend of F2 score changes, and annotated with optional F2 scores.

Parameters: - data (dict): Dictionary containing nested dictionaries with ‘F2_score’ and possibly other metrics. - figsize (tuple): Figure size for the plot, default is (8, 6). - show_f2 (bool): Whether to show F2 scores on the curve, default is True. - show_legend (bool): Whether to show the legend on the plot, default is True.

Example Data format: {‘radii_0’: {‘Novelty’: 96.44, ‘Coverage’: 93.98, ‘Recognition’: 3.55, ‘F2_score’: 0.15}, …}

synkit.Synthesis.Metrics._plot.plot_recognition_coverage_curve(data, coverage_col='Coverage', recognition_col='Recognition', f2_score_col='F2_score', figsize=(8, 6), show_f2=True, show_legend=True)[source]#

Plots a Recognition-Coverage curve using provided data, including optional F2 scores annotated. Styled with Seaborn for enhanced visual appearance.

Parameters: - data (dict): Nested dictionary containing the data for each radii, formatted as shown in example. - coverage_col (str): Key name for the coverage data in the dictionary. - recognition_col (str): Key name for the recognition data in the dictionary. - f2_score_col (str): Key name for the F2 score data in the dictionary. - figsize (tuple): Figure size for the plot, default is (8, 6). - show_f2 (bool): Whether to show F2 scores on the curve, default is True.

Example Data format: {‘radii_0’: {‘Novelty’: 96.44, ‘Coverage’: 93.98, ‘Recognition’: 3.55, …}}

Utilities#