API Documentation

mr_toolkit.coarse_graining.msm_coarse_graining

Main module with code for coarse-graining transition matrices and computing bin-weights.

mr_toolkit.reweighting.splicing

Code for splicing equilibrium trajectories into nonequilibrium steady-state trajectories.

mr_toolkit.reweighting.analysis

Code for reweighting MSMs and obtaining reweighted estimates of steady-state.

mr_toolkit.clustering.StratifiedClusters(...)

Class for performing stratified k-means clustering.

MSM Coarse-graining

Coarse-graining

Main module with code for coarse-graining transition matrices and computing bin-weights.

mr_toolkit.coarse_graining.msm_coarse_graining.coarse_grain(P, cg_map, w, lag=1, normalize=True)[source]

Coarse-grains a fine-grained transition matrix according to some mapping of microstates to macrostates and weights over the microstates.

This is done according to

\[\eqncg\]
Parameters
  • P (np.ndarray) – Fine-grained transition matrix.

  • cg_map (list of lists) – List of all microstates in each macrostate.

  • w (array-like) – Microbin weights \(\wi\).

  • lag (int) – Lag for Markov model \(\lag\).

  • normalize (bool) – Normalize the resulting matrix over the weights. This should be off when building an occupancy matrix over many lags, because there the normalization is over all \(\wi\).

Returns

p_matrix – Coarse-grained transition matrix \(\textbf{T}\).

Return type

np.ndarray

Parameters
  • P (ndarray) –

  • cg_map (ndarray) –

  • w (ndarray) –

  • lag (int) –

  • normalize (bool) –

Examples

To coarse-grain a 6x6 transition matrix P into a 4x4 by grouping the inner pairs of states (1+2 and 3+4) and leaving the edge states unchanged, one could do

>>> coarse_grain(P, [[0], [1,2], [2,3], [4]], w)
mr_toolkit.coarse_graining.msm_coarse_graining.compute_avg_bin_weights(initial_weights, transition_matrix, max_s, lag=1, min_s=0, leave=False)[source]

Obtain the time-averaged bin weights for a lag of 1, described by

\[\eqnwi\]
Parameters
  • initial_weights (array-like) – List or array of initial microbin-weights.

  • transition_matrix (np.ndarray) – (n_states x n_states) Transition matrix.

  • max_s (int) – Maximum trajectory length \(S\)

  • lag (int) – Lag used for Markov model \(\lag\).

  • min_s (int) – Earliest trajectory point to use in sliding window calculation. Defaults to 0.

  • leave (bool) – Leave TQDM progress bar on completion

Returns

wi_bar – List of time-averaged weights for each bin

Return type

np.ndarray (n_states)

Parameters
  • max_s (int) –

  • lag (int) –

  • min_s (int) –

Estimating observables

Main module with code for coarse-graining transition matrices and computing bin-weights.

mr_toolkit.coarse_graining.msm_coarse_graining.get_comm(transition_matrix, statesA, statesB)[source]

Computes the committor for a given transition matrix, source states, and target states.

Parameters
  • transition_matrix (np.ndarray) – Transition matrix.

  • statesA (array-like) – Source/origin state(s).

  • statesB (array-like) – Target state(s).

Returns

committors – Array of committors to statesB for each bin.

Return type

np.ndarray

Raises

AssertionError – The solved committor distribution does not obey first-step stationarity.

Parameters
  • transition_matrix (ndarray) –

  • statesA (list) –

  • statesB (list) –

mr_toolkit.coarse_graining.msm_coarse_graining.get_equil(transition_matrix, normalize=True, _round=15)[source]

Computes the equilibrium distribution for an input transition matrix by taking the left-eigenvector of transition_matrix with an eigenvalue of 1.

Parameters
  • transition_matrix (np.ndarray) – The transition matrix.

  • _round (int, optional (12)) – Number of decimal places of precision to keep in the equil distribution.

Returns

equil – Equilibrium distribution for

Return type

np.ndarray

Parameters
  • transition_matrix (ndarray) –

  • normalize (bool) –

  • _round (int) –

mr_toolkit.coarse_graining.msm_coarse_graining.get_hill_mfpt(ss_dist, T, target_mesostates)[source]

Compute the MFPT via the Hill relation.

From the Hill relation, the MFPT is the inverse flux into the target state, or

\[\hillrelation\]
Parameters
  • ss_dist (array-like) – Stationary distribution.

  • T (array-like) – Transition matrix. (What BCs?)

  • target_mesostates (array-like) – Target states for MFPT calculation.

Returns

MFPT – First-passage time estimate.

Return type

float

mr_toolkit.coarse_graining.msm_coarse_graining.get_naive_hill_mfpt(T_ss, ss_dist, target_mesostates, all_other_states)[source]

This SHOULD just be an explicit, un-optimized version of get_hill_mfpt to make sure I got the linear algebra right

Constructing matrices

Main module with code for coarse-graining transition matrices and computing bin-weights.

mr_toolkit.coarse_graining.msm_coarse_graining.build_fine_transition_matrix(height_ratio, num_bins)[source]

Generate a Markov transition matrix where each bin is height_ratio more likely to transition to itself than to its neighbor.

Parameters
  • height_ratio (float) – Ratio of the transition probability to self vs to neighbor bin. This is a proxy for the inter-bin barrier height.

  • num_bins (int) – Number of bins in the transition matrix.

Returns

t_matrix – A (num_bins x num_bins) tri-diagonal, row-normalized transition matrix.

Return type

np.ndarray

Parameters
  • height_ratio (float) –

  • num_bins (int) –

mr_toolkit.coarse_graining.msm_coarse_graining.build_occupancy(fg_matrix, initial_weights, cg_map, s, time_horizon)[source]

Builds the occupancy matrix as

\[\eqnbuildocc\]
Parameters
  • fg_matrix (np.ndarray) – The fine-grained matrix \(\Tfg\).

  • initial_weights (np.ndarray or list) – Vector of initial weights \(\wi\).

  • cg_map (list of lists) – List of all microstates in each macrostate.

  • s (int) – Maximum trajectory length \(S\).

  • time_horizon (int) – Time horizon \(TH\).

Returns

occ – The occupancy matrix, computed as above.

Return type

np.ndarray

Parameters
  • fg_matrix (ndarray) –

  • initial_weights (ndarray) –

  • cg_map (list) –

  • s (int) –

  • time_horizon (int) –

Trajectory Splicing

Code for splicing equilibrium trajectories into nonequilibrium steady-state trajectories.

mr_toolkit.reweighting.splicing.get_receiving_distribution(tmatrix, stationary, source_states)[source]

Estimates the “receiving distribution” for a given transition matrix.

The receiving distribution is the boundary distribution corresponding to where trajectories go one step after leaving the source states.

Recycling into the receiving distribution produces a nonequilibrium steady-state.

Parameters
  • tmatrix (array-like) – Transition matrix

  • stationary (array-like) – Stationary distribution of the transition matrix

  • source_states (array-like of int) – Set of source states

Return type

Receiving distribution

mr_toolkit.reweighting.splicing.iterative_trajectory_splicing(trajs, source_states, sink_states, n_clusters, splice_msm_lag=1, msm_reversible=False, target_steps_to_keep=1, convergence=1e-09, max_iterations=100)[source]

Performs trajectory splicing on a set of trajectories, like mr_toolkit.reweighting.splicing.splice_trajectories().

However, this function does it iteratively. The trajectories are spliced, and used to estimate the steady-state distribution. This is used to make a better estimate of the receiving distribution, which is then used for another round of splicing. This process repeats until convergence.

Parameters
  • trajs (2D array-like) – A set of discrete trajectories to splice into recycling boundary conditions

  • source_states (array-like) – Set of source states

  • sink_states (array-like) – Set of target/sink states

  • n_clusters (int) – Number of clusters present in the trajectory discretization

  • splice_msm_lag (int) – Lagtime for MSMs

  • msm_reversible (boolean) – Reversibility for MSM

  • target_steps_to_keep (int) – Number of steps after reaching the target to preserve. This should be left to 1, unless you know what you’re doing.

  • convergence (float) – Threshold for RMS change in reweighted stationary distribution estimates to consider iteration converged.

  • max_iterations (int) – Maximum number of iterations to perform.

Return type

Set of spliced trajectories

mr_toolkit.reweighting.splicing.splice_trajectories(trajs_to_splice, source_states, sink_states, n_clusters, msm_lag=1, msm_reversible=False, target_steps_to_keep=1, pbar_visible=True)[source]

Splices a set of trajectories to add recycling boundary conditions to all of them.

See mr_toolkit.reweighting.splicing.splice_trajectory() for more details.

Note that the splicing is done iteratively, in case the segment being spliced introduces another target entry.

Parameters
  • trajs_to_splice (2D array-like) – A set of discrete trajectories to splice into recycling boundary conditions

  • source_states (array-like) – Set of source states

  • sink_states (array-like) – Set of target/sink states

  • n_clusters (int) – Number of clusters present in the trajectory discretization

  • msm_lag (int) – Lagtime for MSMs

  • msm_reversible (boolean) – Reversibility for MSM

  • target_steps_to_keep (int) – Number of steps after reaching the target to preserve. This should be left to 1, unless you know what you’re doing.

  • pbar_visible (bool) – Show the progress bar during iteration

Return type

Set of spliced trajectories

mr_toolkit.reweighting.splicing.splice_trajectory(trajectory, splice_trajectories, target_states, recycling_states, recycling_probabilities, rng, target_steps_to_keep=1)[source]

“Splices”, or adds recycling boundary conditions to, a single discrete trajectory, using a set of discrete trajectories.

Splicing works by identifying the first point in the trajectory where it enters the target state. The M points remaining in the trajectory after this point are truncated.

Then, it chooses a new starting state in the source, according to the input probability distribution. A point in that state is chosen from the set of trajectories provided, and that point along with the following M-1 points are appended to (i.e., spliced on to) the truncated trajectory.

The final result is a trajectory of the same length as the input trajectory, but with recycling boundary conditions.

Parameters
  • trajectory (array-like of int) – A single discrete trajectory to add recycling boundary conditions to.

  • splice_trajectories (2D array-like) – A set of discrete trajectory, from which the splice segments are chosen.

  • target_states (array-like) – Set of target states.

  • recycling_states (array-like) – Set of source states, to recycle to.

  • recycling_probabilities (array-like) – Probability distribution of the source states.

  • rng (np.random.default_rng) – Random number generator to use.

  • target_steps_to_keep (int) – Number of steps after reaching the target to preserve. This should be left to 1, unless you know what you’re doing.

Return type

spliced trajectory, index of the point at which splicing was done

Reweighted MSM estimation

Code for reweighting MSMs and obtaining reweighted estimates of steady-state.

mr_toolkit.reweighting.analysis.compute_reweighted_stationary(discrete_trajectories, N, lag, n_clusters, last_frac=1.0, min_weight=1e-12, n_reweighting_iters=100)[source]

Estimates a stationary distribution from a discrete trajectory using reweighted MSMs.

Parameters
  • discrete_trajectories (array-like) – 2-D array or list of lists with discretized trajectories

  • N (int) – Fragment length for reweighting

  • lag (int) – Lagtime used in reweighting MSMs

  • n_clusters (int) – Number of total states in the reweighted models (or in the discretization)

  • last_frac (float) – Fraction of the trajectories to use. I.e., last_frac=0.25 only uses the last 1/4 of the trajectories.

  • min_weight (float) – Minimum bound on weights during reweighting iteration

  • n_reweighting_iters (int) – Maximum number of reweighting iterations

Returns

convergence, estimated transition matrices at each reweighting iteration)

Return type

(Set of state indices, Stationary distributions at each reweighting iteration, Total number of iterations before

mr_toolkit.reweighting.analysis.get_kl(test_dist, ref_dist, return_nan=False)[source]

Obtain the KL divergence between two distributions.

Parameters
  • test_dist – The distribution to test

  • ref_dist – The reference distribution

  • return_nan – If the KL divergence is invalid for some reason, return NaN if true or -1 otherwise.

Returns

The KL divergence of the two distributions. If invalid, -1 or NaN depending on the value of return_nan.

mr_toolkit.reweighting.analysis.get_set_kls(distributions)[source]

Get KL divergences between multiple sets of distributions.

I.e., a (4x10) input corresponds to 4x 10-element distributions. This would return an upper triangular 4x4 matrix with the unique pairwise KL-divergences.

Parameters

distributions

Stratified Clustering

Tools for performing stratified k-means clustering.

class mr_toolkit.clustering.StratifiedClusters(n_clusters, bin_bounds)[source]

Class for performing stratified k-means clustering.

Parameters
  • n_clusters (int) –

  • bin_bounds (Union[_SupportsArray[dtype], _NestedSequence[_SupportsArray[dtype]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]]) –

fit(data, coord_to_stratify=0)[source]

Fits the stratified clusterer model.

Parameters
  • data (Input points. Should be 2 dimensions, (frame, coordinates).)

  • coord_to_stratify (int, Coordinate to stratify on (i.e. traject)

Parameters
  • data (Union[_SupportsArray[dtype], _NestedSequence[_SupportsArray[dtype]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]]) –

  • coord_to_stratify (int) –

predict(data)[source]

Assigns stratified clusters to a set of input data.

Parameters

data (Array-like, The set of samples to assign to clusters)

Return type

Integer cluster assignments

Parameters

data (Union[_SupportsArray[dtype], _NestedSequence[_SupportsArray[dtype]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]]) –

remove_state(state_to_remove)[source]

Removes a cluster by index, and re-indexes the remaining clusters to be consecutive.

Parameters

state_to_remove (int, The index of the state to remove)

Return type

The index of the removed state, in the space of the ORIGINAL clustering the model was built with.

Parameters

state_to_remove (int) –