API Documentation

`mr_toolkit.coarse_graining.msm_coarse_graining`	Main module with code for coarse-graining transition matrices and computing bin-weights.
`mr_toolkit.reweighting.splicing`	Code for splicing equilibrium trajectories into nonequilibrium steady-state trajectories.
`mr_toolkit.reweighting.analysis`	Code for reweighting MSMs and obtaining reweighted estimates of steady-state.
`mr_toolkit.clustering.StratifiedClusters`(...)	Class for performing stratified k-means clustering.

MSM Coarse-graining

Coarse-graining

Main module with code for coarse-graining transition matrices and computing bin-weights.

mr_toolkit.coarse_graining.msm_coarse_graining.coarse_grain(P, cg_map, w, lag=1, normalize=True)[source]

Coarse-grains a fine-grained transition matrix according to some mapping of microstates to macrostates and weights over the microstates.

This is done according to

\[\eqncg\]

Parameters

P (np.ndarray) – Fine-grained transition matrix.
cg_map (list of lists) – List of all microstates in each macrostate.
w (array-like) – Microbin weights \(\wi\).
lag (int) – Lag for Markov model \(\lag\).
normalize (bool) – Normalize the resulting matrix over the weights. This should be off when building an occupancy matrix over many lags, because there the normalization is over all \(\wi\).

Returns

p_matrix – Coarse-grained transition matrix \(\textbf{T}\).

Return type

np.ndarray

Parameters

P (ndarray) –
cg_map (ndarray) –
w (ndarray) –
lag (int) –
normalize (bool) –

Examples

To coarse-grain a 6x6 transition matrix P into a 4x4 by grouping the inner pairs of states (1+2 and 3+4) and leaving the edge states unchanged, one could do

>>> coarse_grain(P, [[0], [1,2], [2,3], [4]], w)

mr_toolkit.coarse_graining.msm_coarse_graining.compute_avg_bin_weights(initial_weights, transition_matrix, max_s, lag=1, min_s=0, leave=False)[source]

Obtain the time-averaged bin weights for a lag of 1, described by

\[\eqnwi\]

Parameters

initial_weights (array-like) – List or array of initial microbin-weights.
transition_matrix (np.ndarray) – (n_states x n_states) Transition matrix.
max_s (int) – Maximum trajectory length \(S\)
lag (int) – Lag used for Markov model \(\lag\).
min_s (int) – Earliest trajectory point to use in sliding window calculation. Defaults to 0.
leave (bool) – Leave TQDM progress bar on completion

Returns

wi_bar – List of time-averaged weights for each bin

Return type

np.ndarray (n_states)

Parameters

max_s (int) –
lag (int) –
min_s (int) –

Estimating observables

Main module with code for coarse-graining transition matrices and computing bin-weights.

mr_toolkit.coarse_graining.msm_coarse_graining.get_comm(transition_matrix, statesA, statesB)[source]

Computes the committor for a given transition matrix, source states, and target states.

Parameters

transition_matrix (np.ndarray) – Transition matrix.
statesA (array-like) – Source/origin state(s).
statesB (array-like) – Target state(s).

Returns

committors – Array of committors to statesB for each bin.

Return type

np.ndarray

Raises

AssertionError – The solved committor distribution does not obey first-step stationarity.

Parameters

transition_matrix (ndarray) –
statesA (list) –
statesB (list) –

mr_toolkit.coarse_graining.msm_coarse_graining.get_equil(transition_matrix, normalize=True, _round=15)[source]

Computes the equilibrium distribution for an input transition matrix by taking the left-eigenvector of transition_matrix with an eigenvalue of 1.

Parameters

transition_matrix (np.ndarray) – The transition matrix.
_round (int, optional (12)) – Number of decimal places of precision to keep in the equil distribution.

Returns

equil – Equilibrium distribution for

Return type

np.ndarray

Parameters

transition_matrix (ndarray) –
normalize (bool) –
_round (int) –

mr_toolkit.coarse_graining.msm_coarse_graining.get_hill_mfpt(ss_dist, T, target_mesostates)[source]

Compute the MFPT via the Hill relation.

From the Hill relation, the MFPT is the inverse flux into the target state, or

\[\hillrelation\]

Parameters

ss_dist (array-like) – Stationary distribution.
T (array-like) – Transition matrix. (What BCs?)
target_mesostates (array-like) – Target states for MFPT calculation.

Returns

MFPT – First-passage time estimate.

Return type

float

mr_toolkit.coarse_graining.msm_coarse_graining.get_naive_hill_mfpt(T_ss, ss_dist, target_mesostates, all_other_states)[source]: This SHOULD just be an explicit, un-optimized version of get_hill_mfpt to make sure I got the linear algebra right

Constructing matrices

Main module with code for coarse-graining transition matrices and computing bin-weights.

mr_toolkit.coarse_graining.msm_coarse_graining.build_fine_transition_matrix(height_ratio, num_bins)[source]

Generate a Markov transition matrix where each bin is height_ratio more likely to transition to itself than to its neighbor.

Parameters

height_ratio (float) – Ratio of the transition probability to self vs to neighbor bin. This is a proxy for the inter-bin barrier height.
num_bins (int) – Number of bins in the transition matrix.

Returns

t_matrix – A (num_bins x num_bins) tri-diagonal, row-normalized transition matrix.

Return type

np.ndarray

Parameters

height_ratio (float) –
num_bins (int) –

mr_toolkit.coarse_graining.msm_coarse_graining.build_occupancy(fg_matrix, initial_weights, cg_map, s, time_horizon)[source]

Builds the occupancy matrix as

\[\eqnbuildocc\]

Parameters

fg_matrix (np.ndarray) – The fine-grained matrix \(\Tfg\).
initial_weights (np.ndarray or list) – Vector of initial weights \(\wi\).
cg_map (list of lists) – List of all microstates in each macrostate.
s (int) – Maximum trajectory length \(S\).
time_horizon (int) – Time horizon \(TH\).

Returns

occ – The occupancy matrix, computed as above.

Return type

np.ndarray

Parameters

fg_matrix (ndarray) –
initial_weights (ndarray) –
cg_map (list) –
s (int) –
time_horizon (int) –

Trajectory Splicing

Code for splicing equilibrium trajectories into nonequilibrium steady-state trajectories.

mr_toolkit.reweighting.splicing.get_receiving_distribution(tmatrix, stationary, source_states)[source]

Estimates the “receiving distribution” for a given transition matrix.

The receiving distribution is the boundary distribution corresponding to where trajectories go one step after leaving the source states.

Recycling into the receiving distribution produces a nonequilibrium steady-state.

Parameters

tmatrix (array-like) – Transition matrix
stationary (array-like) – Stationary distribution of the transition matrix
source_states (array-like of int) – Set of source states

Return type

Receiving distribution

mr_toolkit.reweighting.splicing.iterative_trajectory_splicing(trajs, source_states, sink_states, n_clusters, splice_msm_lag=1, msm_reversible=False, target_steps_to_keep=1, convergence=1e-09, max_iterations=100)[source]

Performs trajectory splicing on a set of trajectories, like mr_toolkit.reweighting.splicing.splice_trajectories().

However, this function does it iteratively. The trajectories are spliced, and used to estimate the steady-state distribution. This is used to make a better estimate of the receiving distribution, which is then used for another round of splicing. This process repeats until convergence.

Parameters

trajs (2D array-like) – A set of discrete trajectories to splice into recycling boundary conditions
source_states (array-like) – Set of source states
sink_states (array-like) – Set of target/sink states
n_clusters (int) – Number of clusters present in the trajectory discretization
splice_msm_lag (int) – Lagtime for MSMs
msm_reversible (boolean) – Reversibility for MSM
target_steps_to_keep (int) – Number of steps after reaching the target to preserve. This should be left to 1, unless you know what you’re doing.
convergence (float) – Threshold for RMS change in reweighted stationary distribution estimates to consider iteration converged.
max_iterations (int) – Maximum number of iterations to perform.

Return type

Set of spliced trajectories

mr_toolkit.reweighting.splicing.splice_trajectories(trajs_to_splice, source_states, sink_states, n_clusters, msm_lag=1, msm_reversible=False, target_steps_to_keep=1, pbar_visible=True)[source]

Splices a set of trajectories to add recycling boundary conditions to all of them.

See mr_toolkit.reweighting.splicing.splice_trajectory() for more details.

Note that the splicing is done iteratively, in case the segment being spliced introduces another target entry.

Parameters

trajs_to_splice (2D array-like) – A set of discrete trajectories to splice into recycling boundary conditions
source_states (array-like) – Set of source states
sink_states (array-like) – Set of target/sink states
n_clusters (int) – Number of clusters present in the trajectory discretization
msm_lag (int) – Lagtime for MSMs
msm_reversible (boolean) – Reversibility for MSM
target_steps_to_keep (int) – Number of steps after reaching the target to preserve. This should be left to 1, unless you know what you’re doing.
pbar_visible (bool) – Show the progress bar during iteration

Return type

Set of spliced trajectories

mr_toolkit.reweighting.splicing.splice_trajectory(trajectory, splice_trajectories, target_states, recycling_states, recycling_probabilities, rng, target_steps_to_keep=1)[source]

“Splices”, or adds recycling boundary conditions to, a single discrete trajectory, using a set of discrete trajectories.

Splicing works by identifying the first point in the trajectory where it enters the target state. The M points remaining in the trajectory after this point are truncated.

Then, it chooses a new starting state in the source, according to the input probability distribution. A point in that state is chosen from the set of trajectories provided, and that point along with the following M-1 points are appended to (i.e., spliced on to) the truncated trajectory.

The final result is a trajectory of the same length as the input trajectory, but with recycling boundary conditions.

Parameters

trajectory (array-like of int) – A single discrete trajectory to add recycling boundary conditions to.
splice_trajectories (2D array-like) – A set of discrete trajectory, from which the splice segments are chosen.
target_states (array-like) – Set of target states.
recycling_states (array-like) – Set of source states, to recycle to.
recycling_probabilities (array-like) – Probability distribution of the source states.
rng (np.random.default_rng) – Random number generator to use.
target_steps_to_keep (int) – Number of steps after reaching the target to preserve. This should be left to 1, unless you know what you’re doing.

Return type

spliced trajectory, index of the point at which splicing was done

Reweighted MSM estimation

Code for reweighting MSMs and obtaining reweighted estimates of steady-state.

mr_toolkit.reweighting.analysis.compute_reweighted_stationary(discrete_trajectories, N, lag, n_clusters, last_frac=1.0, min_weight=1e-12, n_reweighting_iters=100)[source]

Estimates a stationary distribution from a discrete trajectory using reweighted MSMs.

Parameters

discrete_trajectories (array-like) – 2-D array or list of lists with discretized trajectories
N (int) – Fragment length for reweighting
lag (int) – Lagtime used in reweighting MSMs
n_clusters (int) – Number of total states in the reweighted models (or in the discretization)
last_frac (float) – Fraction of the trajectories to use. I.e., last_frac=0.25 only uses the last 1/4 of the trajectories.
min_weight (float) – Minimum bound on weights during reweighting iteration
n_reweighting_iters (int) – Maximum number of reweighting iterations

Returns

convergence, estimated transition matrices at each reweighting iteration)

Return type

(Set of state indices, Stationary distributions at each reweighting iteration, Total number of iterations before

mr_toolkit.reweighting.analysis.get_kl(test_dist, ref_dist, return_nan=False)[source]

Obtain the KL divergence between two distributions.

Parameters

test_dist – The distribution to test
ref_dist – The reference distribution
return_nan – If the KL divergence is invalid for some reason, return NaN if true or -1 otherwise.

Returns

The KL divergence of the two distributions. If invalid, -1 or NaN depending on the value of return_nan.

mr_toolkit.reweighting.analysis.get_set_kls(distributions)[source]

Get KL divergences between multiple sets of distributions.

I.e., a (4x10) input corresponds to 4x 10-element distributions. This would return an upper triangular 4x4 matrix with the unique pairwise KL-divergences.

Parameters: distributions

Stratified Clustering

Tools for performing stratified k-means clustering.

class mr_toolkit.clustering.StratifiedClusters(n_clusters, bin_bounds)[source]

Class for performing stratified k-means clustering.

Parameters

n_clusters (int) –
bin_bounds (Union[_SupportsArray[dtype], _NestedSequence[_SupportsArray[dtype]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]]) –

fit(data, coord_to_stratify=0)[source]

Fits the stratified clusterer model.

Parameters

data (Input points. Should be 2 dimensions, (frame, coordinates).)
coord_to_stratify (int, Coordinate to stratify on (i.e. traject)

Parameters

data (Union[_SupportsArray[dtype], _NestedSequence[_SupportsArray[dtype]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]]) –
coord_to_stratify (int) –

predict(data)[source]

Assigns stratified clusters to a set of input data.

Parameters: data (Array-like, The set of samples to assign to clusters)
Return type: Integer cluster assignments
Parameters: data (Union[_SupportsArray[dtype], _NestedSequence[_SupportsArray[dtype]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]]) –

remove_state(state_to_remove)[source]

Removes a cluster by index, and re-indexes the remaining clusters to be consecutive.

Parameters: state_to_remove (int, The index of the state to remove)
Return type: The index of the removed state, in the space of the ORIGINAL clustering the model was built with.
Parameters: state_to_remove (int) –