API Documentation
Main module with code for coarse-graining transition matrices and computing bin-weights. |
|
Code for splicing equilibrium trajectories into nonequilibrium steady-state trajectories. |
|
Code for reweighting MSMs and obtaining reweighted estimates of steady-state. |
|
Class for performing stratified k-means clustering. |
MSM Coarse-graining
Coarse-graining
Main module with code for coarse-graining transition matrices and computing bin-weights.
- mr_toolkit.coarse_graining.msm_coarse_graining.coarse_grain(P, cg_map, w, lag=1, normalize=True)[source]
Coarse-grains a fine-grained transition matrix according to some mapping of microstates to macrostates and weights over the microstates.
This is done according to
\[\eqncg\]- Parameters
P (np.ndarray) – Fine-grained transition matrix.
cg_map (list of lists) – List of all microstates in each macrostate.
w (array-like) – Microbin weights \(\wi\).
lag (int) – Lag for Markov model \(\lag\).
normalize (bool) – Normalize the resulting matrix over the weights. This should be off when building an occupancy matrix over many lags, because there the normalization is over all \(\wi\).
- Returns
p_matrix – Coarse-grained transition matrix \(\textbf{T}\).
- Return type
np.ndarray
- Parameters
P (ndarray) –
cg_map (ndarray) –
w (ndarray) –
lag (int) –
normalize (bool) –
Examples
To coarse-grain a 6x6 transition matrix P into a 4x4 by grouping the inner pairs of states (1+2 and 3+4) and leaving the edge states unchanged, one could do
>>> coarse_grain(P, [[0], [1,2], [2,3], [4]], w)
- mr_toolkit.coarse_graining.msm_coarse_graining.compute_avg_bin_weights(initial_weights, transition_matrix, max_s, lag=1, min_s=0, leave=False)[source]
Obtain the time-averaged bin weights for a lag of 1, described by
\[\eqnwi\]- Parameters
initial_weights (array-like) – List or array of initial microbin-weights.
transition_matrix (np.ndarray) – (n_states x n_states) Transition matrix.
max_s (int) – Maximum trajectory length \(S\)
lag (int) – Lag used for Markov model \(\lag\).
min_s (int) – Earliest trajectory point to use in sliding window calculation. Defaults to 0.
leave (bool) – Leave TQDM progress bar on completion
- Returns
wi_bar – List of time-averaged weights for each bin
- Return type
np.ndarray (n_states)
- Parameters
max_s (int) –
lag (int) –
min_s (int) –
Estimating observables
Main module with code for coarse-graining transition matrices and computing bin-weights.
- mr_toolkit.coarse_graining.msm_coarse_graining.get_comm(transition_matrix, statesA, statesB)[source]
Computes the committor for a given transition matrix, source states, and target states.
- Parameters
transition_matrix (np.ndarray) – Transition matrix.
statesA (array-like) – Source/origin state(s).
statesB (array-like) – Target state(s).
- Returns
committors – Array of committors to statesB for each bin.
- Return type
np.ndarray
- Raises
AssertionError – The solved committor distribution does not obey first-step stationarity.
- Parameters
transition_matrix (ndarray) –
statesA (list) –
statesB (list) –
- mr_toolkit.coarse_graining.msm_coarse_graining.get_equil(transition_matrix, normalize=True, _round=15)[source]
Computes the equilibrium distribution for an input transition matrix by taking the left-eigenvector of transition_matrix with an eigenvalue of 1.
- Parameters
transition_matrix (np.ndarray) – The transition matrix.
_round (int, optional (12)) – Number of decimal places of precision to keep in the equil distribution.
- Returns
equil – Equilibrium distribution for
- Return type
np.ndarray
- Parameters
transition_matrix (ndarray) –
normalize (bool) –
_round (int) –
- mr_toolkit.coarse_graining.msm_coarse_graining.get_hill_mfpt(ss_dist, T, target_mesostates)[source]
Compute the MFPT via the Hill relation.
From the Hill relation, the MFPT is the inverse flux into the target state, or
\[\hillrelation\]- Parameters
ss_dist (array-like) – Stationary distribution.
T (array-like) – Transition matrix. (What BCs?)
target_mesostates (array-like) – Target states for MFPT calculation.
- Returns
MFPT – First-passage time estimate.
- Return type
float
Constructing matrices
Main module with code for coarse-graining transition matrices and computing bin-weights.
- mr_toolkit.coarse_graining.msm_coarse_graining.build_fine_transition_matrix(height_ratio, num_bins)[source]
Generate a Markov transition matrix where each bin is height_ratio more likely to transition to itself than to its neighbor.
- Parameters
height_ratio (float) – Ratio of the transition probability to self vs to neighbor bin. This is a proxy for the inter-bin barrier height.
num_bins (int) – Number of bins in the transition matrix.
- Returns
t_matrix – A (num_bins x num_bins) tri-diagonal, row-normalized transition matrix.
- Return type
np.ndarray
- Parameters
height_ratio (float) –
num_bins (int) –
- mr_toolkit.coarse_graining.msm_coarse_graining.build_occupancy(fg_matrix, initial_weights, cg_map, s, time_horizon)[source]
Builds the occupancy matrix as
\[\eqnbuildocc\]- Parameters
fg_matrix (np.ndarray) – The fine-grained matrix \(\Tfg\).
initial_weights (np.ndarray or list) – Vector of initial weights \(\wi\).
cg_map (list of lists) – List of all microstates in each macrostate.
s (int) – Maximum trajectory length \(S\).
time_horizon (int) – Time horizon \(TH\).
- Returns
occ – The occupancy matrix, computed as above.
- Return type
np.ndarray
- Parameters
fg_matrix (ndarray) –
initial_weights (ndarray) –
cg_map (list) –
s (int) –
time_horizon (int) –
Trajectory Splicing
Code for splicing equilibrium trajectories into nonequilibrium steady-state trajectories.
- mr_toolkit.reweighting.splicing.get_receiving_distribution(tmatrix, stationary, source_states)[source]
Estimates the “receiving distribution” for a given transition matrix.
The receiving distribution is the boundary distribution corresponding to where trajectories go one step after leaving the source states.
Recycling into the receiving distribution produces a nonequilibrium steady-state.
- Parameters
tmatrix (array-like) – Transition matrix
stationary (array-like) – Stationary distribution of the transition matrix
source_states (array-like of int) – Set of source states
- Return type
Receiving distribution
- mr_toolkit.reweighting.splicing.iterative_trajectory_splicing(trajs, source_states, sink_states, n_clusters, splice_msm_lag=1, msm_reversible=False, target_steps_to_keep=1, convergence=1e-09, max_iterations=100)[source]
Performs trajectory splicing on a set of trajectories, like
mr_toolkit.reweighting.splicing.splice_trajectories().However, this function does it iteratively. The trajectories are spliced, and used to estimate the steady-state distribution. This is used to make a better estimate of the receiving distribution, which is then used for another round of splicing. This process repeats until convergence.
- Parameters
trajs (2D array-like) – A set of discrete trajectories to splice into recycling boundary conditions
source_states (array-like) – Set of source states
sink_states (array-like) – Set of target/sink states
n_clusters (int) – Number of clusters present in the trajectory discretization
splice_msm_lag (int) – Lagtime for MSMs
msm_reversible (boolean) – Reversibility for MSM
target_steps_to_keep (int) – Number of steps after reaching the target to preserve. This should be left to 1, unless you know what you’re doing.
convergence (float) – Threshold for RMS change in reweighted stationary distribution estimates to consider iteration converged.
max_iterations (int) – Maximum number of iterations to perform.
- Return type
Set of spliced trajectories
- mr_toolkit.reweighting.splicing.splice_trajectories(trajs_to_splice, source_states, sink_states, n_clusters, msm_lag=1, msm_reversible=False, target_steps_to_keep=1, pbar_visible=True)[source]
Splices a set of trajectories to add recycling boundary conditions to all of them.
See
mr_toolkit.reweighting.splicing.splice_trajectory()for more details.Note that the splicing is done iteratively, in case the segment being spliced introduces another target entry.
- Parameters
trajs_to_splice (2D array-like) – A set of discrete trajectories to splice into recycling boundary conditions
source_states (array-like) – Set of source states
sink_states (array-like) – Set of target/sink states
n_clusters (int) – Number of clusters present in the trajectory discretization
msm_lag (int) – Lagtime for MSMs
msm_reversible (boolean) – Reversibility for MSM
target_steps_to_keep (int) – Number of steps after reaching the target to preserve. This should be left to 1, unless you know what you’re doing.
pbar_visible (bool) – Show the progress bar during iteration
- Return type
Set of spliced trajectories
- mr_toolkit.reweighting.splicing.splice_trajectory(trajectory, splice_trajectories, target_states, recycling_states, recycling_probabilities, rng, target_steps_to_keep=1)[source]
“Splices”, or adds recycling boundary conditions to, a single discrete trajectory, using a set of discrete trajectories.
Splicing works by identifying the first point in the trajectory where it enters the target state. The M points remaining in the trajectory after this point are truncated.
Then, it chooses a new starting state in the source, according to the input probability distribution. A point in that state is chosen from the set of trajectories provided, and that point along with the following M-1 points are appended to (i.e., spliced on to) the truncated trajectory.
The final result is a trajectory of the same length as the input trajectory, but with recycling boundary conditions.
- Parameters
trajectory (array-like of int) – A single discrete trajectory to add recycling boundary conditions to.
splice_trajectories (2D array-like) – A set of discrete trajectory, from which the splice segments are chosen.
target_states (array-like) – Set of target states.
recycling_states (array-like) – Set of source states, to recycle to.
recycling_probabilities (array-like) – Probability distribution of the source states.
rng (np.random.default_rng) – Random number generator to use.
target_steps_to_keep (int) – Number of steps after reaching the target to preserve. This should be left to 1, unless you know what you’re doing.
- Return type
spliced trajectory, index of the point at which splicing was done
Reweighted MSM estimation
Code for reweighting MSMs and obtaining reweighted estimates of steady-state.
- mr_toolkit.reweighting.analysis.compute_reweighted_stationary(discrete_trajectories, N, lag, n_clusters, last_frac=1.0, min_weight=1e-12, n_reweighting_iters=100)[source]
Estimates a stationary distribution from a discrete trajectory using reweighted MSMs.
- Parameters
discrete_trajectories (array-like) – 2-D array or list of lists with discretized trajectories
N (int) – Fragment length for reweighting
lag (int) – Lagtime used in reweighting MSMs
n_clusters (int) – Number of total states in the reweighted models (or in the discretization)
last_frac (float) – Fraction of the trajectories to use. I.e., last_frac=0.25 only uses the last 1/4 of the trajectories.
min_weight (float) – Minimum bound on weights during reweighting iteration
n_reweighting_iters (int) – Maximum number of reweighting iterations
- Returns
convergence, estimated transition matrices at each reweighting iteration)
- Return type
(Set of state indices, Stationary distributions at each reweighting iteration, Total number of iterations before
- mr_toolkit.reweighting.analysis.get_kl(test_dist, ref_dist, return_nan=False)[source]
Obtain the KL divergence between two distributions.
- Parameters
test_dist – The distribution to test
ref_dist – The reference distribution
return_nan – If the KL divergence is invalid for some reason, return NaN if true or -1 otherwise.
- Returns
The KL divergence of the two distributions. If invalid, -1 or NaN depending on the value of return_nan.
- mr_toolkit.reweighting.analysis.get_set_kls(distributions)[source]
Get KL divergences between multiple sets of distributions.
I.e., a (4x10) input corresponds to 4x 10-element distributions. This would return an upper triangular 4x4 matrix with the unique pairwise KL-divergences.
- Parameters
distributions
Stratified Clustering
Tools for performing stratified k-means clustering.
- class mr_toolkit.clustering.StratifiedClusters(n_clusters, bin_bounds)[source]
Class for performing stratified k-means clustering.
- Parameters
n_clusters (int) –
bin_bounds (Union[_SupportsArray[dtype], _NestedSequence[_SupportsArray[dtype]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]]) –
- fit(data, coord_to_stratify=0)[source]
Fits the stratified clusterer model.
- Parameters
data (Input points. Should be 2 dimensions, (frame, coordinates).)
coord_to_stratify (int, Coordinate to stratify on (i.e. traject)
- Parameters
data (Union[_SupportsArray[dtype], _NestedSequence[_SupportsArray[dtype]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]]) –
coord_to_stratify (int) –
- predict(data)[source]
Assigns stratified clusters to a set of input data.
- Parameters
data (Array-like, The set of samples to assign to clusters)
- Return type
Integer cluster assignments
- Parameters
data (Union[_SupportsArray[dtype], _NestedSequence[_SupportsArray[dtype]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]]) –
- remove_state(state_to_remove)[source]
Removes a cluster by index, and re-indexes the remaining clusters to be consecutive.
- Parameters
state_to_remove (int, The index of the state to remove)
- Return type
The index of the removed state, in the space of the ORIGINAL clustering the model was built with.
- Parameters
state_to_remove (int) –