API Reference¶

This page provides an auto-generated summary of HYENNA’s API. For more details and examples, refer to the rest of the documentation.

Estimators¶

hyeenna.estimators.conditional_entropy(X: numpy.array, Y: numpy.array, k: int = 5) → float[source]¶

Computes the conditional Shannon entropy of a sample of a random variable X given another sample of a random variable Y using an adaptation of the KL and KSG estimators

Parameters:	X (np.array) – Sample from a random variable Y (np.array) – Sample from a random variable k (int, optional) – Number of neighbors to use in estimation
Returns:	cent – estimated conditional entropy
Return type:	float

References

[0]	Goria, M. N., Leonenko, N. N., Mergel, V. V., & Inverardi, P. L. N. (2005). A new class of random vector entropy estimators and its applications in testing statistical hypotheses. Journal of Nonparametric Statistics, 17(3), 277–297.

[1]	Kraskov, A., Stögbauer, H., & Grassberger, P. (2004). Estimating mutual information. Physical Review E - Statistical Physics, Plasmas, Fluids, and Related Interdisciplinary Topics, 69(6), 16. https://doi.org/10.1103/PhysRevE.69.066138

hyeenna.estimators.conditional_mutual_info(X: numpy.array, Y: numpy.array, Z: numpy.array, k: int = 5) → float[source]¶

Compute the conditional mutual information

Parameters:	X (np.array) – Sample from random variable X Y (np.array) – Sample from random variable Y Z (np.array) – Sample from random variable Z k (int, optional) – Number of neighbors to use in estimation
Returns:
Return type:	estimated conditional mutual information

References

[0]	Vlachos, I., & Kugiumtzis, D. (2010). Non-uniform state space reconstruction and coupling detection. https://doi.org/10.1103/PhysRevE.82.016207

hyeenna.estimators.conditional_transfer_entropy(X: numpy.array, Y: numpy.array, Z: numpy.array, tau: int = 1, omega: int = 1, nu: int = 1, k: int = 1, l: int = 1, m: int = 1, neighbors: int = 5, **kwargs) → float[source]¶

Compute the transfer entropy from a source variable, X, to a target variable, Y, conditioned on other variables contained in Z.

Parameters:	X (np.array) – Source sample from a random variable X Y (np.array) – Target sample from a random variable Y Z (np.array) – Conditioning variable(s). tau (int (default: 1)) – Number of timestep lags for the source variable omega (int (default: 1)) – Number of timestep lags for the target variable conditioning nu (int (default: 1)) – Number of timestep lags for the source variable conditioning k (int (default: 1)) – Width of window for the source variable. l (int (default: 1)) – Width of window for the target variable conditioning. m (int (default: 1)) – Width of window for the source variable conditioning. neighbors (int (default: K)) – Parameter controlling the number of neighbors to use in estimation. **kwargs – Other arguments (undocumented, for internal usage)
Returns:	conditional_transfer_entropy – Computed via conditional_mutual_info
Return type:	float

References

[0]	Schreiber, T. (2000). Measuring information transfer. Physical Review Letters, 85(2), 461–464. https://doi.org/10.1103/PhysRevLett.85.461

hyeenna.estimators.entropy(X: numpy.array, k: int = 5) → float[source]¶

Computes the Shannon entropy of a random variable X using the KL nearest neighbor estimator.

The formula is given by:

$$ hat{H}(X) = psi(N) - psi(k) + log(C_d) + d langle log(epsilon)

angle

$$

where

$N$ is the number of samples
$k$ is the number of neighbors
$psi is the digamma function
$

angle cdot angle$ is the mean

$epsilon_i$ is the 2 times the distance to the $k^{th}$ nearest neighbor.

X: np.array

Sample from a random variable

k: int, optional

Number of neighbors to use in estimation

ent: float

estimated entropy

[0] Goria, M. N., Leonenko, N. N., Mergel, V. V., & Inverardi, P. L. N. (2005). A new class of random vector entropy estimators and its applications in testing statistical hypotheses. Journal of Nonparametric Statistics, 17(3), 277–297. https://doi.org/10.1080/104852504200026815

hyeenna.estimators.kl_divergence(P: numpy.array, Q: numpy.array, k: int = 5)[source]¶

Compute the KL divergence

Parameters:	P (np.array) – Sample from random variable P Q (np.array) – Sample from random variable Q k (int, optional) – Number of neighbors to use in estimation
Returns:
Return type:	estimated KL divergence D(P\|Q)

References

[0]	Wang, Q., Kulkarni, S. R., & Verdu, S. (2006). A Nearest-Neighbor Approach to Estimating Divergence between Continuous Random Vectors. In 2006 IEEE International Symposium on Information Theory. https://doi.org/10.1109/ISIT.2006.261842

hyeenna.estimators.marginal_neighbors(X: numpy.array, R: numpy.array, metric='chebyshev') → list[source]¶: Number of neighbors within a certain radius

hyeenna.estimators.mi_local_nonuniformity_correction(X, *args, k: int = 5, alpha=1.05, **kwargs)[source]¶

Compute the local nonuniformity correction factor for strongly dependent variables. This correction is calculated based on the structure of the space of k-nearest neighbors. The volume of the hyper-rectangle of the maximum-norm bounding box for the k-nearest neighbor estimation is compared to that of the hyper-rectangle bounding the principal components of the covariance matrix of the k-nearest neighbor locations.

Parameters:	X (np.array) – A sample from a random variable args (List[np.array]) – Samples from random variables k (int, optional) – Number of neighbors to use in estimation. alpha* (float, optional) – Sensitivity parameter for filtering non-dependent volumes *kwargs (np.array*) – Samples from random variables
Returns:	lnc – The correction factor to be subtracted from the mutual information
Return type:	float

References

[0]	Gao, S., Steeg, G. V., & Galstyan, A. (2014). Efficient

Estimation of Mutual Information for Strongly Dependent Variables. Retrieved from https://arxiv.org/abs/1411.2003v3

hyeenna.estimators.multi_mutual_info(X: numpy.array, *args, k: int = 5, **kwargs) → float[source]¶

Computes the multivariate mututal information of several random variables using the KSG nearest neighbor estimator.

The formula is given by:

$$ hat{I}(X_1,…,X_m) = (m-1)cdotpsi(N) + psi(k) -

rac{m-1}{k}

langle psi(n_{X_1} +1) + … + psi(n_{X_m} +1)

angle

$$

where

$N$ is the number of samples
$m$ is the number of variables
$k$ is the number of neighbors
$psi is the digamma function
$

angle cdot angle$ is the mean

$

_i$ is the number of points within the distance of

the $k^{th}$ nearest neighbor when projected into the subspace spanned by $i$.

X: np.array: A sample from a random variable
*args: List[np.array]: Samples from random variables
k: int, optional: Number of neighbors to use in estimation.
**kwargs: np.array: Samples from random variables

mi: float: The mutual information

[0]	Kraskov, A., Stögbauer, H., & Grassberger, P. (2004). Estimating mutual information. Physical Review E - Statistical Physics, Plasmas, Fluids, and Related Interdisciplinary Topics, 69(6), 16. https://doi.org/10.1103/PhysRevE.69.066138

hyeenna.estimators.mutual_info(X: numpy.array, Y: numpy.array, k: int = 5) → float[source]¶

Computes the Mututal information of two random variables, X and Y, using the KSG nearest neighbor estimator.

The formula is given by:

$$ hat{I}(X,Y) = psi(N) + psi(k) -

rac{1}{k}

langle psi(n_X +1) + psi(n_Y +1)

angle

$$

where

$N$ is the number of samples
$k$ is the number of neighbors
$psi is the digamma function
$

angle cdot angle$ is the mean

$

_i$ is the number of points within the distance of

the $k^{th}$ nearest neighbor when projected into the subspace spanned by $i$.

X: np.array: A sample from a random variable
Y: np.array: A sample from a random variable
k: int, optional: Number of neighbors to use in estimation.

mi: float: The mutual information

[0]	Kraskov, A., Stögbauer, H., & Grassberger, P. (2004). Estimating mutual information. Physical Review E - Statistical Physics, Plasmas, Fluids, and Related Interdisciplinary Topics, 69(6), 16. https://doi.org/10.1103/PhysRevE.69.066138

hyeenna.estimators.nearest_distances(X: numpy.array, Y: numpy.array = None, k: int = 5, metric='chebyshev') → list[source]¶: Distance to the kth nearest neighbor

hyeenna.estimators.nearest_distances_vec(X: numpy.array, Y: numpy.array = None, k: int = 5, metric='chebyshev') → numpy.array[source]¶: Find vector distance to all k nearest neighbors

hyeenna.estimators.transfer_entropy(X: numpy.array, Y: numpy.array, tau: int = 1, omega: int = 1, k: int = 1, l: int = 1, neighbors: int = 5, **kwargs) → float[source]¶

Compute the transfer entropy from a source variable, X, to a target variable, Y.

Parameters:	X (np.array) – Source sample from a random variable X Y (np.array) – Target sample from a random variable Y tau (int (default: 1)) – Number of timestep lags for the source variable omega (int (default: 1)) – Number of timestep lags for the target variable conditioning k (int (default: 1)) – Width of window for the source variable. l (int (default: 1)) – Width of window for the target variable conditioning. neighbors (int (default: K)) – Parameter controlling the number of neighbors to use in estimation. **kwargs – Other arguments (undocumented, for internal usage)
Returns:	transfer_entropy – Computed via conditional_mutual_info
Return type:	float

References

[0]	Schreiber, T. (2000). Measuring information transfer. Physical Review Letters, 85(2), 461–464. https://doi.org/10.1103/PhysRevLett.85.461

Analysis¶

hyeenna.analysis.estimate_info_transfer_network(varlist: list, names: list, tau: int = 1, omega: int = 1, nu: int = 1, k: int = 1, l: int = 1, m: int = 1, condition: bool = True, nruns: int = 10, sample_size: int = 3000) → pandas.core.frame.DataFrame[source]¶

Compute the pairwise transfer entropy for a list of given variables, resulting in an information transfer network.

Parameters:	varlist (list) – List of given variable data names (list) – List of names corresponding to the data given in varlist tau (int (default=1)) – Lag value for source variables omega (int (default=1)) – Lag value for conditioning target variable history nu (int (default=1)) – Lag value for conditioning source variable histories k (int (default=1)) – Window length for source variables (applied to the same variable as the tau parameter) l (int (default=1)) – Window length for target variable histories (applied to the same variable as the omega parameter) m (int (default=1)) – Window length for source conditioning variables (applied to the same variable as the nu parameter) condition (bool (default=False)) – Whether to condition on all variables, or just the target variable history. nruns (int (default=10)) – Number of samples to compute for each connection. The median value is reported. sample_size (int (default=3000)) – Size of samples to take during estimation of transfer entropy.
Returns:	df – Dataframe representing the information transfer network. Both rows and columns are populated with the given names.
Return type:	pd.DataFrame

hyeenna.analysis.estimate_timescales(X: numpy.ndarray, Y: numpy.ndarray, lag_list: list, window_list: list, sample_size: int = 5000) → pandas.core.frame.DataFrame[source]¶

Compute the transfer entropy (TE) over a range of lag counts and window sizes.

Parameters:	X (np.array) – Source data Y (np.array) – Target data lag_list (list) – A list enumerating the lag counts to compute TE with window_list (list) – A list enumerating the window widths to compute TE with sample_size (int) – Number of samples to use when computing TE
Returns:	out – A dataframe containing the computed transfer entropies for every combination of lag and window given in the input parameters
Return type:	pd.DataFrame

hyeenna.analysis.estimator_stats(estimator: callable, data: dict, params: dict, nruns: int = 10, sample_size: int = 3000) → dict[source]¶

Compute some statistics about a given estimator.

Parameters:	estimator (callable) – The estimator to compute statistics on. Suggested to be from the HYEENNA library. data (dict) – Input data to feed into the estimator params (dict) – Parameters to feed into the estimator nruns (int (default: 10)) – Number of times to run the estimator. sample_size (int (default 3000)) – Size of sample to draw from data to feed into the estimator
Returns:	stats – A dictionary containing sample statistics along with the actual results from each run of the estimator.
Return type:	dict

hyeenna.analysis.shuffle_test(estimator: callable, data: dict, params: dict, confidence: float = 0.99, nruns: int = 10, sample_size: int = 3000) → dict[source]¶

Compute a one tailed Z test against a sample of shuffled surrogates.

Parameters:	estimator (callable) – The estimator to compute statistics on. Suggested to be from the HYEENNA library. data (dict) – Input data to feed into the estimator params (dict) – Parameters to feed into the estimator confidence (float (default: 0.99)) – Confidence level to conduct the test at. nruns (int (default: 10)) – Number of times to run the estimator. sample_size (int (default: 3000)) – Size of sample to draw from data to feed into the estimator
Returns:	stats – A dictionary with statistics from the standard estimator_stats function along with statistics computed on the shuffled surrogates. Most importantly are the ‘test_value’ and ‘significant’ keys, which are the value to perform the test on, along with whether the test result was significantly significant at the given confidence level.
Return type:	dict

Plotting¶

class hyeenna.plot.NumpyEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]¶

Credit: https://stackoverflow.com/questions/26646362/numpy-array-is-not-json-serializable

default(obj)[source]¶

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

For example, to support arbitrary iterators, you could implement default like this:

def default(self, o):
    try:
        iterable = iter(o)
    except TypeError:
        pass
    else:
        return list(iterable)
    # Let the base class default method raise the TypeError
    return JSONEncoder.default(self, o)