Massive univariate linear model.

Linear models for massive univariate statistics.

class mulm.models.MUOLS(Y, X)[source]¶

Mass-univariate linear modeling based Ordinary Least Squares. Given two arrays X (n_samples, p) and Y (n_samples, q). Fit q independent linear models, ie., for all y in Y fit: lm(y ~ X).

f_test(contrast, pval=False)[source]¶

Compute F-statistics (F-scores and p-value associated to contrast).: The code has been cloned from the SPM MATLAB implementation.

Parameters

contrasts: array (q, ) or list of arrays or array 2D.

Single contrast (array) or list of contrasts or array of contrasts. The k contrasts to be tested.

pval: boolean

compute pvalues (default is false)

two_tailed: boolean

one-tailed test or a two-tailed test (default True)

Returns

tstats (k, p) array, pvals (k, p) array, df (k,) array

fit(block=False, max_elements=134217728)[source]¶

Fit p independent linear models, ie., for all y in Y fit: lm(y ~ X).

Parameters

block : boolean

Use block=True for huge matrices Y. Operations block by block to optimize time and memory.

max_elements : int

block dimension (2**27 corresponds to 1Go)

Returns

self

predict(X)[source]¶

Predict Y given a new design matrix X.

Parameters: X : numpy array (n_samples, q)

design matrix of new predictors.
Returns: (n_samples, 1) array of predicted values (X beta)

stats_f_coefficients(X, Y, contrast, pval=False)[source]¶

t_test(contrasts, pval=False, two_tailed=True)[source]¶

Compute T-statistics (t-scores and p-value associated to contrast).: The code has been cloned from the SPM MATLAB implementation.

Parameters

contrasts: array (q, ) or list of arrays or array 2D.

Single contrast (array) or list of contrasts or array of contrasts. The k contrasts to be tested.

pval: boolean

compute pvalues (default is false)

two_tailed: boolean

one-tailed test or a two-tailed test (default True)

Returns

tstats (k, p) array, pvals (k, p) array, df (k,) array

t_test_maxT(contrasts, nperms=1000, two_tailed=True, **kwargs)[source]¶

Correct for multiple comparisons using Westfall and Young, 1993 a.k.a maxT procedure.

It is based on permutation tests procedure. This is the procedure used by FSL (https://fsl.fmrib.ox.ac.uk/).

It should be used when the test statistics, and hence the unadjusted p-values, are dependent. This is the case when groups of dependant variables (in Y) tend to have highly correlated measures. Westfall and Young (1993) proposed adjusted p-values for less conservative multiple testing procedures which take into account the dependence structure among test statistics. References: - Anderson M. Winkler “Statistical analysis of areal quantities in the brain through permutation tests” Ph.D 2017. - Dudoit et al. “Multiple Hypothesis Testing in Microarray Experiments”, Statist. Sci. 2003

Parameters

contrasts: array (q, ) or list of arrays or array 2D.

Single contrast (array) or list of contrasts or array of contrasts. The k contrasts to be tested.

nperms: int

permutation tests (default 1000).

two_tailed: boolean

one-tailed test or a two-tailed test (default True)

Returns

tstats (k, p) array, pvals (k, p) array corrected for multiple comparisons

df (k,) array.

Examples

>>> import numpy as np
>>> import mulm
>>> np.random.seed(42)
>>> # n_samples, nb of features that depends on X and that are pure noise
>>> n_samples, n_info, n_noise = 100, 2, 100
>>> beta = np.array([1, 0, 0.5, 0, 2])[:, np.newaxis]
>>> X = np.random.randn(n_samples, 5) # Design matrix
>>> X[:, -1] = 1 # Intercept
>>> Y = np.random.randn(n_samples, n_info + n_noise)
>>> Y[:, :n_info] += np.dot(X, beta) # n_info features depend from X
>>> contrasts = np.identity(X.shape[1])[:4] # don't test the intercept
>>> mod = mulm.MUOLS(Y, X).fit()
>>> tvals, pvals, df = mod.t_test(contrasts, two_tailed=True)
>>> print(pvals.shape)
(4, 102)
>>> print("Nb of uncorrected p-values <5%:", np.sum(pvals < 0.05))
Nb of uncorrected p-values <5%: 18
>>> tvals, pvals_corrmaxT, df = mod.t_test_maxT(contrasts, two_tailed=True)
>>> print("Nb of corrected pvalues <5%:", np.sum(pvals_corrmaxT < 0.05))
Nb of corrected pvalues <5%: 4

t_test_minP(contrasts, nperms=10000, two_tailed=True, **kwargs)[source]¶

Correct for multiple comparisons using minP procedure.

References: - Dudoit et al. “Multiple Hypothesis Testing in Microarray Experiments”, Statist. Sci. 2003

Parameters

contrasts: array (q, ) or list of arrays or array 2D.

Single contrast (array) or list of contrasts or array of contrasts. The k contrasts to be tested.

nperms: int

permutation tests (default 10000).

two_tailed: boolean

one-tailed test or a two-tailed test (default True)

Returns

tstats (k, p) array, pvals (k, p) array corrected for multiple comparisons

df (k,) array.

Examples

>>> import numpy as np
>>> import mulm
>>> np.random.seed(42)
>>> # n_samples, nb of features that depends on X and that are pure noise
>>> n_samples, n_info, n_noise = 100, 2, 100
>>> beta = np.array([1, 0, 0.5, 0, 2])[:, np.newaxis]
>>> X = np.random.randn(n_samples, 5) # Design matrix
>>> X[:, -1] = 1 # Intercept
>>> Y = np.random.randn(n_samples, n_info + n_noise)
>>> Y[:, :n_info] += np.dot(X, beta) # n_info features depend from X
>>> contrasts = np.identity(X.shape[1])[:4] # don't test the intercept
>>> mod = mulm.MUOLS(Y, X).fit()
>>> tvals, pvals, df = mod.t_test(contrasts, two_tailed=True)
>>> print(pvals.shape)
(4, 102)
>>> print("Nb of uncorrected p-values <5%:", np.sum(pvals < 0.05))
Nb of uncorrected p-values <5%: 18
>>> tvals, pval_corrminp, df = mod.t_test_minP(contrasts, two_tailed=True)
>>> print("Nb of corrected pvalues <5%:", np.sum(pval_corrminp < 0.05))
Nb of corrected pvalues <5%: 4

class mulm.models.MUPairwiseCorr(X, Y)[source]¶

Mass-univariate pairwise correlations. Given two arrays X (n_samples x p) and Y (n_samples x q). Fit p x q independent linear models. Prediction and stats return (p x q) array.

Examples

>>> import numpy as np
>>> from mulm import MUPairwiseCorr
>>> X = np.random.randn(10, 5)
>>> Y = np.random.randn(10, 3)
>>> corr = MUPairwiseCorr(X, Y)
>>> corr.fit()
<mulm.models.MUPairwiseCorr instance at 0x30da878>
>>> f, p = corr.stats_f()
>>> print(f.shape)
(5, 3)

fit()[source]¶

predict(X)[source]¶

stats_f(pval=True)[source]¶

Parameters: pval
Returns: fstats (k, p) array, pvals (k, p) array, df (k,) array

Massive univariate linear model.

Follow us