Massive univariate linear model.

Residualization of a Y data on possibly adjusted for other variables.

class mulm.residualizer.residualizer.Residualizer(data=None, formula_res=None, formula_full=None, contrast_res=None)[source]¶

Residualization of a Y data on possibly adjusted for other variables.

Example: Y is a (n, p) array of p-dependant variables, we want to residualize for “site” adjusted for “age + sex”.

1) Use of DataFrame and formula: 1.1) Residualizer(data=df, formula_res=”site”, formula_full=site + age + sex”)

1.2) Z = get_design_mat(data) will return the numpy (n, k) array design matrix. Row selection can be done on both Y and design_mat (Cross-val., etc.)

2) Use of raw arrays: if you choose to manually write your design matrix. In this case provide res_mask ie, the residualization mask within your full. model. For example: Residualizer(mask=[False, True, False, False]) will fit the whole model and residualize on the second regressor, ie, site.

3) fit(Y, X) fits the model: Y = b0 + b1 site + b2 age + b3 sex + eps => learn and store b1, b2, b3

transform(Y, X) residualize Y on X, ie, returns Y - b1 site

fit(Y, X)[source]¶

Fit parameters of p linear models where each Y is regressed on X.

Parameters

Y: array (n, p)

Dependant variables

X: array(n, k)

Design matrix of independant variables

fit_transform(Y, X)[source]¶

Fit parameters of p linear models where each Y is regressed on X. Residualize Y on X.

Parameters

Y: array (n, p)

Dependant variables

X: array(n, k)

Design matrix of independant variables

Returns

Yres: array (n, p)

Residualized Y data.

get_design_mat(data)[source]¶

transform(Y, X)[source]¶

Residualize Y on X.

Parameters

Y: array (n, p)

Dependant variables

X: array(n, k)

Design matrix of independant variables

Returns

Yres: array (n, p)

Residualized Y data.

class mulm.residualizer.residualizer.ResidualizerEstimator(residualizer)[source]¶

Wrap Residualizer into an estimator compatible with sklearn API.

Note that to be consistant with sklearn API, here X contains the input variable and Z is the design matrix for residualization.

fit(X, y=None)[source]¶

fit_transform(X, y=None)[source]¶

pack(Z, X)[source]¶

Pack (concat) Z (design matrix) and X to match scikit-learn pipelines.

Parameters

Z: array (n, k)

the design_matrix

X: array (n, p)

the input data for scikit-learn: fit(X, y) or transform(X)

Returns

(n, (k+p)) array: [design_matrix, X]

transform(X)[source]¶

upack(ZX)[source]¶

Unpack X and Z (design matrix) from X.

Parameters: ZX: array (n, (k+p))

array: [Z, X]
Returns: Z (design_matrix), X

mulm.residualizer.residualizer.residualize(Y, data, formula_res, formula_full=None)[source]¶: Helper function. See Residualizer.

Massive univariate linear model.

Follow us