ML_tools package#

ML_tools.classifiers module#

ML_tools.classifiers.RFPipeline_PCA(df1, df2, n_iter, cv)[source]#

Creates pipeline that perform Random Forest classification on the data with Principal Component Analysis. The input data is split into training and test sets, then a Randomized Search (with cross-validation) is performed to find the best hyperparameters for the model.

Parameters:

df1 (pandas.DataFrame) – Dataframe containing the features.
df2 (pandas.DataFrame) – Dataframe containing the labels.
n_iter (int) – Number of parameter settings that are sampled.
cv (int) – Number of cross-validation folds to use.

Returns:

pipeline_PCA – A fitted pipeline (includes PCA, hyperparameter optimization using RandomizedSearchCV and a Random Forest Classifier model).

Return type:

sklearn.pipeline.Pipeline

See also

PCA: https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
RandomizedSearchCV: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html

ML_tools.classifiers.RFPipeline_noPCA(df1, df2, n_iter, cv)[source]#

Creates pipeline that perform Random Forest classification on the data without Principal Component Analysis. The input data is split into training and test sets, then a Randomized Search (with cross-validation) is performed to find the best hyperparameters for the model.

Parameters:

df1 (pandas.DataFrame) – Dataframe containing the features.
df2 (pandas.DataFrame) – Dataframe containing the labels.
n_iter (int) – Number of parameter settings that are sampled.
cv (int) – Number of cross-validation folds to use.

Returns:

pipeline_simple – A fitted pipeline (includes hyperparameter optimization using RandomizedSearchCV and a Random Forest Classifier model).

Return type:

sklearn.pipeline.Pipeline

See also

RandomizedSearchCV: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html

ML_tools.classifiers.SVM_feature_reduction(df1, df2)[source]#

Performs SVM classification on the data. The input data is split into training and test sets, then a Grid Search (with cross-validation) is performed to find the best hyperparameters for the model. Feature reduction is implemented in this function.

Parameters:

df1 (pandas.DataFrame) – Dataframe containing the features.
df2 (pandas.DataFrame) – Dataframe containing the labels.

Returns:

grid – A fitted grid search object with the best parameters for the SVM model using the selected features.

Return type:

sklearn.model_selection.GridSearchCV

See also

RFECV: https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFECV.html
GridSearchCV: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html

ML_tools.classifiers.SVM_simple(df1, df2, ker: str)[source]#

Parameters:

df1 (pandas.DataFrame) – Dataframe containing the features.
df2 (pandas.DataFrame) – Dataframe containing the labels.
ker (str) – Kernel type.

Returns:

grid – A fitted grid search object with the best parameters for the SVM model.

Return type:

sklearn.model_selection.GridSearchCV

See also

GridSearchCV: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html

ML_tools.feature_extractor module#

ML_tools.feature_extractor.feature_extractor(image_filepaths, masks_filepaths)[source]#

Uses the MATLAB Engine API to run the feature_extractor.m function. From the outputs of that function, it defines 2 dataframes containing the extracted features and a series containing the labels of the respective subjects.

Parameters:

image_filepaths (list) – Paths to the diffusion parameters maps.
masks_filepaths (list) – Paths to the diffusion space segmentations.

Returns:

df_mean (pandas.DataFrame) – Mean of pixel values for each region (columns) and each subject (rows).
df_std (pandas.DataFrame) – Standard deviation of pixel values for each region (columns) and each subject (rows).
group (pandas.Series) – Subject labels.

ML_tools.feature_extractor.feature_extractor_par(image_filepaths, masks_filepaths)[source]#

Uses the MATLAB Engine API to run the feature_extractor_par.m function (parallelized version of feature_extractor.m). From the outputs of that function, it defines 2 dataframes containing the extracted features and an array containing the labels of the respective subjects.

Parameters:

image_filepaths (list) – Paths to the diffusion parameters maps.
masks_filepaths (list) – Paths to the diffusion space segmentations.

Returns:

df_mean (pandas.DataFrame) – Mean of pixel values for each region (columns) and each subject (rows).
df_std (pandas.DataFrame) – Standard deviation of pixel values for each region (columns) and each subject (rows).
group (pandas.Series) – Subject labels.

ML_tools.reading module#

ML_tools.reading.data_path(dir, subdir)[source]#

Creates a list collecting absolute paths to the files contained in a sub-folder of a parent folder.

Parameters:

dir (str) – Name of the parent folder.
subdir (str) – Name of the parent folder.

Returns:

filepaths – Paths to the files contained in the specified sub-folder.

Return type:

list

ML_tools.score_and_error module#

ML_tools.score_and_error.performance_scores(y_test, y_predicted, y_probability, confidence_int=0.683)[source]#

Computes and displays various performance scores (including accuracy, precision, recall and AUC) with related errors for binary classification models.

Parameters:

y_test (numpy.ndarray) – True labels of test set.
y_predicted (numpy.ndarray) – Predicted labels of test set.
y_probability (numpy.ndarray) – Predicted label probabilities of test set.
confidence_int (float, optional) – Confidence interval for error estimation. Default value is 0.683 (approximately 1 sigma).

Returns:

scores – Dictionary containing various performance scores (and relative errors) including: Accuracy, Precision, Recall and AUC.

Return type:

dict

See also

accuracy_score: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html
precision_score: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_score.html
recall_score: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.recall_score.html
roc_curve: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_curve.html
auc: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.auc.html

ML_tools package

Contents

ML_tools package#

ML_tools.classifiers module#

ML_tools.feature_extractor module#

ML_tools.reading module#

ML_tools.score_and_error module#