ML_tools package#
ML_tools.classifiers module#
- ML_tools.classifiers.RFPipeline_PCA(df1, df2, n_iter, cv)[source]#
Creates pipeline that perform Random Forest classification on the data with Principal Component Analysis. The input data is split into training and test sets, then a Randomized Search (with cross-validation) is performed to find the best hyperparameters for the model.
- Parameters:
df1 (pandas.DataFrame) – Dataframe containing the features.
df2 (pandas.DataFrame) – Dataframe containing the labels.
n_iter (int) – Number of parameter settings that are sampled.
cv (int) – Number of cross-validation folds to use.
- Returns:
pipeline_PCA – A fitted pipeline (includes PCA, hyperparameter optimization using RandomizedSearchCV and a Random Forest Classifier model).
- Return type:
sklearn.pipeline.Pipeline
- ML_tools.classifiers.RFPipeline_noPCA(df1, df2, n_iter, cv)[source]#
Creates pipeline that perform Random Forest classification on the data without Principal Component Analysis. The input data is split into training and test sets, then a Randomized Search (with cross-validation) is performed to find the best hyperparameters for the model.
- Parameters:
df1 (pandas.DataFrame) – Dataframe containing the features.
df2 (pandas.DataFrame) – Dataframe containing the labels.
n_iter (int) – Number of parameter settings that are sampled.
cv (int) – Number of cross-validation folds to use.
- Returns:
pipeline_simple – A fitted pipeline (includes hyperparameter optimization using RandomizedSearchCV and a Random Forest Classifier model).
- Return type:
sklearn.pipeline.Pipeline
See also
- ML_tools.classifiers.SVM_feature_reduction(df1, df2)[source]#
Performs SVM classification on the data. The input data is split into training and test sets, then a Grid Search (with cross-validation) is performed to find the best hyperparameters for the model. Feature reduction is implemented in this function.
- Parameters:
df1 (pandas.DataFrame) – Dataframe containing the features.
df2 (pandas.DataFrame) – Dataframe containing the labels.
- Returns:
grid – A fitted grid search object with the best parameters for the SVM model using the selected features.
- Return type:
sklearn.model_selection.GridSearchCV
- ML_tools.classifiers.SVM_simple(df1, df2, ker: str)[source]#
Performs SVM classification on the data. The input data is split into training and test sets, then a Grid Search (with cross-validation) is performed to find the best hyperparameters for the model. Feature reduction is not implemented in this function.
- Parameters:
df1 (pandas.DataFrame) – Dataframe containing the features.
df2 (pandas.DataFrame) – Dataframe containing the labels.
ker (str) – Kernel type.
- Returns:
grid – A fitted grid search object with the best parameters for the SVM model.
- Return type:
sklearn.model_selection.GridSearchCV
ML_tools.feature_extractor module#
- ML_tools.feature_extractor.feature_extractor(image_filepaths, masks_filepaths)[source]#
Uses the MATLAB Engine API to run the feature_extractor.m function. From the outputs of that function, it defines 2 dataframes containing the extracted features and a series containing the labels of the respective subjects.
- Parameters:
image_filepaths (list) – Paths to the diffusion parameters maps.
masks_filepaths (list) – Paths to the diffusion space segmentations.
- Returns:
df_mean (pandas.DataFrame) – Mean of pixel values for each region (columns) and each subject (rows).
df_std (pandas.DataFrame) – Standard deviation of pixel values for each region (columns) and each subject (rows).
group (pandas.Series) – Subject labels.
- ML_tools.feature_extractor.feature_extractor_par(image_filepaths, masks_filepaths)[source]#
Uses the MATLAB Engine API to run the feature_extractor_par.m function (parallelized version of feature_extractor.m). From the outputs of that function, it defines 2 dataframes containing the extracted features and an array containing the labels of the respective subjects.
- Parameters:
image_filepaths (list) – Paths to the diffusion parameters maps.
masks_filepaths (list) – Paths to the diffusion space segmentations.
- Returns:
df_mean (pandas.DataFrame) – Mean of pixel values for each region (columns) and each subject (rows).
df_std (pandas.DataFrame) – Standard deviation of pixel values for each region (columns) and each subject (rows).
group (pandas.Series) – Subject labels.
ML_tools.reading module#
- ML_tools.reading.data_path(dir, subdir)[source]#
Creates a list collecting absolute paths to the files contained in a sub-folder of a parent folder.
- Parameters:
dir (str) – Name of the parent folder.
subdir (str) – Name of the parent folder.
- Returns:
filepaths – Paths to the files contained in the specified sub-folder.
- Return type:
list
ML_tools.score_and_error module#
- ML_tools.score_and_error.performance_scores(y_test, y_predicted, y_probability, confidence_int=0.683)[source]#
Computes and displays various performance scores (including accuracy, precision, recall and AUC) with related errors for binary classification models.
- Parameters:
y_test (numpy.ndarray) – True labels of test set.
y_predicted (numpy.ndarray) – Predicted labels of test set.
y_probability (numpy.ndarray) – Predicted label probabilities of test set.
confidence_int (float, optional) – Confidence interval for error estimation. Default value is 0.683 (approximately 1 sigma).
- Returns:
scores – Dictionary containing various performance scores (and relative errors) including: Accuracy, Precision, Recall and AUC.
- Return type:
dict
See also
accuracy_scorehttps://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html
precision_scorehttps://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_score.html
recall_scorehttps://scikit-learn.org/stable/modules/generated/sklearn.metrics.recall_score.html
roc_curvehttps://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_curve.html
auchttps://scikit-learn.org/stable/modules/generated/sklearn.metrics.auc.html