computeRocCurve¶

Description¶

The computeRocCurve algorithm calculates and plots the ROC (Receiver Operating Characteristic) curve based on a set of test predictions and true classes. The ROC curve is a graphical representation that illustrates the diagnostic ability of a binary classifier as its decision threshold is varied. It plots the true positive rate (sensitivity) against the false positive rate (1 - specificity), allowing for the evaluation of the model's performance across different thresholds. This implementation uses scikit-learn's built-in functions to efficiently compute and plot the ROC curve, providing insights into the trade-off between sensitivity and specificity for the given predictions. Note that this algorithm is not applicable to SVM models.

Arguments list¶

The computeRocCurve algorithm works with both required and optional arguments. Each argument has specific properties:

Is required means whether an argument must be specified when calling the program or not.
Type specifies the argument datatype.
CLI argument syntax is the exact name to use if you are writing the argument along with the program call.
JSON identifier is the exact name to use if you are writing the argument inside a JSON configuration file.
Default value is the value that will be used by the program if the argument is not specified. If None, it could mean that the argument is not used at all during the algorithm execution or could also mean that you have to specify it yourself.

Show help¶

Display parameters and other helpful information concerning the program usage and terminate it when done.

Property	Value
Is required	No
Type	`None`
CLI argument syntax	`-h`, `--help` or `None`
JSON identifier	`N/A`
Default value	`None`

Warning

Every other specified argument will be ignored.

JSON configuration file¶

File containing the configuration for the algorithm in JSON format (see more about JSON configuration files).

Property	Value
Is required	No
Type	`String`
CLI argument syntax	`--json-configuration-file`
JSON identifier	`N/A`
Default value	`None`

Warning

If you use this argument, it must be the only one specified. No other argument can be specified with it.

Root folder path¶

Default path from where all the other arguments related to file paths are going to be based. Using this allows you to work with paths relative from this location and avoid writing absolute paths or lengthy relative paths.

Property	Value
Is required	No
Type	`String`
CLI argument syntax	`--root_folder`
JSON identifier	`root_folder`
Default value	`.`

Test true classes file¶

File containing "true classes" (expected predictions), from the test portion of the dataset used to train the model.

Property	Value
Is required	Yes
Type	`String`
CLI argument syntax	`--test_class_file`
JSON identifier	`test_class_file`
Default value	`None`

Test prediction file¶

File containing predictions on the test portion of the dataset.

Property	Value
Is required	Yes
Type	`String`
CLI argument syntax	`--test_pred_file`
JSON identifier	`test_pred_file`
Default value	`None`

Positive class index¶

Index of positive class, index starts at 0. Takes values in the range [0,nb_classes-1].

Property	Value
Is required	Yes
Type	`Integer`
CLI argument syntax	`--positive_class_index`
JSON identifier	`positive_class_index`
Default value	`None`

Number of classes¶

Number of classes in the dataset (should be equal to the number of outputs of the model). Takes values in the range [2,∞[.

Property	Value
Is required	Yes
Type	`Integer`
CLI argument syntax	`--nb_classes`
JSON identifier	`nb_classes`
Default value	`None`

Statistics output file¶

Name of the output file that will contain the AUC score, it can be the training statistics file.

Property	Value
Is required	No
Type	`String`
CLI argument syntax	`--stats_file`
JSON identifier	`stats_file`
Default value	`None`

Display parameter inputs¶

Whether to show currently used parameters by the program while running.

Property	Value
Is required	No
Type	`Boolean`
CLI argument syntax	`--show_params`
JSON identifier	`show_params`
Default value	`True`

ROC output file¶

Path to the file where the output ROC curve will be saved.

Property	Value
Is required	No
Type	`String`
CLI argument syntax	`--output_roc`
JSON identifier	`output_roc`
Default value	`roc_curve.png`

Estimator¶

Name of the used estimator.

Property	Value
Is required	No
Type	`String`
CLI argument syntax	`--estimator`
JSON identifier	`estimator`
Default value	`None`

Usage example¶

Example

PythonCLI

from trainings import computeRocCurve

computeRocCurve(
"""--test_class_file test_class.txt 
--test_pred_file predTest.out 
--positive_class_index 1 
--output_roc roc_curve.png 
--stats_file stats.txt 
--root_folder dimlp/datafiles 
--nb_classes 2"""
)

./computeRocCurve --test_class_file test_class.txt --test_pred_file predTest.out --positive_class_index 1 --output_roc roc_curve.png --stats_file stats.txt --root_folder ../dimlp/datafiles --nb_classes 2

Output interpretation¶

Roc curve ¶

This file contains a ROC (Receiver Operating Characteristic) curve, which is used to evaluate the performance of the model during training. The ROC curve is a plot with the following components:

X-Axis (False Positive Rate): This represents the proportion of negative samples that are incorrectly classified as positive. It measures the rate of false positives at various classification thresholds.
Y-Axis (True Positive Rate): This represents the proportion of positive samples that are correctly classified as positive.
Curve: The curve itself shows the trade-off between the true positive rate and the false positive rate across different decision thresholds for the classifier. The curve starts at (0, 0) and moves towards (1, 1).
AUC (Area Under the Curve): This value quantifies the overall performance of the model. It ranges from 0 to 1, with a value of 1 indicating perfect classification and a value of 0.5 indicating a model with no discriminative power. The higher the AUC, the better the model’s ability to distinguish between the positive and negative classes.

This ROC curve visually illustrates how well the model is performing by showing the balance between the true positive rate and the false positive rate, with the AUC providing a summary measure of the model's classification performance.

Statistics file ¶

This file contains some statistics of the model aleady computed before. The AUC score computed on the test set is written at the end of the file.

AUC (Area Under the Curve): This value quantifies the overall performance of the model. It ranges from 0 to 1, with a value of 1 indicating perfect classification and a value of 0.5 indicating a model with no discriminative power. The higher the AUC, the better the model’s ability to distinguish between the positive and negative classes.