FidexGloStats¶
Description¶
The FidexGloStats
program computes a variety of statistics based on the global ruleset generated by FidexGloRules. These metrics provide insights into the fidelity, accuracy, and explainability of the global ruleset in comparison to the model's predictions, as well as additional performance statistics related to classification outcomes. The results are output as text and/or saved to a file for further analysis.
Arguments list¶
The FidexGloStats
algorithm works with both required and optional arguments. Each argument has specific properties:
- Is required means whether an argument must be specified when calling the program or not.
- Type specifies the argument datatype.
- CLI argument syntax is the exact name to use if you are writing the argument along with the program call.
- JSON identifier is the exact name to use if you are writing the argument inside a JSON configuration file.
- Default value is the value that will be used by the program if the argument is not specified. If
None
, it could mean that the argument is not used at all during the algorithm execution or could also mean that you have to specify it yourself.
Show help¶
Display parameters and other helpful information concerning the program usage and terminate it when done.
Property | Value |
---|---|
Is required | No |
Type | None |
CLI argument syntax | -h , --help or None |
JSON identifier | N/A |
Default value | None |
Warning
If you use this argument, it must be the only one specified. No other argument can be specified with it.
JSON configuration file¶
File containing the configuration for the algorithm in JSON format (see more about JSON configuration files).
Property | Value |
---|---|
Is required | No |
Type | String |
CLI argument syntax | --json-configuration-file |
JSON identifier | N/A |
Default value | None |
Warning
If you use this argument, it must be the only one specified. No other argument can be specified with it.
Root folder path¶
Default path from where all the other arguments related to file paths are going to be based. Using this allows you to work with paths relative to this location and avoid writing absolute paths or lengthy relative paths.
Property | Value |
---|---|
Is required | No |
Type | String |
CLI argument syntax | --root_folder |
JSON identifier | root_folder |
Default value | . |
Global Rules file¶
Path to the file containing the global rules obtained with fidexGloRules
algorithm.
Property | Value |
---|---|
Is required | Yes |
Type | String |
CLI argument syntax | --global_rules_file |
JSON identifier | global_rules_file |
Default value | None |
Global Rules output file¶
Name of the file containing global rules with test stats on test set generated by the algorithm.
Property | Value |
---|---|
Is required | No |
Type | String |
CLI argument syntax | --global_rules_outfile |
JSON identifier | global_rules_outfile |
Default value | None |
Info
The filename's extension can be specified as .json
. Allowing the program to generate a JSON-structured rule output file.
Number of attributes¶
Number of attributes in the dataset (should be equal to the number of inputs of the model). Takes values in the range [1,∞[
.
Property | Value |
---|---|
Is required | Yes |
Type | Integer |
CLI argument syntax | --nb_attributes |
JSON identifier | nb_attributes |
Default value | None |
Number of classes¶
Number of classes in the dataset (should be equal to the number of outputs of the model). Takes values in the range [2,∞[
.
Property | Value |
---|---|
Is required | Yes |
Type | Integer |
CLI argument syntax | --nb_classes |
JSON identifier | nb_classes |
Default value | None |
Test data file¶
File containing the testing portion of the dataset used to train the model. It can also contain training "true classes" (see Test true classes file).
Property | Value |
---|---|
Is required | Yes |
Type | String |
CLI argument syntax | --test_data_file |
JSON identifier | test_data_file |
Default value | None |
Test predictions file¶
File containing the predictions from the testing portion of the dataset.
Property | Value |
---|---|
Is required | Yes |
Type | String |
CLI argument syntax | --test_pred_file |
JSON identifier | test_pred_file |
Default value | None |
Test true classes file¶
File containing "true classes" (expected predictions), from the testing portion of the dataset.
Property | Value |
---|---|
Is required | No** |
Type | String |
CLI argument syntax | --test_class_file |
JSON identifier | test_class_file |
Default value | None |
Warning
This argument is not required if, and only if, the true classes are already specified inside the test data file.
Attributes file¶
File containing attributes (inputs) and classes (outputs) names.
Property | Value |
---|---|
Is required | No |
Type | String |
CLI argument syntax | --attributes_file |
JSON identifier | attributes_file |
Default value | None |
Logs output file¶
Name of file containing every feedback made by the algorithm during its execution. If not specified, the feedback is displayed in the terminal.
Property | Value |
---|---|
Is required | No |
Type | String |
CLI argument syntax | --console_file |
JSON identifier | console_file |
Default value | None |
Positive class index¶
Index of positive class to compute true/false positive/negative rates, index starts at 0. If it is specified in the rules file, it has to be the same value. Takes values in the range [0,nb_classes-1]
.
Property | Value |
---|---|
Is required | No |
Type | Integer |
CLI argument syntax | --positive_class_index |
JSON identifier | positive_class_index |
Default value | None |
Statistics output file¶
Name of the output file that will contain all computed statistics.
Property | Value |
---|---|
Is required | No |
Type | String |
CLI argument syntax | --stats_file |
JSON identifier | stats_file |
Default value | None |
Usage example¶
Example
Output interpretation¶
Statistics file¶
This file contains the statistics computed on the global ruleset generated by fidexGloRules. At the top of the file, there is the total number of rules, the mean number of samples covered by each rule and the mean number of antecedants per rule.
The statistics computed for the ruleset are:
Global rule fidelity rate
- Measures how accurately the rules mimic the behavior of the model. This rate reflects the proportion of samples where the rules' predictions match the model's predictions.
Global rule accuracy
- The accuracy of the ruleset in predicting the correct class for test samples, regardless of the model's predictions. It is accurate when there is a correct fidel activated rule, when no rule is activated and the model is correct, or when the activated rules are incorrect but all agree on the correct class.
Explainability rate
- The rate at which the ruleset provides an explanation for a sample by activating one or more rules that either predict the correct class or all agree on the same class.
Default rule rate
- The proportion of samples for which no activated rule is found. In such cases, we choose the model's decision to generate a local rule.
Mean number of correct (fidel) activated rules per sample
- The average number of activated rules per sample that are in agreement with the model's predictions.
Mean number of wrong (not fidel) activated rules per sample
- The average number of activated rules per sample that do not match the model's predictions.
Model test accuracy
- The overall accuracy of the model on the test dataset, calculated independently from the ruleset.
Model test accuracy when rules and models agree
- The accuracy of the model when the model's predictions match the predictions made by the activated rules.
Model test accuracy when activated rules and model agree
- The accuracy of the model for samples where at least one rule is activated and agrees with the model's prediction.
Additional Statistics (if the positive class index parameter is used) computed with both the model decision and the rules decision:
True positive test samples
- The number of test samples where the model (or rules) correctly identify the positive class.
False positive test samples
- The number of test samples where the model (or rules) incorrectly classify a sample as positive when it is not.
True negative test samples
- The number of test samples where the model (or rules) correctly identify the negative class.
False negative test samples
- The number of test samples where the model (or rules) fail to classify a positive sample correctly.
False positive rate
- The proportion of negative samples that were incorrectly classified as positive by the model or rules.
False negative rate
- The proportion of positive samples that were incorrectly classified as negative by the model or rules.
Precision
- The proportion of correctly identified positive samples out of all samples classified as positive (\( \frac{\text{true positives}}{\text{true positives} + \text{false positives}} \)).
Recall
- The proportion of correctly identified positive samples out of all actual positive samples (\( \frac{\text{true positives}}{\text{true positives} + \text{false negatives}} \)).
Global rules output file¶
This file contains all the global rules computed by FidexGloRules. It begins with global statistics about the ruleset, followed by individual rules, ordered by their covering size, and their associated performance metrics.
Global Statistics:
Number of rules
- Indicates the total number of rules in the ruleset.
Mean sample covering number per rule
- The average number of training samples covered by each rule.
Mean number of antecedents per rule
- Represents the average number of conditions (antecedents) in each rule.
Decision threshold
- Indicates whether a decision threshold was used for prediction and specifies the threshold if applicable.
Explanation of Each Rule:
Each rule consists of conditions on various attributes, followed by the predicted class, and is accompanied by several performance metrics. Let's break down this rule as an example:
Rule 1: X0>=0.69326 X1>=0.036824 -> class 0
Train Covering size : 112 --- Test Covering size : 59
Train Fidelity : 1 --- Test Fidelity : 1
Train Accuracy : 0.964286 --- Test Accuracy : 0.966102
Train Confidence : 0.971227 --- Test Confidence : 0.965577
X0, X1
- These represent the variables from the dataset.
>=0.69326, >=0.036824
- The thresholds that the variable values must meet for the rule to be activated.
-> class 0
- The class predicted by the rule when the conditions are met. Here, the rule predicts class 0.
Performance Metrics Associated with the Rule:
Each statistic is represented for the training and testing set.
Train Covering size
- Indicates the number of training samples that are covered by the rule. For Rule 1, it covers 112 samples.
Train Fidelity
- Measures how well the rule aligns with the model’s predictions. A fidelity of 1 means that the rule exactly matches the model’s predictions for all the samples it covers.
Train Accuracy
- The accuracy of the rule in correctly classifying the samples it covers. In the case of Rule 1, 96.43% of the covered samples are correctly classified.
Train Confidence
- Represents the average confidence score of the model’s predictions for the samples covered by the rules. It is computed based on the prediction scores of the covered samples, indicating the model’s confidence in its classifications. For Rule 1, the confidence is 97.12%.
Each subsequent rule follows the same structure.
This file contains all the global rules computed by FidexGloRules. It begins with an indication whether a decision threshold was used for prediction and specifies the threshold if applicable. It then follows with each individual rule and its associated performance metrics, ordered by their covering size. Let's break down this rule as an example:
{
"test": {
"accuracy": 0.5,
"antecedents": [
{
"attribute": 24,
"inequality": true,
"value": 0.51123815959005
},
{
"attribute": 6,
"inequality": true,
"value": 1.9756892136185598
},
],
"confidence": 0.07883515,
"coveredSamples": [
8,
24,
26,
74,
127,
176
],
"coveringSize": 6,
"fidelity": 0.0,
"outputClass": 5
},
"train": {
"accuracy": 0.3333333333333333,
"antecedents": [
{
"attribute": 24,
"inequality": true,
"value": 0.51123815959005
},
{
"attribute": 6,
"inequality": true,
"value": 1.9756892136185598
},
],
"confidence": 0.5935276666666667,
"coveredSamples": [
151,
376,
936
],
"coveringSize": 3,
"fidelity": 1.0,
"outputClass": 5
}
}
Each statistic is represented for the training and testing set.
accuracy
- The accuracy of the rule on the samples it covers. For this rule, 100% of the covered samples are correctly classified.
antecedents
- Each antecedant of the rule which is composed of an attribute (a variable from the dataset), an inequality, and a value. A
true
inequality represents>=
, while afalse
inequality represents<
. The value is the threshold that the attribute's value must meet for the rule to be activated. In this rule, the first antecedant specifies thatX8 < 0.07228972839342673
. confidence
- Represents the average confidence score of the model’s predictions for the samples covered by the rule. For this rule, the confidence is 99.12%.
coveredSamples
- Indicates the samples covered by the rule. This rule coveres the samples 67, 213 and 567.
coveringSize
- Indicates the number of samples that are covered by the rule. This rule covers 3 samples.
fidelity
- Measures how well the rule aligns with the model’s predictions. A fidelity of 1 means that the rule exactly matches the model’s predictions for all the samples it covers.
outputClass
- Indicates the class prediction of the rule, the predicted class is 1.
Each subsequent rule follows the same structure.