Skip to content

FidexGloStats

Description

The FidexGloStats program computes a variety of statistics based on the global ruleset generated by FidexGloRules. These metrics provide insights into the fidelity, accuracy, and explainability of the global ruleset in comparison to the model's predictions, as well as additional performance statistics related to classification outcomes. The results are output as text and/or saved to a file for further analysis.

Arguments list

The FidexGloStats algorithm works with both required and optional arguments. Each argument has specific properties:

  • Is required means whether an argument must be specified when calling the program or not.
  • Type specifies the argument datatype.
  • CLI argument syntax is the exact name to use if you are writing the argument along with the program call.
  • JSON identifier is the exact name to use if you are writing the argument inside a JSON configuration file.
  • Default value is the value that will be used by the program if the argument is not specified. If None, it could mean that the argument is not used at all during the algorithm execution or could also mean that you have to specify it yourself.

Show help

Display parameters and other helpful information concerning the program usage and terminate it when done.

Property Value
Is required No
Type None
CLI argument syntax -h, --help or None
JSON identifier N/A
Default value None

Warning

If you use this argument, it must be the only one specified. No other argument can be specified with it.


JSON configuration file

File containing the configuration for the algorithm in JSON format (see more about JSON configuration files).

Property Value
Is required No
Type String
CLI argument syntax --json-configuration-file
JSON identifier N/A
Default value None

Warning

If you use this argument, it must be the only one specified. No other argument can be specified with it.


Root folder path

Default path from where all the other arguments related to file paths are going to be based. Using this allows you to work with paths relative to this location and avoid writing absolute paths or lengthy relative paths.

Property Value
Is required No
Type String
CLI argument syntax --root_folder
JSON identifier root_folder
Default value .

Global Rules file

Path to the file containing the global rules obtained with fidexGloRules algorithm.

Property Value
Is required Yes
Type String
CLI argument syntax --global_rules_file
JSON identifier global_rules_file
Default value None

Global Rules output file

Name of the file containing global rules with test stats on test set generated by the algorithm.

Property Value
Is required No
Type String
CLI argument syntax --global_rules_outfile
JSON identifier global_rules_outfile
Default value None

Info

The filename's extension can be specified as .json. Allowing the program to generate a JSON-structured rule output file.


Number of attributes

Number of attributes in the dataset (should be equal to the number of inputs of the model). Takes values in the range [1,∞[.

Property Value
Is required Yes
Type Integer
CLI argument syntax --nb_attributes
JSON identifier nb_attributes
Default value None

Number of classes

Number of classes in the dataset (should be equal to the number of outputs of the model). Takes values in the range [2,∞[.

Property Value
Is required Yes
Type Integer
CLI argument syntax --nb_classes
JSON identifier nb_classes
Default value None

Test data file

File containing the testing portion of the dataset used to train the model. It can also contain training "true classes" (see Test true classes file).

Property Value
Is required Yes
Type String
CLI argument syntax --test_data_file
JSON identifier test_data_file
Default value None

Test predictions file

File containing the predictions from the testing portion of the dataset.

Property Value
Is required Yes
Type String
CLI argument syntax --test_pred_file
JSON identifier test_pred_file
Default value None

Test true classes file

File containing "true classes" (expected predictions), from the testing portion of the dataset.

Property Value
Is required No**
Type String
CLI argument syntax --test_class_file
JSON identifier test_class_file
Default value None

Warning

This argument is not required if, and only if, the true classes are already specified inside the test data file.


Attributes file

File containing attributes (inputs) and classes (outputs) names.

Property Value
Is required No
Type String
CLI argument syntax --attributes_file
JSON identifier attributes_file
Default value None

Logs output file

Name of file containing every feedback made by the algorithm during its execution. If not specified, the feedback is displayed in the terminal.

Property Value
Is required No
Type String
CLI argument syntax --console_file
JSON identifier console_file
Default value None

Positive class index

Index of positive class to compute true/false positive/negative rates, index starts at 0. If it is specified in the rules file, it has to be the same value. Takes values in the range [0,nb_classes-1].

Property Value
Is required No
Type Integer
CLI argument syntax --positive_class_index
JSON identifier positive_class_index
Default value None

Statistics output file

Name of the output file that will contain all computed statistics.

Property Value
Is required No
Type String
CLI argument syntax --stats_file
JSON identifier stats_file
Default value None

Usage example

Example

from dimlpfidex.fidex import fidexGloStats

fidexGloStats(
""" --test_data_file test_data.txt 
    --test_pred_file predTest.out 
    --test_class_file test_class.txt 
    --nb_attributes 16 
    --nb_classes 2 
    --stats_file stats.txt
    --global_rules_file globalRules.rls
    --root_folder dimlp/datafiles
    """
)
./fidexGloStats --test_data_file test_data.txt --test_pred_file predTest.out --test_class_file test_class.txt --nb_attributes 16 --nb_classes 2 --stats_file stats.txt --global_rules_file globalRules.rls --root_folder ../dimlp/datafiles

Output interpretation


Statistics file

This file contains the statistics computed on the global ruleset generated by fidexGloRules. At the top of the file, there is the total number of rules, the mean number of samples covered by each rule and the mean number of antecedants per rule.

The statistics computed for the ruleset are:

Global rule fidelity rate
Measures how accurately the rules mimic the behavior of the model. This rate reflects the proportion of samples where the rules' predictions match the model's predictions.
Global rule accuracy
The accuracy of the ruleset in predicting the correct class for test samples, regardless of the model's predictions. It is accurate when there is a correct fidel activated rule, when no rule is activated and the model is correct, or when the activated rules are incorrect but all agree on the correct class.
Explainability rate
The rate at which the ruleset provides an explanation for a sample by activating one or more rules that either predict the correct class or all agree on the same class.
Default rule rate
The proportion of samples for which no activated rule is found. In such cases, we choose the model's decision to generate a local rule.
Mean number of correct (fidel) activated rules per sample
The average number of activated rules per sample that are in agreement with the model's predictions.
Mean number of wrong (not fidel) activated rules per sample
The average number of activated rules per sample that do not match the model's predictions.
Model test accuracy
The overall accuracy of the model on the test dataset, calculated independently from the ruleset.
Model test accuracy when rules and models agree
The accuracy of the model when the model's predictions match the predictions made by the activated rules.
Model test accuracy when activated rules and model agree
The accuracy of the model for samples where at least one rule is activated and agrees with the model's prediction.

Additional Statistics (if the positive class index parameter is used) computed with both the model decision and the rules decision:

True positive test samples
The number of test samples where the model (or rules) correctly identify the positive class.
False positive test samples
The number of test samples where the model (or rules) incorrectly classify a sample as positive when it is not.
True negative test samples
The number of test samples where the model (or rules) correctly identify the negative class.
False negative test samples
The number of test samples where the model (or rules) fail to classify a positive sample correctly.
False positive rate
The proportion of negative samples that were incorrectly classified as positive by the model or rules.
False negative rate
The proportion of positive samples that were incorrectly classified as negative by the model or rules.
Precision
The proportion of correctly identified positive samples out of all samples classified as positive (\( \frac{\text{true positives}}{\text{true positives} + \text{false positives}} \)).
Recall
The proportion of correctly identified positive samples out of all actual positive samples (\( \frac{\text{true positives}}{\text{true positives} + \text{false negatives}} \)).

Global rules output file

This file contains all the global rules computed by FidexGloRules. It begins with global statistics about the ruleset, followed by individual rules, ordered by their covering size, and their associated performance metrics.

Global Statistics:

Number of rules
Indicates the total number of rules in the ruleset.
Mean sample covering number per rule
The average number of training samples covered by each rule.
Mean number of antecedents per rule
Represents the average number of conditions (antecedents) in each rule.
Decision threshold
Indicates whether a decision threshold was used for prediction and specifies the threshold if applicable.

Explanation of Each Rule:

Each rule consists of conditions on various attributes, followed by the predicted class, and is accompanied by several performance metrics. Let's break down this rule as an example:

Rule 1: X0>=0.69326 X1>=0.036824 -> class 0
    Train Covering size : 112 --- Test Covering size : 59
    Train Fidelity : 1 --- Test Fidelity : 1
    Train Accuracy : 0.964286 --- Test Accuracy : 0.966102
    Train Confidence : 0.971227 --- Test Confidence : 0.965577
X0, X1
These represent the variables from the dataset.
>=0.69326, >=0.036824
The thresholds that the variable values must meet for the rule to be activated.
-> class 0
The class predicted by the rule when the conditions are met. Here, the rule predicts class 0.

Performance Metrics Associated with the Rule:

Each statistic is represented for the training and testing set.

Train Covering size
Indicates the number of training samples that are covered by the rule. For Rule 1, it covers 112 samples.
Train Fidelity
Measures how well the rule aligns with the model’s predictions. A fidelity of 1 means that the rule exactly matches the model’s predictions for all the samples it covers.
Train Accuracy
The accuracy of the rule in correctly classifying the samples it covers. In the case of Rule 1, 96.43% of the covered samples are correctly classified.
Train Confidence
Represents the average confidence score of the model’s predictions for the samples covered by the rules. It is computed based on the prediction scores of the covered samples, indicating the model’s confidence in its classifications. For Rule 1, the confidence is 97.12%.

Each subsequent rule follows the same structure.

This file contains all the global rules computed by FidexGloRules. It begins with an indication whether a decision threshold was used for prediction and specifies the threshold if applicable. It then follows with each individual rule and its associated performance metrics, ordered by their covering size. Let's break down this rule as an example:

{
    "test": {
        "accuracy": 0.5,
        "antecedents": [
            {
                "attribute": 24,
                "inequality": true,
                "value": 0.51123815959005
            },
            {
                "attribute": 6,
                "inequality": true,
                "value": 1.9756892136185598
            },
        ],
        "confidence": 0.07883515,
        "coveredSamples": [
            8,
            24,
            26,
            74,
            127,
            176
        ],
        "coveringSize": 6,
        "fidelity": 0.0,
        "outputClass": 5
    },
    "train": {
        "accuracy": 0.3333333333333333,
        "antecedents": [
            {
                "attribute": 24,
                "inequality": true,
                "value": 0.51123815959005
            },
            {
                "attribute": 6,
                "inequality": true,
                "value": 1.9756892136185598
            },
        ],
        "confidence": 0.5935276666666667,
        "coveredSamples": [
            151,
            376,
            936
        ],
        "coveringSize": 3,
        "fidelity": 1.0,
        "outputClass": 5
    }
}

Each statistic is represented for the training and testing set.

accuracy
The accuracy of the rule on the samples it covers. For this rule, 100% of the covered samples are correctly classified.
antecedents
Each antecedant of the rule which is composed of an attribute (a variable from the dataset), an inequality, and a value. A true inequality represents >=, while a false inequality represents <. The value is the threshold that the attribute's value must meet for the rule to be activated. In this rule, the first antecedant specifies that X8 < 0.07228972839342673.
confidence
Represents the average confidence score of the model’s predictions for the samples covered by the rule. For this rule, the confidence is 99.12%.
coveredSamples
Indicates the samples covered by the rule. This rule coveres the samples 67, 213 and 567.
coveringSize
Indicates the number of samples that are covered by the rule. This rule covers 3 samples.
fidelity
Measures how well the rule aligns with the model’s predictions. A fidelity of 1 means that the rule exactly matches the model’s predictions for all the samples it covers.
outputClass
Indicates the class prediction of the rule, the predicted class is 1.

Each subsequent rule follows the same structure.