Function fidexGloStats(const std::string&)

Defined in File fidexGloStatsFct.cpp

Function Documentation

int fidexGloStats(const std::string &command = "")

Computes the statistics of the global ruleset obtained from fidexGloRules on a test dataset.

The statistics computed for the ruleset are:

The global rule fidelity rate.
The global rule accuracy.
The explainability rate (when we can find one or more rules, either correct ones or activated ones which all agree on the same class).
The default rule rate (when we can’t find any rule activated for a sample).
The mean number of correct (fidel) activated rules per sample.
The mean number of wrong (not fidel) activated rules per sample.
The model test accuracy.
The model test accuracy when rules and model agree.
The model test accuracy when activated rules and model agree.

If there is a positive class, additional statistics are computed with both the model decision and the rules decision:

The number of true positive test samples.
The number of false positive test samples.
The number of true negative test samples.
The number of false negative test samples.
The false positive rate.
The false negative rate.
The precision.
The recall.

Notes:

Each file is located with respect to the root folder dimlpfidex or to the content of the ‘root_folder’ parameter if specified.
It’s mandatory to specify the number of attributes and classes in the data, as well as the test dataset.
True test class labels must be provided, either within the data file or separately through a class file.
Test predictions are also mandatory.
The path of the file containing the global ruleset must be provided.
The path of the global rules output file must be provided to compute statistics of the rules on the test set.
If the positive class index is specified, the true/false positive/negative rates are computed.
Parameters can be defined directly via the command line or through a JSON configuration file.
Providing no command-line arguments or using -h/--help displays usage instructions, detailing both required and optional parameters for user guidance.

Outputs:

stats_file : If specified, contains the statistics of the global ruleset seen above.
global_rules_outfile : If specified, edits the global ruleset file by adding the statistics of global rules on test set.
console_file : If specified, contains the console output.

File formats:

Data files: These files should contain one sample per line, with numbers separated either by spaces, tabs, semicolons, or commas. Supported formats:
1. Only attributes (floats).
2. Attributes (floats) followed by an integer class ID.
3. Attributes (floats) followed by one-hot encoded class.
Class files: These files should contain one class sample per line, with integers separated either by spaces, tabs, semicolons, or commas. Supported formats:
1. Integer class ID.
2. One-hot encoded class.
Prediction files: These files should contain one line per data sample, each line consisting of a series of numerical values separated by a space, a comma (CSV), a semicolon (;), or a tab representing the prediction scores for each class.
Global rule file
: This file is generated by fidexGloRules. The first line contains general statistics in the form:

’Number of rules : 1171, mean sample covering number per rule : 236.923997, mean number of antecedents per rule : 13.020495’

The second line indicates if a decision threshold has been used. If no, it says: ‘No decision threshold is used.’ and if yes, it says something like ‘Using a decision threshold of 0.3 for class 0’. Then there is an empty line and each rule is numbered starting from 1 and separated from each other by an empty line. A rule is in the form:

Rule 1: X2531>=175.95 X2200>=181.05 X1828>=175.95 X2590>=178.5 X1257>=183.6 X2277>=170.85 X1816>=173.4 X3040>=183.6 -> class 0

Train Covering size : 127

Train Fidelity : 1

Train Accuracy : 1

Train Confidence : 0.999919
Attributes file: Each line corresponds to one attribute, each attribute must be specified. Classes can be specified after the attributes but are not mandatory. Each attribute or class must be in one word without spaces (you can use _ to replace a space). The order is important as the first attribute/class name will represent the first attribute/class in the dataset.

Example of how to call the function:

: from dimlpfidex import fidex
: fidex.fidexGloStats('--test_data_file datanormTest.txt --test_pred_file predTest.out --test_class_file dataclass2Test.txt --global_rules_file globalRules.rls --nb_attributes 16 --nb_classes 2 --stats_file stats.txt --root_folder dimlp/datafiles')

Parameters:: command – A single string containing either the path to a JSON configuration file with all specified arguments, or all arguments for the function formatted like command-line input. This includes file paths and options for output.
Returns:: Returns 0 for successful execution, -1 for errors encountered during the process.