Function fidexGlo(const std::string&)

Function Documentation

int fidexGlo(const std::string &command = "")

Executes the FidexGlo algorithm with specified parameters to extract explanation rules for each test sample.

For each test sample, FidexGlo extracts explanation rules from the global ruleset created by the fidexGloRules algorithm. If no rule is found in the ruleset and the ‘with_fidex’ parameter is true, Fidex is called to obtain a local rule.

Notes:

  • Each file is located with respect to the root folder dimlpfidex or to the content of the ‘root_folder’ parameter if specified.

  • It’s mandatory to specify the number of attributes and classes in the data, as well as the test dataset.

  • If using Fidex, train data needs to be provided, otherwise it’s not useful. The notes below suppose train dataset is mandatory.

  • True train class labels must be provided, either within the data file or separately through a class file. Test classes are given the same way if present.

  • Train and test predictions are mandatory, either within the data file for test or separately through prediction file for both.

  • The path of the file containing the global ruleset must be provided.

  • If using Fidex, the weights file or rules_file (when training with decision trees) obtained from the model training must be provided.

  • If using Fidex, normalization parameters can be specified to denormalize the rules if data were normalized beforehand.

  • The parameter ‘explanation_file’ has to be provided to extract the explanations in a file.

  • Parameters can be defined directly via the command line or through a JSON configuration file.

  • Providing no command-line arguments or using -h/--help displays usage instructions, detailing both required and optional parameters for user guidance.

Outputs:

  • explanation_file : File containing the explanations for every test sample. An explanation is composed of one or many explanation rules.

  • console_file : If specified, contains the console output.

File formats:

  • Data files: These files should contain one sample per line, with numbers separated either by spaces, tabs, semicolons, or commas. Supported formats:

    1. Only attributes (floats).

    2. Attributes (floats) followed by an integer class ID.

    3. Attributes (floats) followed by one-hot encoded class.

  • Test data files: These files can also include predictions. The format of each sample in the file will be as follows:

    • First Line: Contains data attributes. It may be followed by class information (either as an ID or in one-hot format).

    • Second Line: Contains prediction values.

    • Third Line (optional): Contains class information, only if it was not included in the first line and if present.

  • Class files: These files should contain one class sample per line, with integers separated either by spaces, tabs, semicolons, or commas. Supported formats:

    1. Integer class ID.

    2. One-hot encoded class.

  • Prediction files: These files should contain one line per data sample, each line consisting of a series of numerical values separated by a space, a comma (CSV), a semicolon (;), or a tab representing the prediction scores for each class.

  • Global rule file

    : This file is generated by fidexGloRules. The first line contains general statistics in the form:

    ’Number of rules : 1171, mean sample covering number per rule : 236.923997, mean number of antecedents per rule : 13.020495’

    The second line indicates if a decision threshold has been used. If no, it says: ‘No decision threshold is used.’ and if yes, it says something like ‘Using a decision threshold of 0.3 for class 0’. Then there is an empty line and each rule is numbered starting from 1 and separated from each other by an empty line. A rule is in the form:

    Rule 1: X2531>=175.95 X2200>=181.05 X1828>=175.95 X2590>=178.5 X1257>=183.6 X2277>=170.85 X1816>=173.4 X3040>=183.6 -> class 0

    Train Covering size : 127

    Train Fidelity : 1

    Train Accuracy : 1

    Train Confidence : 0.999919

  • Weights file: This file should be obtained by training with Dimlp, SVM, MLP, or a CNN from dimlpfidex because an additional special Dimlp layer is needed. If the training was made with more than one network, each network is separated by a “Network <id>” marker. The first row represents bias values of the Dimlp layer and the second row are values of the weight matrix between the previous layer and the Dimlp layer. Each value is separated by a space. As an example, if the layers are of size 4, the biases are: b1 b2 b3 b4 and the weights are w1 w2 w3 w4.

  • Rule file: This file should be obtained directly by training with Random Forests or Gradient Boosting from dimlpfidex because rules need to be extracted from the trees.

  • Attributes file: Each line corresponds to one attribute, each attribute must be specified. Classes can be specified after the attributes but are not mandatory. Each attribute or class must be in one word without spaces (you can use _ to replace a space). The order is important as the first attribute/class name will represent the first attribute/class in the dataset.

  • Normalization file

    : Each line contains the mean/median and standard deviation for an attribute.

    Format: ‘2 : original mean: 0.8307, original std: 0.0425’

    Attribute indices (index 2 here) can be replaced with attribute names, then an attribute file is required.

Example of how to call the function:

from dimlpfidex import fidex

fidex.fidexGlo('--test_data_file datanormTest.txt --test_pred_file predTest.out --global_rules_file globalRules.rls --nb_attributes 16 --nb_classes 2 --explanation_file explanation.txt --root_folder dimlp/datafiles --with_fidex true --train_data_file datanormTrain.txt --train_pred_file predTrain.out --train_class_file dataclass2Train.txt --test_class_file dataclass2Test.txt --weights_file weights.wts')

Parameters:

command – A single string containing either the path to a JSON configuration file with all specified arguments, or all arguments for the function formatted like command-line input. This includes file paths, Fidex parameters, and options for output.

Returns:

Returns 0 for successful execution, -1 for errors encountered during the process.