rules_extraction package
Submodules
rules_extraction.plot module
- rules_extraction.plot.plot_accuracy(rules, df_test, class_name=None, n=5, save_path=None)
Plots and optionally saves a plot of accuracy vs. number of rules.
- Parameters:
X_test – test data features
y_test – test data labels
class_name – string, name of the class
N – int, maximum number of rules to consider
save_path – str, if provided, the path where the plot will be saved
- rules_extraction.plot.plot_frontier(df, rule, target_class, model=None, alpha=0.65, save_path=None, device=None)
Plots and optionally saves a plot showing one rule frontier and embedded images.
- Parameters:
df – data that stores image label and path
rule – rule you want to plot, should be a Rule object
target_class – string, name of the class
model – torch model you used
alpha – float between 0 and 1, transparency level
save_path – str, if provided, the path where the plot will be saved
- rules_extraction.plot.transform()
rules_extraction.rules module
- class rules_extraction.rules.EnsembleRule(rules)
Bases:
BaseEstimator
,ClassifierMixin
A simple ensemble of rule-based classifiers.
- Parameters:
rules (list of Rule) – List of individual rule-based classifiers.
- fit(X, y)
Fit the ensemble to the training data.
- predict(X)
Predict labels for the given data.
- score(X, y)
Calculate the accuracy score for the given data and true labels.
- fit(X, y)
Fit the ensemble to the training data.
- Parameters:
X (array-like or pd.DataFrame) – The training input samples.
y (array-like) – The target values.
- Returns:
self – Returns an instance of the ensemble.
- Return type:
object
- predict(X)
Predict labels for the given data.
- Parameters:
X (array-like or pd.DataFrame) – The input samples.
- Returns:
predictions – Array of predicted labels.
- Return type:
array-like
- score(X, y)
Calculate the accuracy score for the given data and true labels.
- Parameters:
X (array-like or pd.DataFrame) – The input samples.
y (array-like) – The true labels.
- Returns:
accuracy – The accuracy score.
- Return type:
float
- class rules_extraction.rules.Rule(conditions, label)
Bases:
BaseEstimator
A simple rule-based classifier.
- Parameters:
conditions (list of str) – List of conditions defining the rule.
label (int) – The label assigned when the conditions are met.
- ops
Dictionary mapping comparison operators to corresponding functions.
- Type:
dict
- fit(X, y=None)
Fit the rule to the training data (not used in this basic implementation).
- predict(X)
Predict labels for the given data.
- score(X, y)
Calculate the accuracy score for the given data and true labels.
- fit(X, y=None)
Fit the rule to the training data.
- Parameters:
X (array-like or pd.DataFrame) – The training input samples.
y (array-like, default=None) – Ignored.
- Returns:
self – Returns an instance of the rule.
- Return type:
object
- predict(X)
Predict labels for the given data.
- Parameters:
X (array-like or pd.DataFrame) – The input samples.
- Returns:
predictions – Array of predicted labels.
- Return type:
array-like
- score(X, y)
Calculate the accuracy score for the given data and true labels.
- Parameters:
X (array-like or pd.DataFrame) – The input samples.
y (array-like) – The true labels.
- Returns:
accuracy – The accuracy score.
- Return type:
float
- class rules_extraction.rules.RuleRanker(rules, X, y)
Bases:
object
Handler for managing, applying, and evaluating rules extracted from a Random Forest model.
- Parameters:
rules (list) – The list of rules. Each rule should be a list or a string.
- data_to_rules(X_arr)
Transform a dataset based on the set of rules, creating binary features.
- Parameters:
X_arr (numpy.ndarray) – The input data array.
- Returns:
The transformed data array.
- Return type:
numpy.ndarray
- fit_perceptron(X_train, y_train, penalty='l1', alpha=0.01, **kwargs)
Fit a Perceptron model to the training data.
- Parameters:
X_train (numpy.ndarray) – The input training data.
y_train (numpy.ndarray) – The target values for training data.
penalty (str) – The penalty to be used by the Perceptron model (default is ‘l1’).
alpha (float) – Constant that multiplies the regularization term (default is 0.01).
- static is_rule(data_point, rule)
Check whether a data point satisfies a particular rule.
- Parameters:
data_point (numpy.ndarray) – The data point to be checked.
rule (tuple) – The rule against which to check the data point. Expected to be a tuple of (list, int).
- Returns:
True if the data point satisfies the rule, False otherwise.
- Return type:
bool
- ops = {'!=': <built-in function ne>, '<': <built-in function lt>, '<=': <built-in function le>, '==': <built-in function eq>, '>': <built-in function gt>, '>=': <built-in function ge>}
- rank_rules(N=None, penalty='l1', alpha=0.01, **kwargs)
Rank the rules based on the absolute values of Perceptron coefficients.
- Parameters:
N (int or None) – Optional parameter to return the top n rules.
- Returns:
A list of tuples containing rule and its absolute importance.
- Return type:
list
- Raises:
ValueError – If the perceptron has not been trained.
rules_extraction.utils module
- rules_extraction.utils.compute_avg_features(model=None, loader=None, class_dict=None, device=None, use_existing=False, save_csv=None, csv_path='./features_map.csv')
Compute average features for images using a pre-trained PyTorch model or load from existing CSV.
- Parameters:
model (torch.nn.Module, optional) – Pre-trained PyTorch neural network model. Required if use_existing=False.
loader (torch.utils.data.DataLoader, optional) – Data loader containing images and labels, and optionally file paths. Required if use_existing=False.
class_dict (dict or None, optional) – A dictionary mapping class indices to class labels. If None, class indices are used as labels.
device (torch.device, optional) – Device (CPU or GPU) on which the computation will be performed. Required if use_existing=False.
use_existing (bool, optional) – If True, use existing CSV if available. If False, always compute new features. Default is False.
csv_path (str, optional) – Path to save or load the CSV file. Default is “./features_map.csv”.
save_csv (bool or None, optional) – If True, save the resulting DataFrame to a CSV file. If None (default), save only when computing new features.
- Returns:
DataFrame containing computed average features, labels, and file paths (if available) for each image.
- Return type:
pd.DataFrame
- Raises:
TypeError – If use_existing=False and the provided model is not a PyTorch module or loader not a PyTorch dataloader.
ValueError – If use_existing=False and required parameters (model, loader, device) are not provided.
Notes
If use_existing=True and the CSV file exists, it will be loaded without using other parameters. If use_existing=False, new features will be computed using the provided model and loader.
- rules_extraction.utils.extract_all_rules(X, y, **kwargs)
Extract rules from all the trees in the random forest.
- Parameters:
X – array-like or pd.DataFrame The input samples.
y – array-like The target values.
**kwargs –
Additional parameters to configure the RandomForestClassifier. - n_estimators: The number of trees in the forest (default=100). - Other parameters available in RandomForestClassifier.
- Returns:
List of all extracted rules.
- Return type:
list
- rules_extraction.utils.extract_features_resnet(x)
Predefined feature extraction for ResNet-like models. [NOT IMPLEMENTED]
- Parameters:
x (torch.Tensor) – input data tensor
- rules_extraction.utils.extract_features_vgg(model, x)
Predefined feature extraction for VGG-like models.
- Parameters:
x (torch.Tensor) – input data tensor
- Returns:
extracted features
- Return type:
torch.Tensor
- rules_extraction.utils.extract_rules(tree, feature_columns)
Extract rules from a single decision tree.
- rules_extraction.utils.filter_dataset(model, loader, device)
Use a PyTorch DataLoader and a PyTorch model to identify and return the indices of correctly predicted datapoints.
This function allows creating a filtered loader using the obtained index list.
- Parameters:
model (torch.nn.Module) – A pre-trained PyTorch model.
loader (torch.utils.data.DataLoader) – DataLoader containing images, labels, and image paths.
device (torch.device) – Device (CPU or GPU) on which the computation will be performed.
- Returns:
List of indices corresponding to correctly predicted datapoints in the loader.
- Return type:
list
- Raises:
TypeError – If the provided model is not a PyTorch module or the loader is not a PyTorch DataLoader.
Notes
This function iterates over the provided DataLoader, evaluates the model on each batch, and identifies the indices of correct predictions. The resulting list of indices can be used to create a filtered loader for further analysis or evaluation.
- rules_extraction.utils.is_torch_loader(obj)
Check if the given object is a PyTorch DataLoader.
- Parameters:
obj – any The object to be checked.
- Returns:
bool True if the object is a PyTorch DataLoader, False otherwise.
- rules_extraction.utils.is_torch_model(obj)
Check if the given object is a PyTorch model.
- Parameters:
obj – any The object to be checked.
- Returns:
bool True if the object is a PyTorch model, False otherwise.
- rules_extraction.utils.make_target_df(df_features, target_class)
Produces a DataFrame with binary labels: 1 for target_class and 0 for other classes.
- Parameters:
df_features (pd.DataFrame) – input DataFrame
target_class (int or str) – class label to be considered as target (1)
- Returns:
new DataFrame with binary labels
- Return type:
pd.DataFrame
- rules_extraction.utils.recurse(tree_, feature_name, node, current_rule, rules_list)
Recursively traverse the tree to extract rules.