Class DataSet

Class Documentation

class DataSet

Represents a dataset for the Dimlp algorithm.

This class handles reading datasets from files, storing them, and providing methods to manipulate and access the data.

Public Functions

DataSet() = default

Default constructor.

explicit DataSet(int nbEx)

Construct a DataSet with a specified number of examples.

Parameters:

nbEx – Number of examples.

DataSet(const std::string &nameFile, int nbAttr)

Construct a DataSet from a file with a specified number of attributes.

Parameters:
  • nameFile – Path to the dataset file.

  • nbAttr – Number of attributes.

DataSet(const std::string &nameFile, int nbIn, int nbOut)

Construct a DataSet from a file with a specified number of attributes and classes.

Parameters:
  • nameFile – Path to the data file.

  • nbIn – Number of input attributes (attributes).

  • nbOut – Number of output attributes (class labels).

Throws:

FileContentError – If there is a problem with the file format or content.

DataSet(DataSet &bigData, StringInt *listPat)

Construct a DataSet from a larger dataset and a list of pattern indices.

Parameters:
  • bigData – Larger dataset.

  • listPat – List of pattern indices.

DataSet(DataSet &master, const int *indPat, int nbEx)

Construct a DataSet from a master dataset and an array of pattern indices.

Parameters:
  • master – Master dataset.

  • indPat – Array of pattern indices.

  • nbEx – Number of examples.

DataSet(DataSet &data1, DataSet &data2)

Construct a DataSet by merging two datasets.

Parameters:
  • data1 – First dataset.

  • data2 – Second dataset.

void Del()

Delete the dataset and free memory.

inline float *GetExample(int index)

Get an example from the dataset.

Parameters:

index – Index of the example.

Returns:

Pointer to the example.

inline int GetNbEx() const

Get the number of examples.

Returns:

Number of examples.

inline int GetNbAttr() const

Get the number of attributes.

Returns:

Number of attributes.

std::shared_ptr<StringInt> Select(std::shared_ptr<DimlpRule> r)

Select examples from the dataset based on a given rule.

Parameters:

r – Pointer to the rule.

Returns:

Pointer to a StringInt object containing the selected example indices.

std::shared_ptr<StringInt> Select(std::shared_ptr<DimlpRule> r, std::shared_ptr<StringInt> subSet)

Select examples from a subset of the dataset based on a given rule.

Parameters:
  • r – Pointer to the rule.

  • subSet – Pointer to a StringInt object containing the subset of example indices.

Returns:

Pointer to a StringInt object containing the selected example indices.

void ExtractDataAndTarget(DataSet &data1, int nbAttr1, DataSet &data2, int nbAttr2) const

Extract data and target (class) attributes from the current dataset.

Parameters:
  • data1 – Dataset for the attributes.

  • nbAttr1 – Number of attributes.

  • data2 – Dataset for the targets (classes).

  • nbAttr2 – Number of targets (classes).

Throws:

FileContentError – If the class data is missing or invalid.