ADLStream.data.ClassificationStreamGenerator

Classification stream generator.

This class is used for generating streams for classification problems.

Parameters:

Name	Type	Description	Default
`stream`	`inherits ADLStream.data.stream.BaseStream`	Stream source to be feed to the ADLStream framework.	required
`label_index`	`int or list`	The column index/indices of the target label. Defaults to -1.	`[-1]`
`one_hot_labels`	`list or None`	Possible label values if one-hot encoding must be done. If None, the target value is not one-hot encoded. Defaults to None.	`None`

Source code in ADLStream/data/classification_generator.py

class ClassificationStreamGenerator(BaseStreamGenerator):
    """Classification stream generator.

    This class is used for generating streams for classification problems.

    Arguments:
        stream (inherits ADLStream.data.stream.BaseStream):
            Stream source to be feed to the ADLStream framework.
        label_index (int or list, optional): The column index/indices of the target
            label.
            Defaults to -1.
        one_hot_labels (list or None, optional): Possible label values if one-hot
            encoding must be done. If None, the target value is not one-hot encoded.
            Defaults to None.
    """

    def __init__(self, stream, label_index=[-1], one_hot_labels=None, **kwargs):
        super().__init__(stream, **kwargs)
        self.label_index = label_index if type(label_index) is list else [label_index]
        self.labels = one_hot_labels
        self.one_hot_encoder = None
        if self.labels:
            self.one_hot_encoder = OneHotEncoder()
            self.one_hot_encoder.fit(np.asarray(self.labels).reshape(-1, 1))

    def preprocess(self, message):
        x = message
        y = [message.pop(i) for i in self.label_index]

        if self.labels:
            y = self.one_hot_encoder.transform([y]).toarray()
            y = list(y[0])

        return x, y

`preprocess(self, message)`

The function that contains the logic to transform a stream message into model imput and target data (x ,y).

Both output, x or y, can be None what means it should not be added to the context.

The target data y can be delayed. Although we are sending x and y at the same time, it does not mean that y is the corresponding target value of x. However, input data and target data should be in order: y_i is the target value of x_i. So the first target data sent (y_0) corresponds with the first input sent (x_0).

Parameters:

Name	Type	Description	Default
`message`	`list`	message received from the stream	required

Exceptions:

Type	Description
`NotImplementedError`	This is an abstract method which should be implemented.

Returns:

Type	Description
`x (list)`	instance of model's input data. y (list): instance of model's target data.

Source code in ADLStream/data/classification_generator.py

def preprocess(self, message):
    x = message
    y = [message.pop(i) for i in self.label_index]

    if self.labels:
        y = self.one_hot_encoder.transform([y]).toarray()
        y = list(y[0])

    return x, y