Skip to content

Transformer

ADLStream.models.Transformer

Transformer

Parameters:

Name Type Description Default
input_shape tuple

Shape of the input data.

required
output_size int

Number of neurons of the last layer.

required
loss tf.keras.Loss

Loss to be use for training.

required
optimizer tf.keras.Optimizer

Optimizer that implements the training algorithm. Use "custom" in order to use a customize optimizer for the transformer model.

required
output_shape tuple

Shape of the output data.

required
attribute list

Ordered list of the indexes of the attributes that we want to predict, if the number of attributes of the input is different from the ones of the output. Defaults to None.

None
num_heads int

Number of heads of the attention layer. Defaults to 4.

4
num_layers int

Number of decoder and encoder layers. Defaults to 2.

2
d_model int

Number of neurons of the dense layer at the beginning of the encoder and decoder. Defaults to 16.

16
dff int

Number of neurons of the rest of dense layers in the model. Defaults to 64.

64
pe_input int

Maximum position encoding for the input. Defaults to 1000.

1000
pe_output int

Maximum position encoding for the output. Defaults to 1000.

required
dropout_rate float between 0 and 1

Fraction of the dense units to drop. Defaults to 0.1.

0.1
activation tf.keras.Loss.Activation/String

Activation function for the point wise feed forward network. Defaults to "relu".

'relu'

Returns:

Type Description
tf.keras.Model

Transformer model

Source code in ADLStream/models/transformer.py
def Transformer(
    input_shape,
    output_size,
    loss,
    optimizer,
    output_shape,
    attribute=None,
    num_heads=4,
    num_layers=2,
    d_model=16,
    dff=64,
    pe_input=1000,
    pe_target=1000,
    dropout_rate=0.1,
    activation="relu",
):

    """Transformer

    Args:
        input_shape (tuple): Shape of the input data.
        output_size (int): Number of neurons of the last layer.
        loss (tf.keras.Loss): Loss to be use for training.
        optimizer (tf.keras.Optimizer): Optimizer that implements the training algorithm.
            Use "custom" in order to use a customize optimizer for the transformer model.
        output_shape (tuple): Shape of the output data.
        attribute (list): Ordered list of the indexes of the attributes that we want to predict, if the number of
            attributes of the input is different from the ones of the output.
            Defaults to None.
        num_heads (int): Number of heads of the attention layer.
            Defaults to 4.
        num_layers (int): Number of decoder and encoder layers. Defaults to 2.
        d_model (int): Number of neurons of the dense layer at the beginning
            of the encoder and decoder. Defaults to 16.
        dff (int): Number of neurons of the rest of dense layers in the model.
            Defaults to 64.
        pe_input (int): Maximum position encoding for the input.
            Defaults to 1000.
        pe_output (int): Maximum position encoding for the output.
            Defaults to 1000.
        dropout_rate (float between 0 and 1): Fraction of the dense units to drop.
            Defaults to 0.1.
        activation (tf.keras.Loss.Activation/String): Activation function for the
            point wise feed forward network. Defaults to "relu".

    Returns:
        tf.keras.Model: Transformer model
    """

    model = TransformerModel(
        attribute=attribute,
        input_size=input_shape[1],
        target_size=output_size,
        target_shape=output_shape,
        num_heads=num_heads,
        num_layers=num_layers,
        d_model=d_model,
        dff=dff,
        pe_input=pe_input,
        pe_target=pe_target,
        dropout_rate=dropout_rate,
        activation=activation,
    )
    if optimizer == "custom":
        learning_rate = CustomSchedule(16)
        optimizer = tf.keras.optimizers.Adam(
            learning_rate, beta_1=0.9, beta_2=0.98, epsilon=1e-9
        )
    att_inp = input_shape[-1]
    att_out = output_shape[-1]

    inp_len = input_shape[1]
    inp = np.arange(inp_len * att_inp).reshape((1, inp_len, att_inp))
    tar_inp = np.arange(output_size).reshape((1, output_shape[0], att_out))

    # First call to the model, in order to initialize the weights of the model, with arbitrary data.
    model.compile(optimizer=optimizer, loss=loss)
    model.call((inp, None, tar_inp), False)

    return model