Transformer

ADLStream.models.Transformer

Transformer

Parameters:

Name	Type	Description	Default
`input_shape`	`tuple`	Shape of the input data.	required
`output_size`	`int`	Number of neurons of the last layer.	required
`loss`	`tf.keras.Loss`	Loss to be use for training.	required
`optimizer`	`tf.keras.Optimizer`	Optimizer that implements the training algorithm. Use "custom" in order to use a customize optimizer for the transformer model.	required
`output_shape`	`tuple`	Shape of the output data.	required
`attribute`	`list`	Ordered list of the indexes of the attributes that we want to predict, if the number of attributes of the input is different from the ones of the output. Defaults to None.	`None`
`num_heads`	`int`	Number of heads of the attention layer. Defaults to 4.	`4`
`num_layers`	`int`	Number of decoder and encoder layers. Defaults to 2.	`2`
`d_model`	`int`	Number of neurons of the dense layer at the beginning of the encoder and decoder. Defaults to 16.	`16`
`dff`	`int`	Number of neurons of the rest of dense layers in the model. Defaults to 64.	`64`
`pe_input`	`int`	Maximum position encoding for the input. Defaults to 1000.	`1000`
`pe_output`	`int`	Maximum position encoding for the output. Defaults to 1000.	required
`dropout_rate`	`float between 0 and 1`	Fraction of the dense units to drop. Defaults to 0.1.	`0.1`
`activation`	`tf.keras.Loss.Activation/String`	Activation function for the point wise feed forward network. Defaults to "relu".	`'relu'`

Returns:

Type	Description
`tf.keras.Model`	Transformer model

Source code in ADLStream/models/transformer.py

def Transformer(
    input_shape,
    output_size,
    loss,
    optimizer,
    output_shape,
    attribute=None,
    num_heads=4,
    num_layers=2,
    d_model=16,
    dff=64,
    pe_input=1000,
    pe_target=1000,
    dropout_rate=0.1,
    activation="relu",
):

    """Transformer

    Args:
        input_shape (tuple): Shape of the input data.
        output_size (int): Number of neurons of the last layer.
        loss (tf.keras.Loss): Loss to be use for training.
        optimizer (tf.keras.Optimizer): Optimizer that implements the training algorithm.
            Use "custom" in order to use a customize optimizer for the transformer model.
        output_shape (tuple): Shape of the output data.
        attribute (list): Ordered list of the indexes of the attributes that we want to predict, if the number of
            attributes of the input is different from the ones of the output.
            Defaults to None.
        num_heads (int): Number of heads of the attention layer.
            Defaults to 4.
        num_layers (int): Number of decoder and encoder layers. Defaults to 2.
        d_model (int): Number of neurons of the dense layer at the beginning
            of the encoder and decoder. Defaults to 16.
        dff (int): Number of neurons of the rest of dense layers in the model.
            Defaults to 64.
        pe_input (int): Maximum position encoding for the input.
            Defaults to 1000.
        pe_output (int): Maximum position encoding for the output.
            Defaults to 1000.
        dropout_rate (float between 0 and 1): Fraction of the dense units to drop.
            Defaults to 0.1.
        activation (tf.keras.Loss.Activation/String): Activation function for the
            point wise feed forward network. Defaults to "relu".

    Returns:
        tf.keras.Model: Transformer model
    """

    model = TransformerModel(
        attribute=attribute,
        input_size=input_shape[1],
        target_size=output_size,
        target_shape=output_shape,
        num_heads=num_heads,
        num_layers=num_layers,
        d_model=d_model,
        dff=dff,
        pe_input=pe_input,
        pe_target=pe_target,
        dropout_rate=dropout_rate,
        activation=activation,
    )
    if optimizer == "custom":
        learning_rate = CustomSchedule(16)
        optimizer = tf.keras.optimizers.Adam(
            learning_rate, beta_1=0.9, beta_2=0.98, epsilon=1e-9
        )
    att_inp = input_shape[-1]
    att_out = output_shape[-1]

    inp_len = input_shape[1]
    inp = np.arange(inp_len * att_inp).reshape((1, inp_len, att_inp))
    tar_inp = np.arange(output_size).reshape((1, output_shape[0], att_out))

    # First call to the model, in order to initialize the weights of the model, with arbitrary data.
    model.compile(optimizer=optimizer, loss=loss)
    model.call((inp, None, tar_inp), False)

    return model