libmultilabel.nn.networks

Module libmultilabel.nn.networks provides the following neural networks:

class libmultilabel.nn.networks.AttentionXML_0(embed_vecs, num_classes: int, rnn_dim: int, linear_size: list, freeze_embed_training: bool = False, rnn_layers: int = 1, embed_dropout: float = 0.2, encoder_dropout: float = 0, post_encoder_dropout: float = 0.5)[source]

class libmultilabel.nn.networks.AttentionXML_1(embed_vecs, num_classes: int, rnn_dim: int, linear_size: list, freeze_embed_training: bool = False, rnn_layers: int = 1, embed_dropout: float = 0.2, encoder_dropout: float = 0, post_encoder_dropout: float = 0.5)[source]

class libmultilabel.nn.networks.BERT(num_classes, encoder_hidden_dropout=0.1, encoder_attention_dropout=0.1, post_encoder_dropout=0, lm_weight='bert-base-cased', lm_window=512, **kwargs)[source]

BERT.

Parameters

num_classes (int) – Total number of classes.
encoder_hidden_dropout (float) – The dropout rate of the feed forward sublayer in each BERT layer. Defaults to 0.1.
encoder_attention_dropout (float) – The dropout rate of the attention sublayer in each BERT layer. Defaults to 0.1.
post_encoder_dropout (float) – The dropout rate of the dropout layer after the BERT model. Defaults to 0.
lm_weight (str) – Pretrained model name or path. Defaults to ‘bert-base-cased’.
lm_window (int) – Length of the subsequences to be split before feeding them to the language model. Defaults to 512.

class libmultilabel.nn.networks.BERTAttention(num_classes, encoder_hidden_dropout=0.1, encoder_attention_dropout=0.1, post_encoder_dropout=0, lm_weight='bert-base-cased', lm_window=512, num_heads=8, attention_type='multihead', labelwise_attention_dropout=0, **kwargs)[source]

BERT + Label-wise Document Attention or Multi-Head Attention.

Parameters

num_classes (int) – Total number of classes.
encoder_hidden_dropout (float) – The dropout rate of the feed forward sublayer in each BERT layer. Defaults to 0.1.
encoder_attention_dropout (float) – The dropout rate of the attention sublayer in each BERT layer. Defaults to 0.1.
post_encoder_dropout (float) – The dropout rate of the dropout layer after the BERT model. Defaults to 0.
lm_weight (str) – Pretrained model name or path. Defaults to ‘bert-base-cased’.
lm_window (int) – Length of the subsequences to be split before feeding them to the language model. Defaults to 512.
num_heads (int) – The number of parallel attention heads. Defaults to 8.
attention_type (str) – Type of attention to use (caml or multihead). Defaults to ‘multihead’.
labelwise_attention_dropout (float) – The dropout rate for labelwise multi-head attention. Defaults to 0.

lm_feature(input_ids)[source]

BERT takes an input of a sequence of no more than 512 tokens. Therefore, long sequence are split into subsequences of size lm_window, which is a number no greater than 512. If the split subsequence is shorter than lm_window, pad it with the pad token.

Parameters: input_ids (torch.Tensor) – Input ids of the sequence with shape (batch_size, sequence_length).
Returns: The representation of the sequence.
Return type: torch.Tensor

class libmultilabel.nn.networks.BiGRULWAN(embed_vecs, num_classes, rnn_dim=512, rnn_layers=1, embed_dropout=0.2, encoder_dropout=0, post_encoder_dropout=0)[source]

BiGRU Labelwise Attention Network.

Parameters

embed_vecs (torch.Tensor) – The pre-trained word vectors of shape (vocab_size, embed_dim).
num_classes (int) – Total number of classes.
rnn_dim (int) – The size of bidirectional hidden layers. The hidden size of the GRU network is set to rnn_dim//2. Defaults to 512.
rnn_layers (int) – The number of recurrent layers. Defaults to 1.
embed_dropout (float) – The dropout rate of the word embedding. Defaults to 0.2.
encoder_dropout (float) – The dropout rate of the encoder. Defaults to 0.
post_encoder_dropout (float) – The dropout rate of the dropout layer after the encoder. Defaults to 0.

class libmultilabel.nn.networks.BiLSTMLWAN(embed_vecs, num_classes, rnn_dim=512, rnn_layers=1, embed_dropout=0.2, encoder_dropout=0, post_encoder_dropout=0)[source]

BiLSTM Labelwise Attention Network.

Parameters

embed_vecs (torch.Tensor) – The pre-trained word vectors of shape (vocab_size, embed_dim).
num_classes (int) – Total number of classes.
rnn_dim (int) – The size of bidirectional hidden layers. The hidden size of the LSTM network is set to rnn_dim//2. Defaults to 512.
rnn_layers (int) – The number of recurrent layers. Defaults to 1.
embed_dropout (float) – The dropout rate of the word embedding. Defaults to 0.2.
encoder_dropout (float) – The dropout rate of the encoder. Defaults to 0.
post_encoder_dropout (float) – The dropout rate of the dropout layer after the encoder. Defaults to 0.

class libmultilabel.nn.networks.BiLSTMLWMHAN(embed_vecs, num_classes, rnn_dim=512, rnn_layers=1, embed_dropout=0.2, encoder_dropout=0, post_encoder_dropout=0, num_heads=8, labelwise_attention_dropout=0)[source]

BiLSTM Labelwise Multihead Attention Network.

Parameters

embed_vecs (torch.Tensor) – The pre-trained word vectors of shape (vocab_size, embed_dim).
num_classes (int) – Total number of classes.
rnn_dim (int) – The size of bidirectional hidden layers. The hidden size of the LSTM network is set to rnn_dim//2. Defaults to 512.
rnn_layers (int) – The number of recurrent layers. Defaults to 1.
embed_dropout (float) – The dropout rate of the word embedding. Defaults to 0.2.
encoder_dropout (float) – The dropout rate of the encoder. Defaults to 0.
post_encoder_dropout (float) – The dropout rate of the dropout layer after the encoder. Defaults to 0.
num_heads (int) – The number of parallel attention heads. Defaults to 8.
labelwise_attention_dropout (float) – The dropout rate for the attention. Defaults to 0.

class libmultilabel.nn.networks.CAML(Convolutional Attention for Multi-Label classification)[source]

Following Mullenbach et al. [https://aclanthology.org/N18-1100.pdf], this class is for reproducing the results in the paper. Use CNNLWAN instead for better modularization.

Parameters

embed_vecs (torch.Tensor) – The pre-trained word vectors of shape (vocab_size, embed_dim).
num_classes (int) – Total number of classes.
filter_sizes (list) – Size of convolutional filters.
num_filter_per_size (int) – The number of filters in convolutional layers in each size. Defaults to 50.
embed_dropout (float) – The dropout rate of the word embedding. Defaults to 0.2.

class libmultilabel.nn.networks.CNNLWAN(embed_vecs, num_classes, filter_sizes=None, num_filter_per_size=50, embed_dropout=0.2, post_encoder_dropout=0, activation='tanh')[source]

CNN Labelwise Attention Network.

Parameters

embed_vecs (torch.Tensor) – The pre-trained word vectors of shape (vocab_size, embed_dim).
num_classes (int) – Total number of classes.
filter_sizes (list) – Size of convolutional filters.
num_filter_per_size (int) – The number of filters in convolutional layers in each size. Defaults to 50.
embed_dropout (float) – The dropout rate of the word embedding. Defaults to 0.2.
post_encoder_dropout (float) – The dropout rate of the encoder output. Defaults to 0.
activation (str) – Activation function to be used. Defaults to ‘tanh’.

class libmultilabel.nn.networks.KimCNN(embed_vecs, num_classes, filter_sizes=None, num_filter_per_size=128, embed_dropout=0.2, post_encoder_dropout=0, activation='relu')[source]

KimCNN.

Parameters

embed_vecs (torch.Tensor) – The pre-trained word vectors of shape (vocab_size, embed_dim).
num_classes (int) – Total number of classes.
filter_sizes (list) – The size of convolutional filters.
num_filter_per_size (int) – The number of filters in convolutional layers in each size. Defaults to 128.
embed_dropout (float) – The dropout rate of the word embedding. Defaults to 0.2.
post_encoder_dropout (float) – The dropout rate of the encoder output. Defaults to 0.
activation (str) – Activation function to be used. Defaults to ‘relu’.

class libmultilabel.nn.networks.XMLCNN(embed_vecs, num_classes, embed_dropout=0.2, post_encoder_dropout=0, filter_sizes=None, hidden_dim=512, num_filter_per_size=256, num_pool=2, activation='relu')[source]

XML-CNN.

Parameters

embed_vecs (torch.Tensor) – The pre-trained word vectors of shape (vocab_size, embed_dim).
num_classes (int) – Total number of classes.
embed_dropout (float) – The dropout rate of the word embedding. Defaults to 0.2.
post_encoder_dropout (float) – The dropout rate of the hidden layer output. Defaults to 0.
filter_sizes (list) – Size of convolutional filters.
hidden_dim (int) – Dimension of the hidden layer. Defaults to 512.
num_filter_per_size (int) – The number of filters in convolutional layers in each size. Defaults to 256.
num_pool (int) – The number of pool for dynamic max-pooling. Defaults to 2.
activation (str) – Activation function to be used. Defaults to ‘relu’.