libmultilabel.nn.networks

Module libmultilabel.nn.networks provides the following neural networks:

class libmultilabel.nn.networks.AttentionXML_0(embed_vecs, num_classes: int, rnn_dim: int, linear_size: list, freeze_embed_training: bool = False, rnn_layers: int = 1, embed_dropout: float = 0.2, encoder_dropout: float = 0, post_encoder_dropout: float = 0.5)[source]
class libmultilabel.nn.networks.AttentionXML_1(embed_vecs, num_classes: int, rnn_dim: int, linear_size: list, freeze_embed_training: bool = False, rnn_layers: int = 1, embed_dropout: float = 0.2, encoder_dropout: float = 0, post_encoder_dropout: float = 0.5)[source]
class libmultilabel.nn.networks.BERT(num_classes, encoder_hidden_dropout=0.1, encoder_attention_dropout=0.1, post_encoder_dropout=0, lm_weight='bert-base-cased', lm_window=512, **kwargs)[source]
Parameters
  • num_classes (int) – Total number of classes.

  • encoder_hidden_dropout (float) – The dropout rate of the feed forward sublayer in each BERT layer. Defaults to 0.1.

  • encoder_attention_dropout (float) – The dropout rate of the attention sublayer in each BERT layer. Defaults to 0.1.

  • post_encoder_dropout (float) – The dropout rate of the dropout layer after the BERT model. Defaults to 0.

  • lm_weight (str) – Pretrained model name or path. Defaults to ‘bert-base-cased’.

  • lm_window (int) – Length of the subsequences to be split before feeding them to the language model. Defaults to 512.

class libmultilabel.nn.networks.BERTAttention(num_classes, encoder_hidden_dropout=0.1, encoder_attention_dropout=0.1, post_encoder_dropout=0, lm_weight='bert-base-cased', lm_window=512, num_heads=8, attention_type='multihead', labelwise_attention_dropout=0, **kwargs)[source]

BERT + Label-wise Document Attention or Multi-Head Attention

Parameters
  • num_classes (int) – Total number of classes.

  • encoder_hidden_dropout (float) – The dropout rate of the feed forward sublayer in each BERT layer. Defaults to 0.1.

  • encoder_attention_dropout (float) – The dropout rate of the attention sublayer in each BERT layer. Defaults to 0.1.

  • post_encoder_dropout (float) – The dropout rate of the dropout layer after the BERT model. Defaults to 0.

  • lm_weight (str) – Pretrained model name or path. Defaults to ‘bert-base-cased’.

  • lm_window (int) – Length of the subsequences to be split before feeding them to the language model. Defaults to 512.

  • num_heads (int) – The number of parallel attention heads. Defaults to 8.

  • attention_type (str) – Type of attention to use (caml or multihead). Defaults to ‘multihead’.

  • labelwise_attention_dropout (float) – The dropout rate for labelwise multi-head attention. Defaults to 0.

lm_feature(input_ids)[source]

BERT takes an input of a sequence of no more than 512 tokens. Therefore, long sequence are split into subsequences of size lm_window, which is a number no greater than 512. If the split subsequence is shorter than lm_window, pad it with the pad token.

Parameters

input_ids (torch.Tensor) – Input ids of the sequence with shape (batch_size, sequence_length).

Returns

The representation of the sequence.

Return type

torch.Tensor

class libmultilabel.nn.networks.BiGRULWAN(embed_vecs, num_classes, rnn_dim=512, rnn_layers=1, embed_dropout=0.2, encoder_dropout=0, post_encoder_dropout=0)[source]

BiGRU Labelwise Attention Network

Parameters
  • embed_vecs (torch.Tensor) – The pre-trained word vectors of shape (vocab_size, embed_dim).

  • num_classes (int) – Total number of classes.

  • rnn_dim (int) – The size of bidirectional hidden layers. The hidden size of the GRU network is set to rnn_dim//2. Defaults to 512.

  • rnn_layers (int) – The number of recurrent layers. Defaults to 1.

  • embed_dropout (float) – The dropout rate of the word embedding. Defaults to 0.2.

  • encoder_dropout (float) – The dropout rate of the encoder. Defaults to 0.

  • post_encoder_dropout (float) – The dropout rate of the dropout layer after the encoder. Defaults to 0.

class libmultilabel.nn.networks.BiLSTMLWAN(embed_vecs, num_classes, rnn_dim=512, rnn_layers=1, embed_dropout=0.2, encoder_dropout=0, post_encoder_dropout=0)[source]

BiLSTM Labelwise Attention Network

Parameters
  • embed_vecs (torch.Tensor) – The pre-trained word vectors of shape (vocab_size, embed_dim).

  • num_classes (int) – Total number of classes.

  • rnn_dim (int) – The size of bidirectional hidden layers. The hidden size of the LSTM network is set to rnn_dim//2. Defaults to 512.

  • rnn_layers (int) – The number of recurrent layers. Defaults to 1.

  • embed_dropout (float) – The dropout rate of the word embedding. Defaults to 0.2.

  • encoder_dropout (float) – The dropout rate of the encoder. Defaults to 0.

  • post_encoder_dropout (float) – The dropout rate of the dropout layer after the encoder. Defaults to 0.

class libmultilabel.nn.networks.BiLSTMLWMHAN(embed_vecs, num_classes, rnn_dim=512, rnn_layers=1, embed_dropout=0.2, encoder_dropout=0, post_encoder_dropout=0, num_heads=8, labelwise_attention_dropout=0)[source]

BiLSTM Labelwise Multihead Attention Network

Parameters
  • embed_vecs (torch.Tensor) – The pre-trained word vectors of shape (vocab_size, embed_dim).

  • num_classes (int) – Total number of classes.

  • rnn_dim (int) – The size of bidirectional hidden layers. The hidden size of the LSTM network is set to rnn_dim//2. Defaults to 512.

  • rnn_layers (int) – The number of recurrent layers. Defaults to 1.

  • embed_dropout (float) – The dropout rate of the word embedding. Defaults to 0.2.

  • encoder_dropout (float) – The dropout rate of the encoder. Defaults to 0.

  • post_encoder_dropout (float) – The dropout rate of the dropout layer after the encoder. Defaults to 0.

  • num_heads (int) – The number of parallel attention heads. Defaults to 8.

  • labelwise_attention_dropout (float) – The dropout rate for the attention. Defaults to 0.

class libmultilabel.nn.networks.CAML(Convolutional Attention for Multi-Label classification)[source]

Follows the work of Mullenbach et al. [https://aclanthology.org/N18-1100.pdf] This class is for reproducing the results in the paper. Use CNNLWAN instead for better modularization.

Parameters
  • embed_vecs (torch.Tensor) – The pre-trained word vectors of shape (vocab_size, embed_dim).

  • num_classes (int) – Total number of classes.

  • filter_sizes (list) – Size of convolutional filters.

  • num_filter_per_size (int) – The number of filters in convolutional layers in each size. Defaults to 50.

  • embed_dropout (float) – The dropout rate of the word embedding. Defaults to 0.2.

class libmultilabel.nn.networks.CNNLWAN(embed_vecs, num_classes, filter_sizes=None, num_filter_per_size=50, embed_dropout=0.2, post_encoder_dropout=0, activation='tanh')[source]

CNN Labelwise Attention Network

Parameters
  • embed_vecs (torch.Tensor) – The pre-trained word vectors of shape (vocab_size, embed_dim).

  • num_classes (int) – Total number of classes.

  • filter_sizes (list) – Size of convolutional filters.

  • num_filter_per_size (int) – The number of filters in convolutional layers in each size. Defaults to 50.

  • embed_dropout (float) – The dropout rate of the word embedding. Defaults to 0.2.

  • post_encoder_dropout (float) – The dropout rate of the encoder output. Defaults to 0.

  • activation (str) – Activation function to be used. Defaults to ‘tanh’.

class libmultilabel.nn.networks.KimCNN(embed_vecs, num_classes, filter_sizes=None, num_filter_per_size=128, embed_dropout=0.2, post_encoder_dropout=0, activation='relu')[source]
Parameters
  • embed_vecs (torch.Tensor) – The pre-trained word vectors of shape (vocab_size, embed_dim).

  • num_classes (int) – Total number of classes.

  • filter_sizes (list) – The size of convolutional filters.

  • num_filter_per_size (int) – The number of filters in convolutional layers in each size. Defaults to 128.

  • embed_dropout (float) – The dropout rate of the word embedding. Defaults to 0.2.

  • post_encoder_dropout (float) – The dropout rate of the encoder output. Defaults to 0.

  • activation (str) – Activation function to be used. Defaults to ‘relu’.

class libmultilabel.nn.networks.XMLCNN(embed_vecs, num_classes, embed_dropout=0.2, post_encoder_dropout=0, filter_sizes=None, hidden_dim=512, num_filter_per_size=256, num_pool=2, activation='relu')[source]

XML-CNN

Parameters
  • embed_vecs (torch.Tensor) – The pre-trained word vectors of shape (vocab_size, embed_dim).

  • num_classes (int) – Total number of classes.

  • embed_dropout (float) – The dropout rate of the word embedding. Defaults to 0.2.

  • post_encoder_dropout (float) – The dropout rate of the hidden layer output. Defaults to 0.

  • filter_sizes (list) – Size of convolutional filters.

  • hidden_dim (int) – Dimension of the hidden layer. Defaults to 512.

  • num_filter_per_size (int) – The number of filters in convolutional layers in each size. Defaults to 256.

  • num_pool (int) – The number of pool for dynamic max-pooling. Defaults to 2.

  • activation (str) – Activation function to be used. Defaults to ‘relu’.