torchnlp.nn package

The neural network nn package torchnlp.nn introduces a set of torch.nn.Module commonly used in NLP.

class torchnlp.nn.LockedDropout(p=0.5)[source]

LockedDropout applies the same dropout mask to every time step.

Thank you to Sales Force for their initial implementation of WeightDrop. Here is their License.

Parameters:p (float) – Probability of an element in the dropout mask to be zeroed.
forward(x)[source]
Parameters:x (torch.FloatTensor [sequence length, batch size, rnn hidden size]) – Input to apply dropout too.
class torchnlp.nn.Attention(dimensions, attention_type='general')[source]

Applies attention mechanism on the context using the query.

Thank you to IBM for their initial implementation of Attention. Here is their License.

Parameters:
  • dimensions (int) – Dimensionality of the query and context.
  • attention_type (str, optional) –

    How to compute the attention score:

    • dot: \(score(H_j,q) = H_j^T q\)
    • general: \(score(H_j, q) = H_j^T W_a q\)

Example

>>> attention = Attention(256)
>>> query = torch.randn(5, 1, 256)
>>> context = torch.randn(5, 5, 256)
>>> output, weights = attention(query, context)
>>> output.size()
torch.Size([5, 1, 256])
>>> weights.size()
torch.Size([5, 1, 5])
forward(query, context)[source]
Parameters:
  • query (torch.FloatTensor [batch size, output length, dimensions]) – Sequence of queries to query the context.
  • context (torch.FloatTensor [batch size, query length, dimensions]) – Data overwhich to apply the attention mechanism.
Returns:

  • output (torch.LongTensor [batch size, output length, dimensions]): Tensor containing the attended features.
  • weights (torch.FloatTensor [batch size, output length, query length]): Tensor containing attention weights.

Return type:

tuple with output and weights

class torchnlp.nn.CNNEncoder(embedding_dim, num_filters, ngram_filter_sizes=(2, 3, 4, 5), conv_layer_activation=ReLU(), output_dim=None)[source]

A combination of multiple convolution layers and max pooling layers.

The CNN has one convolution layer for each ngram filter size. Each convolution operation gives out a vector of size num_filters. The number of times a convolution layer will be used is num_tokens - ngram_size + 1. The corresponding maxpooling layer aggregates all these outputs from the convolution layer and outputs the max.

This operation is repeated for every ngram size passed, and consequently the dimensionality of the output after maxpooling is len(ngram_filter_sizes) * num_filters. This then gets (optionally) projected down to a lower dimensional output, specified by output_dim.

We then use a fully connected layer to project in back to the desired output_dim. For more details, refer to “A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification”, Zhang and Wallace 2016, particularly Figure 1.

Thank you to AI2 for their initial implementation of CNNEncoder. Here is their License.

Parameters:
  • embedding_dim (int) – This is the input dimension to the encoder. We need this because we can’t do shape inference in pytorch, and we need to know what size filters to construct in the CNN.
  • num_filters (int) – This is the output dim for each convolutional layer, which is the number of “filters” learned by that layer.
  • ngram_filter_sizes (tuple of int, optional) – This specifies both the number of convolutional layers we will create and their sizes. The default of (2, 3, 4, 5) will have four convolutional layers, corresponding to encoding ngrams of size 2 to 5 with some number of filters.
  • conv_layer_activation (torch.nn.Module, optional) – Activation to use after the convolution layers.
  • output_dim (int or None, optional) – After doing convolutions and pooling, we’ll project the collected features into a vector of this size. If this value is None, we will just return the result of the max pooling, giving an output of shape len(ngram_filter_sizes) * num_filters.
forward(tokens, mask=None)[source]
Parameters:
  • tokens (torch.FloatTensor [batch_size, num_tokens, input_dim]) – Sequence matrix to encode.
  • mask (torch.FloatTensor) – Broadcastable matrix to tokens used as a mask.
Returns:

Encoding of sequence.

Return type:

(torch.FloatTensor [batch_size, output_dim])

get_input_dim()[source]
get_output_dim()[source]
class torchnlp.nn.WeightDrop(module, weights, dropout=0.0)[source]

The weight-dropped module applies recurrent regularization through a DropConnect mask on the hidden-to-hidden recurrent weights.

Thank you to Sales Force for their initial implementation of WeightDrop. Here is their License.

Parameters:
  • module (torch.nn.Module) – Containing module.
  • weights (list of str) – Names of the module weight parameters to apply a dropout too.
  • dropout (float) – The probability a weight will be dropped.

Example

>>> from torchnlp.nn import WeightDrop
>>> import torch
>>>
>>> torch.manual_seed(123)
<torch._C.Generator object ...
>>>
>>> gru = torch.nn.GRUCell(2, 2)
>>> weights = ['weight_hh']
>>> weight_drop_gru = WeightDrop(gru, weights, dropout=0.9)
>>>
>>> input_ = torch.randn(3, 2)
>>> hidden_state = torch.randn(3, 2)
>>> weight_drop_gru(input_, hidden_state)
tensor(... grad_fn=<AddBackward0>)
class torchnlp.nn.WeightDropGRU(*args, weight_dropout=0.0, **kwargs)[source]

Wrapper around torch.nn.GRU that adds weight_dropout named argument.

Parameters:weight_dropout (float) – The probability a weight will be dropped.
class torchnlp.nn.WeightDropLSTM(*args, weight_dropout=0.0, **kwargs)[source]

Wrapper around torch.nn.LSTM that adds weight_dropout named argument.

Parameters:weight_dropout (float) – The probability a weight will be dropped.
class torchnlp.nn.WeightDropLinear(*args, weight_dropout=0.0, **kwargs)[source]

Wrapper around torch.nn.Linear that adds weight_dropout named argument.

Parameters:weight_dropout (float) – The probability a weight will be dropped.