{
"cells": [
{
"cell_type": "markdown",
"id": "81d18467",
"metadata": {
"origin_pos": 0
},
"source": [
"# Sentiment Analysis: Using Convolutional Neural Networks\n",
":label:`sec_sentiment_cnn` \n",
"\n",
"\n",
"In :numref:`chap_cnn`,\n",
"we investigated mechanisms\n",
"for processing\n",
"two-dimensional image data\n",
"with two-dimensional CNNs,\n",
"which were applied to\n",
"local features such as adjacent pixels.\n",
"Though originally\n",
"designed for computer vision,\n",
"CNNs are also widely used\n",
"for natural language processing.\n",
"Simply put,\n",
"just think of any text sequence\n",
"as a one-dimensional image.\n",
"In this way,\n",
"one-dimensional CNNs\n",
"can process local features\n",
"such as $n$-grams in text.\n",
"\n",
"In this section,\n",
"we will use the *textCNN* model\n",
"to demonstrate\n",
"how to design a CNN architecture\n",
"for representing single text :cite:`Kim.2014`.\n",
"Compared with\n",
":numref:`fig_nlp-map-sa-rnn`\n",
"that uses an RNN architecture with GloVe pretraining\n",
"for sentiment analysis,\n",
"the only difference in :numref:`fig_nlp-map-sa-cnn`\n",
"lies in\n",
"the choice of the architecture.\n",
"\n",
"\n",
"\n",
":label:`fig_nlp-map-sa-cnn`\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "4b47a220",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T19:32:10.301477Z",
"iopub.status.busy": "2023-08-18T19:32:10.300459Z",
"iopub.status.idle": "2023-08-18T19:32:51.806728Z",
"shell.execute_reply": "2023-08-18T19:32:51.805738Z"
},
"origin_pos": 2,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"import torch\n",
"from torch import nn\n",
"from d2l import torch as d2l\n",
"\n",
"batch_size = 64\n",
"train_iter, test_iter, vocab = d2l.load_data_imdb(batch_size)"
]
},
{
"cell_type": "markdown",
"id": "5030f206",
"metadata": {
"origin_pos": 3
},
"source": [
"## One-Dimensional Convolutions\n",
"\n",
"Before introducing the model,\n",
"let's see how a one-dimensional convolution works.\n",
"Bear in mind that it is just a special case\n",
"of a two-dimensional convolution\n",
"based on the cross-correlation operation.\n",
"\n",
"\n",
":label:`fig_conv1d`\n",
"\n",
"As shown in :numref:`fig_conv1d`,\n",
"in the one-dimensional case,\n",
"the convolution window\n",
"slides from left to right\n",
"across the input tensor.\n",
"During sliding,\n",
"the input subtensor (e.g., $0$ and $1$ in :numref:`fig_conv1d`) contained in the convolution window\n",
"at a certain position\n",
"and the kernel tensor (e.g., $1$ and $2$ in :numref:`fig_conv1d`) are multiplied elementwise.\n",
"The sum of these multiplications\n",
"gives the single scalar value (e.g., $0\\times1+1\\times2=2$ in :numref:`fig_conv1d`)\n",
"at the corresponding position of the output tensor.\n",
"\n",
"We implement one-dimensional cross-correlation in the following `corr1d` function.\n",
"Given an input tensor `X`\n",
"and a kernel tensor `K`,\n",
"it returns the output tensor `Y`.\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "6ae6e72e",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T19:32:51.811182Z",
"iopub.status.busy": "2023-08-18T19:32:51.810342Z",
"iopub.status.idle": "2023-08-18T19:32:51.815496Z",
"shell.execute_reply": "2023-08-18T19:32:51.814589Z"
},
"origin_pos": 4,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"def corr1d(X, K):\n",
" w = K.shape[0]\n",
" Y = torch.zeros((X.shape[0] - w + 1))\n",
" for i in range(Y.shape[0]):\n",
" Y[i] = (X[i: i + w] * K).sum()\n",
" return Y"
]
},
{
"cell_type": "markdown",
"id": "367d6f01",
"metadata": {
"origin_pos": 5
},
"source": [
"We can construct the input tensor `X` and the kernel tensor `K` from :numref:`fig_conv1d` to validate the output of the above one-dimensional cross-correlation implementation.\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "558e391c",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T19:32:51.818896Z",
"iopub.status.busy": "2023-08-18T19:32:51.818354Z",
"iopub.status.idle": "2023-08-18T19:32:51.827278Z",
"shell.execute_reply": "2023-08-18T19:32:51.826469Z"
},
"origin_pos": 6,
"tab": [
"pytorch"
]
},
"outputs": [
{
"data": {
"text/plain": [
"tensor([ 2., 5., 8., 11., 14., 17.])"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X, K = torch.tensor([0, 1, 2, 3, 4, 5, 6]), torch.tensor([1, 2])\n",
"corr1d(X, K)"
]
},
{
"cell_type": "markdown",
"id": "a61afb5e",
"metadata": {
"origin_pos": 7
},
"source": [
"For any\n",
"one-dimensional input with multiple channels,\n",
"the convolution kernel\n",
"needs to have the same number of input channels.\n",
"Then for each channel,\n",
"perform a cross-correlation operation on the one-dimensional tensor of the input and the one-dimensional tensor of the convolution kernel,\n",
"summing the results over all the channels\n",
"to produce the one-dimensional output tensor.\n",
":numref:`fig_conv1d_channel` shows a one-dimensional cross-correlation operation with 3 input channels.\n",
"\n",
"\n",
":label:`fig_conv1d_channel`\n",
"\n",
"\n",
"We can implement the one-dimensional cross-correlation operation for multiple input channels\n",
"and validate the results in :numref:`fig_conv1d_channel`.\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "4aeae6e3",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T19:32:51.830851Z",
"iopub.status.busy": "2023-08-18T19:32:51.830143Z",
"iopub.status.idle": "2023-08-18T19:32:51.838780Z",
"shell.execute_reply": "2023-08-18T19:32:51.837994Z"
},
"origin_pos": 8,
"tab": [
"pytorch"
]
},
"outputs": [
{
"data": {
"text/plain": [
"tensor([ 2., 8., 14., 20., 26., 32.])"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def corr1d_multi_in(X, K):\n",
" # First, iterate through the 0th dimension (channel dimension) of `X` and\n",
" # `K`. Then, add them together\n",
" return sum(corr1d(x, k) for x, k in zip(X, K))\n",
"\n",
"X = torch.tensor([[0, 1, 2, 3, 4, 5, 6],\n",
" [1, 2, 3, 4, 5, 6, 7],\n",
" [2, 3, 4, 5, 6, 7, 8]])\n",
"K = torch.tensor([[1, 2], [3, 4], [-1, -3]])\n",
"corr1d_multi_in(X, K)"
]
},
{
"cell_type": "markdown",
"id": "a229abd3",
"metadata": {
"origin_pos": 9
},
"source": [
"Note that\n",
"multi-input-channel one-dimensional cross-correlations\n",
"are equivalent\n",
"to\n",
"single-input-channel\n",
"two-dimensional cross-correlations.\n",
"To illustrate,\n",
"an equivalent form of\n",
"the multi-input-channel one-dimensional cross-correlation\n",
"in :numref:`fig_conv1d_channel`\n",
"is\n",
"the\n",
"single-input-channel\n",
"two-dimensional cross-correlation\n",
"in :numref:`fig_conv1d_2d`,\n",
"where the height of the convolution kernel\n",
"has to be the same as that of the input tensor.\n",
"\n",
"\n",
"\n",
":label:`fig_conv1d_2d`\n",
"\n",
"Both the outputs in :numref:`fig_conv1d` and :numref:`fig_conv1d_channel` have only one channel.\n",
"Same as two-dimensional convolutions with multiple output channels described in :numref:`subsec_multi-output-channels`,\n",
"we can also specify multiple output channels\n",
"for one-dimensional convolutions.\n",
"\n",
"## Max-Over-Time Pooling\n",
"\n",
"Similarly, we can use pooling\n",
"to extract the highest value\n",
"from sequence representations\n",
"as the most important feature\n",
"across time steps.\n",
"The *max-over-time pooling* used in textCNN\n",
"works like\n",
"the one-dimensional global max-pooling\n",
":cite:`Collobert.Weston.Bottou.ea.2011`.\n",
"For a multi-channel input\n",
"where each channel stores values\n",
"at different time steps,\n",
"the output at each channel\n",
"is the maximum value\n",
"for that channel.\n",
"Note that\n",
"the max-over-time pooling\n",
"allows different numbers of time steps\n",
"at different channels.\n",
"\n",
"## The textCNN Model\n",
"\n",
"Using the one-dimensional convolution\n",
"and max-over-time pooling,\n",
"the textCNN model\n",
"takes individual pretrained token representations\n",
"as input,\n",
"then obtains and transforms sequence representations\n",
"for the downstream application.\n",
"\n",
"For a single text sequence\n",
"with $n$ tokens represented by\n",
"$d$-dimensional vectors,\n",
"the width, height, and number of channels\n",
"of the input tensor\n",
"are $n$, $1$, and $d$, respectively.\n",
"The textCNN model transforms the input\n",
"into the output as follows:\n",
"\n",
"1. Define multiple one-dimensional convolution kernels and perform convolution operations separately on the inputs. Convolution kernels with different widths may capture local features among different numbers of adjacent tokens.\n",
"1. Perform max-over-time pooling on all the output channels, and then concatenate all the scalar pooling outputs as a vector.\n",
"1. Transform the concatenated vector into the output categories using the fully connected layer. Dropout can be used for reducing overfitting.\n",
"\n",
"\n",
":label:`fig_conv1d_textcnn`\n",
"\n",
":numref:`fig_conv1d_textcnn`\n",
"illustrates the model architecture of textCNN\n",
"with a concrete example.\n",
"The input is a sentence with 11 tokens,\n",
"where\n",
"each token is represented by a 6-dimensional vectors.\n",
"So we have a 6-channel input with width 11.\n",
"Define\n",
"two one-dimensional convolution kernels\n",
"of widths 2 and 4,\n",
"with 4 and 5 output channels, respectively.\n",
"They produce\n",
"4 output channels with width $11-2+1=10$\n",
"and 5 output channels with width $11-4+1=8$.\n",
"Despite different widths of these 9 channels,\n",
"the max-over-time pooling\n",
"gives a concatenated 9-dimensional vector,\n",
"which is finally transformed\n",
"into a 2-dimensional output vector\n",
"for binary sentiment predictions.\n",
"\n",
"\n",
"\n",
"### Defining the Model\n",
"\n",
"We implement the textCNN model in the following class.\n",
"Compared with the bidirectional RNN model in\n",
":numref:`sec_sentiment_rnn`,\n",
"besides\n",
"replacing recurrent layers with convolutional layers,\n",
"we also use two embedding layers:\n",
"one with trainable weights and the other\n",
"with fixed weights.\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "5ebd40b7",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T19:32:51.842650Z",
"iopub.status.busy": "2023-08-18T19:32:51.841971Z",
"iopub.status.idle": "2023-08-18T19:32:51.850302Z",
"shell.execute_reply": "2023-08-18T19:32:51.849474Z"
},
"origin_pos": 11,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"class TextCNN(nn.Module):\n",
" def __init__(self, vocab_size, embed_size, kernel_sizes, num_channels,\n",
" **kwargs):\n",
" super(TextCNN, self).__init__(**kwargs)\n",
" self.embedding = nn.Embedding(vocab_size, embed_size)\n",
" # The embedding layer not to be trained\n",
" self.constant_embedding = nn.Embedding(vocab_size, embed_size)\n",
" self.dropout = nn.Dropout(0.5)\n",
" self.decoder = nn.Linear(sum(num_channels), 2)\n",
" # The max-over-time pooling layer has no parameters, so this instance\n",
" # can be shared\n",
" self.pool = nn.AdaptiveAvgPool1d(1)\n",
" self.relu = nn.ReLU()\n",
" # Create multiple one-dimensional convolutional layers\n",
" self.convs = nn.ModuleList()\n",
" for c, k in zip(num_channels, kernel_sizes):\n",
" self.convs.append(nn.Conv1d(2 * embed_size, c, k))\n",
"\n",
" def forward(self, inputs):\n",
" # Concatenate two embedding layer outputs with shape (batch size, no.\n",
" # of tokens, token vector dimension) along vectors\n",
" embeddings = torch.cat((\n",
" self.embedding(inputs), self.constant_embedding(inputs)), dim=2)\n",
" # Per the input format of one-dimensional convolutional layers,\n",
" # rearrange the tensor so that the second dimension stores channels\n",
" embeddings = embeddings.permute(0, 2, 1)\n",
" # For each one-dimensional convolutional layer, after max-over-time\n",
" # pooling, a tensor of shape (batch size, no. of channels, 1) is\n",
" # obtained. Remove the last dimension and concatenate along channels\n",
" encoding = torch.cat([\n",
" torch.squeeze(self.relu(self.pool(conv(embeddings))), dim=-1)\n",
" for conv in self.convs], dim=1)\n",
" outputs = self.decoder(self.dropout(encoding))\n",
" return outputs"
]
},
{
"cell_type": "markdown",
"id": "a71ee814",
"metadata": {
"origin_pos": 12
},
"source": [
"Let's create a textCNN instance.\n",
"It has 3 convolutional layers with kernel widths of 3, 4, and 5, all with 100 output channels.\n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "f1e6dd5b",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T19:32:51.853735Z",
"iopub.status.busy": "2023-08-18T19:32:51.853163Z",
"iopub.status.idle": "2023-08-18T19:32:51.962565Z",
"shell.execute_reply": "2023-08-18T19:32:51.961652Z"
},
"origin_pos": 14,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"embed_size, kernel_sizes, nums_channels = 100, [3, 4, 5], [100, 100, 100]\n",
"devices = d2l.try_all_gpus()\n",
"net = TextCNN(len(vocab), embed_size, kernel_sizes, nums_channels)\n",
"\n",
"def init_weights(module):\n",
" if type(module) in (nn.Linear, nn.Conv1d):\n",
" nn.init.xavier_uniform_(module.weight)\n",
"\n",
"net.apply(init_weights);"
]
},
{
"cell_type": "markdown",
"id": "582b40a8",
"metadata": {
"origin_pos": 15
},
"source": [
"### Loading Pretrained Word Vectors\n",
"\n",
"Same as :numref:`sec_sentiment_rnn`,\n",
"we load pretrained 100-dimensional GloVe embeddings\n",
"as the initialized token representations.\n",
"These token representations (embedding weights)\n",
"will be trained in `embedding`\n",
"and fixed in `constant_embedding`.\n"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "265fb564",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T19:32:51.966482Z",
"iopub.status.busy": "2023-08-18T19:32:51.965761Z",
"iopub.status.idle": "2023-08-18T19:33:16.119259Z",
"shell.execute_reply": "2023-08-18T19:33:16.116011Z"
},
"origin_pos": 17,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"glove_embedding = d2l.TokenEmbedding('glove.6b.100d')\n",
"embeds = glove_embedding[vocab.idx_to_token]\n",
"net.embedding.weight.data.copy_(embeds)\n",
"net.constant_embedding.weight.data.copy_(embeds)\n",
"net.constant_embedding.weight.requires_grad = False"
]
},
{
"cell_type": "markdown",
"id": "464057d8",
"metadata": {
"origin_pos": 18
},
"source": [
"### Training and Evaluating the Model\n",
"\n",
"Now we can train the textCNN model for sentiment analysis.\n"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "162fb536",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T19:33:16.127427Z",
"iopub.status.busy": "2023-08-18T19:33:16.125743Z",
"iopub.status.idle": "2023-08-18T19:34:10.261815Z",
"shell.execute_reply": "2023-08-18T19:34:10.260553Z"
},
"origin_pos": 20,
"tab": [
"pytorch"
]
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"loss 0.066, train acc 0.979, test acc 0.868\n",
"4354.2 examples/sec on [device(type='cuda', index=0), device(type='cuda', index=1)]\n"
]
},
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"lr, num_epochs = 0.001, 5\n",
"trainer = torch.optim.Adam(net.parameters(), lr=lr)\n",
"loss = nn.CrossEntropyLoss(reduction=\"none\")\n",
"d2l.train_ch13(net, train_iter, test_iter, loss, trainer, num_epochs, devices)"
]
},
{
"cell_type": "markdown",
"id": "68d5e6b4",
"metadata": {
"origin_pos": 21
},
"source": [
"Below we use the trained model to predict the sentiment for two simple sentences.\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "2476e31b",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T19:34:10.265904Z",
"iopub.status.busy": "2023-08-18T19:34:10.265258Z",
"iopub.status.idle": "2023-08-18T19:34:10.281545Z",
"shell.execute_reply": "2023-08-18T19:34:10.280480Z"
},
"origin_pos": 22,
"tab": [
"pytorch"
]
},
"outputs": [
{
"data": {
"text/plain": [
"'positive'"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d2l.predict_sentiment(net, vocab, 'this movie is so great')"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "2c259095",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T19:34:10.285703Z",
"iopub.status.busy": "2023-08-18T19:34:10.285417Z",
"iopub.status.idle": "2023-08-18T19:34:10.297499Z",
"shell.execute_reply": "2023-08-18T19:34:10.295186Z"
},
"origin_pos": 23,
"tab": [
"pytorch"
]
},
"outputs": [
{
"data": {
"text/plain": [
"'negative'"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d2l.predict_sentiment(net, vocab, 'this movie is so bad')"
]
},
{
"cell_type": "markdown",
"id": "6a8ac677",
"metadata": {
"origin_pos": 24
},
"source": [
"## Summary\n",
"\n",
"* One-dimensional CNNs can process local features such as $n$-grams in text.\n",
"* Multi-input-channel one-dimensional cross-correlations are equivalent to single-input-channel two-dimensional cross-correlations.\n",
"* The max-over-time pooling allows different numbers of time steps at different channels.\n",
"* The textCNN model transforms individual token representations into downstream application outputs using one-dimensional convolutional layers and max-over-time pooling layers.\n",
"\n",
"\n",
"## Exercises\n",
"\n",
"1. Tune hyperparameters and compare the two architectures for sentiment analysis in :numref:`sec_sentiment_rnn` and in this section, such as in classification accuracy and computational efficiency.\n",
"1. Can you further improve the classification accuracy of the model by using the methods introduced in the exercises of :numref:`sec_sentiment_rnn`?\n",
"1. Add positional encoding in the input representations. Does it improve the classification accuracy?\n"
]
},
{
"cell_type": "markdown",
"id": "a431a47b",
"metadata": {
"origin_pos": 26,
"tab": [
"pytorch"
]
},
"source": [
"[Discussions](https://discuss.d2l.ai/t/1425)\n"
]
}
],
"metadata": {
"language_info": {
"name": "python"
},
"required_libs": []
},
"nbformat": 4,
"nbformat_minor": 5
}