{ "cells": [ { "cell_type": "markdown", "id": "f4709271", "metadata": { "origin_pos": 1 }, "source": [ "# Dropout\n", ":label:`sec_dropout`\n", "\n", "\n", "Let's think briefly about what we\n", "expect from a good predictive model.\n", "We want it to peform well on unseen data.\n", "Classical generalization theory\n", "suggests that to close the gap between\n", "train and test performance,\n", "we should aim for a simple model.\n", "Simplicity can come in the form\n", "of a small number of dimensions.\n", "We explored this when discussing the\n", "monomial basis functions of linear models\n", "in :numref:`sec_generalization_basics`.\n", "Additionally, as we saw when discussing weight decay\n", "($\\ell_2$ regularization) in :numref:`sec_weight_decay`,\n", "the (inverse) norm of the parameters also\n", "represents a useful measure of simplicity.\n", "Another useful notion of simplicity is smoothness,\n", "i.e., that the function should not be sensitive\n", "to small changes to its inputs.\n", "For instance, when we classify images,\n", "we would expect that adding some random noise\n", "to the pixels should be mostly harmless.\n", "\n", ":citet:`Bishop.1995` formalized\n", "this idea when he proved that training with input noise\n", "is equivalent to Tikhonov regularization.\n", "This work drew a clear mathematical connection\n", "between the requirement that a function be smooth (and thus simple),\n", "and the requirement that it be resilient\n", "to perturbations in the input.\n", "\n", "Then, :citet:`Srivastava.Hinton.Krizhevsky.ea.2014`\n", "developed a clever idea for how to apply Bishop's idea\n", "to the internal layers of a network, too.\n", "Their idea, called *dropout*, involves\n", "injecting noise while computing\n", "each internal layer during forward propagation,\n", "and it has become a standard technique\n", "for training neural networks.\n", "The method is called *dropout* because we literally\n", "*drop out* some neurons during training.\n", "Throughout training, on each iteration,\n", "standard dropout consists of zeroing out\n", "some fraction of the nodes in each layer\n", "before calculating the subsequent layer.\n", "\n", "To be clear, we are imposing\n", "our own narrative with the link to Bishop.\n", "The original paper on dropout\n", "offers intuition through a surprising\n", "analogy to sexual reproduction.\n", "The authors argue that neural network overfitting\n", "is characterized by a state in which\n", "each layer relies on a specific\n", "pattern of activations in the previous layer,\n", "calling this condition *co-adaptation*.\n", "Dropout, they claim, breaks up co-adaptation\n", "just as sexual reproduction is argued to\n", "break up co-adapted genes.\n", "While such an justification of this theory is certainly up for debate,\n", "the dropout technique itself has proved enduring,\n", "and various forms of dropout are implemented\n", "in most deep learning libraries. \n", "\n", "\n", "The key challenge is how to inject this noise.\n", "One idea is to inject it in an *unbiased* manner\n", "so that the expected value of each layer---while fixing\n", "the others---equals the value it would have taken absent noise.\n", "In Bishop's work, he added Gaussian noise\n", "to the inputs to a linear model.\n", "At each training iteration, he added noise\n", "sampled from a distribution with mean zero\n", "$\\epsilon \\sim \\mathcal{N}(0,\\sigma^2)$ to the input $\\mathbf{x}$,\n", "yielding a perturbed point $\\mathbf{x}' = \\mathbf{x} + \\epsilon$.\n", "In expectation, $E[\\mathbf{x}'] = \\mathbf{x}$.\n", "\n", "In standard dropout regularization,\n", "one zeros out some fraction of the nodes in each layer\n", "and then *debiases* each layer by normalizing\n", "by the fraction of nodes that were retained (not dropped out).\n", "In other words,\n", "with *dropout probability* $p$,\n", "each intermediate activation $h$ is replaced by\n", "a random variable $h'$ as follows:\n", "\n", "$$\n", "\\begin{aligned}\n", "h' =\n", "\\begin{cases}\n", " 0 & \\textrm{ with probability } p \\\\\n", " \\frac{h}{1-p} & \\textrm{ otherwise}\n", "\\end{cases}\n", "\\end{aligned}\n", "$$\n", "\n", "By design, the expectation remains unchanged, i.e., $E[h'] = h$.\n" ] }, { "cell_type": "code", "execution_count": 1, "id": "54feb90c", "metadata": { "execution": { "iopub.execute_input": "2023-08-18T19:33:10.343768Z", "iopub.status.busy": "2023-08-18T19:33:10.343431Z", "iopub.status.idle": "2023-08-18T19:33:13.572271Z", "shell.execute_reply": "2023-08-18T19:33:13.571155Z" }, "origin_pos": 3, "tab": [ "pytorch" ] }, "outputs": [], "source": [ "import torch\n", "from torch import nn\n", "from d2l import torch as d2l" ] }, { "cell_type": "markdown", "id": "4a048453", "metadata": { "origin_pos": 6 }, "source": [ "## Dropout in Practice\n", "\n", "Recall the MLP with a hidden layer and five hidden units\n", "from :numref:`fig_mlp`.\n", "When we apply dropout to a hidden layer,\n", "zeroing out each hidden unit with probability $p$,\n", "the result can be viewed as a network\n", "containing only a subset of the original neurons.\n", "In :numref:`fig_dropout2`, $h_2$ and $h_5$ are removed.\n", "Consequently, the calculation of the outputs\n", "no longer depends on $h_2$ or $h_5$\n", "and their respective gradient also vanishes\n", "when performing backpropagation.\n", "In this way, the calculation of the output layer\n", "cannot be overly dependent on any\n", "one element of $h_1, \\ldots, h_5$.\n", "\n", "![MLP before and after dropout.](../img/dropout2.svg)\n", ":label:`fig_dropout2`\n", "\n", "Typically, we disable dropout at test time.\n", "Given a trained model and a new example,\n", "we do not drop out any nodes\n", "and thus do not need to normalize.\n", "However, there are some exceptions:\n", "some researchers use dropout at test time as a heuristic\n", "for estimating the *uncertainty* of neural network predictions:\n", "if the predictions agree across many different dropout outputs,\n", "then we might say that the network is more confident.\n", "\n", "## Implementation from Scratch\n", "\n", "To implement the dropout function for a single layer,\n", "we must draw as many samples\n", "from a Bernoulli (binary) random variable\n", "as our layer has dimensions,\n", "where the random variable takes value $1$ (keep)\n", "with probability $1-p$ and $0$ (drop) with probability $p$.\n", "One easy way to implement this is to first draw samples\n", "from the uniform distribution $U[0, 1]$.\n", "Then we can keep those nodes for which the corresponding\n", "sample is greater than $p$, dropping the rest.\n", "\n", "In the following code, we (**implement a `dropout_layer` function\n", "that drops out the elements in the tensor input `X`\n", "with probability `dropout`**),\n", "rescaling the remainder as described above:\n", "dividing the survivors by `1.0-dropout`.\n" ] }, { "cell_type": "code", "execution_count": 2, "id": "dcda4b93", "metadata": { "execution": { "iopub.execute_input": "2023-08-18T19:33:13.577007Z", "iopub.status.busy": "2023-08-18T19:33:13.576588Z", "iopub.status.idle": "2023-08-18T19:33:13.582474Z", "shell.execute_reply": "2023-08-18T19:33:13.581632Z" }, "origin_pos": 8, "tab": [ "pytorch" ] }, "outputs": [], "source": [ "def dropout_layer(X, dropout):\n", " assert 0 <= dropout <= 1\n", " if dropout == 1: return torch.zeros_like(X)\n", " mask = (torch.rand(X.shape) > dropout).float()\n", " return mask * X / (1.0 - dropout)" ] }, { "cell_type": "markdown", "id": "7a0d5f19", "metadata": { "origin_pos": 11 }, "source": [ "We can [**test out the `dropout_layer` function on a few examples**].\n", "In the following lines of code,\n", "we pass our input `X` through the dropout operation,\n", "with probabilities 0, 0.5, and 1, respectively.\n" ] }, { "cell_type": "code", "execution_count": 3, "id": "1effb931", "metadata": { "execution": { "iopub.execute_input": "2023-08-18T19:33:13.587061Z", "iopub.status.busy": "2023-08-18T19:33:13.586226Z", "iopub.status.idle": "2023-08-18T19:33:13.614970Z", "shell.execute_reply": "2023-08-18T19:33:13.614053Z" }, "origin_pos": 12, "tab": [ "pytorch" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "dropout_p = 0: tensor([[ 0., 1., 2., 3., 4., 5., 6., 7.],\n", " [ 8., 9., 10., 11., 12., 13., 14., 15.]])\n", "dropout_p = 0.5: tensor([[ 0., 2., 0., 6., 8., 0., 0., 0.],\n", " [16., 18., 20., 22., 24., 26., 28., 30.]])\n", "dropout_p = 1: tensor([[0., 0., 0., 0., 0., 0., 0., 0.],\n", " [0., 0., 0., 0., 0., 0., 0., 0.]])\n" ] } ], "source": [ "X = torch.arange(16, dtype = torch.float32).reshape((2, 8))\n", "print('dropout_p = 0:', dropout_layer(X, 0))\n", "print('dropout_p = 0.5:', dropout_layer(X, 0.5))\n", "print('dropout_p = 1:', dropout_layer(X, 1))" ] }, { "cell_type": "markdown", "id": "087cdc90", "metadata": { "origin_pos": 13 }, "source": [ "### Defining the Model\n", "\n", "The model below applies dropout to the output\n", "of each hidden layer (following the activation function).\n", "We can set dropout probabilities for each layer separately.\n", "A common choice is to set\n", "a lower dropout probability closer to the input layer.\n", "We ensure that dropout is only active during training.\n" ] }, { "cell_type": "code", "execution_count": 4, "id": "a98d0264", "metadata": { "execution": { "iopub.execute_input": "2023-08-18T19:33:13.618877Z", "iopub.status.busy": "2023-08-18T19:33:13.618261Z", "iopub.status.idle": "2023-08-18T19:33:13.626219Z", "shell.execute_reply": "2023-08-18T19:33:13.625088Z" }, "origin_pos": 15, "tab": [ "pytorch" ] }, "outputs": [], "source": [ "class DropoutMLPScratch(d2l.Classifier):\n", " def __init__(self, num_outputs, num_hiddens_1, num_hiddens_2,\n", " dropout_1, dropout_2, lr):\n", " super().__init__()\n", " self.save_hyperparameters()\n", " self.lin1 = nn.LazyLinear(num_hiddens_1)\n", " self.lin2 = nn.LazyLinear(num_hiddens_2)\n", " self.lin3 = nn.LazyLinear(num_outputs)\n", " self.relu = nn.ReLU()\n", "\n", " def forward(self, X):\n", " H1 = self.relu(self.lin1(X.reshape((X.shape[0], -1))))\n", " if self.training:\n", " H1 = dropout_layer(H1, self.dropout_1)\n", " H2 = self.relu(self.lin2(H1))\n", " if self.training:\n", " H2 = dropout_layer(H2, self.dropout_2)\n", " return self.lin3(H2)" ] }, { "cell_type": "markdown", "id": "2793a6d2", "metadata": { "origin_pos": 18 }, "source": [ "### [**Training**]\n", "\n", "The following is similar to the training of MLPs described previously.\n" ] }, { "cell_type": "code", "execution_count": 5, "id": "12f6e01f", "metadata": { "execution": { "iopub.execute_input": "2023-08-18T19:33:13.630286Z", "iopub.status.busy": "2023-08-18T19:33:13.629522Z", "iopub.status.idle": "2023-08-18T19:34:13.198615Z", "shell.execute_reply": "2023-08-18T19:34:13.197238Z" }, "origin_pos": 19, "tab": [ "pytorch" ] }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", " \n", " \n", " \n", " \n", " 2023-08-18T19:34:13.101396\n", " image/svg+xml\n", " \n", " \n", " Matplotlib v3.7.2, https://matplotlib.org/\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n" ], "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "hparams = {'num_outputs':10, 'num_hiddens_1':256, 'num_hiddens_2':256,\n", " 'dropout_1':0.5, 'dropout_2':0.5, 'lr':0.1}\n", "model = DropoutMLPScratch(**hparams)\n", "data = d2l.FashionMNIST(batch_size=256)\n", "trainer = d2l.Trainer(max_epochs=10)\n", "trainer.fit(model, data)" ] }, { "cell_type": "markdown", "id": "101c89c1", "metadata": { "origin_pos": 20 }, "source": [ "## [**Concise Implementation**]\n", "\n", "With high-level APIs, all we need to do is add a `Dropout` layer\n", "after each fully connected layer,\n", "passing in the dropout probability\n", "as the only argument to its constructor.\n", "During training, the `Dropout` layer will randomly\n", "drop out outputs of the previous layer\n", "(or equivalently, the inputs to the subsequent layer)\n", "according to the specified dropout probability.\n", "When not in training mode,\n", "the `Dropout` layer simply passes the data through during testing.\n" ] }, { "cell_type": "code", "execution_count": 6, "id": "224bafde", "metadata": { "execution": { "iopub.execute_input": "2023-08-18T19:34:13.202812Z", "iopub.status.busy": "2023-08-18T19:34:13.202080Z", "iopub.status.idle": "2023-08-18T19:34:13.208377Z", "shell.execute_reply": "2023-08-18T19:34:13.207307Z" }, "origin_pos": 22, "tab": [ "pytorch" ] }, "outputs": [], "source": [ "class DropoutMLP(d2l.Classifier):\n", " def __init__(self, num_outputs, num_hiddens_1, num_hiddens_2,\n", " dropout_1, dropout_2, lr):\n", " super().__init__()\n", " self.save_hyperparameters()\n", " self.net = nn.Sequential(\n", " nn.Flatten(), nn.LazyLinear(num_hiddens_1), nn.ReLU(),\n", " nn.Dropout(dropout_1), nn.LazyLinear(num_hiddens_2), nn.ReLU(),\n", " nn.Dropout(dropout_2), nn.LazyLinear(num_outputs))" ] }, { "cell_type": "markdown", "id": "877d8ec2", "metadata": { "origin_pos": 27 }, "source": [ "Next, we [**train the model**].\n" ] }, { "cell_type": "code", "execution_count": 7, "id": "d9e0ea94", "metadata": { "execution": { "iopub.execute_input": "2023-08-18T19:34:13.212381Z", "iopub.status.busy": "2023-08-18T19:34:13.211782Z", "iopub.status.idle": "2023-08-18T19:35:25.030389Z", "shell.execute_reply": "2023-08-18T19:35:25.029011Z" }, "origin_pos": 28, "tab": [ "pytorch" ] }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", " \n", " \n", " \n", " \n", " 2023-08-18T19:35:24.928771\n", " image/svg+xml\n", " \n", " \n", " Matplotlib v3.7.2, https://matplotlib.org/\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n" ], "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "model = DropoutMLP(**hparams)\n", "trainer.fit(model, data)" ] }, { "cell_type": "markdown", "id": "54985a45", "metadata": { "origin_pos": 29 }, "source": [ "## Summary\n", "\n", "Beyond controlling the number of dimensions and the size of the weight vector, dropout is yet another tool for avoiding overfitting. Often tools are used jointly.\n", "Note that dropout is\n", "used only during training:\n", "it replaces an activation $h$ with a random variable with expected value $h$.\n", "\n", "\n", "## Exercises\n", "\n", "1. What happens if you change the dropout probabilities for the first and second layers? In particular, what happens if you switch the ones for both layers? Design an experiment to answer these questions, describe your results quantitatively, and summarize the qualitative takeaways.\n", "1. Increase the number of epochs and compare the results obtained when using dropout with those when not using it.\n", "1. What is the variance of the activations in each hidden layer when dropout is and is not applied? Draw a plot to show how this quantity evolves over time for both models.\n", "1. Why is dropout not typically used at test time?\n", "1. Using the model in this section as an example, compare the effects of using dropout and weight decay. What happens when dropout and weight decay are used at the same time? Are the results additive? Are there diminished returns (or worse)? Do they cancel each other out?\n", "1. What happens if we apply dropout to the individual weights of the weight matrix rather than the activations?\n", "1. Invent another technique for injecting random noise at each layer that is different from the standard dropout technique. Can you develop a method that outperforms dropout on the Fashion-MNIST dataset (for a fixed architecture)?\n" ] }, { "cell_type": "markdown", "id": "29eae3ea", "metadata": { "origin_pos": 31, "tab": [ "pytorch" ] }, "source": [ "[Discussions](https://discuss.d2l.ai/t/101)\n" ] } ], "metadata": { "language_info": { "name": "python" }, "required_libs": [] }, "nbformat": 4, "nbformat_minor": 5 }