{ "cells": [ { "cell_type": "markdown", "id": "3aa97d87", "metadata": { "origin_pos": 1 }, "source": [ "# Linear Regression Implementation from Scratch\n", ":label:`sec_linear_scratch`\n", "\n", "We are now ready to work through \n", "a fully functioning implementation \n", "of linear regression. \n", "In this section, \n", "(**we will implement the entire method from scratch,\n", "including (i) the model; (ii) the loss function;\n", "(iii) a minibatch stochastic gradient descent optimizer;\n", "and (iv) the training function \n", "that stitches all of these pieces together.**)\n", "Finally, we will run our synthetic data generator\n", "from :numref:`sec_synthetic-regression-data`\n", "and apply our model\n", "on the resulting dataset. \n", "While modern deep learning frameworks \n", "can automate nearly all of this work,\n", "implementing things from scratch is the only way\n", "to make sure that you really know what you are doing.\n", "Moreover, when it is time to customize models,\n", "defining our own layers or loss functions,\n", "understanding how things work under the hood will prove handy.\n", "In this section, we will rely only \n", "on tensors and automatic differentiation.\n", "Later, we will introduce a more concise implementation,\n", "taking advantage of the bells and whistles of deep learning frameworks \n", "while retaining the structure of what follows below.\n" ] }, { "cell_type": "code", "execution_count": 1, "id": "e05d8aba", "metadata": { "attributes": { "classes": [], "id": "", "n": "3" }, "execution": { "iopub.execute_input": "2023-08-18T19:42:51.015606Z", "iopub.status.busy": "2023-08-18T19:42:51.014819Z", "iopub.status.idle": "2023-08-18T19:42:54.292743Z", "shell.execute_reply": "2023-08-18T19:42:54.291780Z" }, "origin_pos": 3, "tab": [ "pytorch" ] }, "outputs": [], "source": [ "%matplotlib inline\n", "import torch\n", "from d2l import torch as d2l" ] }, { "cell_type": "markdown", "id": "caf49f7e", "metadata": { "origin_pos": 6 }, "source": [ "## Defining the Model\n", "\n", "[**Before we can begin optimizing our model's parameters**] by minibatch SGD,\n", "(**we need to have some parameters in the first place.**)\n", "In the following we initialize weights by drawing\n", "random numbers from a normal distribution with mean 0\n", "and a standard deviation of 0.01. \n", "The magic number 0.01 often works well in practice, \n", "but you can specify a different value \n", "through the argument `sigma`.\n", "Moreover we set the bias to 0.\n", "Note that for object-oriented design\n", "we add the code to the `__init__` method of a subclass of `d2l.Module` (introduced in :numref:`subsec_oo-design-models`).\n" ] }, { "cell_type": "code", "execution_count": 2, "id": "d007e745", "metadata": { "attributes": { "classes": [], "id": "", "n": "6" }, "execution": { "iopub.execute_input": "2023-08-18T19:42:54.297196Z", "iopub.status.busy": "2023-08-18T19:42:54.296343Z", "iopub.status.idle": "2023-08-18T19:42:54.302370Z", "shell.execute_reply": "2023-08-18T19:42:54.301603Z" }, "origin_pos": 7, "tab": [ "pytorch" ] }, "outputs": [], "source": [ "class LinearRegressionScratch(d2l.Module): #@save\n", " \"\"\"The linear regression model implemented from scratch.\"\"\"\n", " def __init__(self, num_inputs, lr, sigma=0.01):\n", " super().__init__()\n", " self.save_hyperparameters()\n", " self.w = torch.normal(0, sigma, (num_inputs, 1), requires_grad=True)\n", " self.b = torch.zeros(1, requires_grad=True)" ] }, { "cell_type": "markdown", "id": "71097a3f", "metadata": { "origin_pos": 9 }, "source": [ "Next we must [**define our model,\n", "relating its input and parameters to its output.**]\n", "Using the same notation as :eqref:`eq_linreg-y-vec`\n", "for our linear model we simply take the matrix--vector product\n", "of the input features $\\mathbf{X}$ \n", "and the model weights $\\mathbf{w}$,\n", "and add the offset $b$ to each example.\n", "The product $\\mathbf{Xw}$ is a vector and $b$ is a scalar.\n", "Because of the broadcasting mechanism \n", "(see :numref:`subsec_broadcasting`),\n", "when we add a vector and a scalar,\n", "the scalar is added to each component of the vector.\n", "The resulting `forward` method \n", "is registered in the `LinearRegressionScratch` class\n", "via `add_to_class` (introduced in :numref:`oo-design-utilities`).\n" ] }, { "cell_type": "code", "execution_count": 3, "id": "1306d051", "metadata": { "attributes": { "classes": [], "id": "", "n": "8" }, "execution": { "iopub.execute_input": "2023-08-18T19:42:54.305721Z", "iopub.status.busy": "2023-08-18T19:42:54.305204Z", "iopub.status.idle": "2023-08-18T19:42:54.309765Z", "shell.execute_reply": "2023-08-18T19:42:54.308692Z" }, "origin_pos": 10, "tab": [ "pytorch" ] }, "outputs": [], "source": [ "@d2l.add_to_class(LinearRegressionScratch) #@save\n", "def forward(self, X):\n", " return torch.matmul(X, self.w) + self.b" ] }, { "cell_type": "markdown", "id": "e939c258", "metadata": { "origin_pos": 11 }, "source": [ "## Defining the Loss Function\n", "\n", "Since [**updating our model requires taking\n", "the gradient of our loss function,**]\n", "we ought to (**define the loss function first.**)\n", "Here we use the squared loss function\n", "in :eqref:`eq_mse`.\n", "In the implementation, we need to transform the true value `y`\n", "into the predicted value's shape `y_hat`.\n", "The result returned by the following method\n", "will also have the same shape as `y_hat`. \n", "We also return the averaged loss value\n", "among all examples in the minibatch.\n" ] }, { "cell_type": "code", "execution_count": 4, "id": "6509851d", "metadata": { "attributes": { "classes": [], "id": "", "n": "9" }, "execution": { "iopub.execute_input": "2023-08-18T19:42:54.313880Z", "iopub.status.busy": "2023-08-18T19:42:54.313169Z", "iopub.status.idle": "2023-08-18T19:42:54.318867Z", "shell.execute_reply": "2023-08-18T19:42:54.317836Z" }, "origin_pos": 12, "tab": [ "pytorch" ] }, "outputs": [], "source": [ "@d2l.add_to_class(LinearRegressionScratch) #@save\n", "def loss(self, y_hat, y):\n", " l = (y_hat - y) ** 2 / 2\n", " return l.mean()" ] }, { "cell_type": "markdown", "id": "4285b751", "metadata": { "origin_pos": 14 }, "source": [ "## Defining the Optimization Algorithm\n", "\n", "As discussed in :numref:`sec_linear_regression`,\n", "linear regression has a closed-form solution.\n", "However, our goal here is to illustrate \n", "how to train more general neural networks,\n", "and that requires that we teach you \n", "how to use minibatch SGD.\n", "Hence we will take this opportunity\n", "to introduce your first working example of SGD.\n", "At each step, using a minibatch \n", "randomly drawn from our dataset,\n", "we estimate the gradient of the loss\n", "with respect to the parameters.\n", "Next, we update the parameters\n", "in the direction that may reduce the loss.\n", "\n", "The following code applies the update, \n", "given a set of parameters, a learning rate `lr`.\n", "Since our loss is computed as an average over the minibatch, \n", "we do not need to adjust the learning rate against the batch size. \n", "In later chapters we will investigate \n", "how learning rates should be adjusted\n", "for very large minibatches as they arise \n", "in distributed large-scale learning.\n", "For now, we can ignore this dependency.\n" ] }, { "cell_type": "markdown", "id": "fb3bc263", "metadata": { "origin_pos": 16, "tab": [ "pytorch" ] }, "source": [ "We define our `SGD` class,\n", "a subclass of `d2l.HyperParameters` (introduced in :numref:`oo-design-utilities`),\n", "to have a similar API \n", "as the built-in SGD optimizer.\n", "We update the parameters in the `step` method.\n", "The `zero_grad` method sets all gradients to 0,\n", "which must be run before a backpropagation step.\n" ] }, { "cell_type": "code", "execution_count": 5, "id": "ee40ef54", "metadata": { "attributes": { "classes": [], "id": "", "n": "11" }, "execution": { "iopub.execute_input": "2023-08-18T19:42:54.322951Z", "iopub.status.busy": "2023-08-18T19:42:54.322264Z", "iopub.status.idle": "2023-08-18T19:42:54.329600Z", "shell.execute_reply": "2023-08-18T19:42:54.328587Z" }, "origin_pos": 18, "tab": [ "pytorch" ] }, "outputs": [], "source": [ "class SGD(d2l.HyperParameters): #@save\n", " \"\"\"Minibatch stochastic gradient descent.\"\"\"\n", " def __init__(self, params, lr):\n", " self.save_hyperparameters()\n", "\n", " def step(self):\n", " for param in self.params:\n", " param -= self.lr * param.grad\n", "\n", " def zero_grad(self):\n", " for param in self.params:\n", " if param.grad is not None:\n", " param.grad.zero_()" ] }, { "cell_type": "markdown", "id": "00c390bf", "metadata": { "origin_pos": 21 }, "source": [ "We next define the `configure_optimizers` method, which returns an instance of the `SGD` class.\n" ] }, { "cell_type": "code", "execution_count": 6, "id": "8a5a3a40", "metadata": { "attributes": { "classes": [], "id": "", "n": "14" }, "execution": { "iopub.execute_input": "2023-08-18T19:42:54.333602Z", "iopub.status.busy": "2023-08-18T19:42:54.332931Z", "iopub.status.idle": "2023-08-18T19:42:54.338188Z", "shell.execute_reply": "2023-08-18T19:42:54.336975Z" }, "origin_pos": 22, "tab": [ "pytorch" ] }, "outputs": [], "source": [ "@d2l.add_to_class(LinearRegressionScratch) #@save\n", "def configure_optimizers(self):\n", " return SGD([self.w, self.b], self.lr)" ] }, { "cell_type": "markdown", "id": "a965a93a", "metadata": { "origin_pos": 23 }, "source": [ "## Training\n", "\n", "Now that we have all of the parts in place\n", "(parameters, loss function, model, and optimizer),\n", "we are ready to [**implement the main training loop.**]\n", "It is crucial that you understand this code fully\n", "since you will employ similar training loops\n", "for every other deep learning model\n", "covered in this book.\n", "In each *epoch*, we iterate through \n", "the entire training dataset, \n", "passing once through every example\n", "(assuming that the number of examples \n", "is divisible by the batch size). \n", "In each *iteration*, we grab a minibatch of training examples,\n", "and compute its loss through the model's `training_step` method. \n", "Then we compute the gradients with respect to each parameter. \n", "Finally, we will call the optimization algorithm\n", "to update the model parameters. \n", "In summary, we will execute the following loop:\n", "\n", "* Initialize parameters $(\\mathbf{w}, b)$\n", "* Repeat until done\n", " * Compute gradient $\\mathbf{g} \\leftarrow \\partial_{(\\mathbf{w},b)} \\frac{1}{|\\mathcal{B}|} \\sum_{i \\in \\mathcal{B}} l(\\mathbf{x}^{(i)}, y^{(i)}, \\mathbf{w}, b)$\n", " * Update parameters $(\\mathbf{w}, b) \\leftarrow (\\mathbf{w}, b) - \\eta \\mathbf{g}$\n", " \n", "Recall that the synthetic regression dataset \n", "that we generated in :numref:``sec_synthetic-regression-data`` \n", "does not provide a validation dataset. \n", "In most cases, however, \n", "we will want a validation dataset \n", "to measure our model quality. \n", "Here we pass the validation dataloader \n", "once in each epoch to measure the model performance.\n", "Following our object-oriented design,\n", "the `prepare_batch` and `fit_epoch` methods\n", "are registered in the `d2l.Trainer` class\n", "(introduced in :numref:`oo-design-training`).\n" ] }, { "cell_type": "code", "execution_count": 7, "id": "0c422c5b", "metadata": { "attributes": { "classes": [], "id": "", "n": "15" }, "execution": { "iopub.execute_input": "2023-08-18T19:42:54.342007Z", "iopub.status.busy": "2023-08-18T19:42:54.341174Z", "iopub.status.idle": "2023-08-18T19:42:54.345995Z", "shell.execute_reply": "2023-08-18T19:42:54.344948Z" }, "origin_pos": 24, "tab": [ "pytorch" ] }, "outputs": [], "source": [ "@d2l.add_to_class(d2l.Trainer) #@save\n", "def prepare_batch(self, batch):\n", " return batch" ] }, { "cell_type": "code", "execution_count": 8, "id": "9f43b679", "metadata": { "attributes": { "classes": [], "id": "", "n": "16" }, "execution": { "iopub.execute_input": "2023-08-18T19:42:54.349687Z", "iopub.status.busy": "2023-08-18T19:42:54.348979Z", "iopub.status.idle": "2023-08-18T19:42:54.355255Z", "shell.execute_reply": "2023-08-18T19:42:54.354485Z" }, "origin_pos": 25, "tab": [ "pytorch" ] }, "outputs": [], "source": [ "@d2l.add_to_class(d2l.Trainer) #@save\n", "def fit_epoch(self):\n", " self.model.train()\n", " for batch in self.train_dataloader:\n", " loss = self.model.training_step(self.prepare_batch(batch))\n", " self.optim.zero_grad()\n", " with torch.no_grad():\n", " loss.backward()\n", " if self.gradient_clip_val > 0: # To be discussed later\n", " self.clip_gradients(self.gradient_clip_val, self.model)\n", " self.optim.step()\n", " self.train_batch_idx += 1\n", " if self.val_dataloader is None:\n", " return\n", " self.model.eval()\n", " for batch in self.val_dataloader:\n", " with torch.no_grad():\n", " self.model.validation_step(self.prepare_batch(batch))\n", " self.val_batch_idx += 1" ] }, { "cell_type": "markdown", "id": "b9fc0166", "metadata": { "origin_pos": 29 }, "source": [ "We are almost ready to train the model,\n", "but first we need some training data.\n", "Here we use the `SyntheticRegressionData` class \n", "and pass in some ground truth parameters.\n", "Then we train our model with \n", "the learning rate `lr=0.03` \n", "and set `max_epochs=3`. \n", "Note that in general, both the number of epochs \n", "and the learning rate are hyperparameters.\n", "In general, setting hyperparameters is tricky\n", "and we will usually want to use a three-way split,\n", "one set for training, \n", "a second for hyperparameter selection,\n", "and the third reserved for the final evaluation.\n", "We elide these details for now but will revise them\n", "later.\n" ] }, { "cell_type": "code", "execution_count": 9, "id": "d45852e3", "metadata": { "attributes": { "classes": [], "id": "", "n": "20" }, "execution": { "iopub.execute_input": "2023-08-18T19:42:54.359835Z", "iopub.status.busy": "2023-08-18T19:42:54.359070Z", "iopub.status.idle": "2023-08-18T19:42:56.328769Z", "shell.execute_reply": "2023-08-18T19:42:56.327907Z" }, "origin_pos": 30, "tab": [ "pytorch" ] }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", " \n", " \n", " \n", " \n", " 2023-08-18T19:42:56.275496\n", " image/svg+xml\n", " \n", " \n", " Matplotlib v3.7.2, https://matplotlib.org/\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n" ], "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "model = LinearRegressionScratch(2, lr=0.03)\n", "data = d2l.SyntheticRegressionData(w=torch.tensor([2, -3.4]), b=4.2)\n", "trainer = d2l.Trainer(max_epochs=3)\n", "trainer.fit(model, data)" ] }, { "cell_type": "markdown", "id": "07628bb6", "metadata": { "origin_pos": 31 }, "source": [ "Because we synthesized the dataset ourselves,\n", "we know precisely what the true parameters are.\n", "Thus, we can [**evaluate our success in training\n", "by comparing the true parameters\n", "with those that we learned**] through our training loop.\n", "Indeed they turn out to be very close to each other.\n" ] }, { "cell_type": "code", "execution_count": 10, "id": "5a72b404", "metadata": { "attributes": { "classes": [], "id": "", "n": "21" }, "execution": { "iopub.execute_input": "2023-08-18T19:42:56.334422Z", "iopub.status.busy": "2023-08-18T19:42:56.333858Z", "iopub.status.idle": "2023-08-18T19:42:56.340281Z", "shell.execute_reply": "2023-08-18T19:42:56.339444Z" }, "origin_pos": 32, "tab": [ "pytorch" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "error in estimating w: tensor([ 0.1408, -0.1493])\n", "error in estimating b: tensor([0.2130])\n" ] } ], "source": [ "with torch.no_grad():\n", " print(f'error in estimating w: {data.w - model.w.reshape(data.w.shape)}')\n", " print(f'error in estimating b: {data.b - model.b}')" ] }, { "cell_type": "markdown", "id": "39644e44", "metadata": { "origin_pos": 35 }, "source": [ "We should not take the ability to exactly recover \n", "the ground truth parameters for granted.\n", "In general, for deep models unique solutions\n", "for the parameters do not exist,\n", "and even for linear models,\n", "exactly recovering the parameters\n", "is only possible when no feature \n", "is linearly dependent on the others.\n", "However, in machine learning, \n", "we are often less concerned\n", "with recovering true underlying parameters,\n", "but rather with parameters \n", "that lead to highly accurate prediction :cite:`Vapnik.1992`.\n", "Fortunately, even on difficult optimization problems,\n", "stochastic gradient descent can often find remarkably good solutions,\n", "owing partly to the fact that, for deep networks,\n", "there exist many configurations of the parameters\n", "that lead to highly accurate prediction.\n", "\n", "\n", "## Summary\n", "\n", "In this section, we took a significant step \n", "towards designing deep learning systems \n", "by implementing a fully functional \n", "neural network model and training loop.\n", "In this process, we built a data loader, \n", "a model, a loss function, an optimization procedure,\n", "and a visualization and monitoring tool. \n", "We did this by composing a Python object \n", "that contains all relevant components for training a model. \n", "While this is not yet a professional-grade implementation\n", "it is perfectly functional and code like this \n", "could already help you to solve small problems quickly.\n", "In the coming sections, we will see how to do this\n", "both *more concisely* (avoiding boilerplate code)\n", "and *more efficiently* (using our GPUs to their full potential).\n", "\n", "\n", "\n", "## Exercises\n", "\n", "1. What would happen if we were to initialize the weights to zero. Would the algorithm still work? What if we\n", " initialized the parameters with variance $1000$ rather than $0.01$?\n", "1. Assume that you are [Georg Simon Ohm](https://en.wikipedia.org/wiki/Georg_Ohm) trying to come up\n", " with a model for resistance that relates voltage and current. Can you use automatic\n", " differentiation to learn the parameters of your model?\n", "1. Can you use [Planck's Law](https://en.wikipedia.org/wiki/Planck%27s_law) to determine the temperature of an object\n", " using spectral energy density? For reference, the spectral density $B$ of radiation emanating from a black body is\n", " $B(\\lambda, T) = \\frac{2 hc^2}{\\lambda^5} \\cdot \\left(\\exp \\frac{h c}{\\lambda k T} - 1\\right)^{-1}$. Here\n", " $\\lambda$ is the wavelength, $T$ is the temperature, $c$ is the speed of light, $h$ is Planck's constant, and $k$ is the\n", " Boltzmann constant. You measure the energy for different wavelengths $\\lambda$ and you now need to fit the spectral\n", " density curve to Planck's law.\n", "1. What are the problems you might encounter if you wanted to compute the second derivatives of the loss? How would\n", " you fix them?\n", "1. Why is the `reshape` method needed in the `loss` function?\n", "1. Experiment using different learning rates to find out how quickly the loss function value drops. Can you reduce the\n", " error by increasing the number of epochs of training?\n", "1. If the number of examples cannot be divided by the batch size, what happens to `data_iter` at the end of an epoch?\n", "1. Try implementing a different loss function, such as the absolute value loss `(y_hat - d2l.reshape(y, y_hat.shape)).abs().sum()`.\n", " 1. Check what happens for regular data.\n", " 1. Check whether there is a difference in behavior if you actively perturb some entries, such as $y_5 = 10000$, of $\\mathbf{y}$.\n", " 1. Can you think of a cheap solution for combining the best aspects of squared loss and absolute value loss?\n", " Hint: how can you avoid really large gradient values?\n", "1. Why do we need to reshuffle the dataset? Can you design a case where a maliciously constructed dataset would break the optimization algorithm otherwise?\n" ] }, { "cell_type": "markdown", "id": "d533576d", "metadata": { "origin_pos": 37, "tab": [ "pytorch" ] }, "source": [ "[Discussions](https://discuss.d2l.ai/t/43)\n" ] } ], "metadata": { "language_info": { "name": "python" }, "required_libs": [] }, "nbformat": 4, "nbformat_minor": 5 }