{ "cells": [ { "cell_type": "markdown", "id": "d6fb47f3", "metadata": { "origin_pos": 1 }, "source": [ "# Concise Implementation of Linear Regression\n", ":label:`sec_linear_concise`\n", "\n", "Deep learning has witnessed a sort of Cambrian explosion\n", "over the past decade.\n", "The sheer number of techniques, applications and algorithms by far surpasses the\n", "progress of previous decades. \n", "This is due to a fortuitous combination of multiple factors,\n", "one of which is the powerful free tools\n", "offered by a number of open-source deep learning frameworks.\n", "Theano :cite:`Bergstra.Breuleux.Bastien.ea.2010`,\n", "DistBelief :cite:`Dean.Corrado.Monga.ea.2012`,\n", "and Caffe :cite:`Jia.Shelhamer.Donahue.ea.2014`\n", "arguably represent the\n", "first generation of such models \n", "that found widespread adoption.\n", "In contrast to earlier (seminal) works like\n", "SN2 (Simulateur Neuristique) :cite:`Bottou.Le-Cun.1988`,\n", "which provided a Lisp-like programming experience,\n", "modern frameworks offer automatic differentiation\n", "and the convenience of Python.\n", "These frameworks allow us to automate and modularize\n", "the repetitive work of implementing gradient-based learning algorithms.\n", "\n", "In :numref:`sec_linear_scratch`, we relied only on\n", "(i) tensors for data storage and linear algebra;\n", "and (ii) automatic differentiation for calculating gradients.\n", "In practice, because data iterators, loss functions, optimizers,\n", "and neural network layers\n", "are so common, modern libraries implement these components for us as well.\n", "In this section, (**we will show you how to implement\n", "the linear regression model**) from :numref:`sec_linear_scratch`\n", "(**concisely by using high-level APIs**) of deep learning frameworks.\n" ] }, { "cell_type": "code", "execution_count": 1, "id": "af5e3ef6", "metadata": { "execution": { "iopub.execute_input": "2023-08-18T19:26:10.338416Z", "iopub.status.busy": "2023-08-18T19:26:10.337902Z", "iopub.status.idle": "2023-08-18T19:26:13.242568Z", "shell.execute_reply": "2023-08-18T19:26:13.241256Z" }, "origin_pos": 3, "tab": [ "pytorch" ] }, "outputs": [], "source": [ "import numpy as np\n", "import torch\n", "from torch import nn\n", "from d2l import torch as d2l" ] }, { "cell_type": "markdown", "id": "cf76423a", "metadata": { "origin_pos": 6 }, "source": [ "## Defining the Model\n", "\n", "When we implemented linear regression from scratch\n", "in :numref:`sec_linear_scratch`,\n", "we defined our model parameters explicitly\n", "and coded up the calculations to produce output\n", "using basic linear algebra operations.\n", "You *should* know how to do this.\n", "But once your models get more complex,\n", "and once you have to do this nearly every day,\n", "you will be glad of the assistance.\n", "The situation is similar to coding up your own blog from scratch.\n", "Doing it once or twice is rewarding and instructive,\n", "but you would be a lousy web developer\n", "if you spent a month reinventing the wheel.\n", "\n", "For standard operations,\n", "we can [**use a framework's predefined layers,**]\n", "which allow us to focus\n", "on the layers used to construct the model\n", "rather than worrying about their implementation.\n", "Recall the architecture of a single-layer network\n", "as described in :numref:`fig_single_neuron`.\n", "The layer is called *fully connected*,\n", "since each of its inputs is connected\n", "to each of its outputs\n", "by means of a matrix--vector multiplication.\n" ] }, { "cell_type": "markdown", "id": "39abf439", "metadata": { "origin_pos": 8, "tab": [ "pytorch" ] }, "source": [ "In PyTorch, the fully connected layer is defined in `Linear` and `LazyLinear` classes (available since version 1.8.0). \n", "The latter\n", "allows users to specify *merely*\n", "the output dimension,\n", "while the former\n", "additionally asks for\n", "how many inputs go into this layer.\n", "Specifying input shapes is inconvenient and may require nontrivial calculations\n", "(such as in convolutional layers).\n", "Thus, for simplicity, we will use such \"lazy\" layers\n", "whenever we can.\n" ] }, { "cell_type": "code", "execution_count": 2, "id": "5a75de1f", "metadata": { "execution": { "iopub.execute_input": "2023-08-18T19:26:13.246965Z", "iopub.status.busy": "2023-08-18T19:26:13.246279Z", "iopub.status.idle": "2023-08-18T19:26:13.251883Z", "shell.execute_reply": "2023-08-18T19:26:13.251011Z" }, "origin_pos": 10, "tab": [ "pytorch" ] }, "outputs": [], "source": [ "class LinearRegression(d2l.Module): #@save\n", " \"\"\"The linear regression model implemented with high-level APIs.\"\"\"\n", " def __init__(self, lr):\n", " super().__init__()\n", " self.save_hyperparameters()\n", " self.net = nn.LazyLinear(1)\n", " self.net.weight.data.normal_(0, 0.01)\n", " self.net.bias.data.fill_(0)" ] }, { "cell_type": "markdown", "id": "33660a46", "metadata": { "origin_pos": 12 }, "source": [ "In the `forward` method we just invoke the built-in `__call__` method of the predefined layers to compute the outputs.\n" ] }, { "cell_type": "code", "execution_count": 3, "id": "ea3983b1", "metadata": { "execution": { "iopub.execute_input": "2023-08-18T19:26:13.255283Z", "iopub.status.busy": "2023-08-18T19:26:13.254649Z", "iopub.status.idle": "2023-08-18T19:26:13.258693Z", "shell.execute_reply": "2023-08-18T19:26:13.257894Z" }, "origin_pos": 13, "tab": [ "pytorch" ] }, "outputs": [], "source": [ "@d2l.add_to_class(LinearRegression) #@save\n", "def forward(self, X):\n", " return self.net(X)" ] }, { "cell_type": "markdown", "id": "66748a30", "metadata": { "origin_pos": 15 }, "source": [ "## Defining the Loss Function\n" ] }, { "cell_type": "markdown", "id": "1cdf7456", "metadata": { "origin_pos": 17, "tab": [ "pytorch" ] }, "source": [ "[**The `MSELoss` class computes the mean squared error (without the $1/2$ factor in :eqref:`eq_mse`).**]\n", "By default, `MSELoss` returns the average loss over examples.\n", "It is faster (and easier to use) than implementing our own.\n" ] }, { "cell_type": "code", "execution_count": 4, "id": "08279ee2", "metadata": { "execution": { "iopub.execute_input": "2023-08-18T19:26:13.262204Z", "iopub.status.busy": "2023-08-18T19:26:13.261534Z", "iopub.status.idle": "2023-08-18T19:26:13.265876Z", "shell.execute_reply": "2023-08-18T19:26:13.265042Z" }, "origin_pos": 19, "tab": [ "pytorch" ] }, "outputs": [], "source": [ "@d2l.add_to_class(LinearRegression) #@save\n", "def loss(self, y_hat, y):\n", " fn = nn.MSELoss()\n", " return fn(y_hat, y)" ] }, { "cell_type": "markdown", "id": "9e523527", "metadata": { "origin_pos": 21 }, "source": [ "## Defining the Optimization Algorithm\n" ] }, { "cell_type": "markdown", "id": "9ea96bb2", "metadata": { "origin_pos": 23, "tab": [ "pytorch" ] }, "source": [ "Minibatch SGD is a standard tool\n", "for optimizing neural networks\n", "and thus PyTorch supports it alongside a number of\n", "variations on this algorithm in the `optim` module.\n", "When we (**instantiate an `SGD` instance,**)\n", "we specify the parameters to optimize over,\n", "obtainable from our model via `self.parameters()`,\n", "and the learning rate (`self.lr`)\n", "required by our optimization algorithm.\n" ] }, { "cell_type": "code", "execution_count": 5, "id": "a6f8dcb8", "metadata": { "execution": { "iopub.execute_input": "2023-08-18T19:26:13.269377Z", "iopub.status.busy": "2023-08-18T19:26:13.268734Z", "iopub.status.idle": "2023-08-18T19:26:13.272985Z", "shell.execute_reply": "2023-08-18T19:26:13.272152Z" }, "origin_pos": 25, "tab": [ "pytorch" ] }, "outputs": [], "source": [ "@d2l.add_to_class(LinearRegression) #@save\n", "def configure_optimizers(self):\n", " return torch.optim.SGD(self.parameters(), self.lr)" ] }, { "cell_type": "markdown", "id": "cf9ff29e", "metadata": { "origin_pos": 26 }, "source": [ "## Training\n", "\n", "You might have noticed that expressing our model through\n", "high-level APIs of a deep learning framework\n", "requires fewer lines of code.\n", "We did not have to allocate parameters individually,\n", "define our loss function, or implement minibatch SGD.\n", "Once we start working with much more complex models,\n", "the advantages of the high-level API will grow considerably.\n", "\n", "Now that we have all the basic pieces in place,\n", "[**the training loop itself is the same\n", "as the one we implemented from scratch.**]\n", "So we just call the `fit` method (introduced in :numref:`oo-design-training`),\n", "which relies on the implementation of the `fit_epoch` method\n", "in :numref:`sec_linear_scratch`,\n", "to train our model.\n" ] }, { "cell_type": "code", "execution_count": 6, "id": "084d36cb", "metadata": { "execution": { "iopub.execute_input": "2023-08-18T19:26:13.276169Z", "iopub.status.busy": "2023-08-18T19:26:13.275889Z", "iopub.status.idle": "2023-08-18T19:26:15.234839Z", "shell.execute_reply": "2023-08-18T19:26:15.233918Z" }, "origin_pos": 27, "tab": [ "pytorch" ] }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", " \n", " \n", " \n", " \n", " 2023-08-18T19:26:15.181789\n", " image/svg+xml\n", " \n", " \n", " Matplotlib v3.7.2, https://matplotlib.org/\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n" ], "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "model = LinearRegression(lr=0.03)\n", "data = d2l.SyntheticRegressionData(w=torch.tensor([2, -3.4]), b=4.2)\n", "trainer = d2l.Trainer(max_epochs=3)\n", "trainer.fit(model, data)" ] }, { "cell_type": "markdown", "id": "bf3e3b53", "metadata": { "origin_pos": 28 }, "source": [ "Below, we\n", "[**compare the model parameters learned\n", "by training on finite data\n", "and the actual parameters**]\n", "that generated our dataset.\n", "To access parameters,\n", "we access the weights and bias\n", "of the layer that we need.\n", "As in our implementation from scratch,\n", "note that our estimated parameters\n", "are close to their true counterparts.\n" ] }, { "cell_type": "code", "execution_count": 7, "id": "9ff8ee0f", "metadata": { "execution": { "iopub.execute_input": "2023-08-18T19:26:15.238470Z", "iopub.status.busy": "2023-08-18T19:26:15.237858Z", "iopub.status.idle": "2023-08-18T19:26:15.242681Z", "shell.execute_reply": "2023-08-18T19:26:15.241815Z" }, "origin_pos": 29, "tab": [ "pytorch" ] }, "outputs": [], "source": [ "@d2l.add_to_class(LinearRegression) #@save\n", "def get_w_b(self):\n", " return (self.net.weight.data, self.net.bias.data)\n", "w, b = model.get_w_b()" ] }, { "cell_type": "code", "execution_count": 8, "id": "2e5bd7b1", "metadata": { "execution": { "iopub.execute_input": "2023-08-18T19:26:15.246141Z", "iopub.status.busy": "2023-08-18T19:26:15.245569Z", "iopub.status.idle": "2023-08-18T19:26:15.252296Z", "shell.execute_reply": "2023-08-18T19:26:15.251312Z" }, "origin_pos": 31, "tab": [ "pytorch" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "error in estimating w: tensor([ 0.0094, -0.0030])\n", "error in estimating b: tensor([0.0137])\n" ] } ], "source": [ "print(f'error in estimating w: {data.w - w.reshape(data.w.shape)}')\n", "print(f'error in estimating b: {data.b - b}')" ] }, { "cell_type": "markdown", "id": "bc2e436c", "metadata": { "origin_pos": 32 }, "source": [ "## Summary\n", "\n", "This section contains the first\n", "implementation of a deep network (in this book)\n", "to tap into the conveniences afforded\n", "by modern deep learning frameworks,\n", "such as MXNet :cite:`Chen.Li.Li.ea.2015`, \n", "JAX :cite:`Frostig.Johnson.Leary.2018`, \n", "PyTorch :cite:`Paszke.Gross.Massa.ea.2019`, \n", "and Tensorflow :cite:`Abadi.Barham.Chen.ea.2016`.\n", "We used framework defaults for loading data, defining a layer,\n", "a loss function, an optimizer and a training loop.\n", "Whenever the framework provides all necessary features,\n", "it is generally a good idea to use them,\n", "since the library implementations of these components\n", "tend to be heavily optimized for performance\n", "and properly tested for reliability.\n", "At the same time, try not to forget\n", "that these modules *can* be implemented directly.\n", "This is especially important for aspiring researchers\n", "who wish to live on the leading edge of model development,\n", "where you will be inventing new components\n", "that cannot possibly exist in any current library.\n" ] }, { "cell_type": "markdown", "id": "102e9024", "metadata": { "origin_pos": 34, "tab": [ "pytorch" ] }, "source": [ "In PyTorch, the `data` module provides tools for data processing,\n", "the `nn` module defines a large number of neural network layers and common loss functions.\n", "We can initialize the parameters by replacing their values\n", "with methods ending with `_`.\n", "Note that we need to specify the input dimensions of the network.\n", "While this is trivial for now, it can have significant knock-on effects\n", "when we want to design complex networks with many layers.\n", "Careful considerations of how to parametrize these networks\n", "is needed to allow portability.\n" ] }, { "cell_type": "markdown", "id": "5a5895fe", "metadata": { "origin_pos": 36 }, "source": [ "## Exercises\n", "\n", "1. How would you need to change the learning rate if you replace the aggregate loss over the minibatch\n", " with an average over the loss on the minibatch?\n", "1. Review the framework documentation to see which loss functions are provided. In particular,\n", " replace the squared loss with Huber's robust loss function. That is, use the loss function\n", " $$l(y,y') = \\begin{cases}|y-y'| -\\frac{\\sigma}{2} & \\textrm{ if } |y-y'| > \\sigma \\\\ \\frac{1}{2 \\sigma} (y-y')^2 & \\textrm{ otherwise}\\end{cases}$$\n", "1. How do you access the gradient of the weights of the model?\n", "1. What is the effect on the solution if you change the learning rate and the number of epochs? Does it keep on improving?\n", "1. How does the solution change as you vary the amount of data generated?\n", " 1. Plot the estimation error for $\\hat{\\mathbf{w}} - \\mathbf{w}$ and $\\hat{b} - b$ as a function of the amount of data. Hint: increase the amount of data logarithmically rather than linearly, i.e., 5, 10, 20, 50, ..., 10,000 rather than 1000, 2000, ..., 10,000.\n", " 2. Why is the suggestion in the hint appropriate?\n" ] }, { "cell_type": "markdown", "id": "9467a79a", "metadata": { "origin_pos": 38, "tab": [ "pytorch" ] }, "source": [ "[Discussions](https://discuss.d2l.ai/t/45)\n" ] } ], "metadata": { "language_info": { "name": "python" }, "required_libs": [] }, "nbformat": 4, "nbformat_minor": 5 }