{
"cells": [
{
"cell_type": "markdown",
"id": "89356571",
"metadata": {
"origin_pos": 0
},
"source": [
"# Dog Breed Identification (ImageNet Dogs) on Kaggle\n",
"\n",
"In this section, we will practice\n",
"the dog breed identification problem on\n",
"Kaggle. (**The web address of this competition is https://www.kaggle.com/c/dog-breed-identification**)\n",
"\n",
"In this competition,\n",
"120 different breeds of dogs will be recognized.\n",
"In fact,\n",
"the dataset for this competition is\n",
"a subset of the ImageNet dataset.\n",
"Unlike the images in the CIFAR-10 dataset in :numref:`sec_kaggle_cifar10`,\n",
"the images in the ImageNet dataset are both higher and wider in varying dimensions.\n",
":numref:`fig_kaggle_dog` shows the information on the competition's webpage. You need a Kaggle account\n",
"to submit your results.\n",
"\n",
"\n",
"\n",
":width:`400px`\n",
":label:`fig_kaggle_dog`\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "522dbb60",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T19:37:07.445133Z",
"iopub.status.busy": "2023-08-18T19:37:07.444735Z",
"iopub.status.idle": "2023-08-18T19:37:10.912341Z",
"shell.execute_reply": "2023-08-18T19:37:10.911410Z"
},
"origin_pos": 2,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"import os\n",
"import torch\n",
"import torchvision\n",
"from torch import nn\n",
"from d2l import torch as d2l"
]
},
{
"cell_type": "markdown",
"id": "3236bc52",
"metadata": {
"origin_pos": 3
},
"source": [
"## Obtaining and Organizing the Dataset\n",
"\n",
"The competition dataset is divided into a training set and a test set, which contain 10222 and 10357 JPEG images\n",
"of three RGB (color) channels, respectively.\n",
"Among the training dataset,\n",
"there are 120 breeds of dogs\n",
"such as Labradors, Poodles, Dachshunds, Samoyeds, Huskies, Chihuahuas, and Yorkshire Terriers.\n",
"\n",
"\n",
"### Downloading the Dataset\n",
"\n",
"After logging into Kaggle,\n",
"you can click on the \"Data\" tab on the\n",
"competition webpage shown in :numref:`fig_kaggle_dog` and download the dataset by clicking the \"Download All\" button.\n",
"After unzipping the downloaded file in `../data`, you will find the entire dataset in the following paths:\n",
"\n",
"* ../data/dog-breed-identification/labels.csv\n",
"* ../data/dog-breed-identification/sample_submission.csv\n",
"* ../data/dog-breed-identification/train\n",
"* ../data/dog-breed-identification/test\n",
"\n",
"You may have noticed that the above structure is\n",
"similar to that of the CIFAR-10 competition in :numref:`sec_kaggle_cifar10`, where folders `train/` and `test/` contain training and testing dog images, respectively, and `labels.csv` contains\n",
"the labels for the training images.\n",
"Similarly, to make it easier to get started, [**we provide a small sample of the dataset**] mentioned above: `train_valid_test_tiny.zip`.\n",
"If you are going to use the full dataset for the Kaggle competition, you need to change the `demo` variable below to `False`.\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "3410e2ed",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T19:37:10.916623Z",
"iopub.status.busy": "2023-08-18T19:37:10.915891Z",
"iopub.status.idle": "2023-08-18T19:37:12.042530Z",
"shell.execute_reply": "2023-08-18T19:37:12.041410Z"
},
"origin_pos": 4,
"tab": [
"pytorch"
]
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Downloading ../data/kaggle_dog_tiny.zip from http://d2l-data.s3-accelerate.amazonaws.com/kaggle_dog_tiny.zip...\n"
]
}
],
"source": [
"#@save\n",
"d2l.DATA_HUB['dog_tiny'] = (d2l.DATA_URL + 'kaggle_dog_tiny.zip',\n",
" '0cb91d09b814ecdc07b50f31f8dcad3e81d6a86d')\n",
"\n",
"# If you use the full dataset downloaded for the Kaggle competition, change\n",
"# the variable below to `False`\n",
"demo = True\n",
"if demo:\n",
" data_dir = d2l.download_extract('dog_tiny')\n",
"else:\n",
" data_dir = os.path.join('..', 'data', 'dog-breed-identification')"
]
},
{
"cell_type": "markdown",
"id": "dc030ecd",
"metadata": {
"origin_pos": 5
},
"source": [
"### [**Organizing the Dataset**]\n",
"\n",
"We can organize the dataset similarly to what we did in :numref:`sec_kaggle_cifar10`, namely splitting out\n",
"a validation set from the original training set, and moving images into subfolders grouped by labels.\n",
"\n",
"The `reorg_dog_data` function below reads\n",
"the training data labels, splits out the validation set, and organizes the training set.\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "12a7827b",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T19:37:12.048428Z",
"iopub.status.busy": "2023-08-18T19:37:12.046704Z",
"iopub.status.idle": "2023-08-18T19:37:12.439141Z",
"shell.execute_reply": "2023-08-18T19:37:12.438146Z"
},
"origin_pos": 6,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"def reorg_dog_data(data_dir, valid_ratio):\n",
" labels = d2l.read_csv_labels(os.path.join(data_dir, 'labels.csv'))\n",
" d2l.reorg_train_valid(data_dir, labels, valid_ratio)\n",
" d2l.reorg_test(data_dir)\n",
"\n",
"\n",
"batch_size = 32 if demo else 128\n",
"valid_ratio = 0.1\n",
"reorg_dog_data(data_dir, valid_ratio)"
]
},
{
"cell_type": "markdown",
"id": "5749b6c7",
"metadata": {
"origin_pos": 7
},
"source": [
"## [**Image Augmentation**]\n",
"\n",
"Recall that this dog breed dataset\n",
"is a subset of the ImageNet dataset,\n",
"whose images\n",
"are larger than those of the CIFAR-10 dataset\n",
"in :numref:`sec_kaggle_cifar10`.\n",
"The following\n",
"lists a few image augmentation operations\n",
"that might be useful for relatively larger images.\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "da028e84",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T19:37:12.444078Z",
"iopub.status.busy": "2023-08-18T19:37:12.443719Z",
"iopub.status.idle": "2023-08-18T19:37:12.450590Z",
"shell.execute_reply": "2023-08-18T19:37:12.449525Z"
},
"origin_pos": 9,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"transform_train = torchvision.transforms.Compose([\n",
" # Randomly crop the image to obtain an image with an area of 0.08 to 1 of\n",
" # the original area and height-to-width ratio between 3/4 and 4/3. Then,\n",
" # scale the image to create a new 224 x 224 image\n",
" torchvision.transforms.RandomResizedCrop(224, scale=(0.08, 1.0),\n",
" ratio=(3.0/4.0, 4.0/3.0)),\n",
" torchvision.transforms.RandomHorizontalFlip(),\n",
" # Randomly change the brightness, contrast, and saturation\n",
" torchvision.transforms.ColorJitter(brightness=0.4,\n",
" contrast=0.4,\n",
" saturation=0.4),\n",
" # Add random noise\n",
" torchvision.transforms.ToTensor(),\n",
" # Standardize each channel of the image\n",
" torchvision.transforms.Normalize([0.485, 0.456, 0.406],\n",
" [0.229, 0.224, 0.225])])"
]
},
{
"cell_type": "markdown",
"id": "ef89c6cb",
"metadata": {
"origin_pos": 10
},
"source": [
"During prediction,\n",
"we only use image preprocessing operations\n",
"without randomness.\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "014f4992",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T19:37:12.454305Z",
"iopub.status.busy": "2023-08-18T19:37:12.454005Z",
"iopub.status.idle": "2023-08-18T19:37:12.459587Z",
"shell.execute_reply": "2023-08-18T19:37:12.458771Z"
},
"origin_pos": 12,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"transform_test = torchvision.transforms.Compose([\n",
" torchvision.transforms.Resize(256),\n",
" # Crop a 224 x 224 square area from the center of the image\n",
" torchvision.transforms.CenterCrop(224),\n",
" torchvision.transforms.ToTensor(),\n",
" torchvision.transforms.Normalize([0.485, 0.456, 0.406],\n",
" [0.229, 0.224, 0.225])])"
]
},
{
"cell_type": "markdown",
"id": "9dcc3c05",
"metadata": {
"origin_pos": 13
},
"source": [
"## [**Reading the Dataset**]\n",
"\n",
"As in :numref:`sec_kaggle_cifar10`,\n",
"we can read the organized dataset\n",
"consisting of raw image files.\n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "3f91960b",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T19:37:12.464061Z",
"iopub.status.busy": "2023-08-18T19:37:12.463218Z",
"iopub.status.idle": "2023-08-18T19:37:12.492441Z",
"shell.execute_reply": "2023-08-18T19:37:12.491530Z"
},
"origin_pos": 15,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"train_ds, train_valid_ds = [torchvision.datasets.ImageFolder(\n",
" os.path.join(data_dir, 'train_valid_test', folder),\n",
" transform=transform_train) for folder in ['train', 'train_valid']]\n",
"\n",
"valid_ds, test_ds = [torchvision.datasets.ImageFolder(\n",
" os.path.join(data_dir, 'train_valid_test', folder),\n",
" transform=transform_test) for folder in ['valid', 'test']]"
]
},
{
"cell_type": "markdown",
"id": "d3a71b4e",
"metadata": {
"origin_pos": 16
},
"source": [
"Below we create data iterator instances\n",
"the same way\n",
"as in :numref:`sec_kaggle_cifar10`.\n"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "dc700919",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T19:37:12.495969Z",
"iopub.status.busy": "2023-08-18T19:37:12.495419Z",
"iopub.status.idle": "2023-08-18T19:37:12.505734Z",
"shell.execute_reply": "2023-08-18T19:37:12.501006Z"
},
"origin_pos": 18,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"train_iter, train_valid_iter = [torch.utils.data.DataLoader(\n",
" dataset, batch_size, shuffle=True, drop_last=True)\n",
" for dataset in (train_ds, train_valid_ds)]\n",
"\n",
"valid_iter = torch.utils.data.DataLoader(valid_ds, batch_size, shuffle=False,\n",
" drop_last=True)\n",
"\n",
"test_iter = torch.utils.data.DataLoader(test_ds, batch_size, shuffle=False,\n",
" drop_last=False)"
]
},
{
"cell_type": "markdown",
"id": "0701b62c",
"metadata": {
"origin_pos": 19
},
"source": [
"## [**Fine-Tuning a Pretrained Model**]\n",
"\n",
"Again,\n",
"the dataset for this competition is a subset of the ImageNet dataset.\n",
"Therefore, we can use the approach discussed in\n",
":numref:`sec_fine_tuning`\n",
"to select a model pretrained on the\n",
"full ImageNet dataset and use it to extract image features to be fed into a\n",
"custom small-scale output network.\n",
"High-level APIs of deep learning frameworks\n",
"provide a wide range of models\n",
"pretrained on the ImageNet dataset.\n",
"Here, we choose\n",
"a pretrained ResNet-34 model,\n",
"where we simply reuse\n",
"the input of this model's output layer\n",
"(i.e., the extracted\n",
"features).\n",
"Then we can replace the original output layer with a small custom\n",
"output network that can be trained,\n",
"such as stacking two\n",
"fully connected layers.\n",
"Different from the experiment in\n",
":numref:`sec_fine_tuning`,\n",
"the following does\n",
"not retrain the pretrained model used for feature\n",
"extraction. This reduces training time and\n",
"memory for storing gradients.\n",
"\n",
"Recall that we\n",
"standardized images using\n",
"the means and standard deviations of the three RGB channels for the full ImageNet dataset.\n",
"In fact,\n",
"this is also consistent with the standardization operation\n",
"by the pretrained model on ImageNet.\n"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "61785088",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T19:37:12.511001Z",
"iopub.status.busy": "2023-08-18T19:37:12.510321Z",
"iopub.status.idle": "2023-08-18T19:37:12.519013Z",
"shell.execute_reply": "2023-08-18T19:37:12.517929Z"
},
"origin_pos": 21,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"def get_net(devices):\n",
" finetune_net = nn.Sequential()\n",
" finetune_net.features = torchvision.models.resnet34(pretrained=True)\n",
" # Define a new output network (there are 120 output categories)\n",
" finetune_net.output_new = nn.Sequential(nn.Linear(1000, 256),\n",
" nn.ReLU(),\n",
" nn.Linear(256, 120))\n",
" # Move the model to devices\n",
" finetune_net = finetune_net.to(devices[0])\n",
" # Freeze parameters of feature layers\n",
" for param in finetune_net.features.parameters():\n",
" param.requires_grad = False\n",
" return finetune_net"
]
},
{
"cell_type": "markdown",
"id": "f854ebef",
"metadata": {
"origin_pos": 22
},
"source": [
"Before [**calculating the loss**],\n",
"we first obtain the input of the pretrained model's output layer, i.e., the extracted feature.\n",
"Then we use this feature as input for our small custom output network to calculate the loss.\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "88afcf21",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T19:37:12.525163Z",
"iopub.status.busy": "2023-08-18T19:37:12.523035Z",
"iopub.status.idle": "2023-08-18T19:37:12.531000Z",
"shell.execute_reply": "2023-08-18T19:37:12.529963Z"
},
"origin_pos": 24,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"loss = nn.CrossEntropyLoss(reduction='none')\n",
"\n",
"def evaluate_loss(data_iter, net, devices):\n",
" l_sum, n = 0.0, 0\n",
" for features, labels in data_iter:\n",
" features, labels = features.to(devices[0]), labels.to(devices[0])\n",
" outputs = net(features)\n",
" l = loss(outputs, labels)\n",
" l_sum += l.sum()\n",
" n += labels.numel()\n",
" return l_sum / n"
]
},
{
"cell_type": "markdown",
"id": "0b3d346f",
"metadata": {
"origin_pos": 25
},
"source": [
"## Defining [**the Training Function**]\n",
"\n",
"We will select the model and tune hyperparameters according to the model's performance on the validation set. The model training function `train` only\n",
"iterates parameters of the small custom output network.\n"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "a2fec7ed",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T19:37:12.535506Z",
"iopub.status.busy": "2023-08-18T19:37:12.535169Z",
"iopub.status.idle": "2023-08-18T19:37:12.548096Z",
"shell.execute_reply": "2023-08-18T19:37:12.547177Z"
},
"origin_pos": 27,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"def train(net, train_iter, valid_iter, num_epochs, lr, wd, devices, lr_period,\n",
" lr_decay):\n",
" # Only train the small custom output network\n",
" net = nn.DataParallel(net, device_ids=devices).to(devices[0])\n",
" trainer = torch.optim.SGD((param for param in net.parameters()\n",
" if param.requires_grad), lr=lr,\n",
" momentum=0.9, weight_decay=wd)\n",
" scheduler = torch.optim.lr_scheduler.StepLR(trainer, lr_period, lr_decay)\n",
" num_batches, timer = len(train_iter), d2l.Timer()\n",
" legend = ['train loss']\n",
" if valid_iter is not None:\n",
" legend.append('valid loss')\n",
" animator = d2l.Animator(xlabel='epoch', xlim=[1, num_epochs],\n",
" legend=legend)\n",
" for epoch in range(num_epochs):\n",
" metric = d2l.Accumulator(2)\n",
" for i, (features, labels) in enumerate(train_iter):\n",
" timer.start()\n",
" features, labels = features.to(devices[0]), labels.to(devices[0])\n",
" trainer.zero_grad()\n",
" output = net(features)\n",
" l = loss(output, labels).sum()\n",
" l.backward()\n",
" trainer.step()\n",
" metric.add(l, labels.shape[0])\n",
" timer.stop()\n",
" if (i + 1) % (num_batches // 5) == 0 or i == num_batches - 1:\n",
" animator.add(epoch + (i + 1) / num_batches,\n",
" (metric[0] / metric[1], None))\n",
" measures = f'train loss {metric[0] / metric[1]:.3f}'\n",
" if valid_iter is not None:\n",
" valid_loss = evaluate_loss(valid_iter, net, devices)\n",
" animator.add(epoch + 1, (None, valid_loss.detach().cpu()))\n",
" scheduler.step()\n",
" if valid_iter is not None:\n",
" measures += f', valid loss {valid_loss:.3f}'\n",
" print(measures + f'\\n{metric[1] * num_epochs / timer.sum():.1f}'\n",
" f' examples/sec on {str(devices)}')"
]
},
{
"cell_type": "markdown",
"id": "d77e6522",
"metadata": {
"origin_pos": 28
},
"source": [
"## [**Training and Validating the Model**]\n",
"\n",
"Now we can train and validate the model.\n",
"The following hyperparameters are all tunable.\n",
"For example, the number of epochs can be increased. Because `lr_period` and `lr_decay` are set to 2 and 0.9, respectively, the learning rate of the optimization algorithm will be multiplied by 0.9 after every 2 epochs.\n"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "c06c3c5b",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T19:37:12.552339Z",
"iopub.status.busy": "2023-08-18T19:37:12.552051Z",
"iopub.status.idle": "2023-08-18T19:39:35.191318Z",
"shell.execute_reply": "2023-08-18T19:39:35.189837Z"
},
"origin_pos": 30,
"tab": [
"pytorch"
]
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"train loss 1.240, valid loss 1.545\n",
"577.5 examples/sec on [device(type='cuda', index=0), device(type='cuda', index=1)]\n"
]
},
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"devices, num_epochs, lr, wd = d2l.try_all_gpus(), 10, 1e-4, 1e-4\n",
"lr_period, lr_decay, net = 2, 0.9, get_net(devices)\n",
"train(net, train_iter, valid_iter, num_epochs, lr, wd, devices, lr_period,\n",
" lr_decay)"
]
},
{
"cell_type": "markdown",
"id": "24685aa0",
"metadata": {
"origin_pos": 31
},
"source": [
"## [**Classifying the Testing Set**] and Submitting Results on Kaggle\n",
"\n",
"\n",
"Similar to the final step in :numref:`sec_kaggle_cifar10`,\n",
"in the end all the labeled data (including the validation set) are used for training the model and classifying the testing set.\n",
"We will use the trained custom output network\n",
"for classification.\n"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "44a716b0",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T19:39:35.196555Z",
"iopub.status.busy": "2023-08-18T19:39:35.195796Z",
"iopub.status.idle": "2023-08-18T19:41:31.649186Z",
"shell.execute_reply": "2023-08-18T19:41:31.648347Z"
},
"origin_pos": 33,
"tab": [
"pytorch"
]
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"train loss 1.217\n",
"742.7 examples/sec on [device(type='cuda', index=0), device(type='cuda', index=1)]\n"
]
},
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"net = get_net(devices)\n",
"train(net, train_valid_iter, None, num_epochs, lr, wd, devices, lr_period,\n",
" lr_decay)\n",
"\n",
"preds = []\n",
"for data, label in test_iter:\n",
" output = torch.nn.functional.softmax(net(data.to(devices[0])), dim=1)\n",
" preds.extend(output.cpu().detach().numpy())\n",
"ids = sorted(os.listdir(\n",
" os.path.join(data_dir, 'train_valid_test', 'test', 'unknown')))\n",
"with open('submission.csv', 'w') as f:\n",
" f.write('id,' + ','.join(train_valid_ds.classes) + '\\n')\n",
" for i, output in zip(ids, preds):\n",
" f.write(i.split('.')[0] + ',' + ','.join(\n",
" [str(num) for num in output]) + '\\n')"
]
},
{
"cell_type": "markdown",
"id": "2707c202",
"metadata": {
"origin_pos": 34
},
"source": [
"The above code\n",
"will generate a `submission.csv` file\n",
"to be submitted\n",
"to Kaggle in the same way described in :numref:`sec_kaggle_house`.\n",
"\n",
"\n",
"## Summary\n",
"\n",
"\n",
"* Images in the ImageNet dataset are larger (with varying dimensions) than CIFAR-10 images. We may modify image augmentation operations for tasks on a different dataset.\n",
"* To classify a subset of the ImageNet dataset, we can leverage pre-trained models on the full ImageNet dataset to extract features and only train a custom small-scale output network. This will lead to less computational time and memory cost.\n",
"\n",
"\n",
"## Exercises\n",
"\n",
"1. When using the full Kaggle competition dataset, what results can you achieve when you increase `batch_size` (batch size) and `num_epochs` (number of epochs) while setting some other hyperparameters as `lr = 0.01`, `lr_period = 10`, and `lr_decay = 0.1`?\n",
"1. Do you get better results if you use a deeper pretrained model? How do you tune hyperparameters? Can you further improve the results?\n"
]
},
{
"cell_type": "markdown",
"id": "20bcfde3",
"metadata": {
"origin_pos": 36,
"tab": [
"pytorch"
]
},
"source": [
"[Discussions](https://discuss.d2l.ai/t/1481)\n"
]
}
],
"metadata": {
"language_info": {
"name": "python"
},
"required_libs": []
},
"nbformat": 4,
"nbformat_minor": 5
}