{
"cells": [
{
"cell_type": "markdown",
"id": "2c52633a",
"metadata": {
"origin_pos": 1
},
"source": [
"# Hyperparameter Optimization API\n",
":label:`sec_api_hpo`\n",
"\n",
"Before we dive into the methodology, we will first discuss a basic code\n",
"structure that allows us to efficiently implement various HPO algorithms. In\n",
"general, all HPO algorithms considered here need to implement two decision\n",
"making primitives, *searching* and *scheduling*. First, they need to sample new\n",
"hyperparameter configurations, which often involves some kind of search over the\n",
"configuration space. Second, for each configuration, an HPO algorithm needs to\n",
"schedule its evaluation and decide how many resources to allocate for it. Once\n",
"we start to evaluate a configuration, we will refer to it as a *trial*. We map\n",
"these decisions to two classes, `HPOSearcher` and `HPOScheduler`. On top of that,\n",
"we also provide a `HPOTuner` class that executes the optimization process.\n",
"\n",
"This concept of scheduler and searcher is also implemented in popular HPO\n",
"libraries, such as Syne Tune :cite:`salinas-automl22`, Ray Tune\n",
":cite:`liaw-arxiv18` or Optuna :cite:`akiba-sigkdd19`.\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "4ebfe837",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "2"
},
"execution": {
"iopub.execute_input": "2023-08-18T19:56:11.114949Z",
"iopub.status.busy": "2023-08-18T19:56:11.114106Z",
"iopub.status.idle": "2023-08-18T19:56:14.580979Z",
"shell.execute_reply": "2023-08-18T19:56:14.579878Z"
},
"origin_pos": 2,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"import time\n",
"from scipy import stats\n",
"from d2l import torch as d2l"
]
},
{
"cell_type": "markdown",
"id": "03e9202e",
"metadata": {
"origin_pos": 3
},
"source": [
"## Searcher\n",
"\n",
"Below we define a base class for searchers, which provides a new candidate\n",
"configuration through the `sample_configuration` function. A simple way to\n",
"implement this function would be to sample configurations uniformly at random,\n",
"as we did for random search in :numref:`sec_what_is_hpo`. More sophisticated\n",
"algorithms, such as Bayesian optimization, will make these\n",
"decisions based on the performance of previous trials. As a result, these\n",
"algorithms are able to sample more promising candidates over time. We add the\n",
"`update` function in order to update the history of previous trials, which can\n",
"then be exploited to improve our sampling distribution.\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "457ad716",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "3"
},
"execution": {
"iopub.execute_input": "2023-08-18T19:56:14.585532Z",
"iopub.status.busy": "2023-08-18T19:56:14.584604Z",
"iopub.status.idle": "2023-08-18T19:56:14.589870Z",
"shell.execute_reply": "2023-08-18T19:56:14.588866Z"
},
"origin_pos": 4,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"class HPOSearcher(d2l.HyperParameters): #@save\n",
" def sample_configuration() -> dict:\n",
" raise NotImplementedError\n",
"\n",
" def update(self, config: dict, error: float, additional_info=None):\n",
" pass"
]
},
{
"cell_type": "markdown",
"id": "b7a1256a",
"metadata": {
"origin_pos": 5
},
"source": [
"The following code shows how to implement our random search optimizer from the\n",
"previous section in this API. As a slight extension, we allow the user to\n",
"prescribe the first configuration to be evaluated via `initial_config`, while\n",
"subsequent ones are drawn at random.\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "c27ce1c4",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "4"
},
"execution": {
"iopub.execute_input": "2023-08-18T19:56:14.593455Z",
"iopub.status.busy": "2023-08-18T19:56:14.592742Z",
"iopub.status.idle": "2023-08-18T19:56:14.598613Z",
"shell.execute_reply": "2023-08-18T19:56:14.597627Z"
},
"origin_pos": 6,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"class RandomSearcher(HPOSearcher): #@save\n",
" def __init__(self, config_space: dict, initial_config=None):\n",
" self.save_hyperparameters()\n",
"\n",
" def sample_configuration(self) -> dict:\n",
" if self.initial_config is not None:\n",
" result = self.initial_config\n",
" self.initial_config = None\n",
" else:\n",
" result = {\n",
" name: domain.rvs()\n",
" for name, domain in self.config_space.items()\n",
" }\n",
" return result"
]
},
{
"cell_type": "markdown",
"id": "8f5b09f0",
"metadata": {
"origin_pos": 7
},
"source": [
"## Scheduler\n",
"\n",
"Beyond sampling configurations for new trials, we also need to decide when and\n",
"for how long to run a trial. In practice, all these decisions are done by the\n",
"`HPOScheduler`, which delegates the choice of new configurations to a\n",
"`HPOSearcher`. The `suggest` method is called whenever some resource for training\n",
"becomes available. Apart from invoking `sample_configuration` of a searcher, it\n",
"may also decide upon parameters like `max_epochs` (i.e., how long to train the\n",
"model for). The `update` method is called whenever a trial returns a new\n",
"observation.\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "17f5a320",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "5"
},
"execution": {
"iopub.execute_input": "2023-08-18T19:56:14.601969Z",
"iopub.status.busy": "2023-08-18T19:56:14.601388Z",
"iopub.status.idle": "2023-08-18T19:56:14.606232Z",
"shell.execute_reply": "2023-08-18T19:56:14.605285Z"
},
"origin_pos": 8,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"class HPOScheduler(d2l.HyperParameters): #@save\n",
" def suggest(self) -> dict:\n",
" raise NotImplementedError\n",
"\n",
" def update(self, config: dict, error: float, info=None):\n",
" raise NotImplementedError"
]
},
{
"cell_type": "markdown",
"id": "86ff79f2",
"metadata": {
"origin_pos": 9
},
"source": [
"To implement random search, but also other HPO algorithms, we only need a basic\n",
"scheduler that schedules a new configuration every time new resources become\n",
"available.\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "bd546ac3",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "6"
},
"execution": {
"iopub.execute_input": "2023-08-18T19:56:14.609680Z",
"iopub.status.busy": "2023-08-18T19:56:14.608988Z",
"iopub.status.idle": "2023-08-18T19:56:14.614519Z",
"shell.execute_reply": "2023-08-18T19:56:14.613489Z"
},
"origin_pos": 10,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"class BasicScheduler(HPOScheduler): #@save\n",
" def __init__(self, searcher: HPOSearcher):\n",
" self.save_hyperparameters()\n",
"\n",
" def suggest(self) -> dict:\n",
" return self.searcher.sample_configuration()\n",
"\n",
" def update(self, config: dict, error: float, info=None):\n",
" self.searcher.update(config, error, additional_info=info)"
]
},
{
"cell_type": "markdown",
"id": "5da235c0",
"metadata": {
"origin_pos": 11
},
"source": [
"## Tuner\n",
"\n",
"Finally, we need a component that runs the scheduler/searcher and does some\n",
"book-keeping of the results. The following code implements a sequential\n",
"execution of the HPO trials that evaluates one training job after the next and\n",
"will serve as a basic example. We will later use *Syne Tune* for more scalable\n",
"distributed HPO cases.\n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "3c69cb0e",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "7"
},
"execution": {
"iopub.execute_input": "2023-08-18T19:56:14.618074Z",
"iopub.status.busy": "2023-08-18T19:56:14.617315Z",
"iopub.status.idle": "2023-08-18T19:56:14.624591Z",
"shell.execute_reply": "2023-08-18T19:56:14.623471Z"
},
"origin_pos": 12,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"class HPOTuner(d2l.HyperParameters): #@save\n",
" def __init__(self, scheduler: HPOScheduler, objective: callable):\n",
" self.save_hyperparameters()\n",
" # Bookeeping results for plotting\n",
" self.incumbent = None\n",
" self.incumbent_error = None\n",
" self.incumbent_trajectory = []\n",
" self.cumulative_runtime = []\n",
" self.current_runtime = 0\n",
" self.records = []\n",
"\n",
" def run(self, number_of_trials):\n",
" for i in range(number_of_trials):\n",
" start_time = time.time()\n",
" config = self.scheduler.suggest()\n",
" print(f\"Trial {i}: config = {config}\")\n",
" error = self.objective(**config)\n",
" error = float(error.cpu().detach().numpy())\n",
" self.scheduler.update(config, error)\n",
" runtime = time.time() - start_time\n",
" self.bookkeeping(config, error, runtime)\n",
" print(f\" error = {error}, runtime = {runtime}\")"
]
},
{
"cell_type": "markdown",
"id": "12c6bd50",
"metadata": {
"origin_pos": 13
},
"source": [
"## Bookkeeping the Performance of HPO Algorithms\n",
"\n",
"With any HPO algorithm, we are mostly interested in the best performing\n",
"configuration (called *incumbent*) and its validation error after a given \n",
"wall-clock time. This is why we track `runtime` per iteration, which includes\n",
"both the time to run an evaluation (call of `objective`) and the time to\n",
"make a decision (call of `scheduler.suggest`). In the sequel, we will plot\n",
"`cumulative_runtime` against `incumbent_trajectory` in order to visualize the\n",
"*any-time performance* of the HPO algorithm defined in terms of `scheduler`\n",
"(and `searcher`). This allows us to quantify not only how well the configuration\n",
"found by an optimizer works, but also how quickly an optimizer is able to find it.\n"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "44da4883",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "8"
},
"execution": {
"iopub.execute_input": "2023-08-18T19:56:14.628515Z",
"iopub.status.busy": "2023-08-18T19:56:14.627622Z",
"iopub.status.idle": "2023-08-18T19:56:14.634427Z",
"shell.execute_reply": "2023-08-18T19:56:14.633277Z"
},
"origin_pos": 14,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"@d2l.add_to_class(HPOTuner) #@save\n",
"def bookkeeping(self, config: dict, error: float, runtime: float):\n",
" self.records.append({\"config\": config, \"error\": error, \"runtime\": runtime})\n",
" # Check if the last hyperparameter configuration performs better\n",
" # than the incumbent\n",
" if self.incumbent is None or self.incumbent_error > error:\n",
" self.incumbent = config\n",
" self.incumbent_error = error\n",
" # Add current best observed performance to the optimization trajectory\n",
" self.incumbent_trajectory.append(self.incumbent_error)\n",
" # Update runtime\n",
" self.current_runtime += runtime\n",
" self.cumulative_runtime.append(self.current_runtime)"
]
},
{
"cell_type": "markdown",
"id": "4a366cc7",
"metadata": {
"origin_pos": 15
},
"source": [
"## Example: Optimizing the Hyperparameters of a Convolutional Neural Network\n",
"\n",
"We now use our new implementation of random search to optimize the\n",
"*batch size* and *learning rate* of the `LeNet` convolutional neural network\n",
"from :numref:`sec_lenet`. We being by defining the objective function, which\n",
"will once more be validation error.\n"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "349178aa",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "9"
},
"execution": {
"iopub.execute_input": "2023-08-18T19:56:14.638240Z",
"iopub.status.busy": "2023-08-18T19:56:14.637441Z",
"iopub.status.idle": "2023-08-18T19:56:14.644341Z",
"shell.execute_reply": "2023-08-18T19:56:14.643285Z"
},
"origin_pos": 16,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"def hpo_objective_lenet(learning_rate, batch_size, max_epochs=10): #@save\n",
" model = d2l.LeNet(lr=learning_rate, num_classes=10)\n",
" trainer = d2l.HPOTrainer(max_epochs=max_epochs, num_gpus=1)\n",
" data = d2l.FashionMNIST(batch_size=batch_size)\n",
" model.apply_init([next(iter(data.get_dataloader(True)))[0]], d2l.init_cnn)\n",
" trainer.fit(model=model, data=data)\n",
" validation_error = trainer.validation_error()\n",
" return validation_error"
]
},
{
"cell_type": "markdown",
"id": "aac837ec",
"metadata": {
"origin_pos": 17
},
"source": [
"We also need to define the configuration space. Moreover, the first configuration\n",
"to be evaluated is the default setting used in :numref:`sec_lenet`.\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "a523ef68",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "10"
},
"execution": {
"iopub.execute_input": "2023-08-18T19:56:14.648231Z",
"iopub.status.busy": "2023-08-18T19:56:14.647311Z",
"iopub.status.idle": "2023-08-18T19:56:14.654426Z",
"shell.execute_reply": "2023-08-18T19:56:14.653298Z"
},
"origin_pos": 18,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"config_space = {\n",
" \"learning_rate\": stats.loguniform(1e-2, 1),\n",
" \"batch_size\": stats.randint(32, 256),\n",
"}\n",
"initial_config = {\n",
" \"learning_rate\": 0.1,\n",
" \"batch_size\": 128,\n",
"}"
]
},
{
"cell_type": "markdown",
"id": "0dfe77c8",
"metadata": {
"origin_pos": 19
},
"source": [
"Now we can start our random search:\n"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "a336b154",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T19:56:14.658292Z",
"iopub.status.busy": "2023-08-18T19:56:14.657496Z",
"iopub.status.idle": "2023-08-18T20:01:31.684332Z",
"shell.execute_reply": "2023-08-18T20:01:31.683255Z"
},
"origin_pos": 20,
"tab": [
"pytorch"
]
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" error = 0.9000097513198853, runtime = 62.85189199447632\n"
]
},
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"searcher = RandomSearcher(config_space, initial_config=initial_config)\n",
"scheduler = BasicScheduler(searcher=searcher)\n",
"tuner = HPOTuner(scheduler=scheduler, objective=hpo_objective_lenet)\n",
"tuner.run(number_of_trials=5)"
]
},
{
"cell_type": "markdown",
"id": "92465bdc",
"metadata": {
"origin_pos": 21
},
"source": [
"Below we plot the optimization trajectory of the incumbent to get the any-time\n",
"performance of random search:\n"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "d1f81393",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "11"
},
"execution": {
"iopub.execute_input": "2023-08-18T20:01:31.689220Z",
"iopub.status.busy": "2023-08-18T20:01:31.688906Z",
"iopub.status.idle": "2023-08-18T20:01:32.528839Z",
"shell.execute_reply": "2023-08-18T20:01:32.527643Z"
},
"origin_pos": 22,
"tab": [
"pytorch"
]
},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"board = d2l.ProgressBoard(xlabel=\"time\", ylabel=\"error\")\n",
"for time_stamp, error in zip(\n",
" tuner.cumulative_runtime, tuner.incumbent_trajectory\n",
"):\n",
" board.draw(time_stamp, error, \"random search\", every_n=1)"
]
},
{
"cell_type": "markdown",
"id": "14102def",
"metadata": {
"origin_pos": 23
},
"source": [
"## Comparing HPO Algorithms\n",
"\n",
"Just as with training algorithms or model architectures, it is important to\n",
"understand how to best compare different HPO algorithms. Each HPO run depends\n",
"on two major sources of randomness: the random effects of the training process,\n",
"such as random weight initialization or mini-batch ordering, and the intrinsic\n",
"randomness of the HPO algorithm itself, such as the random sampling of random\n",
"search. Hence, when comparing different algorithms, it is crucial to run each\n",
"experiment several times and report statistics, such as mean or median, across\n",
"a population of multiple repetitions of an algorithm based on different seeds\n",
"of the random number generator.\n",
"\n",
"To illustrate this, we compare random search (see :numref:`sec_rs`) and Bayesian\n",
"optimization :cite:`snoek-nips12` on tuning the hyperparameters of a feed-forward\n",
"neural network. Each algorithm was evaluated\n",
"$50$ times with a different random seed. The solid line indicates the average\n",
"performance of the incumbent across these $50$ repetitions and the dashed line\n",
"the standard deviation. We can see that random search and Bayesian optimization\n",
"perform roughly the same up to ~1000 seconds, but Bayesian optimization can\n",
"make use of the past observation to identify better configurations and thus\n",
"quickly outperforms random search afterwards.\n",
"\n",
"\n",
"\n",
":label:`example_anytime_performance`\n",
"\n",
"## Summary\n",
"\n",
"This section laid out a simple, yet flexible interface to implement various HPO\n",
"algorithms that we will look at in this chapter. Similar interfaces can be found\n",
"in popular open-source HPO frameworks. We also looked at how we can compare HPO\n",
"algorithms, and potential pitfall one needs to be aware. \n",
"\n",
"## Exercises\n",
"\n",
"1. The goal of this exercise is to implement the objective function for a slightly more challenging HPO problem, and to run more realistic experiments. We will use the two hidden layer MLP `DropoutMLP` implemented in :numref:`sec_dropout`.\n",
" 1. Code up the objective function, which should depend on all hyperparameters of the model and `batch_size`. Use `max_epochs=50`. GPUs do not help here, so `num_gpus=0`. Hint: Modify `hpo_objective_lenet`.\n",
" 2. Choose a sensible search space, where `num_hiddens_1`, `num_hiddens_2` are integers in $[8, 1024]$, and dropout values lie in $[0, 0.95]$, while `batch_size` lies in $[16, 384]$. Provide code for `config_space`, using sensible distributions from `scipy.stats`.\n",
" 3. Run random search on this example with `number_of_trials=20` and plot the results. Make sure to first evaluate the default configuration of :numref:`sec_dropout`, which is `initial_config = {'num_hiddens_1': 256, 'num_hiddens_2': 256, 'dropout_1': 0.5, 'dropout_2': 0.5, 'lr': 0.1, 'batch_size': 256}`.\n",
"2. In this exercise, you will implement a new searcher (subclass of `HPOSearcher`) which makes decisions based on past data. It depends on parameters `probab_local`, `num_init_random`. Its `sample_configuration` method works as follows. For the first `num_init_random` calls, do the same as `RandomSearcher.sample_configuration`. Otherwise, with probability `1 - probab_local`, do the same as `RandomSearcher.sample_configuration`. Otherwise, pick the configuration which attained the smallest validation error so far, select one of its hyperparameters at random, and sample its value randomly like in `RandomSearcher.sample_configuration`, but leave all other values the same. Return this configuration, which is identical to the best configuration so far, except in this one hyperparameter.\n",
" 1. Code up this new `LocalSearcher`. Hint: Your searcher requires `config_space` as argument at construction. Feel free to use a member of type `RandomSearcher`. You will also have to implement the `update` method.\n",
" 2. Re-run the experiment from the previous exercise, but using your new searcher instead of `RandomSearcher`. Experiment with different values for `probab_local`, `num_init_random`. However, note that a proper comparison between different HPO methods requires repeating experiments several times, and ideally considering a number of benchmark tasks.\n"
]
},
{
"cell_type": "markdown",
"id": "26e88854",
"metadata": {
"origin_pos": 24,
"tab": [
"pytorch"
]
},
"source": [
"[Discussions](https://discuss.d2l.ai/t/12092)\n"
]
}
],
"metadata": {
"language_info": {
"name": "python"
},
"required_libs": []
},
"nbformat": 4,
"nbformat_minor": 5
}