{
"cells": [
{
"cell_type": "markdown",
"id": "ba85ae7f",
"metadata": {
"origin_pos": 1
},
"source": [
"# Asynchronous Random Search\n",
":label:`sec_rs_async`\n",
"\n",
"As we have seen in the previous :numref:`sec_api_hpo`, we might have to wait\n",
"hours or even days before random search returns a good hyperparameter\n",
"configuration, because of the expensive evaluation of hyperparameter\n",
"configurations. In practice, we have often access to a pool of resources such as\n",
"multiple GPUs on the same machine or multiple machines with a single GPU. This\n",
"begs the question: *How do we efficiently distribute random search?*\n",
"\n",
"In general, we distinguish between synchronous and asynchronous parallel\n",
"hyperparameter optimization (see :numref:`distributed_scheduling`). In the\n",
"synchronous setting, we wait for all concurrently running trials to finish,\n",
"before we start the next batch. Consider configuration spaces that contain\n",
"hyperparameters such as the number of filters or number of layers of a deep\n",
"neural network. Hyperparameter configurations that contain a larger number of \n",
"layers of filters will naturally take more time to finish, and all other trials\n",
"in the same batch will have to wait at synchronisation points (grey area in\n",
":numref:`distributed_scheduling`) before we can continue the optimization\n",
"process.\n",
"\n",
"In the asynchronous setting we immediately schedule a new trial as soon as resources\n",
"become available. This will optimally exploit our resources, since we can avoid any\n",
"synchronisation overhead. For random search, each new hyperparameter configuration\n",
"is chosen independently of all others, and in particular without exploiting\n",
"observations from any prior evaluation. This means we can trivially parallelize random\n",
"search asynchronously. This is not straight-forward with more sophisticated methods\n",
"that make decision based on previous observations (see :numref:`sec_sh_async`).\n",
"While we need access to more resources than in the sequential setting, asynchronous\n",
"random search exhibits a linear speed-up, in that a certain performance is reached\n",
"$K$ times faster if $K$ trials can be run in parallel. \n",
"\n",
"\n",
"\n",
":label:`distributed_scheduling`\n",
"\n",
"In this notebook, we will look at asynchronous random search that, where trials are\n",
"executed in multiple python processes on the same machine. Distributed job scheduling\n",
"and execution is difficult to implement from scratch. We will use *Syne Tune*\n",
":cite:`salinas-automl22`, which provides us with a simple interface for asynchronous\n",
"HPO. Syne Tune is designed to be run with different execution back-ends, and the\n",
"interested reader is invited to study its simple APIs in order to learn more about\n",
"distributed HPO.\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "3bc1ea0b",
"metadata": {
"execution": {
"iopub.execute_input": "2023-08-18T19:45:36.479120Z",
"iopub.status.busy": "2023-08-18T19:45:36.478396Z",
"iopub.status.idle": "2023-08-18T19:45:39.891609Z",
"shell.execute_reply": "2023-08-18T19:45:39.890405Z"
},
"origin_pos": 2,
"tab": [
"pytorch"
]
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:root:SageMakerBackend is not imported since dependencies are missing. You can install them with\n",
" pip install 'syne-tune[extra]'\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"AWS dependencies are not imported since dependencies are missing. You can install them with\n",
" pip install 'syne-tune[aws]'\n",
"or (for everything)\n",
" pip install 'syne-tune[extra]'\n",
"AWS dependencies are not imported since dependencies are missing. You can install them with\n",
" pip install 'syne-tune[aws]'\n",
"or (for everything)\n",
" pip install 'syne-tune[extra]'\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:root:Ray Tune schedulers and searchers are not imported since dependencies are missing. You can install them with\n",
" pip install 'syne-tune[raytune]'\n",
"or (for everything)\n",
" pip install 'syne-tune[extra]'\n"
]
}
],
"source": [
"import logging\n",
"from d2l import torch as d2l\n",
"\n",
"logging.basicConfig(level=logging.INFO)\n",
"from syne_tune import StoppingCriterion, Tuner\n",
"from syne_tune.backend.python_backend import PythonBackend\n",
"from syne_tune.config_space import loguniform, randint\n",
"from syne_tune.experiments import load_experiment\n",
"from syne_tune.optimizer.baselines import RandomSearch"
]
},
{
"cell_type": "markdown",
"id": "4dad0d63",
"metadata": {
"origin_pos": 3
},
"source": [
"## Objective Function\n",
"\n",
"First, we have to define a new objective function such that it now returns the\n",
"performance back to Syne Tune via the `report` callback.\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "836d36b0",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "34"
},
"execution": {
"iopub.execute_input": "2023-08-18T19:45:39.896715Z",
"iopub.status.busy": "2023-08-18T19:45:39.896297Z",
"iopub.status.idle": "2023-08-18T19:45:39.903868Z",
"shell.execute_reply": "2023-08-18T19:45:39.902721Z"
},
"origin_pos": 4,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"def hpo_objective_lenet_synetune(learning_rate, batch_size, max_epochs):\n",
" from syne_tune import Reporter\n",
" from d2l import torch as d2l\n",
"\n",
" model = d2l.LeNet(lr=learning_rate, num_classes=10)\n",
" trainer = d2l.HPOTrainer(max_epochs=1, num_gpus=1)\n",
" data = d2l.FashionMNIST(batch_size=batch_size)\n",
" model.apply_init([next(iter(data.get_dataloader(True)))[0]], d2l.init_cnn)\n",
" report = Reporter()\n",
" for epoch in range(1, max_epochs + 1):\n",
" if epoch == 1:\n",
" # Initialize the state of Trainer\n",
" trainer.fit(model=model, data=data)\n",
" else:\n",
" trainer.fit_epoch()\n",
" validation_error = trainer.validation_error().cpu().detach().numpy()\n",
" report(epoch=epoch, validation_error=float(validation_error))"
]
},
{
"cell_type": "markdown",
"id": "be1414d2",
"metadata": {
"origin_pos": 5
},
"source": [
"Note that the `PythonBackend` of Syne Tune requires dependencies to be imported\n",
"inside the function definition.\n",
"\n",
"## Asynchronous Scheduler\n",
"\n",
"First, we define the number of workers that evaluate trials concurrently. We\n",
"also need to specify how long we want to run random search, by defining an\n",
"upper limit on the total wall-clock time.\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "8444f99c",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "37"
},
"execution": {
"iopub.execute_input": "2023-08-18T19:45:39.908295Z",
"iopub.status.busy": "2023-08-18T19:45:39.907985Z",
"iopub.status.idle": "2023-08-18T19:45:39.912826Z",
"shell.execute_reply": "2023-08-18T19:45:39.911751Z"
},
"origin_pos": 6,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"n_workers = 2 # Needs to be <= the number of available GPUs\n",
"\n",
"max_wallclock_time = 12 * 60 # 12 minutes"
]
},
{
"cell_type": "markdown",
"id": "f29aad51",
"metadata": {
"origin_pos": 7
},
"source": [
"Next, we state which metric we want to optimize and whether we want to minimize or\n",
"maximize this metric. Namely, `metric` needs to correspond to the argument name\n",
"passed to the `report` callback.\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "1fe5e4fc",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "38"
},
"execution": {
"iopub.execute_input": "2023-08-18T19:45:39.917113Z",
"iopub.status.busy": "2023-08-18T19:45:39.916831Z",
"iopub.status.idle": "2023-08-18T19:45:39.921556Z",
"shell.execute_reply": "2023-08-18T19:45:39.920481Z"
},
"origin_pos": 8,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"mode = \"min\"\n",
"metric = \"validation_error\""
]
},
{
"cell_type": "markdown",
"id": "3ce2c04b",
"metadata": {
"origin_pos": 9
},
"source": [
"We use the configuration space from our previous example. In Syne Tune, this\n",
"dictionary can also be used to pass constant attributes to the training script.\n",
"We make use of this feature in order to pass `max_epochs`. Moreover, we specify\n",
"the first configuration to be evaluated in `initial_config`.\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "51b63ea4",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "39"
},
"execution": {
"iopub.execute_input": "2023-08-18T19:45:39.925689Z",
"iopub.status.busy": "2023-08-18T19:45:39.925408Z",
"iopub.status.idle": "2023-08-18T19:45:39.930817Z",
"shell.execute_reply": "2023-08-18T19:45:39.929661Z"
},
"origin_pos": 10,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"config_space = {\n",
" \"learning_rate\": loguniform(1e-2, 1),\n",
" \"batch_size\": randint(32, 256),\n",
" \"max_epochs\": 10,\n",
"}\n",
"initial_config = {\n",
" \"learning_rate\": 0.1,\n",
" \"batch_size\": 128,\n",
"}"
]
},
{
"cell_type": "markdown",
"id": "863e6b0a",
"metadata": {
"origin_pos": 11
},
"source": [
"Next, we need to specify the back-end for job executions. Here we just consider\n",
"the distribution on a local machine where parallel jobs are executed as\n",
"sub-processes. However, for large scale HPO, we could run this also on a cluster\n",
"or cloud environment, where each trial consumes a full instance.\n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "5afe2680",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "40"
},
"execution": {
"iopub.execute_input": "2023-08-18T19:45:39.935131Z",
"iopub.status.busy": "2023-08-18T19:45:39.934847Z",
"iopub.status.idle": "2023-08-18T19:45:39.940103Z",
"shell.execute_reply": "2023-08-18T19:45:39.938952Z"
},
"origin_pos": 12,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"trial_backend = PythonBackend(\n",
" tune_function=hpo_objective_lenet_synetune,\n",
" config_space=config_space,\n",
")"
]
},
{
"cell_type": "markdown",
"id": "d335a513",
"metadata": {
"origin_pos": 13
},
"source": [
"We can now create the scheduler for asynchronous random search, which is similar\n",
"in behaviour to our `BasicScheduler` from :numref:`sec_api_hpo`.\n"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "3c626a35",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "41"
},
"execution": {
"iopub.execute_input": "2023-08-18T19:45:39.944525Z",
"iopub.status.busy": "2023-08-18T19:45:39.944248Z",
"iopub.status.idle": "2023-08-18T19:45:39.952915Z",
"shell.execute_reply": "2023-08-18T19:45:39.951830Z"
},
"origin_pos": 14,
"tab": [
"pytorch"
]
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:syne_tune.optimizer.schedulers.fifo:max_resource_level = 10, as inferred from config_space\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:syne_tune.optimizer.schedulers.fifo:Master random_seed = 2737092907\n"
]
}
],
"source": [
"scheduler = RandomSearch(\n",
" config_space,\n",
" metric=metric,\n",
" mode=mode,\n",
" points_to_evaluate=[initial_config],\n",
")"
]
},
{
"cell_type": "markdown",
"id": "0eea9b5f",
"metadata": {
"origin_pos": 15
},
"source": [
"Syne Tune also features a `Tuner`, where the main experiment loop and\n",
"bookkeeping is centralized, and interactions between scheduler and back-end are\n",
"mediated.\n"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "f7261c66",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "42"
},
"execution": {
"iopub.execute_input": "2023-08-18T19:45:39.957089Z",
"iopub.status.busy": "2023-08-18T19:45:39.956804Z",
"iopub.status.idle": "2023-08-18T19:45:39.961864Z",
"shell.execute_reply": "2023-08-18T19:45:39.961044Z"
},
"origin_pos": 16,
"tab": [
"pytorch"
]
},
"outputs": [],
"source": [
"stop_criterion = StoppingCriterion(max_wallclock_time=max_wallclock_time)\n",
"\n",
"tuner = Tuner(\n",
" trial_backend=trial_backend,\n",
" scheduler=scheduler,\n",
" stop_criterion=stop_criterion,\n",
" n_workers=n_workers,\n",
" print_update_interval=int(max_wallclock_time * 0.6),\n",
")"
]
},
{
"cell_type": "markdown",
"id": "3878fcff",
"metadata": {
"origin_pos": 17
},
"source": [
"Let us run our distributed HPO experiment. According to our stopping criterion,\n",
"it will run for about 12 minutes.\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "57befb31",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "43"
},
"execution": {
"iopub.execute_input": "2023-08-18T19:45:39.966538Z",
"iopub.status.busy": "2023-08-18T19:45:39.966233Z",
"iopub.status.idle": "2023-08-18T19:57:42.878521Z",
"shell.execute_reply": "2023-08-18T19:57:42.877375Z"
},
"origin_pos": 18,
"tab": [
"pytorch"
]
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:syne_tune.tuner:results of trials will be saved on /home/ci/syne-tune/python-entrypoint-2023-08-18-19-45-39-958\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:root:Detected 4 GPUs\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.1 --batch_size 128 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-19-45-39-958/tune_function --tune_function_hash 4d7d5b85e4537ad0c5d0a202623dcec5 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-19-45-39-958/0/checkpoints\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:syne_tune.tuner:(trial 0) - scheduled config {'learning_rate': 0.1, 'batch_size': 128, 'max_epochs': 10}\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.1702844732454753 --batch_size 114 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-19-45-39-958/tune_function --tune_function_hash 4d7d5b85e4537ad0c5d0a202623dcec5 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-19-45-39-958/1/checkpoints\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:syne_tune.tuner:(trial 1) - scheduled config {'learning_rate': 0.1702844732454753, 'batch_size': 114, 'max_epochs': 10}\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:syne_tune.tuner:Trial trial_id 0 completed.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:syne_tune.tuner:Trial trial_id 1 completed.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.34019846567238493 --batch_size 221 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-19-45-39-958/tune_function --tune_function_hash 4d7d5b85e4537ad0c5d0a202623dcec5 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-19-45-39-958/2/checkpoints\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:syne_tune.tuner:(trial 2) - scheduled config {'learning_rate': 0.34019846567238493, 'batch_size': 221, 'max_epochs': 10}\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.014628124155727769 --batch_size 88 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-19-45-39-958/tune_function --tune_function_hash 4d7d5b85e4537ad0c5d0a202623dcec5 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-19-45-39-958/3/checkpoints\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:syne_tune.tuner:(trial 3) - scheduled config {'learning_rate': 0.014628124155727769, 'batch_size': 88, 'max_epochs': 10}\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:syne_tune.tuner:Trial trial_id 2 completed.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.1114831485450576 --batch_size 142 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-19-45-39-958/tune_function --tune_function_hash 4d7d5b85e4537ad0c5d0a202623dcec5 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-19-45-39-958/4/checkpoints\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:syne_tune.tuner:(trial 4) - scheduled config {'learning_rate': 0.1114831485450576, 'batch_size': 142, 'max_epochs': 10}\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:syne_tune.tuner:Trial trial_id 3 completed.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.014076038679980779 --batch_size 223 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-19-45-39-958/tune_function --tune_function_hash 4d7d5b85e4537ad0c5d0a202623dcec5 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-19-45-39-958/5/checkpoints\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:syne_tune.tuner:(trial 5) - scheduled config {'learning_rate': 0.014076038679980779, 'batch_size': 223, 'max_epochs': 10}\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:syne_tune.tuner:Trial trial_id 4 completed.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.02558173674804846 --batch_size 62 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-19-45-39-958/tune_function --tune_function_hash 4d7d5b85e4537ad0c5d0a202623dcec5 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-19-45-39-958/6/checkpoints\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:syne_tune.tuner:(trial 6) - scheduled config {'learning_rate': 0.02558173674804846, 'batch_size': 62, 'max_epochs': 10}\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:syne_tune.tuner:Trial trial_id 5 completed.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.026035979388614055 --batch_size 139 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-19-45-39-958/tune_function --tune_function_hash 4d7d5b85e4537ad0c5d0a202623dcec5 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-19-45-39-958/7/checkpoints\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:syne_tune.tuner:(trial 7) - scheduled config {'learning_rate': 0.026035979388614055, 'batch_size': 139, 'max_epochs': 10}\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:syne_tune.tuner:Trial trial_id 6 completed.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.24202494130424274 --batch_size 231 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-19-45-39-958/tune_function --tune_function_hash 4d7d5b85e4537ad0c5d0a202623dcec5 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-19-45-39-958/8/checkpoints\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:syne_tune.tuner:(trial 8) - scheduled config {'learning_rate': 0.24202494130424274, 'batch_size': 231, 'max_epochs': 10}\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:syne_tune.tuner:Trial trial_id 7 completed.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.10483132064775551 --batch_size 145 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-19-45-39-958/tune_function --tune_function_hash 4d7d5b85e4537ad0c5d0a202623dcec5 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-19-45-39-958/9/checkpoints\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:syne_tune.tuner:(trial 9) - scheduled config {'learning_rate': 0.10483132064775551, 'batch_size': 145, 'max_epochs': 10}\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:syne_tune.tuner:Trial trial_id 8 completed.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.017898854850751864 --batch_size 51 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-19-45-39-958/tune_function --tune_function_hash 4d7d5b85e4537ad0c5d0a202623dcec5 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-19-45-39-958/10/checkpoints\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:syne_tune.tuner:(trial 10) - scheduled config {'learning_rate': 0.017898854850751864, 'batch_size': 51, 'max_epochs': 10}\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:syne_tune.tuner:Trial trial_id 9 completed.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.9645419978270817 --batch_size 200 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-19-45-39-958/tune_function --tune_function_hash 4d7d5b85e4537ad0c5d0a202623dcec5 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-19-45-39-958/11/checkpoints\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:syne_tune.tuner:(trial 11) - scheduled config {'learning_rate': 0.9645419978270817, 'batch_size': 200, 'max_epochs': 10}\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:syne_tune.tuner:Trial trial_id 11 completed.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.10559888854748693 --batch_size 40 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-19-45-39-958/tune_function --tune_function_hash 4d7d5b85e4537ad0c5d0a202623dcec5 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-19-45-39-958/12/checkpoints\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:syne_tune.tuner:(trial 12) - scheduled config {'learning_rate': 0.10559888854748693, 'batch_size': 40, 'max_epochs': 10}\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:syne_tune.tuner:tuning status (last metric is reported)\n",
" trial_id status iter learning_rate batch_size max_epochs epoch validation_error worker-time\n",
" 0 Completed 10 0.100000 128 10 10.0 0.277195 64.928907\n",
" 1 Completed 10 0.170284 114 10 10.0 0.286225 65.434195\n",
" 2 Completed 10 0.340198 221 10 10.0 0.218990 59.729758\n",
" 3 Completed 10 0.014628 88 10 10.0 0.899920 81.001636\n",
" 4 Completed 10 0.111483 142 10 10.0 0.268684 64.427400\n",
" 5 Completed 10 0.014076 223 10 10.0 0.899922 61.264475\n",
" 6 Completed 10 0.025582 62 10 10.0 0.399520 75.966186\n",
" 7 Completed 10 0.026036 139 10 10.0 0.899988 62.261541\n",
" 8 Completed 10 0.242025 231 10 10.0 0.257636 58.186485\n",
" 9 Completed 10 0.104831 145 10 10.0 0.273898 59.771699\n",
" 10 InProgress 8 0.017899 51 10 8.0 0.496118 66.999746\n",
" 11 Completed 10 0.964542 200 10 10.0 0.181600 59.159662\n",
" 12 InProgress 0 0.105599 40 10 - - -\n",
"2 trials running, 11 finished (11 until the end), 436.60s wallclock-time\n",
"\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:syne_tune.tuner:Trial trial_id 10 completed.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.5846051207380589 --batch_size 35 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-19-45-39-958/tune_function --tune_function_hash 4d7d5b85e4537ad0c5d0a202623dcec5 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-19-45-39-958/13/checkpoints\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:syne_tune.tuner:(trial 13) - scheduled config {'learning_rate': 0.5846051207380589, 'batch_size': 35, 'max_epochs': 10}\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:syne_tune.tuner:Trial trial_id 12 completed.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.2468891379769198 --batch_size 146 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-19-45-39-958/tune_function --tune_function_hash 4d7d5b85e4537ad0c5d0a202623dcec5 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-19-45-39-958/14/checkpoints\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:syne_tune.tuner:(trial 14) - scheduled config {'learning_rate': 0.2468891379769198, 'batch_size': 146, 'max_epochs': 10}\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:syne_tune.tuner:Trial trial_id 13 completed.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.12956867470224812 --batch_size 218 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-19-45-39-958/tune_function --tune_function_hash 4d7d5b85e4537ad0c5d0a202623dcec5 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-19-45-39-958/15/checkpoints\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:syne_tune.tuner:(trial 15) - scheduled config {'learning_rate': 0.12956867470224812, 'batch_size': 218, 'max_epochs': 10}\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:syne_tune.tuner:Trial trial_id 14 completed.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.24900745354561854 --batch_size 103 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-19-45-39-958/tune_function --tune_function_hash 4d7d5b85e4537ad0c5d0a202623dcec5 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-19-45-39-958/16/checkpoints\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:syne_tune.tuner:(trial 16) - scheduled config {'learning_rate': 0.24900745354561854, 'batch_size': 103, 'max_epochs': 10}\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:syne_tune.tuner:Trial trial_id 15 completed.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.03903577426988046 --batch_size 80 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-19-45-39-958/tune_function --tune_function_hash 4d7d5b85e4537ad0c5d0a202623dcec5 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-19-45-39-958/17/checkpoints\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:syne_tune.tuner:(trial 17) - scheduled config {'learning_rate': 0.03903577426988046, 'batch_size': 80, 'max_epochs': 10}\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:syne_tune.tuner:Trial trial_id 16 completed.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.01846559300690354 --batch_size 183 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-19-45-39-958/tune_function --tune_function_hash 4d7d5b85e4537ad0c5d0a202623dcec5 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-19-45-39-958/18/checkpoints\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:syne_tune.tuner:(trial 18) - scheduled config {'learning_rate': 0.01846559300690354, 'batch_size': 183, 'max_epochs': 10}\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:syne_tune.stopping_criterion:reaching max wallclock time (720), stopping there.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:syne_tune.tuner:Stopping trials that may still be running.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:syne_tune.tuner:Tuning finished, results of trials can be found on /home/ci/syne-tune/python-entrypoint-2023-08-18-19-45-39-958\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"--------------------\n",
"Resource summary (last result is reported):\n",
" trial_id status iter learning_rate batch_size max_epochs epoch validation_error worker-time\n",
" 0 Completed 10 0.100000 128 10 10 0.277195 64.928907\n",
" 1 Completed 10 0.170284 114 10 10 0.286225 65.434195\n",
" 2 Completed 10 0.340198 221 10 10 0.218990 59.729758\n",
" 3 Completed 10 0.014628 88 10 10 0.899920 81.001636\n",
" 4 Completed 10 0.111483 142 10 10 0.268684 64.427400\n",
" 5 Completed 10 0.014076 223 10 10 0.899922 61.264475\n",
" 6 Completed 10 0.025582 62 10 10 0.399520 75.966186\n",
" 7 Completed 10 0.026036 139 10 10 0.899988 62.261541\n",
" 8 Completed 10 0.242025 231 10 10 0.257636 58.186485\n",
" 9 Completed 10 0.104831 145 10 10 0.273898 59.771699\n",
" 10 Completed 10 0.017899 51 10 10 0.405545 83.778503\n",
" 11 Completed 10 0.964542 200 10 10 0.181600 59.159662\n",
" 12 Completed 10 0.105599 40 10 10 0.182500 94.734384\n",
" 13 Completed 10 0.584605 35 10 10 0.153846 110.965637\n",
" 14 Completed 10 0.246889 146 10 10 0.215050 65.142847\n",
" 15 Completed 10 0.129569 218 10 10 0.313873 61.310455\n",
" 16 Completed 10 0.249007 103 10 10 0.196101 72.519127\n",
" 17 InProgress 9 0.039036 80 10 9 0.369000 73.403000\n",
" 18 InProgress 5 0.018466 183 10 5 0.900263 34.714568\n",
"2 trials running, 17 finished (17 until the end), 722.84s wallclock-time\n",
"\n",
"validation_error: best 0.14451533555984497 for trial-id 13\n",
"--------------------\n"
]
}
],
"source": [
"tuner.run()"
]
},
{
"cell_type": "markdown",
"id": "2dd14e00",
"metadata": {
"origin_pos": 19
},
"source": [
"The logs of all evaluated hyperparameter configurations are stored for further\n",
"analysis. At any time during the tuning job, we can easily get the results\n",
"obtained so far and plot the incumbent trajectory.\n"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "c6323d54",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "46"
},
"execution": {
"iopub.execute_input": "2023-08-18T19:57:42.883230Z",
"iopub.status.busy": "2023-08-18T19:57:42.882930Z",
"iopub.status.idle": "2023-08-18T19:57:43.133156Z",
"shell.execute_reply": "2023-08-18T19:57:43.132313Z"
},
"origin_pos": 20,
"tab": [
"pytorch"
]
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"WARNING:matplotlib.legend:No artists with labels found to put in legend. Note that artists whose label start with an underscore are ignored when legend() is called with no argument.\n"
]
},
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"d2l.set_figsize()\n",
"tuning_experiment = load_experiment(tuner.name)\n",
"tuning_experiment.plot()"
]
},
{
"cell_type": "markdown",
"id": "8cf3f5f6",
"metadata": {
"origin_pos": 21
},
"source": [
"## Visualize the Asynchronous Optimization Process\n",
"\n",
"Below we visualize how the learning curves of every trial (each color in the plot represents a trial) evolve during the\n",
"asynchronous optimization process. At any point in time, there are as many trials\n",
"running concurrently as we have workers. Once a trial finishes, we immediately\n",
"start the next trial, without waiting for the other trials to finish. Idle time\n",
"of workers is reduced to a minimum with asynchronous scheduling.\n"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "8c1d8876",
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "45"
},
"execution": {
"iopub.execute_input": "2023-08-18T19:57:43.138953Z",
"iopub.status.busy": "2023-08-18T19:57:43.138373Z",
"iopub.status.idle": "2023-08-18T19:57:43.445360Z",
"shell.execute_reply": "2023-08-18T19:57:43.444209Z"
},
"origin_pos": 22,
"tab": [
"pytorch"
]
},
"outputs": [
{
"data": {
"text/plain": [
"Text(0, 0.5, 'objective function')"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"d2l.set_figsize([6, 2.5])\n",
"results = tuning_experiment.results\n",
"\n",
"for trial_id in results.trial_id.unique():\n",
" df = results[results[\"trial_id\"] == trial_id]\n",
" d2l.plt.plot(\n",
" df[\"st_tuner_time\"],\n",
" df[\"validation_error\"],\n",
" marker=\"o\"\n",
" )\n",
"\n",
"d2l.plt.xlabel(\"wall-clock time\")\n",
"d2l.plt.ylabel(\"objective function\")"
]
},
{
"cell_type": "markdown",
"id": "e8e40d99",
"metadata": {
"origin_pos": 23
},
"source": [
"## Summary\n",
"\n",
"We can reduce the waiting time for random search substantially by distribution\n",
"trials across parallel resources. In general, we distinguish between synchronous\n",
"scheduling and asynchronous scheduling. Synchronous scheduling means that we\n",
"sample a new batch of hyperparameter configurations once the previous batch\n",
"finished. If we have a stragglers - trials that takes more time to finish than\n",
"other trials - our workers need to wait at synchronization points. Asynchronous\n",
"scheduling evaluates a new hyperparameter configurations as soon as resources\n",
"become available, and, hence, ensures that all workers are busy at any point in\n",
"time. While random search is easy to distribute asynchronously and does not\n",
"require any change of the actual algorithm, other methods require some additional\n",
"modifications.\n",
"\n",
"## Exercises\n",
"\n",
"1. Consider the `DropoutMLP` model implemented in :numref:`sec_dropout`, and used in Exercise 1 of :numref:`sec_api_hpo`.\n",
" 1. Implement an objective function `hpo_objective_dropoutmlp_synetune` to be used with Syne Tune. Make sure that your function reports the validation error after every epoch.\n",
" 2. Using the setup of Exercise 1 in :numref:`sec_api_hpo`, compare random search to Bayesian optimization. If you use SageMaker, feel free to use Syne Tune's benchmarking facilities in order to run experiments in parallel. Hint: Bayesian optimization is provided as `syne_tune.optimizer.baselines.BayesianOptimization`.\n",
" 3. For this exercise, you need to run on an instance with at least 4 CPU cores. For one of the methods used above (random search, Bayesian optimization), run experiments with `n_workers=1`, `n_workers=2`, `n_workers=4`, and compare results (incumbent trajectories). At least for random search, you should observe linear scaling with respect to the number of workers. Hint: For robust results, you may have to average over several repetitions each.\n",
"2. *Advanced*. The goal of this exercise is to implement a new scheduler in Syne Tune.\n",
" 1. Create a virtual environment containing both the [d2lbook](https://github.com/d2l-ai/d2l-en/blob/master/INFO.md#installation-for-developers) and [syne-tune](https://syne-tune.readthedocs.io/en/latest/getting_started.html) sources.\n",
" 2. Implement the `LocalSearcher` from Exercise 2 in :numref:`sec_api_hpo` as a new searcher in Syne Tune. Hint: Read [this tutorial](https://syne-tune.readthedocs.io/en/latest/tutorials/developer/README.html). Alternatively, you may follow this [example](https://syne-tune.readthedocs.io/en/latest/examples.html#launch-hpo-experiment-with-home-made-scheduler).\n",
" 3. Compare your new `LocalSearcher` with `RandomSearch` on the `DropoutMLP` benchmark.\n"
]
},
{
"cell_type": "markdown",
"id": "a15754f4",
"metadata": {
"origin_pos": 24,
"tab": [
"pytorch"
]
},
"source": [
"[Discussions](https://discuss.d2l.ai/t/12093)\n"
]
}
],
"metadata": {
"language_info": {
"name": "python"
},
"required_libs": [
"\"syne-tune[gpsearchers]==0.3.2\""
]
},
"nbformat": 4,
"nbformat_minor": 5
}