Auptimizer -- an Extensible, Open-Source Framework for Hyperparameter Tuning

  • 2019-11-06 18:00:31
  • Jiayi Liu, Samarth Tripathi, Unmesh Kurup, Mohak Shah
  • 3


Tuning machine learning models at scale, especially finding the righthyperparameter values, can be difficult and time-consuming. In addition to thecomputational effort required, this process also requires some ancillaryefforts including engineering tasks (e.g., job scheduling) as well as moremundane tasks (e.g., keeping track of the various parameters and associatedresults). We present Auptimizer, a general Hyperparameter Optimization (HPO)framework to help data scientists speed up model tuning and bookkeeping. WithAuptimizer, users can use all available computing resources in distributedsettings for model training. The user-friendly system design simplifiescreating, controlling, and tracking of a typical machine learning project. Thedesign also allows researchers to integrate new HPO algorithms. To demonstrateits flexibility, we show how Auptimizer integrates a few major HPO techniques(from random search to neural architecture search). The code is available at


Quick Read (beta)

Auptimizer - an Extensible, Open-Source Framework for Hyperparameter Tuning

Jiayi Liu Advanced AI
LG Electronics
Santa Clara, CA, USA
[email protected]
   Samarth Tripathi Advanced AI
LG Electronics
Santa Clara, CA, USA
[email protected]
   Unmesh Kurup Advanced AI
LG Electronics
Santa Clara, CA, USA
[email protected]
   Mohak Shah Advanced AI
LG Electronics
Santa Clara, CA, USA
[email protected]

Tuning machine learning models at scale, especially finding the right hyperparameter values, can be difficult and time-consuming. In addition to the computational effort required, this process also requires some ancillary efforts including engineering tasks (e.g., job scheduling) as well as more mundane tasks (e.g., keeping track of the various parameters and associated results). We present Auptimizer, a general Hyperparameter Optimization (HPO) framework to help data scientists speed up model tuning and bookkeeping. With Auptimizer, users can use all available computing resources in distributed settings for model training. The user-friendly system design simplifies creating, controlling, and tracking of a typical machine learning project. The design also allows researchers to integrate new HPO algorithms. To demonstrate its flexibility, we show how Auptimizer integrates a few major HPO techniques (from random search to neural architecture search). The code is available at

Machine Learning, Data Mining, Hyperparameter Optimization, Software
publicationid: pubid: 978-1-7281-0858-2/19/$31.00 ©2019 IEEE \[email protected]

numbersnone\[email protected]#1none:

Data Science
Machine Learning
Deep Learning
High Performance Computing
Deep Neural Network
Hyperparameter Optimization
Resource Manager
Application Programming Interface
Neural Architecture Search

I Introduction

Designing a Machine Learning (ML) framework for production faces challenges similar to those faced with Big Data. There is a large volume of models with a variety of configurations and training them efficiently at scale with reproducibility is critical to realizing their business value. In this paper, we address one design aspect of the ML framework, namely the HPO process, via a framework called Auptimizer.

I-A Hyperparameter Optimization

ML models are typically sensitive to the values of hyperparameters [31]. Different from model parameters, these hyperparameters are values that control the model configuration or the training setup and thus need to be set before training the model. Due to the lack of gradient information for these hyperparameters, tuning them is often treated as a black-box optimization [11]. As an alternative to manual selection (which is usually based on modeler’s expertise), researchers have proposed different methods to accelerate the tuning process including Bayesian approaches [26], evolutionary algorithms [10], multi-armed bandits [8], and architecture search by learning [33].

Tuning hyperparameters is often time-consuming especially when model training is computationally intensive [1]. Therefore, in practice, an automated HPO solution is critically important for machine learning. Both open-source solutions and commercial offerings are available. However, as a rapidly developing field, there are challenges when applying them under industry settings.

Specifically, no HPO approach is objectively the best for all problems. Most state-of-the-art open-source solutions are backed by certain heuristics driven by research, e.g. Spearmint [26], HyperOpt [2, 3], and HyperBand [20]. Sometimes users need to examine various options before settling on the most suitable approach. But, transferring code and results from one solution to another is difficult, given that no common Application Programming Interface (API) is shared among them. This problem has become more serious recently as the number of hyperparameters has increased dramatically with Deep Neural Network (DNN) algorithms, which are prone to human editing errors.

With the increased availability of large computing infrastructure, parallel and distributed training continue to become affordable and commonplace. HPO implementations typically do not allow users to fully benefit from such high-performance computing environments. Furthermore, the complexity of the hyperparameter space and the long training time make model tracking laborious and error-prone.

There have been efforts to automate this hyperparameter tuning process. Google Vizier [11] discussed the design and algorithms used for the Google Cloud Machine Learning HyperTune subsystem. Google AutoML, Amazon SageMaker, and SigOpt productize HPO algorithms as commercial offerings. But no customization is allowed on the algorithm or the infrastructure. Open-source projects like Optunity[5], Tune[21] integrate different algorithms and have built user-friendly APIs for users. But, in all cases, adopting new algorithms or accommodating new computing resources is still challenging. Also, hyperparameter tuning is often highly coupled with the respective framework and it is difficult to use for other scenarios like fine-tuning an already trained model.

I-B Beyond HPO

Beyond the scope of HPO, autoML tries to solve the ML problem with minimal human intervention. Frameworks such as ATM [30], auto-sklearn [9], auto-WEKA [16], or TuPAQ [27] based on the underlying ML packages (sklearn, WEKA, MLbase [22, 12, 17]) provide additional features such as model selection. However, at the cost of simplifying the model engineering, practitioners also lose the ability to fully customize their model or to use their existing model architectures. Therefore, we focus on the scalability and automation of the HPO process and leave the model selection and feature engineering decisions for future work.

Our contributions are two-fold. We explicitly review the challenges of using HPO in practice, and introduce an open-source automated optimizer framework, Auptimizer, to address these challenges by:

  • reducing the efforts to use and switch HPO algorithms;

  • providing scalability for cloud / on-premise resources;

  • simplifying the process to integrate new HPO algorithms and new resource schedulers;

  • tracking results for reproducibility.

The paper is organized as follows. We summarize the state-of-the-art HPO practices and outline the challenges in Section II. Next, we present the system design of Auptimizer in Section III and demonstrate its usability in Section IV. We further include the new development in the Neural Architecture Search (NAS) research in Section V. We conclude our work with a discussion about the future roadmap of Auptimizer.

II Background

II-A Hyperparameter Optimization Research

HPO has become more relevant alongside the proliferation of ML and data science applications. GridSearch and manual search were favored in early studies due to their simplicity and interpretability [18]. However, these approaches were quickly outpaced by others due to the curse of dimensionality [1].

Different hyperparameters are not independent. Instead, their values are intertwined. Advanced algorithms take advantage of this internal constraints to balance exploration and exploitation of the parameter spaces to offer a better solution. Modeled by a Gaussian process, Bayesian Optimization tries to maximize the expected model improvement and works well for low-dimensional, numerical problems (e.g., Spearmint [26]). Sequential Model-based Global Optimization with tree-based method [2] has been shown to have better performance in high-dimensional, structured model space [7]. Recently, optimizing DNN models using reinforcement learning has become a mainstream topic under the rubric of NAS [33].

Besides learning the structure of hyperparameters, optimizing the training budget using multi-armed bandit strategy also shows promising results for DNN models (e.g., HyperBand [20]). Further combining with Bayesian optimization, BOHB [8] improves the tuning process leveraging on the benefits of both approaches. Despite its simplicity, RandomSearch [1] is still efficient and is commonly used as a benchmark against other more advanced algorithms [6, 20]. Given the variety of ML problems, there is no conclusive preference of the best HPO to use.

Regardless of their variety, all HPO algorithms share the same workflow, which breaks down into four steps:

  1. 1.

    initialize search space and configuration;

  2. 2.

    propose values for hyperparameters;

  3. 3.

    train model and update result;

  4. 4.

    repeat step 2 and 3.

Most of above research focus on the Step 2, and the engineering-oriented projects discussed next are focused more on streamlining the Step 4 while supporting a limited number of algorithms. Our proposed framework, Auptimizer, helps automate the entire process.

II-B Hyperparameter Optimization Practice

Most of the above-mentioned algorithms and many others have released their source codes to the research community, e.g., Spearmint11 1, HyperOpt22 2, and HyperBand33 3 These solutions are originally designed for research. Therefore they are hard to extend or integrate with other algorithms. Moreover, neither common code structure nor common APIs exist for interoperability. Thus, it is challenging for users to adopt them without changing their existing code and to switch amongst these alternatives without significant changes to their code base.

The efforts to consolidate the interfaces at a system level are also available. Google’s Vizier44 4 Unofficial source at [11] is a design used within Google for their Cloud Machine Learning HyperTune subsystem. And Google AutoML, Amazon SageMaker, and SigOpt productize HPO algorithms as commercial offerings. But these approaches typically fail to provide the extensibility for users to integrate their specific HPO algorithms or the scalability to utilize a large pool of computing resources on-premise.

Open-sourcing helps the extensibility, however, the existing packages often fall short on challenges to practical use. Projects like Optunity55 5 or Chocolate66 6 integrated a few different HPO algorithms for users, and new ones can be easily integrated under their consistent APIs. However, the process is sequential, therefore they do not support training models parallelly at scale. On the other side, Dask-ML [24] provides an easy-to-switch backend for computing resources, but lacks supports for customizing and extending HPO algorithms.

Tune [21], centering on scalable hyperparameter search, is undergoing rapid development. It supports scalability on different architectures, and also supports two categories of HPO strategies: trial schedulers (e.g. HyperBand) and search algorithms (e.g. HyperOpt). However, it lacks usability in practice. First, the users’ training script needs modification to align with Tune’s API. This approach occasionally results in excessive re-engineering on source code and it hinders users from debugging their training code. Second, different search algorithms require different configurations, which makes it harder for users to switch among different HPO strategies. Third, Tune relies on the autoscaling function provided by the Ray project for computing resource allocation, which currently cannot support a team environment where more advanced job scheduler is already in place.

There is no universal HPO algorithm having the best performance over all problems. Thus, trying different ones is necessary to reveal the best results and business value. However, a high adoption cost commonly prevents user from trying different algorithms. We summarize the common factors that limit the current HPO toolboxes as flexibility, usability, scalability, and extensibility:

  • Flexibility. It is challenging to switch between HPO algorithms, as the interfaces are dramatically different.

  • Usability. It is time-consuming to integrate an existing ML project into an HPO package. Often, users need to rewrite their code for a specific HPO toolbox, and resulting script cannot be used anywhere else.

  • Scalability. The integration with large-scale computational resources is missing and it is typically hard to scale the toolbox to a multi-node environment.

  • Extensibility. It is challenging to introduce a new algorithm into the existing libraries as these libraries are tightly coupled with the implemented algorithms.

We summarize the comparison of representative HPO solutions based on the above criteria in Table I. Based on our experience in developing an in-house solution, we release an HPO framework, Auptimizer, to mitigate the above-mentioned challenges.

TABLE I: Comparison of HPO toolboxes.
Criteria HyperOpt SageMaker Optunity Dask-ML Tune Auptimizer
Open source Yes No Yes Yes Yes Yes
Flexibility (No. of HPO algorithms) 2 Bayesian 7 2 4, 8 9
Usability (Format of training code) Function Rewrite Function Rewrite Function Script
Scalability Manual Cloud No Yes Yes Yes
Extensibility (Manual to add new HPO algorithms) N.A. N.A. Yes Hard Yes Yes

II-C Definitions

In this paper, we use the following terminology to describe the system design and Auptimizer use cases.

For data science applications, data scientists (users) solve given data mining problems with specified ML models. A script (code) is written and some hyperparameters are commonly identified to be explored during the model training. Typically, the user carries out an experiment to examine a range of hyperparameter combinations and measures the performance (e.g., accuracy) of the model on a hold-out dataset, for example, the number of neighbors in a K-Nearest-Neighbor model, or the learning rate in a deep learning model. Each individual training process for a given hyperparameter set is called a job. After all jobs are finished, the user retrieves the best model from the training history for further analysis or application.

For ML researchers in the HPO field, the use case is different. Researchers focus on developing the algorithm to find the best hyperparameters. Thus, an easy framework to facilitate their algorithm implementation and to benchmark their results against the state-of-the-art algorithms is important.

III Design

Auptimizer is designed primarily as a tool for user. It removes the burden of drastically changing users’ existing code, which is a key hurdle in the HPO adoption process. It only requires the user to add a few lines in the code, and guides users to setup all other experiment-related configurations. Therefore, the user can easily switch among different HPO algorithms and computing resources without rewriting their training script.

Auptimizer is also designed to support researchers and developers to easily extend the framework to other HPO algorithms and computing resources. We highlight the abstraction of the Auptimizer design in Figure 1. Both resources and proposers communicate with Auptimizer via the designated interfaces. We implemented a few open-source HPO solutions to demonstrate the consistency of the API definitions. Meanwhile, the user’s training script is executed as a job, in which the scores are automatically updated for proposer without user’s intervention.

Fig. 1: System Design

The Auptimizer framework abstracts the HPO workflow of an experiment as shown in Section III. Once an experiment is defined and initialized, Auptimizer continuously checks for available resources (get_available()) and new hyperparameter proposals (get_param()) and then runs new jobs to search for the best model. Once a job is finished, Auptimizer automatically starts update(), a function that records the results asynchronously using a callback mechanism.


Auptimizer Internal Workflow \[email protected]@algorithmic \REQUIREexperiment.json; env.ini; code_path \STATEaup.Experiment(experiment.json, env.ini, code_path) \WHILEnot proposer.finished() \STATEresource resource_manager.get_available() \IFnot resource \STATEsleep \COMMENTwait for available resource \ENDIF\STATEhyperparameters proposer.get_param() \STATEJob, resource) \IFJob.callback() \STATEproposer.update() \ENDIF\ENDWHILE\STATEaup.finish() \COMMENTwait for unfinished jobs

In the following sections, we discuss the two key components - Resource Manager and Proposer- along with auxiliary components - Tracking and Visualization - in detail. Researchers and developers will find that these abstractions can help them to easily extend Auptimizer with new HPO algorithms and adapt it to their own computing environments.

III-A Proposer

Proposer controls how Auptimizer interacts with HPO algorithms for recommending new hyperparameter values. The Proposer interface reduces the effort to implement an HPO algorithm by defining two functions: get_param() to return the new hyperparameter values, and update() to update the history. In the open source release, we integrate a few well-known solutions, such as Spearmint [26], HyperOpt [2, 3], HyperBand [20], BOHB [8] along with simple random search and grid search. Moreover, we also demonstrate its usability to a state-of-the-art NAS approaches such as EAS [4] and AutoKeras [13].

Despite the inherently different nature of these algorithms, Auptimizer interacts with them only through the two interfaces described above and keeps other irrelevant components away from users and researchers. When implementing other open-source solutions, we found that at most one source file needs to be changed or added, and the remaining source code can be reused for the integration. As an example, to integrate BOHB, we wrote only 138 lines of code and reused the existing 4305 lines of codes77 7 with commit 841db4b., which demonstrates the power of Auptimizer’s extensibility.

III-A1 get_param()

function is a wrapper for the underlying HPO implementations. It queries new values of hyperparameters and package them into a BasicConfig object to be used for code execution.

The newly created BasicConfig contains all hyperparameter values in a dictionary for a job to run with. Additional information can be added for HPO algorithms to use without interfering with job execution. For instance, the value of the job ID is used in the HyperBand implementation to track previous results and to resume training when necessary. This BasicConfig is then passed to the resource manager for job execution (see discussion in Section III-B). Users only need to change their code to read the BasicConfig as an input file. To further reduce the burden on the user end, we provide load() and save() methods in BasicConfig to simplify the adoption of Auptimizer (see example in Code 3).

An example of the BasicConfig file generated by Auptimizer at runtime is illustrated in Code 1. It contains two variables (x,y) along with additional variables when necessary (e.g., job_id). This generated JSON file will be passed to the code automatically by Resource Manager during model training.

Code 1: Job Configuration File
  {"x": -5.0, "y": 5.0, "job_id": 0}

All configurations used for model training are saved, and user can easily reuse them together with their code without any modification. This enables users to verify or finetune their model after HPO.

III-A2 update()

function collects results back from jobs, updates the tuning history, and also registers the results for record tracking (see Section III-C).

For simple algorithms (e.g., RandomSearch), no history is needed. However, advanced algorithms (e.g., Bayesian Optimization) need to match the resulting scores with the specific input hyperparameters. Auptimizer takes care of this matching by automatically mapping the result back to its BasicConfig and thus, HPO algorithms can directly restore the hyperparameter values used in a specific job. Auxiliary values (e.g. job_id), are tracked and can be customized for other usage, such as to save and retrieve models for further finetuning88 8 Users need to write their own function to restore model based on the input ID..

Fig. 2: Database Schema

III-B Resource Manager

Resource Manager (RM) is another cornerstone in the Auptimizer framework. It connects computing resources to model training automatically thus allowing codes to run on resources based on their availability. It also sets a callback mechanism to trigger the update() function when a job is finished.

A key challenge of usability in HPO implementations is the communication between jobs and the heterogeneous computer resources that jobs run on. The existing open-sourced projects, e.g., HyperOpt, Tune, require to call the code directly to get return values. And commercial services such as SageMaker requires its customer’s code to be encapsulated in a docker image. SigOpt provides API calls for communication but it leaves it to the users to do resource allocation and code execution. All these solutions are challenging for users to use the HPO at scale. In comparison, the Auptimizer framework puts the user-friendliness as its priority and removes this burden.

General resource management and job scheduling tools, e.g., Slurm [32] or TORQUE [29] are not designed for HPO applications. Using those tools, jobs are submitted in advance and wait for the available resources to be executed on. In the HPO setting, the configurations of hyperparameters are typically determined based on the history of model scores and it results in a difficulty to start jobs spontaneously. Without Auptimizer, users need to either allocate all resources at once or to write their corresponding interface to start new jobs on the fly. However, in our workflow, we rely on the flexibility of cloud services (i.e. AWS) to scale out. For Slurm and other tools, we are open to community support.

The RM interface makes it simple to extend Auptimizer to scalable computing environments. Developers only need to interact with get_available() and run(), which queries available resources and allocate correspondingly for job execution. As users, they only need to specify the resources to be used in the experiment configurations. Also integrating with advanced scheduling tools, Auptimizer can further help users to schedule jobs efficiently in a multi-tenant environment with better resource allocation.

III-B1 get_available()

function serves as the interface between Auptimizer and typical resource management and job scheduling tools. In the current implementation, it queries a persistent database for available resources that the user specified. If the requested resource is available, then it will be taken by Auptimizer for job execution. Otherwise, the system will wait until resources are free (see Section III).

The interface get_available() is also compatible with existing resource management tools. For instance, we used boto 399 9 to spawn new EC2 instances on the AWS.

III-B2 run()

The Auptimizer RM component relies on the callback design to solve the scheduling wrapped in the run() function. Specifically, run() interface executes the user-provided code in a Job object. The Job object first sets up the running environment based on the available resources. For instance, it assigns CUDA_VISIBLE_DEVICES for GPU allocation. Then it executes the user-provided code with the newly proposed hyperparameter values (see Section III-A1). Once a job is finished, it triggers a callback() function to update() the result in Auptimizer. It also allows additional information to be passed to Proposer as an arbitrary string when returned from users’ code.

In the current version of Auptimizer, we demonstrate its usability across different computing resources, such as CPUs, GPUs, multiple nodes, and AWS EC2 instances. We require that the users’ code executes successfully on the targeted resources to avoid potential environment issues. In this initial release, we use a SQLite database to keep track of available resources and all jobs are running locally. Both the hyperparameter configuration and the results are communicated by the standard IO protocol.

III-C Experiment Tracking and Visualization

Experiment tracking provides a foundation of reproducibility in a data science project. In Auptimizer, all the experiment history is tracked in the user-specified database. The data schema is illustrated in Figure 2.

Experiment table plays the central role to track the overall progress. It contains experiment ID, user ID, and start and end time of an experiment. Beside them, the exp_config specifies the scope of the experiment (see Section IV-B for detailed discussion) The tables of User and Resource are for user control and resource management. The Job table tracks the runtime status and result of each job. Since Auptimizer automatically checks in its training process in experiments, users are alleviated from the worry of losing reproducibility.

The Auptimizer framework also provides a basic tool to visualize the results from history (see Section IV-D). In addition, users are able to directly access the results stored in the database for further analysis.

IV Using Auptimizer

In this section, we demonstrate the key features of using Auptimizer in practice, by the simple and commonly used DNN model for the MNIST dataset1010 10 with commit 971c96b.[19]. The DNN model contains two convolution layers and two fully connected layers. Adam optimizer is used for training [15] with a global dropout ratio for regularization [28]. Also, for demonstration purpose, we only search for the best accuracy on the test dataset without distinguishing it from the validation dataset.

IV-A Auptimizer Workflow

In this section, we illustrate the basic workflow to adopt Auptimizer.

First, we need to set up the Auptimizer by filling in the basic information for the computing environment. Auptimizer has a user-friendly interactive guide that can be invoked by python -m aup.setup. It will setup the Auptimizer for the first time with information about the computing environment and the database.

Next, we need to identify the key hyperparameters. In this experiment, we will explore five hyperparameters, numbers of filters in the first two convolution layers (conv1, conv2), the dropout ratio, the number of neurons of the first fully-connected layer, and the learning rate. Also, we use n_iterations to adjust number of epochs in training, which is useful for HyperBand and BOHB.

After that, we need to write down the experiment configuration correspondingly, which we explain in Section IV-B. And the training script is modified accordingly in Section IV-C.

IV-B Experiment Configuration

Auptimizer also provides a command-line tool to initiate the file as python -m aup.init. Experiment Configuration controls the search space and the choice of HPO algorithm of an experiment. In Code 2, we illustrate the configuration for random search for the Rosenbrock function [25].

Code 2 shows that the configuration is simple and straightforward. The n_samples specifies how many jobs a user wants to run for the HPO process and n_parallel jobs can be executed at the same time on the CPU resource. The hyperparameter space is defined in parameter_config, each hyperparameter is a float ranging from -5 to 10. Those hyperparameters will be assigned by Auptimizer into a BasicConfig (e.g., Code 1) and will be accessed from the user’s code directly (see Code 3).

Occasionally additional information can still be required for different HPO algorithms, e.g., using "engine":"tpe" to instruct HyperOpt to use TPE as the backend engine for HPO. But overall, the change in experiment configuration is significantly reduced in contrast to using different open-source implementations of HPO algorithms. And most importantly, there is no need to change the user’s code for different algorithms.

Auptimizer can also guide through the process of selecting HPO algorithm and defining hyperparameter specifications interactively. One generated example configuration file for random search is shown in Code 2. Users enjoy flexibility and scalability by simply changing the proposer name or the parallel number.

Code 2: Experiment Configuration File
    "proposer": "random",
    "script": "",
    "resource": "gpu",
    "n_parallel": 2,
    "target": "min",
      {"name": "conv1", "range": [20, 50],  "type": "int"},
      {"name": "dropout", "range": [0.5, 0.9], "type": "float"},
    "n_samples": 100

IV-C Code Update

To run jobs automatically in Auptimizer, we need to modify the source code correspondingly. Comparing to other HPO tools, the modifications are significantly reduced, and the resulting code can still be run independently without Auptimizer. Generally, there are four items:

  • change the code to self-executable,

  • parse input hyperparameters,

  • use them for model training,

  • report back the result.

We highlight the changes in Code 3 and the steps are explained here:

  1. 1.

    Line 1: add the shebang line to make the code self-executable.

  2. 2.

    Line 2-3: import sys, aup to parse hyperparameters and return values to Auptimizer.

  3. 3.

    Line 4-5: modify original training function by replacing variables with hyperparameters in config.

  4. 4.

    Line 6-7: main function. It parses BasicConfig by the input file.

  5. 5.

    Line 8-9: original training script. It trains for config[’n_iterations’] epochs and computes the test accuracy.

  6. 6.

    Line 10: return score to Auptimizer.

Code 3: Example Code
1   #!/usr/bin/env python
2   import sys
3   from aup import BasicConfig, print_result
4   config = dict(conv1=32, conv2=64, fc1=1024, learning_rate=0.001, dropout=0.1, data_dir="input_data", n_iterations=10)
5   # Training code with variables replaced as hyperparameters
6   if __name__ == "__main__":
7     config = BasicConfig(**config).load(sys.argv[1]) if len(sys.argv) > 2
8     accuracy = ... # training code returns test accuracy
9     print_result(accuracy)

As demonstrated above, minimal changes are required to fully adopt the Auptimizer framework into practice. More importantly, the usability of the code remains, and users can reuse the exact same script for other purposes (training from scratch, finetuning) by providing the hyperparameters as input. Moreover, Auptimizer does not restrict users on any specific language or framework. For instance, a MatLab user can also use Auptimizer to tune their hyperparameters once they parse and return result in their code correspondingly.

IV-D Experiments

After defining the experiment and modifying code, users are ready to run the experiment by simply entering python -m aup experiment.json. More importantly, users can easily switch among HPO algorithms by updating the configuration, or retrieve and store results in the database.

We allocate roughly the same number of total training epochs for each HPO algorithm. For random, Spearmint, HyperOpt, each hyperparameter configuration is trained for 10 epochs with 100 different configurations. Whereas for grid search, we assign the grid with 3 values for all hyperparameters, except the learning rate which is chosen from 0.001,0.01, resulting in 162 configurations. For HyperBand and BOHB, we allocate a total budget of 1000 epochs approximately along with 100 configurations to be explored. We also enforce the minimum number of epochs to be 1 with no upper limit.

Fig. 3: Auptimizer scalability on AWS

In Figure 3, we examine the scalability of Auptimizer by comparing the overall experiment time with the total time used by all jobs divided by the number of computing resources. The experiment searched for 128 configurations with up to 64 AWS EC2 instances. Because training time varies due to the changing model complexity (i.e. number of filters and neurons), we fixed the random seed, such that all experiments explored the same configurations. On average, each job ran 5 minutes on a t2.medium instance with 4 vCPUs. Clearly, the training time dominates the runtime, whereas the communication and the HPO algorithm (random) take marginal time in total. The break from linearity is caused by two issues. First, the total time of an experiment is driven by the last job. Because different jobs have different training times, the gap between experiment time and the total time used by jobs becomes larger when using more parallel machines. Second, the performance fluctuation of the EC2 machines is the main reason for the nonlinearity in the scaling relation and it is not controllable by Auptimizer. More importantly, typical model training time is much longer than 5 minutes, which makes the additional cost by Auptimizer negligible.

We illustrate all hyperparameter combinations from different HPO algorithms in Figure 4. It shows that different HPO algorithms have searched for different paths in the hyperparameter space. Choosing the optimal HPO algorithm is a challenge and exploring them easily is important in practice. Among different approaches, we only need to change the name of algorithms, which significantly reduce the engineering work at the code level. Users can also easily scale the experiment to run in parallel by specifying the n_parallel value. Researchers can use it as benchmark suite when they have a new algorithm to test against.

In Figure 5, we show the performance of different HPO algorithm with n_parallel=8. We want to emphasize that the purpose is to demonstrate the usability of Auptimizer rather than to benchmark different HPO algorithms. By changing the algorithm names, we can easily run different strategies to tune a given model. Albeit the error rate reported here is not representative as no validation set is used. We can still confirm a few characteristics of the HPO algorithms. For example, Spearmint generally find good models at the cost that most models are complex models and result in longer training times. And as expected, BOHB and HyperBand are more resource efficient in finding good models. Grid search explored the complicated model at the early stage, which lead to an overall good performance, but in practice, when the reasonable range is not available or the dimensionality is high, it often does not work well.

Fig. 4: Hyperparameter Distribution from Different HPO Algorithms
Fig. 5: Performance of Different HPO Algorithms

V Neural Architecture Search

Though neural networks have become ubiquitous for various AI tasks there is still a lot of expert knowledge needed for designing architectures. As a result, recently gradient-based architecture search has become very popular. A seminal paper describing the research is [33], which uses a recurrent-network-based “controller” to generate strings of “child nets”. These child nets are also neural networks, whose architectures are specified by a string variable and are each trained to convergence. The controller then uses the accuracy of the child nets as a reward signal to compute the policy gradient. Progressively the controller will give higher probabilities to architectures with higher accuracy and improves its search over time, learning architectures which would progressively improve accuracy. The same paradigm can be easily adapted and extended using Auptimizer where the controller can be abstracted into Proposer, allowing users to both improve and develop NAS algorithms in a scalable and automated environment.

Since [33], further improvements have been suggested. However, the essential technique and approach remains the same. In [23], the authors show how to improve the efficiency of NAS by forcing all child models to share weights, which allows child networks to train efficiently to convergence without starting from scratch every time.

The technique works by incorporating transfer learning between child models which substantially reduces the running time. Following this approach, [4] also offsets designing and training each child network from scratch during the exploration of the highly inefficient architecture search space, by exploring the architecture space based on the current network and reusing its weights with a bidirectional tree-structured reinforcement learning meta-controller. This allows for highly expressive tree-structured architecture space which can be traversed in a multi-branch crawl yielding child architectures in an ordered fashion. Since architecture search involves efficient distribution of hardware resources and managing close synchronization between the controller and child networks processes, Auptimizer is effectively used to automate the process. In the remaining section we describe how we extend and incorporate this algorithm with Auptimizer using the publicly available code 1111 11 From with commit 070d2d7..

The structure of our implementation includes two main parts, and The file is a minor modification of the original file that trains child neural architectures as jobs. Illustrated in Code 4, the original code takes a folder name as input, which contains the architecture of the current branch, saved weights, new architecture and a static configuration including new epochs to run and learning rate. It runs the child net architecture and returns the net validation accuracy and running time. The modified version (shown in Code 5) modifies merely five lines to make it compatible with the Auptimizer framework. Then Auptimizer handles the execution of these client processes automatically and compiles their results asynchronously to aid the Proposer.

Code 4: Origin
1   from expdir_monitor.expdir_monitor import ExpdirMonitor
2   def run(expdir):
3     expdir_monitor = ExpdirMonitor(expdir)
4     valid_performance =, restore=False)
6   def main():
7     expdir = input().strip(’\n’)
8     run(expdir)
10   if __name__ == "__main__":
11     main()
Code 5: Updated
1   #!/usr/bin/env python
2   from expdir_monitor.expdir_monitor import ExpdirMonitor
3   from aup import BasicConfig, print_result
4   def run(expdir):
5     expdir_monitor = ExpdirMonitor(expdir)
6     valid_performance =, restore=False)
7     print_result(valid_performance)
9   def main():
10     config = BasicConfig().load(sys.argv[1])
11     run(config["expdir"])
13   if __name__ == "__main__":
14     main()

The Proposer wraps the main controller process (, which contains the RNN-controller based reinforcement critic. Its controlling variables include number of batches to run per episode, number of episodes to run, maximum epochs per child episode, range of filter dimensions, range of strides, potential kernel sizes, among many others. The Proposer is initialized with a basic initial configuration to generate a set of potential child processes to run. The Auptimizer executes jobs using the modified with these configurations of client networks, and reports back to the original controller once all the generated child nets for the episode have finished running. The Proposer then computes gradients from the string of child architectures and the reported accuracies, and generates new child nets for the next episode using its actors for wider and deeper configuration generation to build upon the current branch, allowing Auptimizer to take over and execute batches for the episode iteratively. Once finished, users can easily check the Auptimizer logs and database for the client runs, their architectures and accuracies, and gain more insights into the experiment. To conclude, the flexibility of Auptimizer design allows us to easily and quickly integrate an open-source NAS code into the framework.

AutoKeras [14] is an open-source library for automated machine learning, and has recently become increasingly popular for NAS applications and research. The library includes a framework and different functions to search architecture space and hyperparameters for deep learning models. As the early NAS techniques gained popularity, their major shortcoming of exorbitant computational cost remained unaddressed. In contrast to those techniques, AutoKeras performs Network Morphism based architecture generation guided by Bayesian Optimization. Network Morphism keeps the functionality of the neural network while changing its neural architecture, using an edit-distance neural network kernel which measures how many operations are needed to change one neural network to another. This allows AutoKeras to minimize the prohibitive computation costs while also allowing for control over the architecture search space. The framework also provides support for other standard search algorithms like Random, Grid, and Greedy along with Bayesian Optimization for network morphism. AutoKeras can also be used by NAS researchers who seek to implement their own NAS algorithms by reimplementing the ‘generate’ and ‘update’ functions to generate the next neural architecture and update the controller with evaluation result of a neural architecture respectively.

We provide a high level integration for AutoKeras with Auptimizer, which allows AutoKeras code to be executed on available resources. Auptimizer takes the ‘time limit’ and ‘search’ as arguments for how long to perform NAS and which search algorithm to run; and then performs the final Hyperparameter tuning. Our integration allows users to abstract away not only the NAS search process with AutoKeras but also utilize resource adaptability and result tracking with Auptimizer. Our AutoKeras integration is designed for both easy scaling on resources for NAS applications and hassle-free comparisons for switching between different search techniques or developing new search algorithms. To this end, we treat each complete AutoKeras search and final tuning as a unique job, unlike our EAS implementation of a granular approach where each candidate child model would be a job.

VI Discussion and Conclusion

Auptimizer design goals are focused on a user-friendly interface. Auptimizer benefits both practitioners and researchers and its design simplifies the integration and development of HPO algorithms. Specifically, the framework design helps both users to easily use Auptimizer in their workflows and researchers to quickly implement novel HPO algorithms. To reach these goals, the Auptimizer design has fulfilled the following requirements:

  • Flexibility. All implemented HPO algorithms share the same interface. This enables users to switch between different algorithms without changes in the code. A pool of HPO algorithms is integrated into the Auptimizer for users to explore and for researchers to benchmark against.

  • Usability. Changes to existing user’s code are limited to a minimal level. It reduces the friction for users to switch to the Auptimizer framework.

  • Scalability. Auptimizer can deploy to a pool of computing resources to automatically scale out the experiment, and users only need to specify the resource.

  • Extensibility. New HPO algorithms can be easily integrated into the Auptimizer framework if they followed the specified interface (see Section III-A).

Auptimizer addresses a critical missing piece in the application aspect of HPO research. It provides a universal platform to develop new algorithms efficiently. More importantly, Auptimizer lowers the barriers for data scientists in adopting HPO into their practice. Its scalability helps users to train their models efficiently with all computing resources available. Switching between different HPO algorithms is simple and only needs changing the proposer name (dedicated controlling parameters will be default and specified). This allows practitioners to quickly explore their ideas with advanced algorithm less laboriously.

The Auptimizer framework requires only minimal changes to existing scripts and these scripts, once modified, can be reused for other occasions directly. This non-intrusiveness frees users from repeated refactoring of their code. Users are also free to use any languages in addition to Python (although a little extra work is needed to setup the interfaces with Auptimizer). Altogether, Auptimizer gives practitioners and researchers great flexibility in building models using different frameworks (e.g. TensorFlow or PyTorch) and multiple languages (e.g. MatLab or R). We plan to introduce other functionalities (such as model compression) in Auptimizer in future releases.

To conclude, we have presented the design of Auptimizer that addresses the challenges in current HPO solutions. We have shown that it is user-friendly for both model tuning and new HPO algorithms development. Auptimizer supports a few major HPO approaches out of the box1212 12 BOHB and AutoKeras are not included in the version 1 release. and is ready to help users to automate and accelerate their model training process. We encourage community contributions to further improve the framework with state-of-the-art algorithms and infrastructure support to solve the challenges in the big data era.


  • [1] J. Bergstra and Y. Bengio (2012) Random Search for Hyper-Parameter Optimization. Journal of Machine Learning Research 13 (Feb), pp. 281–305. Cited by: §I-A, §II-A, §II-A.
  • [2] J. S. Bergstra, R. Bardenet, Y. Bengio, and B. Kégl (2011) Algorithms for hyper-parameter optimization. In Advances in neural information processing systems, pp. 2546–2554. Cited by: §I-A, §II-A, §III-A.
  • [3] J. Bergstra, D. Yamins, and D. Cox (2013-17–19 Jun) Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures. In Proceedings of the 30th International Conference on Machine Learning, S. Dasgupta and D. McAllester (Eds.), Proceedings of Machine Learning Research, Vol. 28, Atlanta, Georgia, USA, pp. 115–123. Cited by: §I-A, §III-A.
  • [4] H. Cai, T. Chen, W. Zhang, Y. Yu, and J. Wang (2018) Efficient architecture search by network transformation. In The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), pp. 2787–2794. Cited by: §III-A, §V.
  • [5] M. Claesen, J. Simm, D. Popovic, and B. De Moor (2014) Hyperparameter tuning in python using optunity. (eng). Cited by: §I-A.
  • [6] J. K. Dutta, J. Liu, U. Kurup, and M. Shah (2018-01) Effective Building Block Design for Deep Convolutional Neural Networks using Search. External Links: 1801.08577, Link Cited by: §II-A.
  • [7] K. Eggensperger, M. Feurer, F. Hutter, J. Bergstra, J. Snoek, H. Hoos, and K. Leyton-Brown (2013) Towards an empirical foundation for assessing bayesian optimization of hyperparameters. In NIPS workshop on Bayesian Optimization in Theory and Practice, Vol. 10, pp. 3. Cited by: §II-A.
  • [8] S. Falkner, A. Klein, and F. Hutter (2018-10–15 Jul) BOHB: robust and efficient hyperparameter optimization at scale. In Proceedings of the 35th International Conference on Machine Learning, J. Dy and A. Krause (Eds.), Proceedings of Machine Learning Research, Vol. 80, Stockholmsm ssan, Stockholm Sweden, pp. 1437–1446. Cited by: §I-A, §II-A, §III-A.
  • [9] M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, and F. Hutter (2015) Efficient and robust automated machine learning. In Advances in Neural Information Processing Systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.), pp. 2962–2970. External Links: Link Cited by: §I-B.
  • [10] F. Friedrichs and C. Igel (2005) Evolutionary tuning of multiple svm parameters. Neurocomputing 64, pp. 107–117. Cited by: §I-A.
  • [11] D. Golovin, B. Solnik, S. Moitra, G. Kochanski, J. Karro, and D. Sculley (2017) Google vizier: a service for black-box optimization. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’17, New York, NY, USA, pp. 1487–1495. External Links: ISBN 978-1-4503-4887-4 Cited by: §I-A, §I-A, §II-B.
  • [12] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten (2009) The weka data mining software: an update. ACM SIGKDD explorations newsletter 11 (1), pp. 10–18. Cited by: §I-B.
  • [13] H. Jin, Q. Song, and X. Hu (2018-06-27)(Website) External Links: cs.LG/1806.10282 Cited by: §III-A.
  • [14] H. Jin, Q. Song, and X. Hu (2019) Auto-keras: an efficient neural architecture search system. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1946–1956. Cited by: §V.
  • [15] D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §IV.
  • [16] L. Kotthoff, C. Thornton, H. H. Hoos, F. Hutter, and K. Leyton-Brown (2016) Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA. Journal of Machine Learning Research 17, pp. 1–5. Cited by: §I-B.
  • [17] T. Kraska, A. Talwalkar, J. C. Duchi, R. Griffith, M. J. Franklin, and M. I. Jordan (2013) MLbase: a distributed machine-learning system.. In CIDR, Cited by: §I-B.
  • [18] H. Larochelle, D. Erhan, A. Courville, J. Bergstra, and Y. Bengio (2007) An empirical evaluation of deep architectures on problems with many factors of variation. In Proceedings of the 24th International Conference on Machine Learning, ICML ’07, New York, NY, USA, pp. 473–480. External Links: ISBN 978-1-59593-793-3 Cited by: §II-A.
  • [19] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, et al. (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE 86 (11), pp. 2278–2324. Cited by: §IV.
  • [20] L. Li, K. Jamieson, G. DeSalvo, A. Rostamizadeh, and A. Talwalkar (2018) Hyperband: a novel bandit-based approach to hyperparameter optimization. Journal of Machine Learning Research 18 (185), pp. 1–52. Cited by: §I-A, §II-A, §III-A.
  • [21] R. Liaw, E. Liang, R. Nishihara, P. Moritz, J. E. Gonzalez, and I. Stoica (2018) Tune: a research platform for distributed model selection and training. arXiv preprint arXiv:1807.05118. Cited by: §I-A, §II-B.
  • [22] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al. (2011) Scikit-learn: machine learning in python. Journal of machine learning research 12 (Oct), pp. 2825–2830. Cited by: §I-B.
  • [23] H. Pham, M. Guan, B. Zoph, Q. Le, and J. Dean (2018-10–15 Jul) Efficient neural architecture search via parameters sharing. In Proceedings of the 35th International Conference on Machine Learning, J. Dy and A. Krause (Eds.), Proceedings of Machine Learning Research, Vol. 80, Stockholmsm ssan, Stockholm Sweden, pp. 4095–4104. Cited by: §V.
  • [24] M. Rocklin (2015) Dask: parallel computation with blocked algorithms and task scheduling. In Proceedings of the 14th Python in Science Conference, K. Huff and J. Bergstra (Eds.), pp. 130 – 136. Cited by: §II-B.
  • [25] H. H. Rosenbrock (1960) An automatic method for finding the greatest or least value of a function. The Computer Journal 3 (3), pp. 175–184. Cited by: §IV-B.
  • [26] B. J. Snoek, H. Larochelle, and R. P. Adams (2012) Practical Bayesian optimization of machine learning algorithms. In Advances in neural information processing systems, pp. 2951–2959. Cited by: §I-A, §I-A, §II-A, §III-A.
  • [27] E. R. Sparks, A. Talwalkar, D. Haas, M. J. Franklin, M. I. Jordan, and T. Kraska (2015) Automating model search for large scale machine learning. In Proceedings of the Sixth ACM Symposium on Cloud Computing, pp. 368–380. Cited by: §I-B.
  • [28] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov (2014) Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15, pp. 1929–1958. Cited by: §IV.
  • [29] G. Staples and Garrick (2006) TORQUE—TORQUE resource manager. In Proceedings of the 2006 ACM/IEEE conference on Supercomputing - SC ’06, New York, New York, USA, pp. 8. Cited by: §III-B.
  • [30] T. Swearingen, W. Drevo, B. Cyphers, A. Cuesta-Infante, A. Ross, and K. Veeramachaneni (2017) ATM: A distributed, collaborative, scalable system for automated machine learning. In 2017 IEEE International Conference on Big Data, BigData 2017, Boston, MA, USA, December 11-14, 2017, pp. 151–162. Cited by: §I-B.
  • [31] J. N. van Rijn and F. Hutter (2018) Hyperparameter Importance Across Datasets. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining - KDD ’18, New York, New York, USA, pp. 2367–2376. Cited by: §I-A.
  • [32] A. B. Yoo, M. A. Jette, and M. Grondona (2003) Slurm: simple linux utility for resource management. In Workshop on Job Scheduling Strategies for Parallel Processing, pp. 44–60. Cited by: §III-B.
  • [33] B. Zoph and Q. V. Le (2017) Neural architecture search with reinforcement learning. In ICLR, External Links: Link Cited by: §I-A, §II-A, §V, §V.