Tuning machine learning models at scale, especially finding the righthyperparameter values, can be difficult and time-consuming. In addition to thecomputational effort required, this process also requires some ancillaryefforts including engineering tasks (e.g., job scheduling) as well as moremundane tasks (e.g., keeping track of the various parameters and associatedresults). We present Auptimizer, a general Hyperparameter Optimization (HPO)framework to help data scientists speed up model tuning and bookkeeping. WithAuptimizer, users can use all available computing resources in distributedsettings for model training. The user-friendly system design simplifiescreating, controlling, and tracking of a typical machine learning project. Thedesign also allows researchers to integrate new HPO algorithms. To demonstrateits flexibility, we show how Auptimizer integrates a few major HPO techniques(from random search to neural architecture search). The code is available athttps://github.com/LGE-ARC-AdvancedAI/auptimizer.
Quick Read (beta)
Auptimizer - an Extensible, Open-Source Framework for Hyperparameter Tuning
Tuning machine learning models at scale, especially finding the right hyperparameter values, can be difficult and time-consuming. In addition to the computational effort required, this process also requires some ancillary efforts including engineering tasks (e.g., job scheduling) as well as more mundane tasks (e.g., keeping track of the various parameters and associated results). We present Auptimizer, a general Hyperparameter Optimization (HPO) framework to help data scientists speed up model tuning and bookkeeping. With Auptimizer, users can use all available computing resources in distributed settings for model training. The user-friendly system design simplifies creating, controlling, and tracking of a typical machine learning project. The design also allows researchers to integrate new HPO algorithms. To demonstrate its flexibility, we show how Auptimizer integrates a few major HPO techniques (from random search to neural architecture search). The code is available at https://github.com/LGE-ARC-AdvancedAI/auptimizer.
- Data Science
- Machine Learning
- Deep Learning
- High Performance Computing
- Deep Neural Network
- Hyperparameter Optimization
- Resource Manager
- Application Programming Interface
- Neural Architecture Search
Designing a Machine Learning (ML) framework for production faces challenges similar to those faced with Big Data. There is a large volume of models with a variety of configurations and training them efficiently at scale with reproducibility is critical to realizing their business value. In this paper, we address one design aspect of the ML framework, namely the HPO process, via a framework called Auptimizer.
ML models are typically sensitive to the values of hyperparameters . Different from model parameters, these hyperparameters are values that control the model configuration or the training setup and thus need to be set before training the model. Due to the lack of gradient information for these hyperparameters, tuning them is often treated as a black-box optimization . As an alternative to manual selection (which is usually based on modeler’s expertise), researchers have proposed different methods to accelerate the tuning process including Bayesian approaches , evolutionary algorithms , multi-armed bandits , and architecture search by learning .
Tuning hyperparameters is often time-consuming especially when model training is computationally intensive . Therefore, in practice, an automated HPO solution is critically important for machine learning. Both open-source solutions and commercial offerings are available. However, as a rapidly developing field, there are challenges when applying them under industry settings.
Specifically, no HPO approach is objectively the best for all problems. Most state-of-the-art open-source solutions are backed by certain heuristics driven by research, e.g. Spearmint , HyperOpt [2, 3], and HyperBand . Sometimes users need to examine various options before settling on the most suitable approach. But, transferring code and results from one solution to another is difficult, given that no common Application Programming Interface (API) is shared among them. This problem has become more serious recently as the number of hyperparameters has increased dramatically with Deep Neural Network (DNN) algorithms, which are prone to human editing errors.
With the increased availability of large computing infrastructure, parallel and distributed training continue to become affordable and commonplace. HPO implementations typically do not allow users to fully benefit from such high-performance computing environments. Furthermore, the complexity of the hyperparameter space and the long training time make model tracking laborious and error-prone.
There have been efforts to automate this hyperparameter tuning process. Google Vizier  discussed the design and algorithms used for the Google Cloud Machine Learning HyperTune subsystem. Google AutoML, Amazon SageMaker, and SigOpt productize HPO algorithms as commercial offerings. But no customization is allowed on the algorithm or the infrastructure. Open-source projects like Optunity, Tune integrate different algorithms and have built user-friendly APIs for users. But, in all cases, adopting new algorithms or accommodating new computing resources is still challenging. Also, hyperparameter tuning is often highly coupled with the respective framework and it is difficult to use for other scenarios like fine-tuning an already trained model.
I-B Beyond HPO
Beyond the scope of HPO, autoML tries to solve the ML problem with minimal human intervention. Frameworks such as ATM , auto-sklearn , auto-WEKA , or TuPAQ  based on the underlying ML packages (sklearn, WEKA, MLbase [22, 12, 17]) provide additional features such as model selection. However, at the cost of simplifying the model engineering, practitioners also lose the ability to fully customize their model or to use their existing model architectures. Therefore, we focus on the scalability and automation of the HPO process and leave the model selection and feature engineering decisions for future work.
Our contributions are two-fold. We explicitly review the challenges of using HPO in practice, and introduce an open-source automated optimizer framework, Auptimizer, to address these challenges by:
The paper is organized as follows. We summarize the state-of-the-art HPO practices and outline the challenges in Section II. Next, we present the system design of Auptimizer in Section III and demonstrate its usability in Section IV. We further include the new development in the Neural Architecture Search (NAS) research in Section V. We conclude our work with a discussion about the future roadmap of Auptimizer.
II-A Hyperparameter Optimization Research
HPO has become more relevant alongside the proliferation of ML and data science applications. GridSearch and manual search were favored in early studies due to their simplicity and interpretability . However, these approaches were quickly outpaced by others due to the curse of dimensionality .
Different hyperparameters are not independent. Instead, their values are intertwined. Advanced algorithms take advantage of this internal constraints to balance exploration and exploitation of the parameter spaces to offer a better solution. Modeled by a Gaussian process, Bayesian Optimization tries to maximize the expected model improvement and works well for low-dimensional, numerical problems (e.g., Spearmint ). Sequential Model-based Global Optimization with tree-based method  has been shown to have better performance in high-dimensional, structured model space . Recently, optimizing DNN models using reinforcement learning has become a mainstream topic under the rubric of NAS .
Besides learning the structure of hyperparameters, optimizing the training budget using multi-armed bandit strategy also shows promising results for DNN models (e.g., HyperBand ). Further combining with Bayesian optimization, BOHB  improves the tuning process leveraging on the benefits of both approaches. Despite its simplicity, RandomSearch  is still efficient and is commonly used as a benchmark against other more advanced algorithms [6, 20]. Given the variety of ML problems, there is no conclusive preference of the best HPO to use.
Regardless of their variety, all HPO algorithms share the same workflow, which breaks down into four steps:
initialize search space and configuration;
propose values for hyperparameters;
train model and update result;
Most of above research focus on the Step 2, and the engineering-oriented projects discussed next are focused more on streamlining the Step 4 while supporting a limited number of algorithms. Our proposed framework, Auptimizer, helps automate the entire process.
II-B Hyperparameter Optimization Practice
Most of the above-mentioned algorithms and many others have released their source codes to the research community, e.g., Spearmint11 1 https://github.com/JasperSnoek/spearmint, HyperOpt22 2 https://github.com/hyperopt/hyperopt, and HyperBand33 3 https://github.com/zygmuntz/hyperband. These solutions are originally designed for research. Therefore they are hard to extend or integrate with other algorithms. Moreover, neither common code structure nor common APIs exist for interoperability. Thus, it is challenging for users to adopt them without changing their existing code and to switch amongst these alternatives without significant changes to their code base.
The efforts to consolidate the interfaces at a system level are also available. Google’s Vizier44 4 Unofficial source at https://github.com/tobegit3hub/advisor  is a design used within Google for their Cloud Machine Learning HyperTune subsystem. And Google AutoML, Amazon SageMaker, and SigOpt productize HPO algorithms as commercial offerings. But these approaches typically fail to provide the extensibility for users to integrate their specific HPO algorithms or the scalability to utilize a large pool of computing resources on-premise.
Open-sourcing helps the extensibility, however, the existing packages often fall short on challenges to practical use. Projects like Optunity55 5 https://github.com/claesenm/optunity or Chocolate66 6 https://github.com/AIworx-Labs/chocolate integrated a few different HPO algorithms for users, and new ones can be easily integrated under their consistent APIs. However, the process is sequential, therefore they do not support training models parallelly at scale. On the other side, Dask-ML  provides an easy-to-switch backend for computing resources, but lacks supports for customizing and extending HPO algorithms.
Tune , centering on scalable hyperparameter search, is undergoing rapid development. It supports scalability on different architectures, and also supports two categories of HPO strategies: trial schedulers (e.g. HyperBand) and search algorithms (e.g. HyperOpt). However, it lacks usability in practice. First, the users’ training script needs modification to align with Tune’s API. This approach occasionally results in excessive re-engineering on source code and it hinders users from debugging their training code. Second, different search algorithms require different configurations, which makes it harder for users to switch among different HPO strategies. Third, Tune relies on the autoscaling function provided by the Ray project for computing resource allocation, which currently cannot support a team environment where more advanced job scheduler is already in place.
There is no universal HPO algorithm having the best performance over all problems. Thus, trying different ones is necessary to reveal the best results and business value. However, a high adoption cost commonly prevents user from trying different algorithms. We summarize the common factors that limit the current HPO toolboxes as flexibility, usability, scalability, and extensibility:
Flexibility. It is challenging to switch between HPO algorithms, as the interfaces are dramatically different.
Scalability. The integration with large-scale computational resources is missing and it is typically hard to scale the toolbox to a multi-node environment.
Extensibility. It is challenging to introduce a new algorithm into the existing libraries as these libraries are tightly coupled with the implemented algorithms.
We summarize the comparison of representative HPO solutions based on the above criteria in Table I. Based on our experience in developing an in-house solution, we release an HPO framework, Auptimizer, to mitigate the above-mentioned challenges.
|Flexibility (No. of HPO algorithms)||2||Bayesian||7||2||4, 8||9|
|Usability (Format of training code)||Function||Rewrite||Function||Rewrite||Function||Script|
|Extensibility (Manual to add new HPO algorithms)||N.A.||N.A.||Yes||Hard||Yes||Yes|
In this paper, we use the following terminology to describe the system design and Auptimizer use cases.
For data science applications, data scientists (users) solve given data mining problems with specified ML models. A script (code) is written and some hyperparameters are commonly identified to be explored during the model training. Typically, the user carries out an experiment to examine a range of hyperparameter combinations and measures the performance (e.g., accuracy) of the model on a hold-out dataset, for example, the number of neighbors in a K-Nearest-Neighbor model, or the learning rate in a deep learning model. Each individual training process for a given hyperparameter set is called a job. After all jobs are finished, the user retrieves the best model from the training history for further analysis or application.
Auptimizer is designed primarily as a tool for user. It removes the burden of drastically changing users’ existing code, which is a key hurdle in the HPO adoption process. It only requires the user to add a few lines in the code, and guides users to setup all other experiment-related configurations. Therefore, the user can easily switch among different HPO algorithms and computing resources without rewriting their training script.
Auptimizer is also designed to support researchers and developers to easily extend the framework to other HPO algorithms and computing resources. We highlight the abstraction of the Auptimizer design in Figure 1. Both resources and proposers communicate with Auptimizer via the designated interfaces. We implemented a few open-source HPO solutions to demonstrate the consistency of the API definitions. Meanwhile, the user’s training script is executed as a job, in which the scores are automatically updated for proposer without user’s intervention.
The Auptimizer framework abstracts the HPO workflow of an experiment as shown in Section III. Once an experiment is defined and initialized, Auptimizer continuously checks for available resources (get_available()) and new hyperparameter proposals (get_param()) and then runs new jobs to search for the best model. Once a job is finished, Auptimizer automatically starts update(), a function that records the results asynchronously using a callback mechanism.
\[email protected]@algorithmic \REQUIREexperiment.json; env.ini; code_path \STATEaup.Experiment(experiment.json, env.ini, code_path) \WHILEnot proposer.finished() \STATEresource resource_manager.get_available() \IFnot resource \STATEsleep \COMMENTwait for available resource \ENDIF\STATEhyperparameters proposer.get_param() \STATEJob aup.run(hyperparameters, resource) \IFJob.callback() \STATEproposer.update() \ENDIF\ENDWHILE\STATEaup.finish() \COMMENTwait for unfinished jobs
In the following sections, we discuss the two key components - Resource Manager and Proposer- along with auxiliary components - Tracking and Visualization - in detail. Researchers and developers will find that these abstractions can help them to easily extend Auptimizer with new HPO algorithms and adapt it to their own computing environments.
Proposer controls how Auptimizer interacts with HPO algorithms for recommending new hyperparameter values. The Proposer interface reduces the effort to implement an HPO algorithm by defining two functions: get_param() to return the new hyperparameter values, and update() to update the history. In the open source release, we integrate a few well-known solutions, such as Spearmint , HyperOpt [2, 3], HyperBand , BOHB  along with simple random search and grid search. Moreover, we also demonstrate its usability to a state-of-the-art NAS approaches such as EAS  and AutoKeras .
Despite the inherently different nature of these algorithms, Auptimizer interacts with them only through the two interfaces described above and keeps other irrelevant components away from users and researchers. When implementing other open-source solutions, we found that at most one source file needs to be changed or added, and the remaining source code can be reused for the integration. As an example, to integrate BOHB, we wrote only 138 lines of code and reused the existing 4305 lines of codes77 7 https://github.com/automl/HpBandSter with commit 841db4b., which demonstrates the power of Auptimizer’s extensibility.
function is a wrapper for the underlying HPO implementations. It queries new values of hyperparameters and package them into a BasicConfig object to be used for code execution.
The newly created BasicConfig contains all hyperparameter values in a dictionary for a job to run with. Additional information can be added for HPO algorithms to use without interfering with job execution. For instance, the value of the job ID is used in the HyperBand implementation to track previous results and to resume training when necessary. This BasicConfig is then passed to the resource manager for job execution (see discussion in Section III-B). Users only need to change their code to read the BasicConfig as an input file. To further reduce the burden on the user end, we provide load() and save() methods in BasicConfig to simplify the adoption of Auptimizer (see example in Code 3).
An example of the BasicConfig file generated by Auptimizer at runtime is illustrated in Code 1. It contains two variables (x,y) along with additional variables when necessary (e.g., job_id). This generated JSON file will be passed to the code automatically by Resource Manager during model training.
All configurations used for model training are saved, and user can easily reuse them together with their code without any modification. This enables users to verify or finetune their model after HPO.
function collects results back from jobs, updates the tuning history, and also registers the results for record tracking (see Section III-C).
For simple algorithms (e.g., RandomSearch), no history is needed. However, advanced algorithms (e.g., Bayesian Optimization) need to match the resulting scores with the specific input hyperparameters. Auptimizer takes care of this matching by automatically mapping the result back to its BasicConfig and thus, HPO algorithms can directly restore the hyperparameter values used in a specific job. Auxiliary values (e.g. job_id), are tracked and can be customized for other usage, such as to save and retrieve models for further finetuning88 8 Users need to write their own function to restore model based on the input ID..
III-B Resource Manager
Resource Manager (RM) is another cornerstone in the Auptimizer framework. It connects computing resources to model training automatically thus allowing codes to run on resources based on their availability. It also sets a callback mechanism to trigger the update() function when a job is finished.
A key challenge of usability in HPO implementations is the communication between jobs and the heterogeneous computer resources that jobs run on. The existing open-sourced projects, e.g., HyperOpt, Tune, require to call the code directly to get return values. And commercial services such as SageMaker requires its customer’s code to be encapsulated in a docker image. SigOpt provides API calls for communication but it leaves it to the users to do resource allocation and code execution. All these solutions are challenging for users to use the HPO at scale. In comparison, the Auptimizer framework puts the user-friendliness as its priority and removes this burden.
General resource management and job scheduling tools, e.g., Slurm  or TORQUE  are not designed for HPO applications. Using those tools, jobs are submitted in advance and wait for the available resources to be executed on. In the HPO setting, the configurations of hyperparameters are typically determined based on the history of model scores and it results in a difficulty to start jobs spontaneously. Without Auptimizer, users need to either allocate all resources at once or to write their corresponding interface to start new jobs on the fly. However, in our workflow, we rely on the flexibility of cloud services (i.e. AWS) to scale out. For Slurm and other tools, we are open to community support.
The RM interface makes it simple to extend Auptimizer to scalable computing environments. Developers only need to interact with get_available() and run(), which queries available resources and allocate correspondingly for job execution. As users, they only need to specify the resources to be used in the experiment configurations. Also integrating with advanced scheduling tools, Auptimizer can further help users to schedule jobs efficiently in a multi-tenant environment with better resource allocation.
function serves as the interface between Auptimizer and typical resource management and job scheduling tools. In the current implementation, it queries a persistent database for available resources that the user specified. If the requested resource is available, then it will be taken by Auptimizer for job execution. Otherwise, the system will wait until resources are free (see Section III).
The interface get_available() is also compatible with existing resource management tools. For instance, we used boto 399 9 https://github.com/boto/boto3. to spawn new EC2 instances on the AWS.
The Auptimizer RM component relies on the callback design to solve the scheduling wrapped in the run() function. Specifically, run() interface executes the user-provided code in a Job object. The Job object first sets up the running environment based on the available resources. For instance, it assigns CUDA_VISIBLE_DEVICES for GPU allocation. Then it executes the user-provided code with the newly proposed hyperparameter values (see Section III-A1). Once a job is finished, it triggers a callback() function to update() the result in Auptimizer. It also allows additional information to be passed to Proposer as an arbitrary string when returned from users’ code.
In the current version of Auptimizer, we demonstrate its usability across different computing resources, such as CPUs, GPUs, multiple nodes, and AWS EC2 instances. We require that the users’ code executes successfully on the targeted resources to avoid potential environment issues. In this initial release, we use a SQLite database to keep track of available resources and all jobs are running locally. Both the hyperparameter configuration and the results are communicated by the standard IO protocol.
III-C Experiment Tracking and Visualization
Experiment tracking provides a foundation of reproducibility in a data science project. In Auptimizer, all the experiment history is tracked in the user-specified database. The data schema is illustrated in Figure 2.
Experiment table plays the central role to track the overall progress. It contains experiment ID, user ID, and start and end time of an experiment. Beside them, the exp_config specifies the scope of the experiment (see Section IV-B for detailed discussion) The tables of User and Resource are for user control and resource management. The Job table tracks the runtime status and result of each job. Since Auptimizer automatically checks in its training process in experiments, users are alleviated from the worry of losing reproducibility.
The Auptimizer framework also provides a basic tool to visualize the results from history (see Section IV-D). In addition, users are able to directly access the results stored in the database for further analysis.
IV Using Auptimizer
In this section, we demonstrate the key features of using Auptimizer in practice, by the simple and commonly used DNN model for the MNIST dataset1010 10 https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/3_NeuralNetworks/convolutional_network.py with commit 971c96b.. The DNN model contains two convolution layers and two fully connected layers. Adam optimizer is used for training  with a global dropout ratio for regularization . Also, for demonstration purpose, we only search for the best accuracy on the test dataset without distinguishing it from the validation dataset.
IV-A Auptimizer Workflow
In this section, we illustrate the basic workflow to adopt Auptimizer.
First, we need to set up the Auptimizer by filling in the basic information for the computing environment. Auptimizer has a user-friendly interactive guide that can be invoked by python -m aup.setup. It will setup the Auptimizer for the first time with information about the computing environment and the database.
Next, we need to identify the key hyperparameters. In this experiment, we will explore five hyperparameters, numbers of filters in the first two convolution layers (conv1, conv2), the dropout ratio, the number of neurons of the first fully-connected layer, and the learning rate. Also, we use n_iterations to adjust number of epochs in training, which is useful for HyperBand and BOHB.
IV-B Experiment Configuration
Auptimizer also provides a command-line tool to initiate the file as python -m aup.init. Experiment Configuration controls the search space and the choice of HPO algorithm of an experiment. In Code 2, we illustrate the configuration for random search for the Rosenbrock function .
Code 2 shows that the configuration is simple and straightforward. The n_samples specifies how many jobs a user wants to run for the HPO process and n_parallel jobs can be executed at the same time on the CPU resource. The hyperparameter space is defined in parameter_config, each hyperparameter is a float ranging from to . Those hyperparameters will be assigned by Auptimizer into a BasicConfig (e.g., Code 1) and will be accessed from the user’s code directly (see Code 3).
Occasionally additional information can still be required for different HPO algorithms, e.g., using "engine":"tpe" to instruct HyperOpt to use TPE as the backend engine for HPO. But overall, the change in experiment configuration is significantly reduced in contrast to using different open-source implementations of HPO algorithms. And most importantly, there is no need to change the user’s code for different algorithms.
Auptimizer can also guide through the process of selecting HPO algorithm and defining hyperparameter specifications interactively. One generated example configuration file for random search is shown in Code 2. Users enjoy flexibility and scalability by simply changing the proposer name or the parallel number.
IV-C Code Update
To run jobs automatically in Auptimizer, we need to modify the source code correspondingly. Comparing to other HPO tools, the modifications are significantly reduced, and the resulting code can still be run independently without Auptimizer. Generally, there are four items:
change the code to self-executable,
parse input hyperparameters,
use them for model training,
report back the result.
We highlight the changes in Code 3 and the steps are explained here:
Line 1: add the shebang line to make the code self-executable.
Line 2-3: import sys, aup to parse hyperparameters and return values to Auptimizer.
Line 4-5: modify original training function by replacing variables with hyperparameters in config.
Line 6-7: main function. It parses BasicConfig by the input file.
Line 8-9: original training script. It trains for config[’n_iterations’] epochs and computes the test accuracy.
Line 10: return score to Auptimizer.
As demonstrated above, minimal changes are required to fully adopt the Auptimizer framework into practice. More importantly, the usability of the code remains, and users can reuse the exact same script for other purposes (training from scratch, finetuning) by providing the hyperparameters as input. Moreover, Auptimizer does not restrict users on any specific language or framework. For instance, a MatLab user can also use Auptimizer to tune their hyperparameters once they parse and return result in their code correspondingly.
After defining the experiment and modifying code, users are ready to run the experiment by simply entering python -m aup experiment.json. More importantly, users can easily switch among HPO algorithms by updating the configuration, or retrieve and store results in the database.
We allocate roughly the same number of total training epochs for each HPO algorithm. For random, Spearmint, HyperOpt, each hyperparameter configuration is trained for 10 epochs with 100 different configurations. Whereas for grid search, we assign the grid with 3 values for all hyperparameters, except the learning rate which is chosen from , resulting in 162 configurations. For HyperBand and BOHB, we allocate a total budget of 1000 epochs approximately along with 100 configurations to be explored. We also enforce the minimum number of epochs to be 1 with no upper limit.
In Figure 3, we examine the scalability of Auptimizer by comparing the overall experiment time with the total time used by all jobs divided by the number of computing resources. The experiment searched for 128 configurations with up to 64 AWS EC2 instances. Because training time varies due to the changing model complexity (i.e. number of filters and neurons), we fixed the random seed, such that all experiments explored the same configurations. On average, each job ran 5 minutes on a t2.medium instance with 4 vCPUs. Clearly, the training time dominates the runtime, whereas the communication and the HPO algorithm (random) take marginal time in total. The break from linearity is caused by two issues. First, the total time of an experiment is driven by the last job. Because different jobs have different training times, the gap between experiment time and the total time used by jobs becomes larger when using more parallel machines. Second, the performance fluctuation of the EC2 machines is the main reason for the nonlinearity in the scaling relation and it is not controllable by Auptimizer. More importantly, typical model training time is much longer than 5 minutes, which makes the additional cost by Auptimizer negligible.
We illustrate all hyperparameter combinations from different HPO algorithms in Figure 4. It shows that different HPO algorithms have searched for different paths in the hyperparameter space. Choosing the optimal HPO algorithm is a challenge and exploring them easily is important in practice. Among different approaches, we only need to change the name of algorithms, which significantly reduce the engineering work at the code level. Users can also easily scale the experiment to run in parallel by specifying the n_parallel value. Researchers can use it as benchmark suite when they have a new algorithm to test against.
In Figure 5, we show the performance of different HPO algorithm with n_parallel=8. We want to emphasize that the purpose is to demonstrate the usability of Auptimizer rather than to benchmark different HPO algorithms. By changing the algorithm names, we can easily run different strategies to tune a given model. Albeit the error rate reported here is not representative as no validation set is used. We can still confirm a few characteristics of the HPO algorithms. For example, Spearmint generally find good models at the cost that most models are complex models and result in longer training times. And as expected, BOHB and HyperBand are more resource efficient in finding good models. Grid search explored the complicated model at the early stage, which lead to an overall good performance, but in practice, when the reasonable range is not available or the dimensionality is high, it often does not work well.
V Neural Architecture Search
Though neural networks have become ubiquitous for various AI tasks there is still a lot of expert knowledge needed for designing architectures. As a result, recently gradient-based architecture search has become very popular. A seminal paper describing the research is , which uses a recurrent-network-based “controller” to generate strings of “child nets”. These child nets are also neural networks, whose architectures are specified by a string variable and are each trained to convergence. The controller then uses the accuracy of the child nets as a reward signal to compute the policy gradient. Progressively the controller will give higher probabilities to architectures with higher accuracy and improves its search over time, learning architectures which would progressively improve accuracy. The same paradigm can be easily adapted and extended using Auptimizer where the controller can be abstracted into Proposer, allowing users to both improve and develop NAS algorithms in a scalable and automated environment.
Since , further improvements have been suggested. However, the essential technique and approach remains the same. In , the authors show how to improve the efficiency of NAS by forcing all child models to share weights, which allows child networks to train efficiently to convergence without starting from scratch every time.
The technique works by incorporating transfer learning between child models which substantially reduces the running time. Following this approach,  also offsets designing and training each child network from scratch during the exploration of the highly inefficient architecture search space, by exploring the architecture space based on the current network and reusing its weights with a bidirectional tree-structured reinforcement learning meta-controller. This allows for highly expressive tree-structured architecture space which can be traversed in a multi-branch crawl yielding child architectures in an ordered fashion. Since architecture search involves efficient distribution of hardware resources and managing close synchronization between the controller and child networks processes, Auptimizer is effectively used to automate the process. In the remaining section we describe how we extend and incorporate this algorithm with Auptimizer using the publicly available code 1111 11 From https://github.com/han-cai/EAS with commit 070d2d7..
The structure of our implementation includes two main parts, client.py and EASProposer.py. The client.py file is a minor modification of the original file that trains child neural architectures as jobs. Illustrated in Code 4, the original code takes a folder name as input, which contains the architecture of the current branch, saved weights, new architecture and a static configuration including new epochs to run and learning rate. It runs the child net architecture and returns the net validation accuracy and running time. The modified version (shown in Code 5) modifies merely five lines to make it compatible with the Auptimizer framework. Then Auptimizer handles the execution of these client processes automatically and compiles their results asynchronously to aid the Proposer.
The Proposer wraps the main controller process (arch_search_convnet_net2net.py), which contains the RNN-controller based reinforcement critic. Its controlling variables include number of batches to run per episode, number of episodes to run, maximum epochs per child episode, range of filter dimensions, range of strides, potential kernel sizes, among many others. The Proposer is initialized with a basic initial configuration to generate a set of potential child processes to run. The Auptimizer executes jobs using the modified client.py with these configurations of client networks, and reports back to the original controller once all the generated child nets for the episode have finished running. The Proposer then computes gradients from the string of child architectures and the reported accuracies, and generates new child nets for the next episode using its actors for wider and deeper configuration generation to build upon the current branch, allowing Auptimizer to take over and execute batches for the episode iteratively. Once finished, users can easily check the Auptimizer logs and database for the client runs, their architectures and accuracies, and gain more insights into the experiment. To conclude, the flexibility of Auptimizer design allows us to easily and quickly integrate an open-source NAS code into the framework.
AutoKeras  is an open-source library for automated machine learning, and has recently become increasingly popular for NAS applications and research. The library includes a framework and different functions to search architecture space and hyperparameters for deep learning models. As the early NAS techniques gained popularity, their major shortcoming of exorbitant computational cost remained unaddressed. In contrast to those techniques, AutoKeras performs Network Morphism based architecture generation guided by Bayesian Optimization. Network Morphism keeps the functionality of the neural network while changing its neural architecture, using an edit-distance neural network kernel which measures how many operations are needed to change one neural network to another. This allows AutoKeras to minimize the prohibitive computation costs while also allowing for control over the architecture search space. The framework also provides support for other standard search algorithms like Random, Grid, and Greedy along with Bayesian Optimization for network morphism. AutoKeras can also be used by NAS researchers who seek to implement their own NAS algorithms by reimplementing the ‘generate’ and ‘update’ functions to generate the next neural architecture and update the controller with evaluation result of a neural architecture respectively.
We provide a high level integration for AutoKeras with Auptimizer, which allows AutoKeras code to be executed on available resources. Auptimizer takes the ‘time limit’ and ‘search’ as arguments for how long to perform NAS and which search algorithm to run; and then performs the final Hyperparameter tuning. Our integration allows users to abstract away not only the NAS search process with AutoKeras but also utilize resource adaptability and result tracking with Auptimizer. Our AutoKeras integration is designed for both easy scaling on resources for NAS applications and hassle-free comparisons for switching between different search techniques or developing new search algorithms. To this end, we treat each complete AutoKeras search and final tuning as a unique job, unlike our EAS implementation of a granular approach where each candidate child model would be a job.
VI Discussion and Conclusion
Auptimizer design goals are focused on a user-friendly interface. Auptimizer benefits both practitioners and researchers and its design simplifies the integration and development of HPO algorithms. Specifically, the framework design helps both users to easily use Auptimizer in their workflows and researchers to quickly implement novel HPO algorithms. To reach these goals, the Auptimizer design has fulfilled the following requirements:
Usability. Changes to existing user’s code are limited to a minimal level. It reduces the friction for users to switch to the Auptimizer framework.
Scalability. Auptimizer can deploy to a pool of computing resources to automatically scale out the experiment, and users only need to specify the resource.
Auptimizer addresses a critical missing piece in the application aspect of HPO research. It provides a universal platform to develop new algorithms efficiently. More importantly, Auptimizer lowers the barriers for data scientists in adopting HPO into their practice. Its scalability helps users to train their models efficiently with all computing resources available. Switching between different HPO algorithms is simple and only needs changing the proposer name (dedicated controlling parameters will be default and specified). This allows practitioners to quickly explore their ideas with advanced algorithm less laboriously.
The Auptimizer framework requires only minimal changes to existing scripts and these scripts, once modified, can be reused for other occasions directly. This non-intrusiveness frees users from repeated refactoring of their code. Users are also free to use any languages in addition to Python (although a little extra work is needed to setup the interfaces with Auptimizer). Altogether, Auptimizer gives practitioners and researchers great flexibility in building models using different frameworks (e.g. TensorFlow or PyTorch) and multiple languages (e.g. MatLab or R). We plan to introduce other functionalities (such as model compression) in Auptimizer in future releases.
To conclude, we have presented the design of Auptimizer that addresses the challenges in current HPO solutions. We have shown that it is user-friendly for both model tuning and new HPO algorithms development. Auptimizer supports a few major HPO approaches out of the box1212 12 BOHB and AutoKeras are not included in the version 1 release. and is ready to help users to automate and accelerate their model training process. We encourage community contributions to further improve the framework with state-of-the-art algorithms and infrastructure support to solve the challenges in the big data era.
-  (2012) Random Search for Hyper-Parameter Optimization. Journal of Machine Learning Research 13 (Feb), pp. 281–305. Cited by: §I-A, §II-A, §II-A.
-  (2011) Algorithms for hyper-parameter optimization. In Advances in neural information processing systems, pp. 2546–2554. Cited by: §I-A, §II-A, §III-A.
-  (2013-17–19 Jun) Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures. In Proceedings of the 30th International Conference on Machine Learning, S. Dasgupta and D. McAllester (Eds.), Proceedings of Machine Learning Research, Vol. 28, Atlanta, Georgia, USA, pp. 115–123. Cited by: §I-A, §III-A.
-  (2018) Efficient architecture search by network transformation. In The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), pp. 2787–2794. Cited by: §III-A, §V.
-  (2014) Hyperparameter tuning in python using optunity. (eng). Cited by: §I-A.
-  (2018-01) Effective Building Block Design for Deep Convolutional Neural Networks using Search. External Links: Cited by: §II-A.
-  (2013) Towards an empirical foundation for assessing bayesian optimization of hyperparameters. In NIPS workshop on Bayesian Optimization in Theory and Practice, Vol. 10, pp. 3. Cited by: §II-A.
-  (2018-10–15 Jul) BOHB: robust and efficient hyperparameter optimization at scale. In Proceedings of the 35th International Conference on Machine Learning, J. Dy and A. Krause (Eds.), Proceedings of Machine Learning Research, Vol. 80, Stockholmsm ssan, Stockholm Sweden, pp. 1437–1446. Cited by: §I-A, §II-A, §III-A.
-  (2015) Efficient and robust automated machine learning. In Advances in Neural Information Processing Systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.), pp. 2962–2970. External Links: Cited by: §I-B.
-  (2005) Evolutionary tuning of multiple svm parameters. Neurocomputing 64, pp. 107–117. Cited by: §I-A.
-  (2017) Google vizier: a service for black-box optimization. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’17, New York, NY, USA, pp. 1487–1495. External Links: Cited by: §I-A, §I-A, §II-B.
-  (2009) The weka data mining software: an update. ACM SIGKDD explorations newsletter 11 (1), pp. 10–18. Cited by: §I-B.
-  (2018-06-27)(Website) External Links: Cited by: §III-A.
-  (2019) Auto-keras: an efficient neural architecture search system. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1946–1956. Cited by: §V.
-  (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §IV.
-  (2016) Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA. Journal of Machine Learning Research 17, pp. 1–5. Cited by: §I-B.
-  (2013) MLbase: a distributed machine-learning system.. In CIDR, Cited by: §I-B.
-  (2007) An empirical evaluation of deep architectures on problems with many factors of variation. In Proceedings of the 24th International Conference on Machine Learning, ICML ’07, New York, NY, USA, pp. 473–480. External Links: Cited by: §II-A.
-  (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE 86 (11), pp. 2278–2324. Cited by: §IV.
-  (2018) Hyperband: a novel bandit-based approach to hyperparameter optimization. Journal of Machine Learning Research 18 (185), pp. 1–52. Cited by: §I-A, §II-A, §III-A.
-  (2018) Tune: a research platform for distributed model selection and training. arXiv preprint arXiv:1807.05118. Cited by: §I-A, §II-B.
-  (2011) Scikit-learn: machine learning in python. Journal of machine learning research 12 (Oct), pp. 2825–2830. Cited by: §I-B.
-  (2018-10–15 Jul) Efficient neural architecture search via parameters sharing. In Proceedings of the 35th International Conference on Machine Learning, J. Dy and A. Krause (Eds.), Proceedings of Machine Learning Research, Vol. 80, Stockholmsm ssan, Stockholm Sweden, pp. 4095–4104. Cited by: §V.
-  (2015) Dask: parallel computation with blocked algorithms and task scheduling. In Proceedings of the 14th Python in Science Conference, K. Huff and J. Bergstra (Eds.), pp. 130 – 136. Cited by: §II-B.
-  (1960) An automatic method for finding the greatest or least value of a function. The Computer Journal 3 (3), pp. 175–184. Cited by: §IV-B.
-  (2012) Practical Bayesian optimization of machine learning algorithms. In Advances in neural information processing systems, pp. 2951–2959. Cited by: §I-A, §I-A, §II-A, §III-A.
-  (2015) Automating model search for large scale machine learning. In Proceedings of the Sixth ACM Symposium on Cloud Computing, pp. 368–380. Cited by: §I-B.
-  (2014) Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15, pp. 1929–1958. Cited by: §IV.
-  (2006) TORQUE—TORQUE resource manager. In Proceedings of the 2006 ACM/IEEE conference on Supercomputing - SC ’06, New York, New York, USA, pp. 8. Cited by: §III-B.
-  (2017) ATM: A distributed, collaborative, scalable system for automated machine learning. In 2017 IEEE International Conference on Big Data, BigData 2017, Boston, MA, USA, December 11-14, 2017, pp. 151–162. Cited by: §I-B.
-  (2018) Hyperparameter Importance Across Datasets. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining - KDD ’18, New York, New York, USA, pp. 2367–2376. Cited by: §I-A.
-  (2003) Slurm: simple linux utility for resource management. In Workshop on Job Scheduling Strategies for Parallel Processing, pp. 44–60. Cited by: §III-B.
-  (2017) Neural architecture search with reinforcement learning. In ICLR, External Links: Cited by: §I-A, §II-A, §V, §V.