Machine learning pipeline potentially consists of several stages ofoperations like data preprocessing, feature engineering and machine learningmodel training. Each operation has a set of hyper-parameters, which can becomeirrelevant for the pipeline when the operation is not selected. This gives riseto a hierarchical conditional hyper-parameter space. To optimize this mixedcontinuous and discrete conditional hierarchical hyper-parameter space, wepropose an efficient pipeline search and configuration algorithm which combinesthe power of Reinforcement Learning and Bayesian Optimization. Empiricalresults show that our method performs favorably compared to state of the artmethods like Auto-sklearn , TPOT, Tree Parzen Window, and Random Search.
Quick Read (beta)
ReinBo: Machine Learning pipeline search and configuration with Bayesian Optimization embedded Reinforcement Learning
Machine learning pipeline potentially consists of several stages of operations like data preprocessing, feature engineering and machine learning model training. Each operation has a set of hyper-parameters, which can become irrelevant for the pipeline when the operation is not selected. This gives rise to a hierarchical conditional hyper-parameter space. To optimize this mixed continuous and discrete conditional hierarchical hyper-parameter space, we propose an efficient pipeline search and configuration algorithm which combines the power of Reinforcement Learning and Bayesian Optimization. Empirical results show that our method performs favorably compared to state of the art methods like Auto-sklearn , TPOT, Tree Parzen Window, and Random Search.
Keywords:Bayesian Optimization Reinforcement Learning AutoML
shapes,backgrounds \usetikzlibraryarrows,positioning \usetikzlibrarytrees \usetikzlibrarychains \usetikzlibraryarrows,petri,topaths\usetikzlibraryshadows
Over the past years, Machine Learning (ML) has achieved remarkable success in a wide range of application areas, which has greatly increased the demand for machine learning systems. However, an efficient machine learning algorithm crucially depends on a human expert, who has to carefully design the pipeline of the machine learning system, potentially consisting of data pre-process, feature filtering, machine learning algorithm selection, as well as identifying a good set of hyper-parameters. As there are a large number of possible alternatives of models as well as hyper-parameters, the need for automated machine learning (AutoML) has been growing, which is supposed to automatically generate a data analysis pipeline with machine learning methods and parameter settings that are optimized for a given data set, in order to make machine learning methods available for non-expert users.
Since hyper-parameters of a machine learning model have a large influence on the performance of the model, hyper-parameter optimization becomes a critical part of an AutoML system. Popular hyper-parameter optimization methods include Sequential Bayesian Optimization, which iterates between fitting surrogate models that predict model performance, and using them to make choices about which configurations to investigate.
However, the composition of the machine learning pipelines also plays a vital role in the performance of AutoML systems. Choosing different data preprocessing or feature engineering techniques as well as choosing different machine learning models for a specific dataset could potentially result in considerable performance differences. The joint optimization of the pipeline search and its associated hyper-parameters configuration could essentially reside under the umbrella of Combined Algorithm Selection and Hyperparameter optimization (CASH) problem , where Algorithm corresponds to the pipeline and Configuration corresponds to the hyper-parameters associated with the pipeline. The pipelines and hyper-parameters reside in a conditional hierarchical space, where some hyper-parameters only become valid when the corresponding pipeline is present. For example, Figure 1 illustrates such a situation when the data preprocessing and feature engineering operations are selected, which correspond to an incomplete pipeline, one out of three machine learning algorithms need to be chosen (indicated by dashed edges) to complete the pipeline, the corresponding hyper-parameters (indicated by solid edges) of an algorithm only become valid when the algorithm is selected.
To optimize the conditional hyper-parameters space jointly with the pipeline it is attached to, we embed Bayesian Optimization in the Reinforcement Learning process, and dub the method ReinBo, which means Machine Learning Pipeline search and configuration with Reinforcement Learning and Bayesian Optimization. Note that ReinBo can solve not only CASH problems, but also any mixed discrete and continuous conditional hierarchical space optimization, which is left for future work.
Our major contributions are:
Inspired by Hierarchical Reinforcement Learning , we transform the conditional hierarchical hyper-parameter optimization problem into subtasks of pipeline selection and hyper-parameter optimization, which circumvents the conditional constraint and reduces the search dimension.
To our best knowledge, we are the first to embed Bayesian Optimization (BO) into Reinforcement learning, specifically Q Learning  for collaborative joint search of pipelines and hyper-parameters, which is different from using BO for policy optimization , and also different from using BO for hyper-parameter fine tuning after an optimal pipeline is selected by a reinforcement learning based AutoML framework .
We provide an open source light weight R language implementation reinbo11 1 https://github.com/smilesun/reinbo for the R Machine Learning community which could run efficiently on a personal computer, and takes much less resources compared to other AutoML softwares.
2 Related Work
In this section, we try to classify the current popular AutoML solutions into a taxonomy and discuss the differences of each individual work with ours.
Sequential Model Based Optimization family Auto-sklearn  and Auto-Weka  both use Sequential Model-based Algorithm Configuration (SMAC)  to solve the Combined Algorithm Selection and Hyperparameter optimization (CASH) problem. SMAC transforms the conditional hierarchical hyper-parameter space into a flat structure by instantiating inactive conditional parameters to default values, which allows the random forest to focus on active hyper-parameters . A potential drawback for this method is that the surrogate model needs to learn in a high dimensional hyper-parameter space, which might need a large sample of observations to be sufficiently trained, while in practice, running machine learning algorithm is usually very expensive. Tree Parzen Window (TPE) , however, tackles the conditional hierarchical hyper-parameter space using a tree like Parzen Window to construct two density estimators on top of a tree like hyper-parameter set. Expected improvement induced from lower and upper quantile density estimators is used to select new candidate proposals from points generated by Ancestral Sampling.
Tree-based Genetic Programming TPOT  automatically designs and optimizes machine learning pipelines with a genetic programming  algorithm. The machine learning operators are used as genetic programming primitives, which will be combined by tree-based pipelines and the Genetic Programming algorithm is used to evolve tree-based pipelines until the “best” pipeline is found. Similar methods also include Recipe . However, this family of methods does not scale well . In this paper, we aim for an AutoML system that could give a valuable configured pipeline within limited time.
Monte Carlo Tree Search Alike ML-Plan  is an AutoML system, built upon a Hierarchical Task Network, which uses a Monte Carlo Tree Search like algorithm to search for pipelines and also configure the pipeline with hyper-parameters. Task is expanded based on best-first search, where the score is estimated by a randomized depth first search by randomly trying different subtree possibilities on a Hierarchical Task Network. To ensure exploration, ML-Plan gives equal possibility to the starting node in a Hierarchical Task Network and then uses a best-first strategy for searching at the lower part of the network. Potential drawback for this method is that the hyper-parameter space is discretized, which might essentially lose good candidates in continuous spaces since large continuous hyper-parameter spaces would be essentially hard to discretize.
Reinforcement Learning based Neural Network Architecture Search This family of methods are usually not termed as AutoML systems but rather Neural Architecture Search. For instance, MetaQNN  uses Q-learning to search convolutional neural network architectures. The learning agent is trained to sequentially choose CNN layers using Q-learning with an -greedy exploration strategy and the goal is to maximize the cross-validation accuracy. In , instead of using Q-learning, the authors use Recurrent Neural Networks as the reinforcement learning policy approximator to generate variable strings to represent various neural architecture forms. The reward-function is designed to be the validation performance of the constructed network. The reinforcement learning policy is trained with gradient descent algorithm, specifically REINFORCE. The architecture elements being searched are very similar to MetaQNN. Inspired from , we also assume the machine learning pipeline to be optimized could be represented by a variable length string, but our work is different from  in that we use both Deep Q-learning and Tabular Q-learning. More importantly, compared with both  and , which optimize the neural architecture, the elements of the architecture are mostly factor variables like layer type or discretized elements like filter size, while in this paper, we deal heavily with continuous hyper-parameters (The and for a rbf kernel Support Vector Machine). To jointly optimize the discrete pipeline choice and associated continuous hyper-parameters, we embed Bayesian Optimization inside our reinforcement learning agent.
Other Reinforcement Learning based methods In , the authors also combine pipeline search and hyper-parameter optimization in a reinforcement learning process based on the PEORL  framework, however, the hyper-parameter is randomly sampled during the reinforcement learning process, an extra stage is needed to sweep the hyper-parameters using hyper-parameter optimization techniques, while in our work, hyper-parameter optimization is embedded in the reinforcement learning process. Alpha3M  combined MCTS and recurrent neural network in a self play  fashion, however, it seems that Alpha3M does not perform better than the state of art AutoML systems. For example, out of all the 6 OpenML datasets they have used to compare with state of art AutoML systems, Alpha3M only shows a clear improvement on 1 dataset (spectf) against Auto-sklearn  and TPOT , according to Figure 4 in . Furthermore, it is not clear to us how the hyper-parameters are set and if Bayesian Optimization is used. The implementation of Alpha3M takes advantage of the GPUs  for the fast performance while our method has a light weight implementation which efficiently runs with CPU and does not necessarily need Neural Networks.
3.1 Towards Reinbo
As shown in Figure 2, we assume that a machine learning pipeline potentially consists of 3 stages (s1 through s3 in the figure), which include data preprocessing (imputations, NA and more), feature engineering (Principal Component Analysis for feature transform, Anova for feature filtering and more), and machine learning model (learner like SVM, Random Forest). Specifically, we use operation “NA” to indicate that no operation would be done in the stage in question. Figure 2 just serves as a toy but working example for ReinBo, in practice, there are a lot more operations available. A particular operation has associated hyper-parameters (for instance the percentage of selected features in Anova feature filtering). In Figure 2, dark color filled cells (NA, Anova, SVM) represent selected operations and their associated active hyper-parameters (percentage, sigma, C), while hyper-parameters for inactive operations are not drawn in the figure.
Observing from Figure 2, along with Figure 1, we could think of the pipeline selection and configuration problem as a two-phase process. During the first phase, a planning algorithm guides the agent to choose a path which corresponds to an unconfigured pipeline. This is similar to a multi-armed bandit problem, where each path corresponds to one arm, while difference lies in that the agent can not directly pull a discrete arm but have to pull across several consecutive discrete arm groups (each arm group corresponds to a stage in Figure 2) and the agent only gets reward after choosing one of arms from the last group. The second phase is similar to contextual bandit with continuous action space (corresponding to hyper-parameters), where the context is which path from the first phase has been selected.
We model the first phase as a reinforcement learning episode, where a particular operation in stage is treated as action , taken upon corresponding state . State could be represented by actions taken up to the current stage for example. The pipeline search problem is then to find an optimal policy to decide which operation (action) to take at a particular state. The action value function at each state tells us how favorable a particular operation is. We use to denote the space of legal actions at state . Suppose a roll-out of states trajectory for one composition (episode) is , the corresponding space of pipeline could be denoted by , where is the total number of stages and we use to denote the Cartesian Product. For a more general notation, we use to denote the space of actions when the state is at stage .
We search for potentially better hyperparameters in the second phase with Bayesian Optimization. Aside from the pipeline itself, each concrete operation (action ) at stage is configurable by a set of hyper-parameters . can be hyper-parameters set for a preprocessor like the ratio of variance to keep in PCA or hyper-parameters set for a machine learning model like the and hyper-parameter for SVM. Thus a configured pipeline search space would be where we use to denote the conditional hyper-parameter space at stage .
The connection point between reinforcement learning and Bayesian Optimization lies in the reward function design in the reinforcement learning part. During the composition process, there is no signal available to judge how good a current uncompleted pipeline is until the final learner (classifier) is configured with hyper-parameters and trained on the data. At the starting point, different pipelines are tried out randomly, which corresponds to an untrained exploration policy . A completed pipeline with a joint non-conditional hyper-parameter search space is optimized with Bayesian Optimization for a few steps. The best negative loss is then used as a reward at the end of an episode to guide the reinforcement learning agent towards a better policy. The environment uncertainty only comes with the stochastic reward, while the transition from current state to next state through action is deterministic. We choose to use Q-learning  to optimize the policy where we have tried the Tabular Q-learning and Deep Q-learning . We find out that the Tabular Q-learning works better than Deep Q-learning. For space constraint, the latter is not discussed in detail in this work.
We need Bayesian Optimization to optimize the hyper-parameters in a fine grained level with limited budget, but also want to give budget preference to those promising pipelines. To circumvent the complexity of conditional and hiercharical relationship between hyper-parameters and pipeline, we use reinforcement learning to choose a pipeline and let Bayesian Optimization tune the hyper-parameters. We model the variation of the same pipeline with different hyper-parameters as the environment uncertainty. By using separate surrogate model for each pipeline, we circumvent the risk of mistakenly modeling improper dependent structure between different hyper-parameters, at a minor cost of maintaining those searched pipelines surrogate model as dictionary in memory.
3.2 Connections to Hyperband
The idea of only using a few steps Bayesian Optimization is inspired by Hyperband , where the trade-off between aggressively exploring more configurations and giving each configuration more resources to be validated is solved by grid searching. Instead, in this paper, we do not need the grid search, promising pipelines will get a higher probability to be selected by our reinforcement learning agent which means these pipelines get more chances to be evaluated by the Bayesian Optimization process. The trade-off between exploitation and exploration is naturally resolved by an -greedy policy, and by annealing from a large value to a small value, we encourage more exploration at the beginning. Compared to Hyperband, our method selects the budgets allocated for a particular pipeline automatically, the effectiveness of our strategy could then rely on the recent success of reinforcement learning in different areas.
3.3 Connection and Extension to Hierarchical Reinforcement Learning
Hierarchical Reinforcement Learning (hrl)  is proposed to tackle the curse of dimensionality in Reinforcement Learning . Although the Option approach  is more popular, our method has a close connection to the MAXQ subtask approach , which divide a task recursively into subtasks and decompose the value function accordingly. The current version of ReinBo can be treated as a special case of the MAXQ task decomposition, where we have two tasks of pipeline selection and hyper-parameter configuration. However, in the current version, most states are not shared between these two tasks, so there is no need to use MAXQ hrl algorithm to solve the problem. But our method can be naturally extended to a hrl version when our design space of pipeline allow shared state between the two subtasks. We leave it as future work to optimize such complicated pipelines using Hierarchical Reinforcement Learning.
3.4 Procedures of ReinBo
As shown in Algorithm 3.4 , we first initialize a policy for the agent which can be represented by a randomly initialized neural network or a Q-table initialized with default values, coupled with an exploration mechanism like the -greedy strategy. During the roll-out, initial populations of pipelines get sampled, with the corresponding hyper-parameter space to be optimized by Bayesian Optimization for several steps. The corresponding surrogate model is stored in the dictionary for future episode if the same pipeline gets rolled out again. The performance of the pipeline on validation data will be used to serve as feedback signal or reward to the reinforcement learning agent to conduct policy iteration.
\REQUIREdataset , Operators and Hyper-parameters
Initialize MBO Surrogate Dictionary
\WHILEBudget not reached \STATERoll-out an unconfigured pipeline according to policy
\STATEGet hyper-parameters set for the ground pipeline
\STATEReward \STATEUpdate Policy with reinforcement learning algorithm \ENDWHILE
Once an unconfigured pipeline is constructed at the end of the episode, running Bayesian Optimization could be beneficial in searching for a more favorable hyper-parameter setting. However, Bayesian Hyperparameter Optimization with large budgets could be rather expensive. Instead, we optimize hyper-parameters for an un-configured pipeline only for several iterations. For example, we take the number of iterations to be 2 or 3 times the dimension of hyper-parameter space, which means that hyper-parameter spaces with higher dimension will get more sampling budgets. After each episode, the current best configuration’s performance for this pipeline in question is used as reward. The next time the same pipeline is sampled, the surrogate model could be retrieved from the dictionary to facilitate further search using MBO. We dub the hyper-parameter search process as MBO_PROBE, with details shown in Algorithm 3.4.22 2 To save budgets, when an un-configured pipeline does not improve after a number of trials of MBO_PROBE, it can also be suspended for future evaluation. If an un-configured pipeline is not sampled yet, an initial design is generated to facilitate an initial surrogate model.
\IF \STATEgenerate initial design of size for surrogate model;
\FOR in \STATEevaluate each design by initiating the pipeline with corresponding hyper-parameters .
\ENDFOR\STATEinitialize \ENDIF\FOR in \STATEpropose new point according to surrogate model \STATEevaluate new point \ENDFOR
best accuracy until now
4.1 Implementation, Comparision Methods and Setups
Our initial implementation for ReinBo is based on R machine learning packages mlr , mlrCPO  for pipeline construction and mlrMBO  for Bayesian Optimization. However, ReinBo could be extended to combine with any machine learning library as backends. The R package parabox33 3 https://github.com/smilesun/parabox is implemented for this project to specify conditional hierarchical hyper-parameter space and provides the conditional ancestral sampling (random search in conditional hyper-parameter space). The R package rlR44 4 https://github.com/smilesun/rlR is implemented for reinforcement learning where the user could implement a custom environment as input. All python packages are invoked with the R-Python interface reticulate .
To evaluate the performance of our proposed method, we compare the performance of ReinBo with several state of art AutoML systems, as well as several conditional hyper-parameter space tuning methods. We compare against Auto-sklearn  and TPOT , both based on scikit-learn . At the time of writing, there is no documentation and examples online for ML-Plan , thus, ML-Plan is not included in the experiment. Additionally, we compare against hyperparameter optimization techniques which preserve the hierarchical conditional structure, including Tree-structured Parzen Estimator (TPE)  used in Hyperopt , and Random Search with conditional Ancestral Sampling (self implemented in R package parabox). Random Search remains a very strong baseline in a lot of machine learning hyper-parameter optimization scenarios .
Evaluation Criteria As warned in , many state of art AutoML systems seem to have missed to deal with the risk of overfitting. Therefore, in the experiment part, we focus on evaluating the generalization capability of the selected pipeline empirically. To avoid any potential confusion from synonyms, we use to represent the part of a dataset fed into a given AutoML system and use to represent the locked out part  of the same dataset used to test the generalization capacity. The split of and is done by Cross Validation, which means for a dataset , and . To create the and split, we use 5-fold cross-validation (CV5), which corresponds to the outer loop of Nested Cross Validation (NCV). We take the aggregated mmce (mean miss-calssification error) across the 5 iterations of outer loop of NCV as performance measure.
Instead of using running time as budget, we use the number of pipeline configurations as the unit of budget, to account for performances variations due to hardware and network load etc. For , we assigned a budget of 1000 times of CV5 equivalent to each AutoML algorithm, which corresponds to the inner loop of NCV.
Other Setups To account for different AutoML systems data input format incompatibility problem, we conduct dummy encoding to categorical features beforehand. Aiming for a light weight implementation, in the experiment, we limit our choice of pipeline components for ReinBo. We compose a pipeline in 3 stages, with potential operations/actions at each stage listed in Table 1. Associated hyper-parameters with an un-configured pipeline are listed in Table 2. We call the components and associated hyper-parameters the pipeline pool. The same pipeline pool is used for ReinBo, TPE and Random Search.
For Auto-sklearn, Meta-learning and ensemble are disabled, the resampling strategy is set to be CV5, stop criteria is changed to budget instead of time and all other configurations are kept default. We have contacted the author of Autosklearn through github for the right use of the API to ensure the above configuration is satisfied. For TPOT (version 0.9), the default configuration space contains a lot of operators while the light version provides only fast models and pre-processors. The light TPOT is therefore less time-consuming but it could probably lead to lower accuracy in consequence. For this reason, we compare ReinBo with both TPOT with the default configuration and TPOT with light configuration, and we call them TPOT and TPOT-light respectively. TPOT is configured to allow equal amount of budgets with all methods being compared and other configurations are left to be default.
Datasets We experimented on a set of standard benchmarking datasets of high quality collected from OpenML-CC1855 5 https://www.openml.org/s/99  and Auto-Weka , which are rather well-curated from many thousands and have diverse numbers of classes, features, observations, as well as various ratios of the minority and majority class size. Summary of these datasets is listed in Table 3.
4.2 Experiment results
In Figure 3, we compare the mmce (1-Accuracy) of each method with boxplot over the datasets listed in Table 3 across 10 statistical replications. Additionally, we list numerical results in Table 4 with statistical test, where each numerical value represents the aggregated mean mmce over the statistical replications. Underline in each row indicates the smallest mean value over the corresponding dataset. The bold-faced values indicate that the corresponding algorithm does not perform significantly worse than the underlined algorithm on the corresponding dataset based on Mann-Whitney U test. As shown in Table 4, ML-ReinBo has boldfaces for 8 of 10 datasets followed by much less boldfaces from other methods.
In Table 5, we compare win (significantly better), lose and tie (neither significantly better nor worse) relationships according to the test. As shown in Table 5, ReinBo has won TPOT on 5 datasets and performed worse than TPOT for only one dataset. And not surprisingly, TPOT has performed considerably better than TPOT-light in the empirical experiments since TPOT-light has smaller search space with only fast models and preprocessors. ReinBo also performs much better than Random Search and TPE, where ReinBo has significantly won them on 5 and 6 tasks respectively and lost only on 1 task. Compared to ReinBo, Auto-sklearn has won only once and behaved worse than ReinBo on 3 of 10 datasets.
Meanwhile, ReinBo has comparatively short box ranges in most cases as shown in Figure 3. Hence, we would conclude that ReinBo performed better and more stably than other algorithms in our empirical experiments. Besides comparing the final performance, it is also interesting to look into the suggested machine learning pipelines by an AutoML system. The frequencies of the operators in the pipelines suggested by ReinBo are listed in Table 6.
|Preprocess||Freq.||Relative freq.||Feature engineering||Freq.||Relative freq.||Classifier||Freq.||Relative freq.|
Running time Figure 4 shows the average running time each algorithm takes to complete one experiment, which corresponds to a complete Nested Cross Validation (NCV) process.
From the figure, it can be seen that Auto-sklearn is the most time-consuming algorithm in our empirical experiments. Although TPOT-light is the fastest algorithm, it resulted in worse performance because it contains only fast operators. Our proposed ReinBo algorithm spent less time than Random Search and state of art AutoML systems TPOT and Auto-sklearn in average.
5 Summary and Future Work
We present a new AutoML algorithm ReinBo by embedding Bayesian Optimization into Reinforcement Learning. The Reinforcement Learning takes care of pipeline composition, and Bayesian Optimization takes care of configuring the hyper-parameters associated with the composed pipeline. ReinBo is inspired by Hyperband and previous efforts in AutoML by considering the trade-off of assigning resources to a particular configuration and exploring more configurations as a reinforcement learning problem, where the learned policy solves the trade-off automatically. Experiments show our method has a considerable improvement compared to other state of art systems and methods. For future work, it would be interesting to include meta learning into our system, which does not only learn how to construct a pipeline and configure it for a dataset in question, but also how to generalize the learned policy to a wide range of dataset by learning jointly on the meta features. Additionally, it would be nice to see how ReinBo performs on jointly optimizing neural architecture and continuous hyper-parameters like learning rate and momentum, as well as applications like Computer Vision  and semantic web based Recommendation Systems  where pipeline might play a role. Multi-Objective Bayesian Optimization  for the hyper-parameter tuning would be another future direction.
Janek Thomas has given us a lot of helpful suggestions, Martin Binder has helped us a lot with the mlrCPO setting, Florian Pfisterer has helped us with the Auto-sklearn setup. We thank all the above and Tieu-binh Ly for giving us a lot of suggestions for the language.
-  Allaire, J., Ushey, K., Tang, Y.: reticulate: Interface to ’Python’ (2019), https://CRAN.R-project.org/package=reticulate, r package version 1.11.1
-  Baker, B., Gupta, O., Naik, N., Raskar, R.: Designing neural network architectures using reinforcement learning. International Conference on Learning Representations (2017)
-  Banzhaf, W., Nordin, P., Keller, R.E., Francone, F.D.: Genetic programming: an introduction, vol. 1. Morgan Kaufmann San Francisco (1998)
-  Barto, A.G., Mahadevan, S.: Recent advances in hierarchical reinforcement learning. Discrete event dynamic systems 13(1-2), 41–77 (2003)
-  Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. Journal of Machine Learning Research 13(Feb), 281–305 (2012)
-  Bergstra, J., Yamins, D., Cox, D.D.: Hyperopt: A python library for optimizing the hyperparameters of machine learning algorithms. In: Proceedings of the 12th Python in science conference. pp. 13–20. Citeseer (2013)
-  Bergstra, J.S., Bardenet, R., Bengio, Y., Kégl, B.: Algorithms for hyper-parameter optimization. In: Advances in neural information processing systems. pp. 2546–2554 (2011)
-  Binder, M.: mlrCPO: Composable Preprocessing Operators and Pipelines for Machine Learning (2019), https://CRAN.R-project.org/package=mlrCPO, r package version 0.3.4-2
-  Bischl, B., Casalicchio, G., Feurer, M., Hutter, F., Lang, M., Mantovani, R.G., van Rijn, J.N., Vanschoren, J.: Openml benchmarking suites and the openml100. arXiv preprint arXiv:1708.03731 (2017)
-  Bischl, B., Lang, M., Kotthoff, L., Schiffner, J., Richter, J., Studerus, E., Casalicchio, G., Jones, Z.M.: mlr: Machine learning in r. Journal of Machine Learning Research 17(170), 1–5 (2016), http://jmlr.org/papers/v17/15-066.html
-  Bischl, B., Richter, J., Bossek, J., Horn, D., Thomas, J., Lang, M.: mlrmbo: A modular framework for model-based optimization of expensive black-box functions. arXiv preprint arXiv:1703.03373 (2017)
-  Brochu, E., Cora, V.M., De Freitas, N.: A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprint arXiv:1012.2599 (2010)
-  Dietterich, T.G.: Hierarchical reinforcement learning with the maxq value function decomposition. Journal of Artificial Intelligence Research 13, 227–303 (2000)
-  Drori, I., Krishnamurthy, Y., Rampin, R., de Paula Lourenco, R., Ono, J.P., Cho, K., Silva, C., Freire, J.: Alphad3m: Machine learning pipeline synthesis. In: AutoML Workshop at ICML (2018)
-  Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., Hutter, F.: Efficient and robust automated machine learning. In: Advances in Neural Information Processing Systems. pp. 2962–2970 (2015)
-  Horn, D., Dagge, M., Sun, X., Bischl, B.: First investigations on noisy model-based multi-objective optimization. In: International Conference on Evolutionary Multi-Criterion Optimization. pp. 298–313. Springer (2017)
-  Hutter, F., Hoos, H.H., Leyton-Brown, K.: Sequential model-based optimization for general algorithm configuration. In: International Conference on Learning and Intelligent Optimization. pp. 507–523. Springer (2011)
-  Jiulong, Z., Luming, G., Su, Y., Xudong, S., Xiaoshan, L.: Detecting chinese calligraphy style consistency by deep learning and one-class svm. In: 2017 2nd International Conference on Image, Vision and Computing (ICIVC). pp. 83–86. IEEE (2017)
-  Kulkarni, T.D., Narasimhan, K., Saeedi, A., Tenenbaum, J.: Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. In: Advances in neural information processing systems. pp. 3675–3683 (2016)
-  Kushwaha, N., Sun, X., Singh, B., Vyas, O.: A lesson learned from pmf based approach for semantic recommender system. Journal of Intelligent Information Systems 50(3), 441–453 (2018)
-  Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., Talwalkar, A.: Hyperband: A novel bandit-based approach to hyperparameter optimization. The Journal of Machine Learning Research 18(1), 6765–6816 (2017)
-  Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)
-  Mohr, F., Wever, M., Hüllermeier, E.: Ml-plan: Automated machine learning via hierarchical planning. Machine Learning 107(8-10), 1495–1515 (2018)
-  Olson, R.S., Moore, J.H.: Tpot: A tree-based pipeline optimization tool for automating machine learning. In: Workshop on Automatic Machine Learning. pp. 66–74 (2016)
-  Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, 2825–2830 (2011)
-  de Sá, A.G., Pinto, W.J.G., Oliveira, L.O.V., Pappa, G.L.: Recipe: a grammar-based framework for automatically evolving classification pipelines. In: European Conference on Genetic Programming. pp. 246–261. Springer (2017)
-  Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354 (2017)
-  Sun, X., Bommert, A., Pfisterer, F., Rahnenführer, J., Lang, M., Bischl, B.: High dimensional restrictive federated model selection with multi-objective bayesian optimization over shifted distributions (2019)
-  Thornton, C., Hutter, F., Hoos, H.H., Leyton-Brown, K.: Auto-weka: Automated selection and hyper-parameter optimization of classification algorithms. CoRR abs/1208.3719 (2012), http://arxiv.org/abs/1208.3719
-  Thornton, C., Leyton-Brown, K.: Auto-weka: Automated selection and hyper-parameter optimization of classification algorithms (2012)
-  Watkins, C.J., Dayan, P.: Q-learning. Machine learning 8(3-4), 279–292 (1992)
-  Yang, F., Gustafson, S., Elkholy, A., Lyu, D., Liu, B.: Program search for machine learning pipelines leveraging symbolic planning and reinforcement learning. In: Genetic Programming Theory and Practice XVI, pp. 209–231. Springer (2019)
-  Yang, F., Lyu, D., Liu, B., Gustafson, S.: Peorl: Integrating symbolic planning and hierarchical reinforcement learning for robust decision-making. arXiv preprint arXiv:1804.07779 (2018)
-  Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578 (2016)