We present a multi-objective evolutionary optimization algorithm that usesGaussian process (GP) regression-based models to select trial solutions in amulti-generation iterative procedure. In each generation, a surrogate model isconstructed for each objective function with the sample data. The models areused to evaluate solutions and to select the ones with a high potential beforethey are evaluated on the actual system. Since the trial solutions selected bythe GP models tend to have better performance than other methods that only relyon random operations, the new algorithm has much higher efficiency in exploringthe parameter space. Simulations with multiple test cases show that the newalgorithm has a substantially higher convergence speed and stability thanNSGA-II, MOPSO, and some other more recent algorithms.
Quick Read (beta)
Multi-objective multi-generation Gaussian process optimizer for design optimization
We present a multi-objective evolutionary optimization algorithm that uses Gaussian process (GP) regression-based models to select trial solutions in a multi-generation iterative procedure. In each generation, a surrogate model is constructed for each objective function with the sample data. The models are used to evaluate solutions and to select the ones with a high potential before they are evaluated on the actual system. Since the trial solutions selected by the GP models tend to have better performance than other methods that only rely on random operations, the new algorithm has much higher efficiency in exploring the parameter space. Simulations with multiple test cases show that the new algorithm has a substantially higher convergence speed and stability than NSGA-II, MOPSO, and some other more recent algorithms.
The design of a complex system often requires the search of the ideal solution among a multi-variable parameter space. The ideal solution may involve a trade-off of competing performance requirements. In recent years, multi-objective evolutionary algorithms (MOEAs) have been widely adopted to discover the set of solutions with the best performances, i.e., the Pareto front. These include multi-objective genetic algorithms (MOGA) [1, 2, 3] and multi-objective particle swarm optimization (MOPSO) [4, 5, 6, 7].
In the particle accelerator field, there are many challenging design optimization problems, such as lattice design for synchrotron light sources [8, 9, 10], beamline design for photoinjectors , and cavity design for superconducting Radio Frequency (SRF) components. Numeric optimization is often used in the design studies. In a design optimization study, many trial solutions will be evaluated, and typically an evaluation involves the numeric simulation of the physics processes that affect the system performance. Such a simulation could be computationally expensive (e.g., hours), especially as the current trend is to build in as many details into the physics model as possible. On the other hand, accelerator projects often have tight schedules, with limited time available for design studies. Therefore, high efficiency of the optimization algorithm is crucial. The fast convergence requirement means that the optimizer not only needs to be able to converge to the true Pareto front with a small number of evaluations, but also has a small computation overhead. In addition, the performance of the optimizer needs be stable in order to avoid the need to rerun the same optimization multiple times.
Both MOGA [11, 8, 9] and MOPSO [12, 10] algorithms have found use in accelerator design studies in recent years. In MOGA and MOPSO algorithms, an iterative process is executed to update a population of solutions. During each iteration, which may be referred to as a generation, new trial solutions are generated and evaluated. Both methods employ stochastic operations to produce new solutions with existing good solutions, although the details differ. These operations are heuristically effective, but are intrinsically inefficient as the new solutions are not based on any valid prediction. There is a strong incentive to develop more powerful methods as the design of future accelerators is becoming more challenging.
Several novel techniques are adopted in some of the more recently developed MOEAs, such as objective decomposition (MOEA/D ), multiple search strategies (MMOPSO ) and problem transformation scheme (WOF-SMPSO ), to tackle complex or large-scale multi-objective optimization problems (MOPs). These new algorithms can be considerably faster than the conventional MOGA or MOPSO algorithms. However, like the MOGA and MOPSO, the new algorithms do not make full use of the information in the evaluated solutions to assist the search for the Pareto front.
Surrogate assisted MOEAs [13, 14, 15] have been developed to improve the efficiency of the algorithms. These algorithms are often based on posterior Gaussian process (GP) [16, 17, 18, 19, 20, 21] models. A posterior Gaussian process is a non-parametric, analytic model derived from a prior Gaussian process and the sample data, based on the Bayesian inference. It serves as a surrogate model of the actual physics model that represents the system and is used to produce the sample data. The GP model can be used to predict the performance of solutions not yet evaluated, along with an uncertainty estimate.
MOEA/D-EGO  and ParEGO  are two of the GP assisted algorithms that have excellent performance for many optimization problems. However, the overhead of these surrogate assisted MOEAs is usually significant for problems with a high-dimensional decision space, due to the high time complexity for the common techniques involved in these algorithms, such as population clustering, acquisition function optimization, and drop-out strategy. For example, in a test case with the ZDT test functions  of 30 decision variables, the data preprocessing before the GP model building in MOEA/D-EGO can cost around 1 hour at the beginning, and the time complexity grew cubically with each iteration. ParEGO performed better with regard to the computation overhead. However, it suffers from a low sample number per iteration, with only 1 sample data point per iteration, as compared to 5 or more in MOEA/D-EGO. Therefore, it is not time efficient and cannot take advantage of parallel computing capability, which is common in today’s computing environments. In addition, the performance of the algorithms that employ the aforementioned techniques suffer from the curse of dimensionality . Therefore, they are rarely used to solve problems with a relatively high dimensional () decision space . The difficulty to extend to high-dimension problems and the high computation overhead for these algorithms limit their usefulness in many design studies.
Neural network (NN)-based surrogate models have also been proposed to assist optimization algorithms [24, 25, 26]. The NN models are used to calculate or substitute the function evaluations, or help select an offspring solution for evaluation. A recent approach proposed in the accelerator field is to train a neural network as the surrogate model, from which the Pareto front can be found with optimization . The challenge for this approach is that a large amount of data points may be needed to construct a global NN-based surrogate model that is sufficiently accurate to determine the Pareto front with it. Transfer learning, i.e., application of a learned model from one design problem to another, is usually not applicable. This is because design problems, even for those with the same parameter setup, are very different from each other due to the nonlinear nature of the problems. Therefore, simulation data for one problem are typically of little use for another problem.
In this study, we propose a multi-objective multi-generation Gaussian process optimizer (MG-GPO) for design optimization. Similar to MOGA and MOPSO, it generates and manipulates a population of solutions with stochastic operations in an iterative manner. The difference is that posterior GP models are constructed and updated in each iteration and are used to select the trial solutions for the actual evaluation. The model-based selection substantially boosts the efficiency of the algorithm. The MG-GPO algorithm differs from other surrogate-assisted evolutionary algorithms, such as MOEA/D-EGO and ParEGO, in that the GP models are only used for filtering, instead of for the generation of new solutions (e.g., through optimization). This reduces the requirement for high accuracy in the model and consequently lowers the computation overhead, and give the algorithm high robustness and reliability. The implementation of the algorithm is also simple and straightforward. Because of such features, it is easy to apply to high-dimensional problems. In addition, as non-dominated sorting is used both to select trial solutions with the GP models and to select the fittest solutions for the next generation, constraints can be easily integrated by including them in the sorting criteria.
The paper is organized as follows. In Section II we give a brief introduction to the Gaussian process regression and optimization. The multi-generation GP optimizer is described in Section III. A test of the new optimizer with analytic functions is presented in Section IV. The conclusion is given in Section V.
II Gaussian process regression and optimization
The Gaussian process regression is a type of Bayesian inference, in which one combines a prior statistical model and the observed evidences to deduce knowledge of the actual statistical model, based on Bayes’ theorem of the conditional probabilities.
A Gaussian process is a statistical model of the distribution of a random function over space or time (distribution of a parameter space is assumed in the present context). The GP not only gives the probability distribution of the function at one location, but also its joint distribution with the function value at any other location. The joint distribution is a normal distribution. For an unknown function over a parameter space, a prior Gaussian process can be specified with the prior mean function and the kernel function , where and are vectors that represent points in the parameter space. Without any knowledge about the function, the prior mean is often assumed . The kernel function represents the covariance of the function values at two locations. It is often assumed to take the squared exponential form [20, 21],
where is the estimated variance of the function, is a diagonal matrix and the parameters specify the correlation of the function values at two points separated in space in the direction of coordinate.
After a number of sample data points, given as , , , , , are taken from the parameter space, we would like to know the function value at a new point . From the prior GP, the joint distribution of the sample data and the new point is given by a multi-variate normal distribution,
where is the kernel matrix, whose elements are , and the kernel vector is given by . The prior joint distribution function, Eq. 2, and the evidence by the sample data set allow us to calculate the conditional distribution of the function value at point , which is a normal distribution given by its mean and standard deviation ,
The expected mean, , is an estimate of the function value and the standard deviation gives the uncertainty.
Eqs. 3-4 are the posterior model of the actual function. It is worth noting that this is a non-parametric model. The sample data enter the model directly. The posterior distribution not only can be used to predict the function values in the parameter space, but also can be used to optimize the function.
In a GP optimizer, the posterior model is used to choose the next trial solution. With the posterior GP, an optimization algorithm is used to look for a point that the model predicts to yield the largest gain, which is then evaluated on the real system. After that, the new data point enters the sample data set and the GP model is updated accordingly. The measure of the gain is represented by the acquisition function, a popular choice of which is the upper confidence bound (UCB) for a maximization problem . For a minimization problem, it is the lower confidence bound (LCB), given by
where is a constant. A suitable value of is used to balance the exploitation and the exploration strategies - a small favors exploitation and a large favors exploration. Taking a large is to take some risk by going into the less certain area in the parameter space in exchange for the opportunity to yield a big gain.
After every new data point is added, the GP model is updated, which requires the inversion of the kernel matrix. During the search for the trial solution, many matrix multiplications are performed. These calculation can be time consuming if the dimension of the matrix is large. Therefore, the size of the data set is often limited to the order of hundreds.
III Multi-generation Gaussian process optimizer
The ability of the posterior GP model to approximate the actual model and to predict the performance of a new solution can be very useful in design optimization, where it is common to evaluate thousands or tens of thousands solutions in the search for the optimal design. A design study often has multiple objectives. In the following we propose a multi-objective, multi-generation Gaussian process optimization algorithm that would be ideal for design optimization.
Presently MOGA and MOPSO algorithms are widely used in the design optimization of accelerators. A popular MOGA algorithm is the NSGA-II . It takes an iterative scheme to update a population of solutions. At each iteration, it generates new trial solutions based on the existing ones, using the simulated binary crossover (SBX)  and polynomial mutation operations . In a crossover two solutions are combined to generate a pair of new solutions randomly distributed in between, while a mutation operation modifies a solution with random changes to the parameters. The new trial solutions are evaluated and compared to the existing solutions with a non-dominated sorting. Some solutions replace the existing ones and enter the next generation if they outperform the latter.
The MOPSO [4, 5] algorithm also manipulates a population of solutions iteratively. In this case, each solution is considered a particle in the parameter space. New solutions are generated by shifting the existing solutions in the parameter space by an offset called the velocity. The velocity consists of contributions from three terms: the previous velocity, a shift toward the best solution of the history of the particle (the personal best), and a shift toward a solution in the global best solutions. The velocity and the personal and global best solutions are updated at every iteration.
The MOGA and MOPSO algorithms work because the operations used to generate new solutions tend to produce solutions toward the direction with better performances, which are then selected and used for the next generation. However, there is no guarantee that the crossover and mutation operations or the shift by the velocity will yield better solutions. No information is extracted from the previous function evaluations other than the selection of the best solutions.
When we apply GP regression to model the existing solutions, we would be able to determine which new solutions have a high probability of yielding good performances. We can optimize with the posterior GP model to produce promising trial solutions. Or we can simply generate a large quantity of potential new solutions, evaluate them with the GP model, and use the outcome to select the solutions with a potential to yield a significant improvement. By selecting only these solutions for the computationally expensive function evaluation, we could substantially improve the efficiency of the algorithm.
The new algorithm, which may be referred to as the multi-objective, multi-generation Gaussian process optimizer (MG-GPO), also works iteratively. The initial population of solutions may be randomly generated, throughout the parameter space, or within a small region in the parameter space. The population of solutions, , is fixed.
At each iteration, new solutions will be generated and evaluated. The set of solutions evaluated on iteration may be labeled . The set is combined with the best solutions from the last iteration, which form a set labeled , and the combined set is sorted with the non-dominated sorting , from which the population of best solutions is updated.
A GP model is constructed for each objective, which has its own set of model parameters, and . We also give the prior GP model a non-zero mean, . The value of and are given by the mean and standard deviation of the function values of the previous data set, respectively. With the non-zero mean, Eq. 3 is replaced with
The use of a non-zero mean helps avoid an abrupt change in the function value when searching in the transition region between the sampled area and the un-sampled areas. A wrong mean value could produce a bias that either pull the search into the unexplored territory or prevent the search into it.
While it is possible to use a multi-objective optimization algorithm to optimize the surrogate models, produce the Pareto front, and use the solutions in the Pareto front for the actual evaluation, we adopt a simple approach to sample the area around the existing best solutions. New solutions are generated through the mutation and crossover operations. For each solution in the previous population of best solutions, , new solutions are by mutation and another solutions are by crossover. Mutation is done by the polynomial mutation (PLM) technique . Crossover is done with the SBX technique, as is done in NSGA-II. Obviously, there could be better ways to generate new solutions, for example, by using the gradient afforded by the posterior GP model. Nonetheless, the present simple approach is adequate to demonstrate the advantage of the GP method. Besides, we can always increase the number of new solutions to improve the sampling of the GP models as the cost of evaluating the GP models is usually negligible compared to the actual physics simulation.
The solutions are then evaluated with the GP models, which give the expected mean and standard deviation for each objective function. We choose the GP-LCB acquisition functions as the figure of merit for the solutions. A non-dominated sorting is then performed over the solutions, from which solutions are selected for the actual design simulation. These solutions form the set , which is then combined with and another non-dominated sorting is used to updated the best solutions, yielding .
The parameter in GP-LCB can have a significant impact to the behavior of the algorithm. A large value encourages exploration of the parameter space but in the same time may not take full advantage of the learned model. Conversely, a small value better exploits the model but may not sufficiently explore the parameter space. It could be argued that at the beginning of an optimization a large is preferred as more exploration is needed in order to discover the area in the parameter space with good solutions. A small would be preferred in the later stage as the algorithm converges to a relatively small area where a refined search is needed. Figure 1 and 2 compares the convergence of the MG-GPO algorithm with several schemes for the ZDT3 test case (see next section). It was found that decreasing exponentially generation by generation from toward zero, with , gives the best performance for the problem. The optimal decreasing factor may depend on the optimization problem and the optimization setup. Ideally, an adaptive scheme of decreasing over generations, using certain performance metrics (e.g., hyper-volume) as the guide would be preferred.
The GP models are updated at the end of the iteration. The sample data used for the GP models are the combined set of and . There will be some redundant data points, as some solutions in has just entered . The duplicate points can be eliminated. It can also be left in, as it does not pose a difficulty, when singular value decomposition (SVD) is used in the matrix inversion in Eq. 6..
The MG-GPO algorithm is summarized below (with being the maximum number of generations, the decreasing factor for ) \[email protected]@algorithmic \STATE, Initialize the population, . Initialize . \STATEEvaluate all solutions in \STATEConstruct Gaussian process models, , with \WHILE \STATE \STATEUpdate with . \STATEFor each solution in , generate solutions with mutation and solutions with crossover. \STATEEvaluate the solutions with \STATEUse non-dominated sorting to select best solutions, which forms the set . \STATEEvaluate the solutions in in the actual system. \STATEUse non-dominated sorting to select best solutions from the combined set of and , the results of which form . \STATEConstruct Gaussian process models, , with solutions in and . \ENDWHILE
In the above algorithm, new trial solutions are generated with mutation and crossover operations, as is done in NSGA-II. The new trial solutions can also been generated with the moving particle approach as done in MOPSO. The attraction terms by the global and personal best solutions can be varied and selected by the GP models. We tested the approach and found that similar improvement in convergence efficiency can be made.
IV Performance comparisons with other algorithms
Simulations were conducted to demonstrate the fast convergence of the MG-GPO algorithm in comparison with a few commonly used multi-objective optimization algorithms. The algorithms selected for comparison include two classic algorithms, NSGA-II and MOPSO, and two more recent ones, MMOPSO  and WOF-SMPSO [31, 32, 7]. The NSGA-II, MMOPSO, and WOF-SMPSO codes used in the tests are from the PlatEMO platform. The MOPSO algorithm is the same as used in Ref.  and was based on the framework described in .
IV-A Test Instances
Four test cases have been used to test the performance of the MG-GPO algorithm in comparison to the NSGA-II, MOPSO, MMOPSO and WOF-SMPSO algorithms. These test cases are commonly used for algorithm performance comparison, for example, in Ref. . All test cases have two objective functions, and assumed to be minimization problems.
IV-A1 ZDT1 
The ZDT1 test case is defined by
The parameter ranges are [0, 1] for all variables. The optimal solutions are with for , 3, , and .
IV-A2 ZDT2 
ZDT2 is defined similarly as ZDT1, except that the function is redefined as
The ranges of the parameters are the same as ZDT1.
IV-A3 ZDT3 
The definition is similar to ZDT1, except that the function is redefined as
The ranges of the parameters are the same as ZDT1. The Pareto front of this case consists of disconnected stripes.
IV-A4 ZDT6 
ZDT6 test case is defined as
The ranges of the parameters are the same as ZDT1. Its Pareto front is nonconvex, and the distribution of the Pareto solutions is highly nonuniform.
The dimension of the test cases was chosen to be . This may represent the mid-dimensional decision spaces in design studies. Tests were also done with for ZDT1 and ZDT2. The cases represent high-dimensional design problems.
IV-B Experimental Settings
IV-B1 General Settings
For the MG-GPO implementation, the parameter range is normalized to . The mutation (PLM) and crossover (SBX) control parameters are set to and , respectively. To balance the need for initial exploration in the early stage and refined search in the later stage, the parameter in the GP-LCB acquisition function is exponentially decreased, initially with and with each generation it is scaled down by the factor . The multiplication factors are set to . In all test cases, the correlation length parameters of MG-GPO are optimized in each generation with respect to the data samples used for GP model construction. The GPy package is used for hyper-parameter optimization .
For NSGA-II, the crossover probability is set to . The distribution indices for the simulated binary crossover (SBX) and mutation operations are and , respectively .
For MOPSO, the weight factors in the velocity composition are and . The MOPSO algorithm also includes a mutation operation, with a probability rate of , where is the number of variables.
For MMOPSO, there are no algorithm-specific hyper parameters to modify.
For WOF-SMPSO, all hyper parameters are set to the default values, as suggested in Ref. .
The initial solutions are randomly distributed, with parameters drawn from a uniform distribution in the parameter range.
IV-B2 Mid-dimensional Settings
The mid-dimensional test problems are ZDT1, ZDT2, ZDT3 and ZDT6. The dimensions of the test problems are set to . The population size is set to for all algorithms. The algorithms are run for 4080 evaluations. Each test instance was repeated 10 times.
IV-B3 High-dimensional Settings
The high-dimensional test problems are ZDT1 and ZDT2. The dimensions of the test problems are set to . The population size is set to for all algorithms. The algorithms are run for 8080 evaluations. Each test instance was repeated 10 times.
IV-C Experimental Results
Figure 3 shows four snapshots of the distribution of the non-dominated front in the population of solutions during the course of optimization for the five algorithms being compared for all four test problems with , with the number of evaluations from 1000 to 4000, at an 1000 interval. The case with the median behavior among the 10 tests for each algorithm is shown.
For all four test problems, MG-GPO converges the fastest among the five algorithms. For ZDT1, MG-GPO nearly converges to the Pareto front within 1000 evaluations, followed by MMOPSO and WOF-SMPSO. For ZDT2, MG-GPO also converges within approximately 1000 evaluations, followed by WOF-SMPSO, which converges with 4000 evaluations. For ZDT3 and ZDT6, MG-GPO takes 2000 and 3000 evaluations to converge, respectively.
To better describe the convergence history of the algorithms, Figure 4 shows the evolution of HV and IGD metrics for the algorithms for the cases. For each algorithm, the median and the mean values for the 10 tests are shown. The spread of the metrics among the 10 tests for each algorithm is shown with shaded areas. Clearly, MG-GPO converges faster than the other algorithms. In addition, the performance of MG-GPO is very stable. The spread of the metrics for MG-GPO is considerably smaller than the other algorithms.
Simulations for test cases ZDT1 and ZDT2 with were done to demonstrate the performance of the MG-GPO algorithm for high-dimensional optimization problems. Figure 5 shows the non-dominated front of solutions for the algorithms at four generations. For both problems, MG-GPO converges to the Pareto front within 4000 evaluations, leading the other algorithms by a substantial margin. The evolution of HV and IGD metrics is shown in Figure 6, which clearly indicates that MG-GPO does not only converge fastest, but also has a stable performance.
The algorithm performance comparison results are also summarized in tables. Table I and II show the IGD and HV metrics, respectively, for the cases. The best value, the means, and the standard deviations (for the 10 runs) are listed for each algorithm at the four instants during the runs. Similarly, Table III and IV show the results for the cases. The tables show that MG-GPO converges significantly faster than the other algorithms.
The results of the Wilcoxon tests at a 0.05 significance level of MG-GPO vs. the other four algorithms are listed in Table VI and V for the HV and IGD metrics, respectively, for the cases. Table VIII and VII show the results for the cases. In the tables, the value 1 denotes that MG-GPO wins, 0 for no winner, -1 for that the other algorithm wins, and N/As in the HV tables denote that neither algorithm has reached a positive HV among the 10 runs. For the IGD tests, MG-GPO wins all tests against NSGA-II and MOPSO, wins 11 times and draws 5 times against MMOPSO, and wins 15 times and draws 1 time against WOF-SMPSO (for a total of 16 tests between the two algorithms). Similar result were found for the HV tests. Note that MG-GPO loses once to MMOPSO on ZDT1 at evaluation number 4000, but cross examination with Table II shows that the mean HV difference is quite small (). For the cases, MG-GPO wins almost all the tests, except with 1 draw with WOF-SMPSO on ZDT1 at evaluation number 2000 in IGD and HV. The Wilcoxon test results clearly indicate that MG-GPO has a significant better performance over the other algorithms being compared. This performance advantage is even bigger for the higher dimensional problems.
The reference point in the HV calculation for all the test instances is set to .
The zero HV value indicates that for all 10 runs no solution has reached the reference point.
The reference point in the HV calculation for all the test instances is set to .
The zero HV value has the same meaning as in Table II.
1 denotes that MG-GPO wins, 0 for no winner, -1 for that the other algorithm wins.
1, 0, and -1 have the same meaning as in Table V. N/A denotes that neither algorithm has reached a positive HV among the 10 runs.
1, 0, and -1 have the same meaning as in Table V.
1, 0, -1 and N/A have the same meaning as in Table VI.
We proposed a new multi-objective stochastic optimization algorithm that is based on Gaussian process regression. The new algorithm update a population of solutions iteratively. At each iteration, it constructs a posterior Gaussian process and uses it as a surrogate model of the actual system to be optimized. A large number of candidate solutions are generated and evaluated with the surrogate model and the results are used to select a small number of promising solutions to be evaluated on the real system (e.g., by physics simulation).
The new algorithm, referred to as multi-generation Gaussian process optimizer (MG-GPO), has been tested with analytic functions. In all test cases, a substantially faster convergence speed is found than the classic evolutionary algrithms, NSGA-II and MOPSO, as well as two more recent algorithms, MMOPOSO and WOF-SMPSO. The new algorithm would be very useful for design optimization of large systems where a search for optimal solutions in a multi-dimensional parameter space is needed.
Future improvement could be made to the MG-GPO algorithm by adopting an adaptive scheme to update the hyper-parameter for the GP-LCB acquisition function.
This work was supported by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences, under Contract No. DE-AC02-76SF00515 and FWP 2018-SLAC-100469 and by Computing Science, Office of Advanced Scientific Computing Research under FWP 2018-SLAC-100469ASCR.
-  K. Deb and D. Kalyanmoy, Multi-Objective Optimization Using Evolutionary Algorithms. New York, NY, USA: John Wiley & Sons, Inc., 2001.
-  K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitist multiobjective genetic algorithm: Nsga-ii,” IEEE Transactions on Evolutionary Computation, vol. 6, no. 2, pp. 182–197, April 2002.
-  Q. Zhang and H. Li, “Moea/d: A multiobjective evolutionary algorithm based on decomposition,” IEEE Transactions on Evolutionary Computation, vol. 11, no. 6, pp. 712–731, 2007.
-  J. Kennedy and R. Eberhart, “Particle swarm optimization,” in Proceedings of ICNN’95 - International Conference on Neural Networks, vol. 4, Nov 1995, pp. 1942–1948 vol.4.
-  M. Clerc and J. Kennedy, “The particle swarm - explosion, stability, and convergence in a multidimensional complex space,” IEEE Transactions on Evolutionary Computation, vol. 6, no. 1, pp. 58–73, Feb 2002.
-  Q. Lin, J. Li, Z. Du, J. Chen, and Z. Ming, “A novel multi-objective particle swarm optimization with multiple search strategies,” European Journal of Operational Research, vol. 247, no. 3, pp. 732–744, 2015.
-  H. Zille, H. Ishibuchi, S. Mostaghim, and Y. Nojima, “A framework for large-scale multiobjective optimization based on problem transformation,” IEEE Transactions on Evolutionary Computation, vol. 22, no. 2, pp. 260–275, 2018.
-  L. Yang, D. Robin, F. Sannibale, C. Steier, and W. Wan, “Global optimization of the magnetic lattice using genetic algorithms,” in Proceedings of EPAC’08, 2008, pp. 3050–3052.
-  M. Borland, V. Sajaev, L. Emery, and A. Xiao, “Direct methods of optimization of storage ring dynamic and momentum aperture,” in Proceedings of PAC’09, 2009, pp. 3850–3852.
-  X. Huang and J. Safranek, “Nonlinear dynamics optimization with particle swarm and genetic algorithms for spear3 emittance upgrade,” Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, vol. 757, pp. 48 – 53, 2014. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0168900214004914
-  I. V. Bazarov and C. K. Sinclair, “Multivariate optimization of a high brightness dc gun photoinjector,” Phys. Rev. ST Accel. Beams, vol. 8, p. 034202, Mar 2005. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevSTAB.8.034202
-  X. Pang and L. Rybarcyk, “Multi-objective particle swarm and genetic algorithm for the optimization of the lansce linac operation,” Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, vol. 741, pp. 124 – 129, 2014. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0168900213017464
-  Q. Zhang, W. Liu, E. Tsang, and B. Virginas, “Expensive multiobjective optimization by moea/d with gaussian process model,” IEEE Transactions on Evolutionary Computation, vol. 14, no. 3, pp. 456–474, 2010.
-  T. Chugh, Y. Jin, K. Miettinen, J. Hakanen, and K. Sindhya, “A surrogate-assisted reference vector guided evolutionary algorithm for computationally expensive many-objective optimization,” IEEE Transactions on Evolutionary Computation, vol. 22, no. 1, pp. 129–142, 2018.
-  J. Knowles, “ParEGO: a hybrid algorithm with on-line landscape approximation for expensive multiobjective optimization problems,” IEEE Transactions on Evolutionary Computation, vol. 10, no. 1, pp. 50–66, 2006.
-  H. J. Kushner, “A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise,” Journal of Basic Engineering, vol. 86, no. 1, p. 97, 1964. [Online]. Available: https://doi.org/10.11152F1.3653121
-  A. G. Zhilinskas, “Single-step bayesian search method for an extremum of functions of a single variable,” Cybernetics and Systems Analysis - CYBERN SYST ANAL-ENGL TR, vol. 11, pp. 160–166, 01 1975.
-  J. Mockus, “On bayesian methods for seeking the extremum and their application.” in IFIP Congress, 1977, pp. 195–200. [Online]. Available: http://dblp.uni-trier.de/db/conf/ifip/ifip1977.html#Mockus77
-  D. R. Jones, M. Schonlau, and W. J. Welch, “Efficient global optimization of expensive black-box functions,” Journal of Global Optimization, vol. 13, no. 4, pp. 455–492, Dec 1998. [Online]. Available: https://doi.org/10.1023/A:1008306431147
-  C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning. the MIT Press, 2006.
-  E. Brochu, V. M. Cora, and N. de Freitas, “A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning,” CoRR, vol. abs/1012.2599, 2010. [Online]. Available: http://arxiv.org/abs/1012.2599
-  E. Zitzler, K. Deb, and L. Thiele, “Comparison of multiobjective evolutionary algorithms: Empirical results,” Evol. Comput., vol. 8, no. 2, pp. 173–195, Jun. 2000. [Online]. Available: http://dx.doi.org/10.1162/106365600568202
-  R. Bellman, R. Corporation, and K. M. R. Collection, Dynamic Programming, ser. Rand Corporation research study. Princeton University Press, 1957. [Online]. Available: https://books.google.it/books?id=wdtoPwAACAAJ
-  A. Syberfeldt, H. Grimm, A. Ng, and R. I. John, “A parallel surrogate-assisted multi-objective evolutionary algorithm for computationally expensive optimization problems,” in 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence), 2008, pp. 3177–3184.
-  P.-G. Liu, X. Han, and C. Jiang, “A novel multi-objective optimization method based on an approximation model management technique,” Computer Methods in Applied Mechanics and Engineering, vol. 197, pp. 2719–2731, 06 2008.
-  G. Kourakos and A. Mantoglou, “Development of a multi-objective optimization algorithm using surrogate models for coastal aquifer management,” Journal of Hydrology, vol. 479, p. 13–23, 02 2013.
-  A. Edelen, A. Adelmann, N. Neveu, Y. Huber, and M. Frey, “Machine learning to enable orders of magnitude speedup in multi-objective optimization of particle accelerator systems,” https://arxiv.org/pdf/1903.07759.pdf, March 2019.
-  P. Auer, “Using confidence bounds forexploitation-exploration trade-offs,” ournal of Machine Learning Research, vol. 3, pp. 397–422, 2002.
-  K. Deb and R. B. Agrawal, “Simulated binary crossover for continuous search space,” Complex Systems, vol. 9, pp. 115–148, 1995.
-  K. Liagkouras and K. Metaxiotis, “An elitist polynomial mutation operator for improved performance of moeas in computer networks,” in 2013 22nd International Conference on Computer Communication and Networks (ICCCN). IEEE, 2013, pp. 1–5.
-  H. Zille, H. Ishibuchi, S. Mostaghim, and Y. Nojima, “Weighted optimization framework for large-scale multi-objective optimization,” in Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion, ser. GECCO ’16 Companion. New York, NY, USA: Association for Computing Machinery, 2016, p. 83–84. [Online]. Available: https://doi.org/10.1145/2908961.2908979
-  H. Zille and S. Mostaghim, “Comparison study of large-scale optimisation techniques on the lsmop benchmark functions,” in 2017 IEEE Symposium Series on Computational Intelligence (SSCI), 2017, pp. 1–8.
-  Y. Tian, R. Cheng, X. Zhang, and Y. Jin, “PlatEMO: A MATLAB platform for evolutionary multi-objective optimization,” IEEE Computational Intelligence Magazine, vol. 12, no. 4, pp. 73–87, 2017.
-  GPy, “GPy: A gaussian process framework in python,” http://github.com/SheffieldML/GPy, since 2012.
-  E. Zitzler, L. Thiele, M. Laumanns, C. M. Fonseca, and V. G. Da Fonseca, “Performance assessment of multiobjective optimizers: An analysis and review,” IEEE Transactions on evolutionary computation, vol. 7, no. 2, pp. 117–132, 2003.
-  C. A. C. Coello and M. R. Sierra, “A study of the parallelization of a coevolutionary multi-objective evolutionary algorithm,” in Mexican International Conference on Artificial Intelligence. Springer, 2004, pp. 688–697.
-  M. R. Sierra and C. A. C. Coello, “A new multi-objective particle swarm optimizer with improved selection and diversity mechanisms,” Technical Report of CINVESTAV-IPN, 2004.
Xiaobiao Huang Xiaobiao Huang graduated from Tsinghua University with a BS in physics and a BE in computer science. He obtained a PhD degree in Physics from Indiana University, Bloomington in 2005. He has been a staff scientist at SLAC National Accelerator Laboratory since 2006.
Minghao Song Minghao Song graduated from Chongqing University with a BE in Nuclear Engineering and Technology. He obtained a ME degree in Nuclear Energy and Nuclear Technology Engineering from University of Chinese Academy of Sciences. He is currently a Ph. D student in physics from Illinois Institute of Technology, Chicago and conducting Ph. D thesis work at SLAC National Accelerator Laboratory.
Zhe Zhang Zhe Zhang graduated from Tsinghua University with a BE in engineering physics. He obtained a PhD degree in nuclear science and technology from Tsinghua University in 2017. He is currently a research associate at SLAC National Accelerator Laboratory.