Evolutionary algorithms for multi-center solutions

Large classes of multi-center supergravity solutions have been constructed in the study of supersymmetric black holes and their microstates. Many smooth multi-center solutions have the same charges as supersymmetric black holes, with all centers deep inside a long black-hole-like throat. These configurations are constrained by regularity, absence of closed timelike curves, and charge quantization. Due to these constraints, constructing explicit solutions with several centers in generic arrangements, and with all parameters in physically relevant ranges, is a hard task. In this work we present an optimization algorithm, based on evolutionary algorithms and Bayesian optimization, that systematically constructs numerical solutions satisfying all constraints. We exhibit explicit examples of novel five-center and seven-center machine-precision solutions.


Introduction
Black hole solutions in classical gravitational theories typically involve a small handful of parameters, such as mass, angular momentum, and charge.However black holes have an entropy proportional to their horizon area, which suggests that they have a vast number of internal degrees of freedom.Black holes also contain curvature singularities, and the semiclassical description of black hole evaporation leads to the information paradox [1,2].These facts present three corresponding major challenges for a fundamental theory of quantum gravity: to identify the black hole internal degrees of freedom, to resolve the singularities inside black holes, and to provide a consistent description of black hole evaporation.
In String Theory, black hole entropy arises from an exponential number of internal quantum microstates [3].It is therefore of significant interest to study the gravitational description of heavy pure states, in order to investigate string-theoretic singularity resolution and black hole evaporation.Large families of such pure states are well-described by smooth, horizonless supergravity solutions which, in the best-understood examples, provide a valuable description of black hole microstates [4][5][6][7].
Supersymmetric multi-center solutions involve non-trivial topology supported by flux.These solutions are specified by a set of harmonic functions on a three-dimensional Euclidean space.This formalism is typically used to construct supergravity fields in four, five or six macroscopic dimensions.Depending on the details, large families of multi-center solutions in five and/or six macroscopic dimensions can have the features of being horizonless and smooth, up to possible orbifold singularities, see e.g.[19].We shall moreover focus on the "scaling" regime in which the centers lie deep inside a long black-hole-like throat [18][19][20].
Constructing multi-center solutions with several centers is a hard problem.This is because asymptotic flatness, charge quantization, smoothness and absence of closed timelike curves (CTCs) comprise a set of non-trivial algebraic constraints.These constraints make the positions of the centers and the coefficients of the poles of the harmonic functions highly interdependent.The most important set of these constraints is known as the bubble equations.The distances between the centers are generically irrational real quantities.The bubble equations constrain these distances in terms of quantized parameters; this is a strong set of constraints.For further discussion, see e.g.[21].
Despite this difficulty, several solutions with three or four centers have been constructed analytically, see e.g.[18,20,[22][23][24][25].However, fewer solutions with five or more centers have been constructed, and until relatively recently these typically involved taking all centers to lie on a line, so that the configuration is axisymmetric, see e.g.[18].An important step forward was recently made, by considering the dependent variables in the bubble equations to be a subset of the coefficients of the poles of the harmonic functions (which we call "flux parameters"), rather than the distances between the centers [21].In this form, the bubble equations are a linear system, involving a symmetric matrix M. It was moreover conjectured that a configuration does not contain CTCs if and only if M is positive-definite.While this perspective simplifies the task of finding physically relevant solutions, the strong nature of the constraints between generically irrational distances and quantized parameters remains.
One can construct exact solutions with this method by arranging non-generic locations of centers, however this is not easy to implement in practice, and is currently limited to quite special arrangements of centers [21].An alternative approach is to construct approximate solutions to the bubble equations, as discussed in [21] and done in [22,23].One can do so in an iterative approach by first choosing a set of locations of the centers and solving for the flux parameters, obtaining generically irrational values.One then rounds any irrational flux parameters to nearby rational values that enable all quantization constraints to be satisfied.One then takes the rounded fluxes and attempts to re-solve the bubble equations in the traditional approach, to find the distances between the centers.This method has been used to numerically construct four-center solutions in axisymmetric or near-axisymmetric configurations [23].However, it is unclear whether this method is generically tractable for more than four centers [21].
In this paper we present a novel algorithmic method to construct numerical solutions with any number of centers, and with no symmetry imposed on the locations of the centers.The basic idea is to proceed in two steps.First, we generate a suitably good starting configuration which has certain desired physical properties, but is not yet a solution to the bubble equations.Second, we systematically vary the positions of the centers to construct a sequence of approximate solutions with increasing precision.
These two steps require two separate algorithms, since they optimize over different variables.In the first step, we generate a good starting configuration by optimizing over flux parameters.In the second step, we optimize over the locations of the centers.
The starting configuration is required to have two key physical properties.The first requirement is that the configuration be in the scaling regime mentioned above, in which the centers lie deep inside a long black-hole-like throat.We implement this by first finding an exact solution to the homogeneous form of the bubble equations, which will be reviewed in Section 2. We then round the flux parameters as described above.
The second requirement is that the supergravity charge radii are large, such that the solutions are weakly curved.We shall primarily have in mind solutions corresponding to bound states of D1 branes, D5 branes, and momentum P in a compact direction, in five or six macroscopic dimensions.In six dimensions, the (dimensionful) D1 and D5 charges, Q 1 , Q 5 control the main curvature scale of the solution (and contribute to the ADM mass), so we require them to be appropriately large.
The charges Q 1 , Q 5 , considered as functions of the flux parameters, are computationally expensive to evaluate.Bayesian optimization is well-suited to the task of optimizing computationally expensive functions (see e.g.[61]); we therefore employ it for the first step.The reason that the charges are computationally expensive to evaluate is that, given some flux parameters, one must first solve the bubble equations for the remaining flux parameters, and then evaluate the expressions for the charges Q 1 , Q 5 , which will be given in Section 2.
In a nutshell, Bayesian optimization is a strategy to choose points (in our case, values of flux parameters) on which to evaluate, or sample, the function to be optimized, known as the "objective" function (in our case, min(Q 1 , Q 5 )).After a point is sampled, the accrued knowledge of the objective function is updated, and then used to decide the next point to sample.The next point is selected according to a specified strategy that balances exploitation of more favourable regions (where Q 1 , Q 5 are known to be larger) versus exploration of unknown regions (where Q 1 , Q 5 have not yet been computed).We will describe this in detail in Section 3.2.
After a successful run of the Bayesian optimization algorithm, we have a configuration in the scaling regime, with appropriately large charges Q 1 , Q 5 , which is not yet a solution.It can be regarded as an approximate solution, but with low precision.In the second step, we construct numerical solutions by varying the positions of the centers.
Our two-step approach means that after a successful first step, it is reasonable to expect that if there is a genuine solution nearby, it is likely to require incremental adjustments to the positions of the centers, rather than a wide search.Evolutionary algorithms are well-suited to problems where incremental changes result in incremental improvements (see e.g.[12]).We therefore employ an evolutionary algorithm in the second step.
Evolutionary algorithms work by generating a population of individuals, comprising certain data known as genes, and quantifying their fitness via a function known as the fitness function.The algorithm then generates subsequent generations of individuals following the principles of the Darwinian theory: selection, reproduction and mutation.By iterating this process over several generations, the algorithm aims to construct new individuals with higher fitness.
In our algorithm, an individual is a multi-center supergravity configuration that approximately solves the bubble equations (in their full, inhomogeneous form).Its genes are (an appropriate subset of) the positions of the centers.The coefficients of the poles of the harmonic functions are determined by the previous step.The fitness function quantifies the precision to which the bubble equations are approximately satisfied, with higher fitness being an approximate solution with a lower error.Once a multi-center configuration with the desired fitness is generated, we investigate the absence of CTCs by computing the eigenvalues of the matrix M.
Our algorithm is designed to generate solutions with any number of centers, in a generic configuration.We have run the algorithm on several configurations of three, five and seven centers, and we shall exhibit explicit examples of novel five-and seven-center configurations.Generating solutions with a higher number of centers is feasible, although naturally this is computationally more expensive.The algorithm is implemented in Python, and the code is publicly available. 1his paper is structured as follows.In Section 2 we review multi-center scaling solutions, and their construction in the formalism of [21].In Section 3 we first describe our overall method, and then describe the Bayesian optimization algorithm and the evolutionary algorithm we have developed.In Section 4 we describe explicit examples of five-center and seven-center scaling configurations obtained with our method, and comment on the performance of the algorithm.We discuss our results in Section 5.
2 Multi-center scaling supergravity solutions

Multi-center solutions
For concreteness, we primarily consider 5D N = 1 Super-Einstein-Maxwell-Yang-Mills supergravity, whose bosonic field content is the metric, three Abelian vector multiplets, and an SU(2) triplet of non-Abelian vector multiplets.If one turns off the non-Abelian multiplets, one recovers the STU model.
Multi-center solutions are specified by a set of harmonic functions on a three-dimensional Euclidean "base" space, which have poles at the location of the centers.The index a = 0, 1, ..., n − 1 labels the centers, and r a = |⃗ r − ⃗ r a | is the distance from the a-th center in the three-dimensional base.In the Abelian sector the harmonic functions are (i = 0, 1, 2): where q a ∈ Z.In the non-Abelian sector [62], denoting the gauge coupling by g, we have 2) The harmonic function H defines a four-dimensional Gibbons-Hawking metric via where ds2 3 is the flat metric on R 3 , and A is a one-form related to H via ⋆ 3 dA = dH.For the full five-dimensional fields, we refer the reader to [21].
Only certain subsets of possible coefficients of the poles in Eqs.(2.1) and (2.2) lead to physically sensible solutions: one needs to impose further constraints.First, asymptotic flatness requires a q a = 1.Second, upon uplifting to Type IIB supergravity compactified on S 1 × T 4 , the coefficients k i a are quantized in terms of integer flux parameters n i a as follows [47], where the coordinate volume of T 4 is (2π) 4 V 4 and that of the S 1 is 2πR y .
In this paper we focus on smooth horizonless supersymmetric solutions. 2The following relations are imposed by absence of event horizons and singularities (the first three relations) and asymptotic flatness (the last two relations), see e.g.[21,App. A.3], (2.5) The absence of Dirac-Misner singularities imposes the so-called "bubble equations" [68,19,62], which constrain the relation between the positions of the centers and the local charges: where Here r ab is the R 3 Euclidean distance between centers a and b, Π i ab are the magnetic fluxes, and we will refer to the coefficients k i a as flux parameters.The bubble equations are a set of n equations among which only (n − 1) are independent: summation over a leads to a trivial identity, due to the antisymmetry of the Π i ab .
The asymptotic charges of the multi-center solutions are [19]: a,b,a̸ =b (2.8)

Scaling solutions and their construction
Of particular interest are solutions to the bubble equations (2.6) in which the distances between the centers can be made uniformly parametrically small by scaling r ab → λr ab with λ ≪ 1, while keeping the asymptotic charges approximately constant.These solutions are known as "scaling" solutions [18][19][20].Note that the rescaling r ab → λr ab is equivalent to multiplying the RHS of (2.6) by λ, with λ ≪ 1.It will be useful for us to note that in the limit λ → 0, one obtains the homogeneous bubble equations [20] (see also for instance [23]), b̸ =a Therefore, in the scaling regime of small λ, solutions to the full inhomogeneous bubble equations (2.6) are also approximate solutions to the homogeneous bubble equations (2.9), up to terms of order λ.We will exploit this to construct new scaling solutions.
The full inhomogeneous bubble equations (2.6) have typically been considered as equations in which the variables to be solved for are the distances r ab , see e.g.[19].This perspective has two disadvantages [21].First, it is generically difficult to find solutions for r ab .Second, after solving the equations, one often finds that the resulting r ab do not represent possible distances between points in 3D Euclidean space; for instance, the triangle inequality might not be respected.
A recently developed alternative approach is to exploit the feature that the bubble equations (2.6) are linear in the flux parameters k 2 a .Thus, instead of solving for the distances, one can first specify the positions of the centers, and then solve for the flux parameters k 2 a with a = 2, 3, ...n [21] 4 .While this procedure is general and not restricted to scaling solutions, let us now review it in the context of scaling solutions.We introduce a scaling parameter λ that rescales the positions of the centers while keeping the shape of the distribution fixed: i.e. we write the distance between the centers as r ab = λd ab , where d ab remain constant in the scaling process.We define5 where we have introduced the constant s which takes values 0 or 1.These values correspond respectively to the homogeneous and inhomogeneous bubble equations, as we shall see momentarily.We then introduce (α , β = 1, ..., n − 1)6 in terms of which, we write the following linear system of equations in the fluxes Π 2 ab : (2.12) For s = 1 this linear system is equivalent to the inhomogeneous bubble equations (2.6), while for s = 0 the system is equivalent to the homogeneous bubble equations (2.9).
Although this perspective has simplified the task of solving the bubble equations, it remains a fact that generic solutions obtained in this way will not respect the quantization conditions in Eq. (2.4).This can be seen as follows.If we choose generic locations of the centers, generic relative distances will be irrational numbers.Then generic solutions will give irrational values of the flux parameters k 2 α , which is in conflict with the quantization conditions in Eq. (2.4).As described in the Introduction, using this method one can construct exact solutions with quantized fluxes by arranging a set of non-generic locations of centers, such that all relative distances are rational.For instance one can take all centers to lie on a line, or on a circle, as discussed in [21]. 7While these constructions provide interesting and valuable exact solutions, the requirement to work with non-generic locations of centers is a significant limitation.
To proceed further, an alternative approach is to construct approximate solutions to the bubble equations.One can do so with an iterative approach, as follows.One first chooses a set of center locations, then solves for k 2 α , generically obtaining irrational values.One then rounds the k 2 α to nearby rational numbers to a desired precision, obtaining an approximate solution, as discussed in [21] and done in [22,23].
This can be further improved by taking the rounded flux parameters k 2 α , re-solving the bubble equations (in the traditional way) to obtain a new set of distances r ab , and then arranging center positions to have the resulting relative distances.If this could be done analytically, one can obtain exact solutions, however typically this is hard, for the reasons discussed below Eq. (2.9).More realistically, one can employ this method to improve the precision of the approximate solution, as done in [23].However, for more than four centers, doubt has been expressed as to the feasibility of this method [21].
3 Constructing numerical scaling solutions

Overview of the method
In this section we describe our method to construct numerical solutions.The method involves two main steps.In the first step, we generate an approximate scaling solution with rounded flux parameters k 2 α and large charges Q 1 , Q 5 .The condition of large charges is imposed using a Bayesian optimization algorithm, optimizing over flux parameters.In the second step, we fix the flux parameters and optimize instead over the locations of the centers, using an evolutionary algorithm.
In this subsection we first describe the overall method, focusing primarily on the first step of generating an appropriately good starting configuration.The following subsections will describe the respective algorithms in detail.
A key ingredient in the first step is the construction of configurations in the scaling regime.
To do this, we follow the discussion below (2.9).We solve the homogeneous bubble equations (2.9) in the form (2.12) by taking the position of the centers ⃗ r a and the coefficients q a , l i 0 , k 0,1 a and k 2 0 to be independent variables, and solving for k 2 α (recall α = 1, ..., n − 1).The initialization/optimization of the independent variables will be described in the next subsection.
This will enable us, at a later step, to rescale r ab → λr ab with λ ≪ 1 to obtain a configuration in the scaling regime, as done in [23]. 8However, before doing this rescaling we round the flux parameters, and impose Q 1 , Q 5 > Q, as follows.
We round the flux parameters, k 2 α → k2 α , to a certain precision, which will be a hyperparameter of the algorithm, and which we call k rounding.From here onwards, tildes denote rounded quantities.After rounding, Eq. (2.9) will no longer be exactly satisfied for the same configuration of centers.
Having constructed an approximate solution to the homogeneous bubble equations, we then impose Q 1 , Q 5 > Q using a Bayesian optimization algorithm, optimizing over a subset of the independent flux parameters, and iterating over the steps so far, as described in the next subsection.
After a successful run of the Bayesian optimization algorithm, we have an approximate solution to the homogeneous bubble equations, with charges Q 1 , Q 5 in the desired range.We next generate another approximate solution in the scaling regime by rescaling the positions of the centers obtained in Eq. (3.5): where for concreteness we take λ = 10 −5 .
At this stage, we have a starting configuration with the desired physical properties.Before using it as an input to the evolutionary algorithm, we next impose two conditions that further indicate whether the configurations can be considered sufficiently good starting configurations.
To describe the first condition, let us consider the homogeneous bubble equations (2.9) for a = n − 1 : 8 We thank Pierre Heidmann for a discussion on this point.
Let us further examine the generic case in which all the terms in this sum are non-zero.Then a necessary condition to have a solution is that not all of the terms in the sum have the same sign.Since the distances are positive, the expressions ) should not all have the same sign.
After rounding the flux parameters, k 2 α → k2 α , we have rounded fluxes Π2 .In the evolutionary algorithm, we will keep these fixed and change the location of the centers.So, before running the evolutionary algorithm, we examine whether or not all of the following expressions have the same sign (note the presence of the rounded fluxes Π2 ): We have found that if these quantities have the same sign, it is a reliable indicator that there is unlikely to be a nearby scaling solution.Therefore, only if these quanties do not all have the same sign, we proceed.
Next, we perform a preliminary investigation of the absence of CTCs.Let us first review the case in which only Abelian fields are turned on.To rule out CTCs, two algebraic combinations of the harmonic functions (2.1) must be globally positive.Generically, the stronger of these conditions is that the quartic E 7(7) invariant, as a function of the harmonic functions (2.1), is globally positive [69].Investigating this condition is non-trivial, and typically done numerically.The generalization to configurations with both Abelian and non-Abelian fields was discussed in [62,21].The authors of [21] conjectured that the condition for absence of CTCs is equivalent to requiring that the matrix M 2 defined in Eq. (2.12) is positive-definite.We thus investigate absence of CTCs in the solutions found by the algorithm by examining this condition on M 2 .
Although M 2 depends on the positions of the centers, and thus will be modified by the evolutionary algorithm, we have observed that small modifications of the distances do not tend to change the eigenvalues much.Of course, after the evolutionary algorithm, one must recheck the condition on M 2 .However, checking the condition at this stage provides a good indication of whether the condition will be respected in the final solution.If M 2 is positive definite, we proceed to use this configuration as a seed for the evolutionary algorithm.
Note that the condition (2.9) for scaling solutions, and the scaling limit r ab → λr ab , correspond to "zooming in" to the core of the solutions, such that the asymptotics become AdS 2 fibered over S 3 ; for a general discussion, see [70].We wish to "undo" this limit and construct solutions with R 4,1 asymptotics.This requires restoring the inhomogeneous terms in the bubble equations (2.6).
In the second step of the method, the evolutionary algorithm modifies the positions of the centers to construct numerical solutions to the full inhomogeneous bubble equations (2.6), as we will describe in Section 3.3.

The Bayesian optimization algorithm
We now describe the Bayesian optimization algorithm that we use to implement the first part of our method.As described in the previous section, we begin by specifying the positions of the centers ⃗ r a and the coefficients q a , l i 0 , k 0,1 a and k 2 0 , considering them to be independent variables, and solving the homogeneous bubble equations (2.9) for k 2 α (recall α = 1, ..., n − 1).We wish to impose large charges, Q 1 , Q 5 > Q, for some appropriate Q.To do so, in principle one could attempt to maximise the first two expressions in Eq. (2.8) as functions of both k 2 0 and the full set of k 0,1 a .However in practice, it is neither computationally efficient, nor necessary, to maximize Q 1 , Q 5 with respect to a substantial fraction of the flux parameters k 0,1 a -it suffices to focus on the flux parameters of a small number of the centers.We thus introduce a hyperparameter n b and maximize Q 1 , Q 5 only with respect to the flux parameters of n b of the centers.In our examples, it will suffice to take n b to be equal to 1 or 2.
The centers whose flux parameters are dependent variables of the optimization process will be labelled with the index ā = 0, • • • , n b − 1, and the remainder will be labelled with the index ȧ = n b , • • • , n − 1.In other words, we will fix the value of the flux parameters k 0,1 ȧ when initializing the algorithm, and we then maximize the value of the global charges with respect to k 2 0 and k 0,1 ā .In order to maximize the value of the global charges we use a Bayesian optimization algorithm.The reason for this choice is that the global charges Q 1 , Q 5 in (2.8) are computationally expensive to evaluate: for any given trial configuration, one must first solve the homogeneous bubble equations for k 2 α , and then use the result to compute Q 1 , Q 5 .

Bayesian Optimization
Let us now provide an intuitive description of how the Bayesian optimization algorithm works.Bayesian optimization (BO) is an approach to find the global maximum (or minimum) of a "black-box" function, called the objective function.By black-box function, we mean either a function over which we have no analytic control (for example a stochastic function), or a function that is computationally expensive to evaluate, as in the case at hand.This means that we do not have a global knowledge of the function, i.e. we do not know its value on every point of the domain, however we have the freedom to evaluate it on a finite set points.
In this context, Bayesian optimization algorithm is a strategy to obtain the maximum of such functions that works better than a random search.It works as follows (for a review, see e.g.[61]).First, a Gaussian process prior is placed on the objective function.Then, the objective function is evaluated on a set of points [x 1 , • • • , x n 0 ] of the domain.At this stage, the data ]} represent all our knowledge on the objective function.Of course, there are infinitely many functions whose value is , but, by assuming that the objective function follows a Gaussian process model, we estimate that not all such functions are equally probable.
In the next step, we construct a "surrogate" function, which, among the infinite functions that have the same value of the objective on the points [x 1 , • • • , x n 0 ], has the highest probability of representing the objective9 : as such, it is our best estimate of the objective based on the knowledge we have so far, and has the advantage of being much quicker to evaluate.
The last ingredient of the algorithm is the acquisition function, also known as acquisition strategy.By evaluating the surrogate function on a finite set of points of the domain, it chooses the next sampling point of the objective (i.e. the point of the domain that is more likely to pay off when the objective is evaluated on it), according to some specified strategy.There are many different possible acquisition strategies; see e.g.[61].
By evaluating the objective function on this new point, we increase the knowledge we have on the objective.Thus, after each additional sampling point the surrogate function is updated, and the acquisition function is again used to choose the next sampling point, iterating until an acceptably good approximate maximum (in our case, Q 1 , Q 5 > Q) of the objective is found, or a previously set computational limit is reached (in our case, a maximum number of iterations N ).

Implementation
We now explain this first part of our algorithm in more detail.We initialize the independent variables as follows.The q a must all sum to 1.If n is odd, we use alternating values ±1, starting and ending with 1.If n is even, the first n − 1 use alternating values ±1, starting and ending with −1, while q n = 2: n even ; Furthermore, we choose the position of the n centers so that they lie inside a cube with edge length equal to two.We set up the coordinate system in such a way that the first center is at the origin, the second is at y = 0, z = 0 and the third center is at z = 0, so that we have a total of 3n − 6 free coordinates.We sample the remaining non-zero r i a from the following uniform distribution: In practice, this sampling is obtained from a discrete distribution whose step-size is controlled by the hyperparameter prec pos.Similarly, the coefficients k 0,1 ȧ are sampled with the discrete uniform distribution U − 10 prec k , 10 prec k , with step-size equal to 1.In the following we will set the parameter prec k = 2.
Having initialized all the parameters we are not optimizing over, we now maximize the objective function as a function of the variables {k 0,1 ā , k 2 0 }, which (having set prec k = 2) we also allow to take integer values in [−100, 100] ⊗(2nb+1) .The evaluation of f obj works as follows.We first solve the homogeneous bubble equations (2.9).Next, we round the flux parameters k 2 α → k2 α .Last, we use (2.8) to compute Q 1 , Q 5 and select the minimum of the two.
We evaluate the objective function on n 0 points in the (2n b + 1)-dimensional space of {k 0,1 ā , k 2 0 } sampled from the discrete uniform distribution U − 10 2 , 10 2 ⊗(2nb+1) .Assuming a Gaussian process prior as described above, we use knowledge of f obj evaluated at this set of points to generate the surrogate function.We use n 0 = 200.
We then evaluate the surrogate function on a much higher number (of order 10 4 ) of randomly sampled points, with discrete uniform distribution U −10 2 , 10 2 ⊗(2nb+1) and step-size 1.We use an acquisition function based on the Probability of Improvement method (see e.g.[71]) to choose the next point of the domain that is most worth evaluating with the objective function.The knowledge of f obj at this new point is then used to update the surrogate function.This process is iterated until a point {k 0,1 ā , k 2 0 } such that f obj {k 0,1 ā , k 2 0 } > Q is found, or a previously set computational limit (the maximum number of iterations N ) is reached.This procedure is summarized in Algorithm 1.
While this algorithm is not guaranteed to find a solution, we found that in practice this approach is much more successful than a random search.
After a successful run of the Bayesian optimization algorithm, we impose the two conditions described at the end of Sec.3.1, to be confident that a nearby scaling solution exists and that Algorithm 1 BO algorithm Assume Gaussian process prior Evaluate f obj on n 0 points {k 0,1 ā , k 2 0 } sampled with U − 10 2 , 10 2 ⊗(2nb+1) Q max ← The maximum output of f obj found so far Generate the surrogate function using the available data while n < N or Q max < Q do Let {k 0,1 ā , k 2 0 } n be the point returned by the acquisition function Evaluate

The evolutionary algorithm
A successful run of the Bayesian optimization algorithm outputs an approximate solution to the homogeneous bubble equations (2.9) with the desired characteristics.The approximate nature of this solution is due to the rounding of the flux parameters k 2 α .Our task is now to obtain a numerical solution of the inhomogeneous bubble equations (2.6) by moving the positions of the centers in the R 3 base space.
We shall do so by using an evolutionary algorithm (EA), which is an optimization algorithm inspired by Darwin's theory of evolution.The starting point is a population, i.e. a set of individuals that are approximate solutions to the problem we wish to solve.The properties of each individual are called genes.We measure how good an approximate solution is via the fitness function, which is the function we want to maximize.The fittest individuals are selected to reproduce, by passing some of their genes to their offspring and in the reproduction process some mutations are implemented, i.e. random modifications of the genes of the offspring.Then the offspring take the place of the less fit individuals, which die, such that the population size is constant.This process is iterated until a sufficiently good solution is found, or a previously set computational limit is reached.
For the case at hand, each individual is an approximate solution to the bubble equations (2.6).An individual's genes are (a subset of) the positions of the centers together with a set of strategy parameters, to be described momentarily.
After implementing the translational and rotational symmetry, the number of free coordinates is 3n − 6, while the number of independent bubble equations is n − 1. Implementing a genetic algorithm on all the 3n − 6 coordinates would be computationally expensive.Thus, we select a subset of the coordinates to be fixed to the values of the BO algorithm output, ri a of Eq. (3.1).We observed that fixing approximately one coordinate on each of the last n − 3 centers provides a good balance between effectiveness and computational cost, and we shall give explicit examples of this in Section 4. We shall denote by d the number of degrees of freedom, i.e. the number of unfixed coordinates over which we run the EA.
To describe the algorithm further, we introduce the multi-index A = (a, i) combining the center label a and the three Euclidean coordinates i.For the p th individual in the population, we denote the set of coordinates to be varied by r (p) A .As noted above, we work directly with the positions of the centers as genes, so the distances between the centers are always well-defined.
In the reproduction process, random mutations occur.The magnitude of the mutations is controlled by the set of strategy parameters.Each direction r (p) A has an independent strategy parameter σ (p) A , which is a gene of the individual, and which also undergoes variation and selection itself.As we will describe in the following, we will initialize the positions of the individuals in the population by sampling from a Gaussian distribution with fixed standard deviation, which will be taken as the initial value of the strategy parameter for every individual and every direction.The genes of the p-th individual in the population are then written as All genes {r A } undergo variation and selection.An individual whose genes are likely to survive in the evolutionary process will have a good set of positions r (p) A , that will be quantified by having high fitness, and a set of strategy parameters σ (p) A that are likely to give rise to fit offspring, as will become clearer when we discuss reproduction and mutation.
The evolutionary mechanism is iterated over a certain number of generations controlled by the hyperparameter generations; in each new generation a number of offspring is generated, specified by the hyperparameter offspring per generation.The optimal values of these hyperparameters depend on the number of degrees of freedom of the problem, i.e. the number of centers of the configuration, as we shall see in examples.

Fitness function
We now define the fitness function, i.e. the function the EA algorithm seeks to maximize.We seek solutions to the inhomogeneous bubble equations (2.6), so we use these to construct the fitness function.By rearranging the bubble equations, we write: where if ϵ a = 0 ∀a , then the solution is exact.We seek configurations {r A } that minimize the errors associated to all the bubble equations.We do so by instructing the algorithm to minimize a |ϵ a |.That is, we define the algorithm's fitness function to be It will also be useful to define the following inverse fitness function: Note that since the sum of the bubble equations is zero by construction, the inverse fitness function f inv typically over-states the error of a configuration, particularly as the number of centers grows larger.A more accurate measure is the largest absolute value of the errors of the individual equations.So, for later use when discussing our results, we also define the following maximum-error fitness and inverse fitness functions,

Population initialization
We initialize a population of pop size individuals by starting with the configuration of centers obtained in (3.1), and adding to it a random variable δr (p) A sampled from the Gaussian distribution N (0, var pos) with mean 0 and standard deviation var pos, where var pos is a hyperparameter of the algorithm: The optimal value of var pos depends on the number of centers.If var pos is too small, we generate a population that is too similar to the seed solution; if var pos is too large, we obtain a large number of unfit individuals in the initial population.It is not practical to compute optimal value of var pos directly from the bubble equations Eq. (2.6), so we optimize it via a simple grid search.Concretely, we examine the populations generated by a set of candidate values of var pos, and use the population fitness to select the optimal var pos.
Once an individual is generated, we implement the opposition-based technique [72,73], i.e. we compare this individual with the one obtained through reflection symmetry with respect to the initial configuration (3.1), and keep the one which has the highest fitness.In other words, given the individual with position r

Selection
The selection mechanism of the evolutionary algorithm dictates which individuals pass their genes to the offspring, and which individuals are replaced in the new generation.Recall that the number of new offspring per generation is a hyperparameter of the algorithm.To produce an offspring, the algorithm selects two parents that reproduce and one individual that will be replaced.This selection process occurs probabilistically, favouring potential parents with highest fitness, and where those with highest inverse fitness are most likely to die off.
Our algorithm implements two different selection mechanisms, amongst which the user can choose.The two methods are (see e.g.[74] for more details): • Fitness proportional selection.The probability of an individual to be chosen as a parent is: where f is the fitness function (3.9).The probability of an individual to die off is governed by the same equation, with the fitness function replaced by the inverse fitness function, defined in Eq. (3.10).
• Sigma scaling.The probability of an individual to be chosen as parent is given by a modified version of Eq. (3.15), with the fitness function being replaced by the following auxiliary fitness function f ′ , which involves a shift controlled by the mean f and standard deviation σ f of the fitnesses of the population: where c is a constant which is usually set to 2. Similarly, we also apply this method to the selection of the individual that dies by replacing the fitness function with the inverse fitness function.
The first method is less computationally expensive, however it can be less effective due to the following disadvantage.It is sensitive to adding a constant shift to all f , and depending on this shift there can be too much or too little selection pressure.For instance, if there is too much selection pressure, the best individuals tend to dominate the population very quickly, which can lead to premature convergence.
By contrast, the second method is more computationally expensive, but tends to produce an appropriate amount of selection pressure.

Reproduction and mutation
Once two parents {r A } and {r A } are selected, their offspring {r A , σ A } is generated as follows.First, separately for each gene, i.e. for each element of the multi-index A, we assign equal probability to one of the following three reproduction methods to occur.Before possible mutation, this gene will be set equal to the gene of a parent, or the average of the two parental genes: 1) A , σ A , σ A } ; 3) {r A , σ A } = r (1) (3.17) Next, we implement a mutation mechanism, which can enable the population to escape from local minima of the fitness function.Recall that each offspring has a total of d genes (A = 1, . . ., d).For each gene of an offspring, we assign a probability of order 2/d for the gene to mutate away from the value assigned in (3.17).So on average, approximately two genes of each offspring will mutate.
It is desirable to build in the flexibility for mutations in different genes to have different strengths.We therefore implement uncorrelated mutation, with different step sizes for different genes (see e.g.[74]).If a gene is selected for mutation, first σ A mutates, then the mutated σ ′ A sets the scale for the mutation of the position r A .The mutation of σ A is controlled by two Gaussian random variables: δσ is sampled only once for each offspring, while δσ A is sampled separately for each gene.Both are sampled from the Gaussian distribution N (0, 1).The mutation strength of σ A is controlled by the parameters τ ′ = 1/ √ d and τ = 1/ 2 √ d via: where following [74], we set τ ′ = 1/ √ d and τ = 1/ 2 √ d.The way this treats different directions differently is as follows.The mutation exp(τ ′ δσ) is common to all directions and allows for an overall change of the mutation step-size.By contrast, the term exp(τ δσ A ) introduces different mutation strengths in different directions.
Once σ A has mutated to σ ′ A , the position r A mutates with a Gaussian random variable δr A drawn from N (0, 1), with mutation strength set by the new σ ′ A : As the fitness of the population improves, and the algorithm explores narrower regions of the phase space, we improve upon the mutation mechanism (3.18) to achieve the following set of goals.The mutation should enable an appropriately fine exploration of the narrower region, while not becoming too small in magnitude.Separately, the mutation should be able to jump outside local minima.To enable this, we introduce a hyperparameter generation update.After generation update generations, we update the strategy parameter of the individuals according to the following prescription.After this update, the algorithm returns to the mutation (3.18) for the next generation update generations.The algorithm contains two different methods to update the strategy parameter, among which the user can choose.These are: • Random update.We introduce the hyperparameters percentage random update and factor random update, randomly select (percentage random update)% of the individuals and rescale σ A → σ A /factor random update, while rescaling the strategy parameter of the remaining individuals as σ A → (factor random update) σ A .
• Variance update.We update the strategy parameter according to the variance of the positions, as follows.For each A, we define the average position rA = 1 pop size q r (q) A , and reset the strategy parameters in direction A of all individuals to have the same value, The variance update method works as follows.If, for instance, the whole population is concentrated in a particular region of parameter space, it tends to enable finer exploration of the local region.By contrast, if for instance the population is concentrated in two or more separate regions, the variance update tends to enable the population to explore a larger region of the parameter space.
We performed several runs with both Random and Variance updates.The Variance update has the advantage that it does not depend on the number of centers, while in the Random update we are introducing two new hyperparameters that could in principle be optimized over, depending for instance on the number of centers.Of the two methods, typically the Variance update is more computationally expensive, and typically the performance of the algorithm is better than or comparable to the Random update, depending on the other parameters of the configuration.
For the sake of clarity, let us illustrate an example of how the variance update method works.Suppose we set the evolutionary algorithm to run for 1000 generations, and that we specify generation update = 100.The evolution of the strategy parameters σ A will work as follows.Up to generation 99, the σ A of each individual will evolve with random mutations via Eq.(3.18).Then, at generation 100, the strategy parameters σ A of all individuals in the population will be reset according to Eq. (3.21).Next, for generations 101 to 199 we return to the random mutations described in Eq. (3.18).Then, at generation 200 we again reset the σ A according to Eq. (3.21), and the pattern continues until the end.

Results
In this section we present two explicit examples of numerical scaling solutions obtained with the algorithm described above, which have five and seven centers respectively.In both of these examples we use the fitness proportional selection method and the variance update mechanism described in Section 3. 3.In the examples that we present, the non-Abelian coupling constant g will be set equal to 1.We also comment on the performance of the algorithm.

A five-center scaling configuration
As a first application of our method, we provide an example of a five-center scaling configuration.We first optimize the hyperparameters of the evolutionary algorithm for a five-center configuration.We do so with a grid search and obtain the values reported in Table 1.We then follow the procedure described in Section 3.2.We use the Bayesian optimization algorithm to obtain initial positions and flux parameters, recorded in  2: Input parameters of the configuration.For ease of notation, the rescaling in Eq. (3.1) with λ = 10 −5 is understood: in the first three rows of this table the coordinates of the centers are in units of 10 −5 , i.e. r 1 2 = 0.3314 × 10 −5 .The underlined coordinates are those that are genes of the EA, i.e. the coordinates over which the evolution process occurs.
By solving the homogeneous bubble equations (2.9), we obtain the (n − 1) remaining k 2 α parameters.After rounding to a precision of k rounding = 10 −4 , we report their values in Table 3.
The parameters in Tables 2 and 3    M 2 is positive-definite.We therefore proceed to use the configuration as a seed configuration in the evolutionary algorithm.The genes of the evolutionary algorithm are the subset of that are coordinates underlined in Table 2.
We plot in Figure 1 how the fitness of the fittest individual and the average fitness of the population change over the generations in a representative run of the algorithm.By starting with a seed solution with fitness of order 1 we obtain, after around 14000 generations, a numerical solution with fitness f ≃ 2 × 10 10 .This means that the EA generated a numerical solution that solves each bubble equation with a precision of at least 5 × 10 −11 .
Note that these fitness/error figures are dimensionful, and depend on our choice of units.A useful dimensionless relative error can be defined by comparing the error in the bubble equations to the largest summand on the left-hand side of the bubble equations.For this fivecenter configuration, the largest summand has absolute value approximately 5×10 4 .The ratio of the largest error in the bubble equations, finv (r A ) = max a |ϵ a |, to this largest summand, is therefore approximately 1 × 10 −15 .This is a typical best-case value at which the dimensionless relative error of the algorithm saturates, reflecting the fact that we work to machine precision.
As a side point, we observe that the fitness of the fittest individual in the population does not increase monotonically.The main reason is that occasionally the fittest individual is selected to die; this is part of the random nature of the algorithm, and is one way that the algorithm can escape local minima.A more minor reason is that the algorithm maximizes the sum-error fitness f defined in (3.9), while we have plotted the max-error fitness f defined in (3.11), being a more accurate measure of the fitness.The associated matrix M 2 is positive-definite, thus we can have confidence that the geometry does not contain any CTCs.We report in Table 4 the coordinates of the centers of this numerical solution.
We compute the global charges of this solution using Eq.(2.8), obtaining: where J R ≡ | ⃗ J R |.We note that J R ≪ 1 as expected.

A seven-center scaling configuration
We now present an example of a seven-center scaling configuration.We report in Table 5 the hyperparameters of the evolutionary algorithm, optimized for seven-center configurations.
pop size offspring per generation generations var pos generation update 3000 100 18500 10 −4 666 Table 5: Hyperparameters of the algorithm, optimized for seven-center configurations.
After running the Bayesian optimization algorithm, we obtain an initial configuration with global charges Q 1 ≃ 736 and Q 5 ≃ 410, which is described in Table 6 2, the coordinates in the first three rows are in units of 10 −5 , and underlined coordinates are genes of the evolutionary algorithm.
The homogeneous bubble equations (2.9) give the n − 1 remaining flux parameters k 2 α ; we round them to a precision of 10 −5 , and report the result in Table 7  The coefficients in Tables 6 and 7 provide an approximate solution to the inhomogeneous bubble equations (2.6) with fitness f ri a ≈ 3.21.As in the five-center example, only the underlined coordinates in Table 6 are taken to be genes of the EA.
As depicted in Figure 2, we obtain, after around 17000 generations, a numerical solution with fitness f ≃ 1.7 × 10 10 .For this configuration, the absolute value of the largest term on the left-hand side of the bubble equations is 1.1 × 10 4 .So the dimensionless relative error defined in the previous subsection is approximately 5.7 × 10 −15 , again reflecting the fact that we work to machine precision.
The coordinates of the centers of the fittest configuration obtained by the evolutionary algorithm are reported in Table 8 The matrix M 2 is positive-definite, and thus we can have confidence that the configuration is free of CTCs.Finally, we record the global charges of the solution: We again observe that J R ≪ 1 as expected.

Performance of the algorithm
We now make some general comments regarding the algorithm's performance.We have run the algorithm in detail for configurations of three, five, and seven centers.As the number of centers increases, naturally the runtime increases.This is primarily due to two factors.First, the number of bubble equations that need to be evaluated increases linearly with n.Second, a higher number of centers means a higher number of degrees of freedom of the evolutionary algorithm, thus the population size and the number of offspring per generation should be increased accordingly, to optimize the algorithm's performance.
When run on a three-center configuration, the algorithm typically found of order 10 approximate solutions with fitness ≳ 10 6 in around 10 hours (all run-times refer to a high-specification mainstream desktop machine).For a five-center configuration, typically around 13 hours runtime produced two solutions with fitness ≳ 10 6 .For a seven-center configuration, in 40 hours runtime the algorithm produced two solutions with fitness ≳ 10 5 , including the one reported above.Each of these runs was performed initially with a cutoff of 10000 generations.Subsequently, longer runs were performed on the two examples presented earlier in this section to obtain the machine-precision solutions.
In light of the No Free Lunch Theorem, we have compared our algorithm with a random search on several examples, and observed it is always far superior, as follows.The random search is obtained by generating offspring per generation new individuals via Eq.(3.13) for generations generations, and evaluating their fitness.We did so for different values of the standard deviation var pos.For large values of var pos, the relevant sampling space is too big, and the probability of finding good solutions is low.As we decrease var pos, the performance of the random search increases until an optimized value.Decreasing var pos further results in a loss of performance, as the individuals are too close to the seed solution rA , and thus their fitness is of the same order of f rA .  2 and 3. We repeat the analysis for four different values of var pos, which are denoted with σ in the plot's legend.
In all the examples we analysed, the random search provided an approximate solution with fitness no higher than around 10 2 .For completeness, in Figure 3 we present an example of such a random search for the five-center configuration discussed in 4.1, with the values of generations and offspring per generation given in Table 1.This contrasts with the far superior performance of the evolutionary algorithm.

Discussion
In this work we have presented an optimization algorithm, combining Bayesian optimization and an evolutionary algorithm, using which we have constructed numerical multi-center solutions with several centers arranged in generic configurations, satisfying all flux quantization constraints.
The Bayesian optimization algorithm generates a configuration in the scaling regime with appropriately large D1 and D5 charges, by optimizing over a subset of the flux parameters.The output of this first step is a starting configuration that is an approximate solution to the bubble equations (2.6).The starting configuration contains two approximations: first, it was derived by solving the homogeneous form of the bubble equations; second, the flux parameters were rounded in order to respect the conditions of flux quantization.
The evolutionary algorithm uses the configuration generated in the first step as a starting point to generate approximate solutions to the full inhomogeneous bubble equations, with increasing precision.It does so by optimizing over the positions of the centers.By working with the positions of the centers, the distances between centers are always well-defined.The fact that we work with quantized fluxes and well-defined distances is a distinguishing advantage of our method over previous approaches.
In Section 4 we exhibited two examples of novel machine-precision numerical solutions, one with five centers and one with seven centers.Although we have used the algorithm primarily to construct configurations with three, five and seven centers, it can in principle be used to construct configurations with any number of centers in generic configurations, upon tuning the hyperparameters appropriately.
While the evolutionary algorithm significantly improves upon previous methods, naturally it does not always find machine-precision solutions, depending on the starting configuration.This can be for one of two reasons.First, it is a general limitation of evolutionary algorithms that they do not always find optimal solutions, i.e. global minima of the fitness function: in the evolution process, the population might get stuck in a local minimum.It is well known that this problem is more serious when the number of degrees of freedom of the problem is low, as the probability of having local minima decreases as the number of dimensions increase.Second, given a set of poles in the harmonic functions describing the multi-center solution, it is not guaranteed that there exists a nearby exact solution to the bubble equations.For instance, by rounding the flux parameters, one could access a region of parameter space that does not admit solutions.It is thus possible that the evolutionary algorithm fails to find a sufficiently good solution simply because such a solution does not exist.Indeed, we found three-center configurations in which the evolutionary algorithm could not improve the fitness above a certain value, and a detailed analysis of one such example showed that a nearby exact solution did not exist.Despite these inherent limitations, most of the time the algorithm is successful at finding solutions.
As the number of centers grows, naturally more computational resources are required.It is worth noting that, in particular, the task of finding a good starting configuration requires significantly more computational resources.This is because it can take several iterations for the Bayesian optimization algorithm to find scaling configurations with appropriately large supergravity charges Q 1 , Q 5 , together with a positive-definite matrix M for absence of CTCs.So as the number of centers increases, a smaller fraction of potential starting configurations get passed on to the evolutionary algorithm.In particular, apart from choosing the flux parameters that we initialize to have the same sign, as suggested in [21,Footnote 17], finding a good M is otherwise done by a random search.The probability of randomly selecting initial parameters that lead to an M with all positive eigenvalues decreases as the number of centers (and thus the dimension of M) increases.Therefore, it would be interesting to explore more efficient ways of selecting parameters that lead to a positive-definite M; we leave this task to future work.
To conclude, we have implemented a novel application of evolutionary algorithms and Bayesian optimization to the study of multi-centered solutions to supergravity, and have presented two state-of-the-art machine-precision numerical solutions.Our algorithm could be used to create larger families of numerical multi-center solutions, which could in turn be used in phenomenological models [58][59][60].The prospects for harnessing the power of computer science algorithms to solve physically interesting problems in String Theory and related fields appear bright, with an exciting future ahead.

A
, we consider the individual with position r′(p)A given by: r ′ (p) A = 2r A − r the population (only) the fitter of the two individuals.We initialize the strategy parameter as σ (p) A = var pos for all p, A.

Figure 1 :
Figure 1: Fitness of the fittest five-center configuration, and average fitness of the population over the generations.The final plateau occurs at machine precision, as discussed in the text.The maximum fitness obtained is f ≃ 2 × 10 10 , corresponding to a dimensionless relative error of 1 × 10 −15 .

Figure 2 :
Figure 2: Fitness of the fittest seven-center configuration, and average fitness of the population over the generations.The final plateau again occurs at machine precision: the maximum fitness obtained is f ≃ 1.7 × 10 10 , corresponding to a dimensionless relative error of 5.7 × 10 −15 .

Figure 3 :
Figure 3: Random search over the initial configuration described in Tables2 and 3. We repeat the analysis for four different values of var pos, which are denoted with σ in the plot's legend.
update Q max Update the surrogate function with the new data n++ there are no CTCs in the configuration at this point.If and only if these two tests are passed, we proceed to the evolutionary algorithm.

Table 1 :
Hyperparameters of the algorithm, optimized for five-center configurations.

Table 2 ,
that give a configuration with global charges Q 1 ≃ 2938 and Q 5 ≃ 2016.

Table
define a numerical solution to the inhomogeneous bubble equations (2.6) with fitness f ri a ≈ 4.26.The configuration respects the condition regarding the possibility of a nearby scaling solution discussed around Eq. (3.3).Moreover, the matrix

Table 3 :
Solution of the homogeneous bubble equations, with the input parameters given in Table2, after rounding.

Table 4 :
Output of the evolutionary algorithm with highest fitness, for the five-center configuration.Similarly to those in Table2, these coordinates are in units of 10 −5 . .

Table 6 :
Input parameters of the solution.Similarly to those in Table

Table 7 :
. Solution of the homogeneous bubble equations, with input parameters given in Table6, after rounding.

Table 8 :
. Output of the evolutionary algorithm with highest fitness, for the seven-center configuration.Again, coordinates are in units of 10 −5 .