Artificial Neural Networks as Trial Wave Functions for Quantum Monte Carlo

Inspired by the universal approximation theorem and widespread adoption of artificial neural network techniques in a diversity of fields, feed‐forward neural networks are proposed as a general purpose trial wave function for quantum Monte Carlo simulations of continuous many‐body systems. Whereas for simple model systems the whole many‐body wave function can be represented by a neural network, the antisymmetry condition of non‐trivial fermionic systems is incorporated by means of a Slater determinant. To demonstrate the accuracy of the trial wave functions, an exactly solvable model system of two trapped interacting particles, as well as the hydrogen dimer, is studied.


I. Introduction
Although the stationary Schrödinger equation describes the most basic form of quantum mechanics, finding accurate eigenstates and eigenvalues of the equation remains a hard problem when realistic physical systems are concerned. Especially the important contribution of the so-called correlation energy requires expensive computations when high accuracy is desired, which typically exhibit high-order polynomial or even exponential scaling of effort with number of particles, rendering application to larger systems of particles infeasible.
One class of methods for solving the many-body Schrödinger equation efficiently are the so-called variational Monte Carlo (VMC) methods [1], which rely on optimizing parametrized trial wave functions according to the Rayleigh-Ritz variational principle and evaluating the apparent high-dimensional integrals by the means of Monte Carlo (MC) integration. Typically, the approach involves the postulation of problem-specific trial wave function formulations, in an attempt to reduce the number of required variational parameters by making use of physical intuition. The apparent downside of such an ansatz are the need for physical intuition and the inflexibility of a specialized trial wave function, which is particularly problematic when, for example, phase transitions are meant to be investigated.
In this work we want to consider an overall similar, but conceptionally different approach. Instead of crafting specific trial wave functions, we want to investigate a novel class of very general trial wave functions built from Artificial Neural Networks (ANN) and in particular Feedforward Neural Networks (FFNN). This idea is fueled by the Universal Approximation Theorem, which states that FFNNs can approximate any continuous func- * Jan Kessler: ithanil@mail.uni-paderborn.de tion on a finite measure to arbitrary accuracy [2]. Furthermore, ANNs already prove to be a successful "blackbox" approach for a variety of problems in different fields and they are seen as well-suited for dealing with highdimensional input [3]. In turn however, the number of variational parameters within FFNNs grows large quickly and some need for intuition is moved to the question of how particle coordinates are presented to the FFNN and what activation functions are to be used in the network.
The (to our knowledge) first successful application of ANNs as wave functions for simple one-, two-and three-dimensional systems was published already about 20 years ago [4], but the bulk of work in this direction is very recent (2017-2018) [5][6][7][8][9][10][11][12]. However, none of these works consider the use of FFNNs within a VMC framework in continuous space, so the method that we present can be considered novel.
In summary, we aim to present a new method for approximating the ground states of Schrödinger equations in continuous space, by making use of FFNNs in our trial wave functions. To train the employed neural networks, we devise a gradient-based VMC optimization scheme. Both components of the method promise beneficial scaling of computational effort with dimensionality and allow for massively parallel execution, providing efficient use of modern supercomputers. Finally, no input of any existing data or knowledge is required in our approach, rendering it a true ab-initio method.

II. Methods
In this section we want to present our novel method for approximating ground state wave functions of continuous quantum systems by employing basic FFNNs as trial wave functions.
The first cornerstone of our method is the Rayleigh-Ritz variational principle, justifying a systematic gradient-based strategy to train the employed neural net-arXiv:1904.10251v2 [physics.comp-ph] 10 May 2019 works. We compute the apparent integrals by the means of importance sampled MC integration, resulting in a VMC optimization scheme. This shall be explained in more detail in the first following subsection.
Subsequently, we describe how the specific neural network wave functions (NNWF) used in this work are constructed. For that purpose, we present multiple possible formulations of such wave functions, going from a straightforward and agnostic ansatz to more sophisticated and/or problem-specific versions.
Finally, we explain the gradient-based stochastic optimization scheme used for minimizing the energy of our NNWFs.

II.1. Variational Monte Carlo
From the non-relativistic time-independent spatial Schrödinger equation we know that the exact spectrum Ψ α (R) of stationary wave functions is given by the eigenstates of the Hamilton operatorĤ, which describes the physical system of concern. The corresponding eigenvalues E α are the energies of the respective states. The state with the lowest energy E 0 is called ground state and labeled as Ψ 0 (R). However, for most realistic many-body Hamiltonians it is impossible to exactly solve Eq. 1 and determine the exact spectrum of eigenstates, so the problem is to find accurate approximative representations of Ψ α (R). To approximate the ground state Ψ 0 (R) in a systematic way, we make use of the Rayleigh-Ritz variational principle where Ψ T = Ψ T (R; β) is a parametrized trial wave function with parameters β and E T being the corresponding energy. Since E T is an upper bound of the true ground state energy E 0 , the variational principle immediately suggests that we can optimize Ψ T by varying β so as to minimize E T . The trial wave functions Ψ T (R) are functions of D * N particle coordinates, where D is the dimensionality of the space and N the number of particles. Computing arbitrary observables O = Ψ T |Ô|Ψ T (e.g. E T ) that arise from such many-body states requires integrating over R D * N space, which results in solving rather highdimensional integrals. Because of its favorable scaling with dimensionality, we employ MC integration for the numerical evaluation of these integrals.
The rightmost term of Eq. 2 immediately suggests how to compute the trial wave function's energy E T by importance sampled MC integration, in the fashion of a Metropolis-Hastings algorithm [13]. In fact, given the trial wave function Ψ T (R), it is sufficient to compute the  local energy using coordinates R i that are randomly sampled from the probability distribution |Ψ T (R)| 2 and to average over them, i.e.
In the equation above ≈ is used to indicate that the two values are equal within the statistical uncertainty due to the stochastic sampling. Notice that because Eq. 4 can be understood as a sum of independent summands, it can be evaluated in a massively parallel fashion by using multiple statistically independent random walks R i . This scheme is typically referred to as VMC method. To optimize the parameters of the trial wave function, we use a gradient-based stochastic optimization algorithm, described in more detail in subsection II.3.

II.2. Feedforward Neural Network Wave Functions
In the following, we describe how we construct trial wave functions for VMC simulations from basic FFNNs. The structure of such a network is illustrated in Fig. 1. The emplyoed neural networks can be divided into input, hidden and output layers, where each layer itself consists of multiple units, typically called artificial neurons or just units. Every such unit in the network signals a value to at least one other unit in the "next" layer via unidirectional connections, leading to a flow of information from the input (particle coordinates R) to the output layer (wave function value Ψ T (R)).
In the input layer, D * N input units are utilized to signal the coordinates R to all units of the next layer, the first hidden layer. Additionally, there is one offset unit, signaling a static 1 in our case. Such an offset unit is present in every hidden layer as well. In the following hidden and output layers, the non-offset units are characterized by a propagation function p(x; β) and an activation function a(p). For these units the employed propagation function is a weighted sum of outputs x i of all units from the previous layer, i.e.
where β i are the weights associated with each incoming connection. These weights are the variational parameters of our trial wave function. The result of the propagation function is then used as an input for the unit's activation function a(p), to compute the final output value of the neuron, which is again transmitted to the next layer (if present). While in principle most non-linear functions are usable as activation functions, after some exploration (see appendix) we decided to use the following two hidden unit activation functions in all of our simulations: Notably, these two activation functions exhibit opposite symmetry and different non-linearity. Also note that they are mathematically smooth, unlike some other frequently used neural network activation functions (e.g. rectified linear functions). However, our choice is certainly just one possible choice out of many. For the output unit we employed the exponential activation function a E (p) = exp(p), (6) which in general appears to be rarely used in ANNs. But, inspired by the omnipresence of exponentials in wave function formulations, we considered it for our use case. Using this activation function on the output unit means that the FFNN has strictly positive output values, which will be relevant in some of the following discussions. For informations about results with different activation function choices, we refer to the appendix.

II.2.1. Simple NNWF
For a first straightforward NNWF approach, we can feed the raw particle coordinates R directly into the described network via the input layer and employ the network's output as the wave function Ψ(R). For simplicity, using just a single hidden layer with n h units (including an offset), we can write such a wave function as where a o is the chosen output activation function (a L or a E ) and a h,i the selected hidden layer activation functions (a T and/or a G ). The corresponding connection weights within the output and hidden units, which will be determined by the training process, are denoted as β o and β h,i , respectively. A NNWF constructed in this way is (given smooth activation functions) notably a smooth function, which would be expected of a physical wave function for finite potential energy. However, it exhibits no guaranteed symmetry under particle exchange. Yet, if the physical system of concern consists of distinguishable particles, the latter property is not directly an issue. But, the implications for the application to indistinguishable particles requiring bosonic or fermionic symmetry needs to be discussed. For example, a spatial wave function for (spinless) bosons is required to be strictly symmetric under arbitrary permutations of particle coordinates. However, it is known that for systems with local and exchange-symmetric Hamiltonians without degeneracy and real-valued eigenstates, there is no other state with a lower energy than that of the symmetric ground state [14]. In this case, by minimizing the energy, we would expect an asymmetric NNWF to approximate the exchange-symmetric ground state, thus becoming an approximately symmetric wave function.
Although this ansatz may seem a bit crude, we still consider it interesting for a test, as it is the maximally agnostic or "naive" approach and especially because it does not entail any additional computational cost for the explicit symmetrization. In the following, we will refer to it as "simple NNWF".

II.2.2. (Anti-)Symmetrized NNWF
Even though, it appears to be a potential option to use simple coordinate-fed FFNNs directly as wave functionswithout guaranteed exchange symmetry -to approximate certain symmetric ground states, success is unlikely if an approximation to an antisymmetric ground state is needed. Typically, the antisymmetric ground state is of higher energy than the corresponding symmetric ground state, so minimizing the energy of an asymmetric NWNF is expected to yield an approximately symmetric state. Moreover, in the special case of strictly positive valued output activation functions, the simple NNWF cannot possibly learn an antisymmetric state, by definition.
The direct way of creating guaranteed exchange symmetric or antisymmetric wave functions from asymmetric base functions would be to apply the symmetrization or antisymmetrization operators with S N being the symmetric group of permutationsP and π denoting the parity of the individual permutation. For the two-particle systems concerned in this work, we can directly write the resulting NNWF as Constructing the NNWF in this way guarantees the desired exchange symmetry. However, it must be noted that the symmetrization operators implies a factorial computational scaling with the number of particles, which is prohibitive for more than a few particles. Nevertheless, for the present two-particle applications, we habe included the results obtained with this ansatz, as a reference for comparison.

II.2.3. Product NNWF
So far, we have described a general method to obtain (in principle) arbitrarily exact approximations of realvalued spatial ground state wave functions from firstprinciples by taking only exchange symmetry into account. No further considerations about the physical system of concern were implemented into the construction of our trial wave functions.
While such a general and agnostic method has its merits, we would not want to stay away from more specialized variants, at least when they prove to be easy to realize and more efficient. One possible idea in this direction is to utilize a NNWF Ψ N N (R) merely as a modulating factor for an imposed problem-specific base wave function Φ(R), i.e.
The base wave function Φ(R) could in principle be freely chosen, but we expect better results when it is already similar to the ground state that is to be approximated. Furthermore, Φ(R) can be used to directly define the sign of the full product Ψ P (R), given that a strictly positive valued Ψ N N (R) is used. Then Ψ P (R) has the same sign as Φ(R) for all R and is equal to zero only at the same nodal surface {R : Φ(R) = 0}. This property effectively restricts the set of wave functions that such a NNWF can accurately approximate, at least if Φ(R) actually has nodes.
To approximate bosonic ground states, we consider a symmetric product of identical single-particle orbitals φ(r) as a base wave function, leading to where r i depict the coordinates of all individual particles. We will refer to Ψ BP (R) as "bosonic product NNWF". For fermionic ground states, however, we use an antisymmetric Slater-determinant Φ SD of given singleparticle orbitals φ i (r) as the base wave function: We denote Ψ F P (R) as "fermionic product NNWF". As already alluded to above, given a strictly positive Ψ N N , the nodes of Ψ F P (R) are defined by Φ SD (R). This is to say that as long as Φ SD (R) does not change during the optimization, the nodes are fixed. If the nodal surface should be variable instead, the considered orbitals need to contain variational parameters to be optimized together with the FFNN weights. One known possibility in this direction is the use of backflow orbitals [12]. However, for the applications in this work, we are able to rely on a fixed node ansatz. ... We have to mention though that at this time we have no mathematical proof at hand which would guarantee that the variational principle holds for Ψ F P (R) when using a non-symmetric Ψ N N , i.e. that there is no asymmetric state reachable with a truly lower energy than the exactly antisymmetric state. Nevertheless, we decided to test this NNWF formulation in our applications, because it promises to solve the "antisymmetry problem" at negligible computational cost compared to the previously discussed full antisymmetrization and still retains the simple use of raw coordinates R.

II.2.4. Symmetric-Featured NNWF
Another possible optimization approach concerns the way information is fed to the actual FFNN. Until now we simply used all D * N raw particle coordinates R as input, but this choice might not be the ideal one in all cases.
For example considering exchange-symmetry, it is easy to see that the raw coordinates contain some redundant information, because for the wave function it doesn't matter if particle 1 is at x 1 and particle 2 at x 2 or if it is the other way around. Using a symmetric representation of particle coordinates instead would guarantee exchange symmetry of the NNWF and potentially increase accuracy. And in general, an inspection of the specific Hamiltonian might lead to more efficient representations of particle coordinates.
We implement the possibility of such representations by replacing the D * N particle coordinates R j in Eq. 7 with n f functions f j (R) ("features"), resulting in for a network with only a single hidden layer. In this work, we want to restrict us to exchange-symmetric sets of features and therefore refer to such NNWFs as symmetric-featured NNWFs. Notably, employing a symmetric-featured NNWF as left side of a bosonic or fermionic product NNWF yields a strictly symmetric or antisymmetric wave function (symmetric-featured bosonic/fermionic NNWF). At this point we want to remind that the set of f j (R) must be chosen with some care. If relevant information is lost here, the NNWF might not be able to approximate the true ground state anymore, no matter how large the employed FFNN is. Also notable, the continuity and differentiability NNWF will depend on the respective properties of the functions f j (R). For example, employing certain input sorting methods as f j will yield exchangesymmetric NNWFs straightforwardly, but for more than one space dimension questions arise about continuity and differentiability of the resulting NNWFs. We want to avoid such discussions in this paper and postpone the use of input sorting to future work.
We present explicit, problem-specific realizations of features f j (R) in section III.

II.3. Adam-VMC Optimization
A NNWF may contain hundreds or thousands of variational parameters and we aim to adjust them in an effort to approximate the true ground state wave function as closely as possible, via an algorithm. A notable complication are the unavoidable statistical errors present in both our cost function and respective gradients. To deal with this high-dimensional stochastic optimization problem, we employ the algorithm Adam [15]. It is a gradient-based optimization algorithm suitable for noisy gradients, that internally builds up first and second order weight momenta, providing an adaptive learning rate for each of our weights β i . We used the algorithm as proposed by its authors, including use of the suggested exponential moving average to obtain the optimized weights (section 7.2 in [15]), for reduced influence of noise on the final optimization result.
As cost function for optimization we compute the energy E T as in Eq. 4 and add a (small) L 2 -regularization term for the variational parameters: The gradient of the energy E T can be derived analytically as This means that in our MC integration scheme we can compute the gradients G i of our cost function C as where the summands of sums N M C are evaluated along the importance sampled random walk, as discussed in section II.1.
In conclusion, we have devised a gradient-based optimization scheme to minimize the energy of FFNN-based trial wave functions for a given Hamiltonian. The variational principle guarantees that this energy-minimization scheme is a valid strategy to approximate the ground state wave function of the system.

III.1. 2-particle harmonic trap with soft-core interaction
For a first application of our NNWFs on a toy model, we selected a system of two particles in an onedimensional harmonic potential, interacting via a potential of finite-range and constant-value. Such a system can be described by the Hamiltonian where we are using atomic units, harmonic oscillator frequency of 1 a.u. and the soft-core potential This model is a strongly correlated system, but still it was shown to be exactly solvable for both bosonic and fermionic symmetry [16]. The recipe for exact solution allowed us to obtain the exact energy eigenvalues and eigenstates for validation of our simulation results. The soft-core interaction between particles can be either attractive or repulsive and both potential range and strength can be chosen freely, providing us a rich testing environment for our NNWFs. While the Hamiltonian 18 is all that is required to begin VMC optimization of the simple Ψ N S (x) or (anti-)symmetrized Ψ S/AS (x), for the product formulations Ψ BP (x) and Ψ F P (x) we need to propose single-particle orbitals. To that extent, we consider the non-interacting system (i.e. V = 0), namely a harmonic oscillator, which has well-known single-particle eigenstates given by the Hermite functions ψ n . Relevant for us are the first two orbitals From the first or both of these orbitals we build the right sides of bosonic or fermionic product NNWF as in Eq. 12 or 13, respectively. In the case of symmetric-featured NNWFs we employ as input to the FFNN, instead of raw coordinates x. For the given Hamiltonian, all relevant information are retained in this representation, it guarantees bosonic exchange symmetry, spatial symmetry corresponding to the Hamiltonian and it is mathematically smooth. We use this representation both for symmetric-featured simple NNWFs and for the left side of symmetric-featured bosonic/fermionic product NNWFs.

III.2. H2 Molecule
As a more realistic application we study the hydrogen molecule H 2 in Born-Oppenheimer approximation, i.e. we consider the electronic Hamiltonian (in atomic units) where r denote the variable electronic coordinates and R the static protonic coordinates.
Although above system consists of electrons, which are fermions, their spatial wave function must be exchangesymmetric whenever their spin wave function is antisymmetric (singlet state). In fact, the state with lowest energy is a singlet and therefore requires an exchangesymmetric spatial wave function. Hence, we may directly employ the simple NNWF or use a bosonic product NNWF. For the latter we use as single-particle orbital the bonding molecular orbital where r are the single electron coordinates and R i the two protonic coordinates. Unlike for the previously discussed trap, the exact ground state of the hydrogen molecule is not known. Hence, as a reference we used energies computed by fullconfiguration-interaction [17], which can be considered exact for our purposes. Also unlike for the trap system, we don't devise symmetric feature coordinates, because although it is straightforward to do for the simple hydrogen molecule, we think it is more interesting to revisit the matter in future research on larger systems.

IV. Computational Details
We used the following basic scheme for all reported simulations: 1. Least-square fit the FFNN to a first-guess solution 2. Repeatedly use Adam-VMC scheme to optimize NNWF iteratively

Identify the result with lowest upper energy confidence bound
The first step initializes the NNWF to a well-behaved state before the actual VMC optimization is performed, using relatively fast least-squares fitting. This procedure is not strictly necessary, but increases efficiency and stability of the subsequent VMC algorithm, especially when the chosen initial state is already close to the true ground state solution. Except for product-type NNWFs, we used a product of Gaussian single-particle orbitals to perform this step. Such an initial state appears to be a straightforward first guess for systems with localized ground states. For product-type NNWFs we skipped the first step and started with weights initialized to small random numbers. Note that for technical reasons we didn't use any symmetrizing operators during fitting, even when they were used in the following VMC-optimization. Starting with the pre-fitted FFNN from step 1 employed as NNWF, in step 2 we executed the previously described Adam optimization, with parameters α = 0.002, β 1 = 0.9, β 2 = 0.9, = 10 −8 , λ r = 0.005. The variational parameters of the NNWF were updated according to Adam rule until the last 100 optimization steps yielded constant energy, with respect to integration error. With the resulting averaged variational parameters, a final MC evaluation of the respective energy was carried out. Afterwards, the Adam-VMC procedure was restarted, beginning with the final result from the previous optimization run. This scheme was repeated until reaching a consistently fixed runtime limit of 8 hours for the trap system (16h for Ψ S and Ψ AS ) and 24h for H 2 (48h for the NNWF with 25x25 hidden units).
Finally in step 3, we chose out of the individual energy results of all chained optimization runs the one with the lowest upper energy confidence bound with σ M C being the integration error estimation. We do this selection, because the last optimization run is not necessarily the one with the best energy, due to the stochastic nature of the optimization. The best NNWFs selected in step 3 were then employed in extensive MCsampling runs to evaluate the energies and other observables reported in this paper. The calculations of energy and gradient at each optimization step were carried out via 4 * 10 6 MC iterations, the final energy evaluation of each chained optimization run via 8 * 10 7 iterations and the concluding observable sampling runs via 10 9 iterations, distributed among 16 CPU cores in all cases.
For the MC sampling we used all-particle moves proposed from a uniform distribution, with a maximal step size calibrated for ≈ 50% acceptance probability during 2500 equilibration steps per thread, following a random walker initialization before every MC integration.
For better stability when using NNWFs, we confined the walkers to a periodic interval or box with edge length −10 < x < 10 Bohr, large enough to still allow proper simulation of the non-periodic systems of interest.
All FFNNs used to obtain our application results contained two hidden layers, each with 12 hidden units and 1 offset unit. A comparison of results for different network sizes can be found in Fig. 11 (appendix).
To realize our NNWF optimization in practice, we wrote own C++ libraries for all involved tasks. All relevant libraries are linked in the README of the DCM-UPB/NNVMC repository on GitHub (https://github.com/DCM-UPB/NNVMC, not public yet!). Access to our actual application programs and/or data can be provided on request.
V. Results

V.1. 2-particle harmonic trap with soft-core interaction
Using the previously described optimization scheme, we applied all of our NNWF formulations to the softcore harmonic trap system, for various choices of Hamiltonian parameters a (potential range) and V (potential strength). Exemplary, Fig. 2 shows the resulting energy values for simple bosonic and fermionic product NNWFs, in comparison with the exact bosonic and fermionic ground state energies. At least on the scale of these plots, the energies obtained from the NNWFs perfectly match the exact energies, for all considered Hamiltonian parameters. The same holds true for all other employed NNWFs, so we refrain from displaying them in the same way.
For a more in-depth analysis of the results, we split our NNWF formulations into two groups: 1. Asymmetric NNWFs without strict exchangesymmetry, i.e. simple NNWF and simple bosonic and fermionic product NNWFs.
For both groups we will display all resulting energy residuals E T − E 0 , i.e. the difference between VMC energy result E T and true ground state energy E 0 . However, because the first group is technically prone to exhibit significant asymmetry, for these NNWFs we will also check positional observables and a measure of how well the desired exchange-symmetry is realized.

V.1.1. Energy residuals
Beginning with the group of asymmetric NNWFs, we display their energy residuals in Fig. 3a. Note that the residual axis is scaled logarithmically to compensate for the large range of values. Furthermore, for any producttype NNWF the interaction-free case V = 0 is not shown, because the imposed right side of our product NNWFs is already exactly the ground state of that system. In Fig. 3a we are mainly looking for effects of using the simple bosonic-product NNWF (NS-BP) over the simple NNWF (NS). Overall, it appears that there is no clearly superior ansatz among the two. However, the product ansatz yields consistently similar or lower energies whenever the soft-core potential is attractive or low by absolute value. In turn it yields similar or higher energies for strong repulsive interaction. This behavior can be understood as a verification of our assumption that the product-type NNWF perform better when their imposed right side is more similar to the true ground state. For our bosonic product NNWF we are using the bosonic ground state of a two-particle harmonic oscillator, which with some hand-waving can be said to be a first guess that tends to fit better when the inter-particle interaction is attractive or small and worse for strong repulsive interaction (also see Fig. 3d).
Looking at the simple fermionic product NNWF (NS-FP) which uses the fermionic ground state of a twoparticle harmonic oscillator, we get the opposite picture instead (as expected): It tends to perform better for repulsive interaction than for attractive interaction. Especially noteworthy is that the simple fermionic product NNWF predicted accurate (residual < 10 −3 Ha) fermionic energies without applying an antisymmetry operator or using special coordinates to guarantee antisymmetry of the product. Now turning to the second group of wave functions, we display the energy residuals of all strictly exchangesymmetric NNWFs in Fig. 3b and those of all strictly exchange-antisymmetric NNWFs in Fig. 3c. One main observation is that our symmetric-featured NNWF (SF) performs only marginally different to either simple (NS) or symmetrized (Sym) NNWFs. However, if we use a network with symmetric feature input for the left side of the bosonic product ansatz (SF-BP), we get energy results that are consistently similar or better than any other bosonic NNWF ansatz that we used. Similarly, the symmetric-featured fermionic product NNWF (SF-FP) performs better than the other fermionic NNWFs (NS-FP and FSym).
Concluding, we found the product-type NNWFs to of-fer superior accuracy over the simple NNWF, at least when the imposed base wave function is already similar to the desired ground state. Furthermore, at least in combination with a product ansatz, we found the use of symmetrized features as neural network input beneficial for resulting energies. In any case, symmetrized feature input is beneficial by guaranteeing the desired exchangesymmetry or antisymmetry (when used in fermionic product), without the factorial scaling of computational cost as with symmetrized or antisymmetrized NNWFs. And finally, although there do exist differences in energy accuracy for the various NNWF types, we find it noteworthy that the same FFNN structure was employed in all these cases and yielded satisfactory results without exception, even when all we did was just feeding raw coordinates to a FFNN and using the output as wave function (simple NNWF).

V.1.2. Asymmetry analysis
Although we have now verified that our NNWFs approximate the desired ground state energies, we don't know yet whether the approximation holds for other observables. In particular, we want to validate the correct exchange-symmetry and positional observables of the non-symmetric group of NNWFs. For this purpose, we computed the positional observables x , x 2 (Fig. 4) and the exchange symmetry measure (Fig. 5) which yields 1 for a symmetric Ψ, −1 for an antisymmetric Ψ and 0 for a completely asymmetric Ψ. To complete the picture, we also computed the particle density n(x) = i δ(x i − x) (Fig. 6). Looking at the results, it becomes immediately obvious that for certain Hamiltonian parameters a 0, V 0 the observed exchange-symmetry and position expectation values do not match the expectations. Because we are considering indistinguishable particles and an external potential that is symmetric around the origin, we would expect both position expectation values x to be the same and zero, regardless of Hamiltonian parameters. In contrast though, for said choices of a, V we find the non-symmetric NNWFs to predict the two particles being located on opposite sides of the trap, i.e. x 1 = − x 2 = 0, and the symmetry measure is vanishing simultaneously. At the same time the energy E T and other expectation values like x 2 and n(x) remain close to expectation. These properties indicate that the NNWF actually learned a quasi-degenerate state of crystallized distinguishable particles, instead of the targeted bosonic or fermionic ground states. Notably, we observe this phenomenon only in a regime where bosonic and fermionc ground states itself become increasingly quasidegenerate, which is also nicely illustrated by the density profiles in Fig. 6. The behavior of our non-symmetric NNWFs for large V is fortunately not surprising, considering the analysis of the exact Hamiltonian eigenstates in the hard-core limit V → ∞ [16], where the states can be exactly constructed from single-particle orbitals located on the left and right sides of the trap.
Concluding, we learned that if we chose to employ NNWFs without strict exchange-symmetry to describe indistinguishable particles, the VMC optimization of said NNWF might prefer an asymmetric state of distinguishable particles whenever the corresponding energy is sufficiently close to the ground state energy. Hence, when for some reason using symmetric features is not an option and a non-symmetric NNWF must be used, one may check a diagnostic like Eq. 25 to verify the desired symmetry, if it is of practical relevance. However, if exchange-symmetry of the NNWF can be guaranteed, e.g. by providing the used FFNN with exchangesymmetric input, the phenomenon discussed in this subsection is directly avoided.

V.2. H2 Molecule
For the hydrogen molecule, we considered both simple NNWF (in two different sizes) and simple bosonic product NNWF (as described in III.2), and obtained optimized energies for a range of protonic distances R HH = |R 1 − R 2 |. The resulting energy curve E T (R HH ) represents an approximation to the Born-Oppenheimer potential of hydrogen and is displayed in Fig. 7a, with remaining differences to reference energies E 0 (R HH ) [17] displayed in Fig. 7b. Apparently, all employed NNWFs yield energy curves that qualitatively resemble the reference Born-Oppenheimer potential, with energy residuals ranging between 10 −1 and 10 −4 Ha, but differing by mostly 1-2 orders of magnitude between the two different NNWFs of same size. Despite the differences in energy accuracy, all NNWFs correctly predict the lowest energy to be at data point R HH = 1.4011 a 0 , with neighboring data points set at 1.36 and 1.44 a 0 . This suggests that NNWFs might be especially useful in molecular geometry optimization. However, in contrast to the bosonic product NNWF, the simple NNWF of same size also predicts an energetical maximum at around R HH = 4.8 a 0 , an unphysical feature which is shared with other trial wave function formulations (ref?). However, as we can see from the results of the larger simple NNWF, that feature can be effectively suppressed by increasing the size of the FFNN. In contrast to the results for the soft-core harmonic trap system, here we observe simple and simple bosonic product NNWF formulations achieving consistently and remarkably different energetical accuracy, when compared for the same FFNN size. We assume that this observation is not only a manifestation of singularities in the present Hamiltonian (TODO: Cusp), but also of the fact that the right side of our bosonic product NNWF formulations is varying with Hamiltonian parameter R HH in the case of H 2 , while it is static in the case of the soft-core harmonic trap.
To complement the energy results, we display the ESM (Eq. 25) diagnostic for all employed NNWFs in Fig. 8. While all NNWFs learned approximately exchange-symmetric states for R HH < 3 a 0 , all NNWFs began to learn asymmetric states starting from some distance 3 a 0 < R HH < 6 a 0 , with the simple NNWFs becoming asymmetric earlier than the bosonic product NNWF. Just as in the case of the soft-core harmonic trap system, the asymmetry stems from a localization of both particles on opposite sides of the system (here along the H-H axis), i.e. the NNWFs adopted a state of distinguishable particles.

VI. Conclusion
We have presented a VMC method using FFNN-based trial wave functions (NNWF) in continuous space to approximate the ground states of N-particle Hamiltonians to (in principle) arbitrary accuracy. We have formulated several versions of such FFNN-based trial wave functions, with different properties regarding their exchangesymmetry and practicality. All wave function formulations were tested on a correlated, but exactly solvable, 2-particle system, the soft-core harmonic trap. In all cases the exact ground state energies were predicted with reasonable accuracy, considering the use of a relatively small FFNN. Nevertheless, we could show that including problem-specific knowledge within the NNWF construction offers potentially superior accuracy over a more simple, but general approach. This distinction in accuracy between the different NNWFs was more pronounced when we considered the Hydrogen molecule in Born-Oppenheimer approximation, but still even the simple NNWF would yield a qualitatively adequate prediction of the Born-Oppenheimer potential.
We observed that the NNWFs without guaranteed exchange-symmetry were prone to learn states of distinguishable particles whenever such a state was energetically close to the ground state of indistinguishable particles, however the practical relevance of this behavior remains questionable. We want to look at it again within future studies of many-electron systems and especially investigate methods for providing exchange-symmetric input to the FFNN, preventing the asymmetry inherently.
In conclusion, we have demonstrated that a NNWF represents a flexible trial wave function that can be straightforwardly applied to a variety of Hamiltonian settings, without necessarily requiring problem-specific adjustments. While these trial wave functions certainly do not compete in accuracy and efficiency with other modern methods when applied to few-electron systems, we suppose there might be a reasonable use of NNWFs for larger systems, complex Hamiltonians or whenever a very flexible trial wave function is required.
Despite the satisfying success of our method, there are clearly countless improvements possible and probably necessary for application to realistic systems. On the technical side, we expect to achieve a significantly speed-up optimization process by employing more suitable differentiation techniques to compute the gradient, e.g. by a reverse-mode differentiation scheme. However, our currently used forward-accumulation scheme will most likely remain an efficient choice for the second order input derivatives, which are required to compute the exact kinetic energy of the NNWF. It should be mentioned that there are countless different ANN types, NNWF formulations or combinations with other methods thinkable, and we considered only a small subset of what is possible. Our energy-gradient-based approach to optimizing the NNWF is not the only choice thinkable. Overall, this leaves a lot of interesting opportunities for future research.
Finally, it is worth noticing that the (anti-)symmetry requirement of the trial wave function can be considered as a serious limitation of our method, because the straightforward and general way of constructing (anti-)symmetric NNWFs (Eq. 8) is computationally not feasible for many-body applications. Fortunately, the fact that the lowest energy state is always bosonic makes it possible to simulate bosons by relying on that the optimization will make the NNWF approximately symmetric. This happens in other forms in all other QMC imaginarytime projection methods, like Diffusion Monte Carlo, and can be systematically improved by a more accurate tuning of the parameters (ref?).
When one is interested in the simulation of fermions however, the problem becomes more serious. In fact, the approach we used in this work introduces a fixed-node approximation. This means that the quality of our solution will depend on the nodal surface resulting from the choice of the functions that form the Slater determinant. In other words, we are hitting the fermion sign problem from a new perspective, and the hope to find a breakthrough that allows for unbiased and computationally affordable simulations is to be considered almost null. However, due to the intrinsic flexible nature of the NN, there is hope to find a novel systematic approach to reduce the fixed-node bias.

?
A. Settling for a network structure

Activation Functions
Before we started production of data for this paper, we first investigated the influence of activation function choices on the achieved energy accuracy. Besides the a T and a G hidden layer activation functions mentioned in section II.2, we also tried to use the identity function for some of the hidden units: Besides the exponential activation function, for the output unit we also tried the commonly employed logistic activation function a L (p) = 1 1 + exp(−p) , which is bounded between 0 and 1. Note that this range restriction is not a problem in our VMC method, as the scale or normalization of the wave function is not relevant.
We dismissed the common choice of linear output activation spanning full R, because in our testing we found it yielding unfavorable oscillations around value 0 for input ranges where the wave function has a small amplitude. Furthermore, in the bosonic case we would want to avoid a wave function with variable sign anyway.
Energy results for the harmonic soft-core trap and certain combinations of hidden layer and output activation functions are displayed in Fig. 9. Results for the combination of a T and a G hidden layer activation function with either a L or a E on the output are displayed in Fig. 10.

Hidden Layer Structure
Similarly, we investigated how the accuracy changes when changing the size and number of hidden layers. An overview of our results is shown in Fig. 11.    7  13  19  7x7  7x13  13x7  13x13  7x7x7 FIG. 11. Like Fig. 9, but comparing different numbers of hidden layers and hidden units for the case of T/G-E activation function configuration.