M‐ENIAC: A Physics‐Informed Machine Learning Recreation of the First Successful Numerical Weather Forecasts

In 1950 the first successful numerical weather forecast was obtained by solving the barotropic vorticity equation using the Electronic Numerical Integrator and Computer (ENIAC), which marked the beginning of the age of numerical weather prediction. Here, we ask the question of how these numerical forecasts would have turned out, if machine learning based solvers had been used instead of standard numerical discretizations. Specifically, we recreate these numerical forecasts using physics‐informed neural networks. We show that physics‐informed neural networks provide an easier and more accurate methodology for solving meteorological equations on the sphere, as compared to the ENIAC solver.


Introduction
Numerical weather prediction has been the backbone of modern meteorology for more than 100 years (Bauer et al., 2015;Bjerknes, 2009;Richardson, 1922).It was first shown feasible in the seminal work of Charney et al. (1950), although the dream of numerical weather prediction is much older, dating back to Richardson (1922) more than 100 years ago.The first successful numerical weather forecast was carried out on the Electronic Numerical Integrator and Computer (ENIAC), using a highly simplified version of the governing equations of the atmosphere.This simplified model, the barotropic vorticity equation, was solved using a straightforward finite-difference method, and the resulting accuracy of the forecasts was indeed quite underwhelming (Lynch, 2008).Still, in comparison to the numerical forecast attempted by Richardson 30 years earlier, the forecasts by Charney, Fjørtoft and von Neumann were a resounding success that eventually lead to the quiet revolution of numerical weather prediction (Bauer et al., 2015) (Figure 1).
The ENIAC forecasts were recreated in the paper by Lynch (2008), and later carried out on a cellphone (Lynch & Lynch, 2008) to showcase the dramatic improvements of processing powers, and the potential that this holds for new avenues for numerical weather forecasting.With the recent breakthrough in training deep neural networks (Krizhevsky et al., 2012), and the enormous interest in both using neural networks for solving differential equations (Raissi et al., 2019) and meteorology in general, we argue that it is natural to revisit the ENIAC integrations once again, this time using machine learning.More specifically, in this paper we ask the following speculative question: How would the first successful numerical weather forecasts have turned out, had machine-learning based numerical solvers be used instead of classical numerical integration?We refer to this task as M-ENIAC, a physics-informed machine learning recreation of the ENIAC forecasts.
While the recreation of the ENIAC forecasts using neural networks may seem contrived, we do believe this is a timely problem to consider for several reasons.As most other fields of the mathematical sciences, meteorology has seen a surge of interest in deep learning.While neural networks have been considered in this field as early as the 1990s, see (Gardner & Dorling, 1998) for a historical review, only recently was it shown that a transformer based data-driven model can outperform the state-of-the art ECMWF Integrated Forecasting System (Bi et al., 2023).
In parallel to the breakthroughs of neural networks in computer vision, they have also been proposed as an alternative to traditional numerical schemes for solving differential equations (Lagaris et al., 1998;Raissi et al., 2019).While this method is enjoying increasing popularity in recent years, there are almost no applications of this method in meteorology, having only been applied to the shallow-water equations on the sphere (Bihlo & Popovych, 2022).Importantly, to the best of our knowledge, this method has not yet been used for weather forecasting using real weather data.As such, we believe it is fitting to return to the problem of weather forecasting with the barotropic vorticity equation, this time using physics-informed machine learning.We will highlight some parallels between the earliest successful weather forecast using finite differencing, and the state-of-the-art of using neural networks for solving the same equation, which include both performance and accuracy issues, but also hint at some potential benefits of neural network based numerical discretization methods over more traditional approaches.
The further organization of this paper is as follows.In the following Section 2 we review the relevant literature on meteorological applications of machine learning and how neural networks can be used for solving differential equations.In Section 3 we introduce the barotropic vorticity equation and discuss its underlying assumptions along with its limitations for weather forecasting.This section also contains the description of how neural networks can be used to solve this equation.The main Section 4 presents the numerical results of integrating the vorticity equation on the sphere using physics-informed neural networks.We wrap up the paper with a summary of our results presented in Section 5.

Previous Work
Numerical weather prediction was first proposed by Richardson (1922).While impractical at the time, Richardson's ideas on numerically solving the governing equations of atmosphere-ocean dynamics would eventually be put into practice several decades later.The first successful (but highly simplified) numerical weather forecast was carried out by Charney et al. (1950).Here, the authors numerically solved the barotropic vorticity equation on a polar stereographic plane over North America and the North Atlantic.While being worse than the persistence prediction for three out of four 24 hr forecasts carried out (Lynch, 2008), this work sparked considerable investments into the field that over the decades to follow became a resounding success story (Bauer et al., 2015).
Today numerical weather prediction is carried out largely using traditional numerical methods, in particular finite difference, finite element or pseudospectral discretization methods (Durran, 2010).With the emergence of modern deep learning, there has been considerable interest in applying machine learning to meteorology and climate science in recent years.This includes all aspects of these fields, such as data assimilation (Geer, 2021), weather forecasting (Bi et al., 2023;Bihlo, 2021;Pathak et al., 2022;Weyn et al., 2019), ensemble forecasting (Bihlo, 2021;Brecht & Bihlo, 2023;Scher & Messori, 2018), weather nowcasting (Shi et al., 2015), downscaling (Mouatadid et al., 2017), and subgrid-scale parameterization (Gentine et al., 2018).While all of these references focus on data-driven approaches, there have also been first examples for solving the governing equations directly using neural networks (Bihlo & Popovych, 2022).This latter approach, dubbed physics-informed neural networks in the paper by Raissi et al. (2019), aims at approximating the solution of a differential equation with a neural network.For computing derivatives, the method relies on automatic differentiation (Baydin et al., 2018) rather than numerical differentiation, which is readily available in modern deep learning frameworks such as JAX, PyTorch or TensorFlow.This allows computing derivatives in single points, without having to rely on defining a computational mesh first.The method is thus entirely meshless, which makes it potentially attractive for meteorology, which had to devise a plethora of grids (Staniforth & Thuburn, 2012) for solving the governing equations of hydro-thermodynamics on a sphere using mesh-based discretization methods.
While the main idea behind physics-informed neural networks is quite elegant, in practice there are several issues that will have to be overcome to make them applicable for the atmospheric sciences.Physics-informed neural networks tend to converge to the wrong solution for longer training times (Wang et al., 2022;Wang & Perdikaris, 2023), and they typically cannot obtain the same level of error as traditional numerical methods can achieve, Geophysical Research Letters 10.1029/2023GL107718 see for example, Penwarden et al. (2023) for a study on various physics-informed neural networks and their associated error levels.This is likely not a theoretical shortcoming, but a practical one, as the optimization methods used for minimizing the physics-informed loss function may simply not yield an accurate enough minimum, resulting in rather large errors of the associated numerical solution.Using tailored optimization methods can vastly improve this situation (Bihlo, 2024).Physics-informed neural networks also take much longer to train in practice than solving the same equations using a traditional numerical method.For example, the physics-informed neural networks being trained here took about four hours each on a single NVIDIA RTX8000 GPU, while the re-created ENIAC solver from Lynch (2008) solves the same problem in a fraction of a minute.For meteorological applications it should also be stressed that we are not aware of any physics-informed neural networks that can exactly preserve conservation laws, although progress is being made in this direction as well (Cardoso- Bihlo & Bihlo, 2023;Jagtap et al., 2020).
All these issues bear some remarkable resemblance to the early days of numerical weather forecasting.The numerical scheme used by Charney et al. (1950) was neither particularly accurate (Lynch, 2008), nor was it conservative, with the first conservative numerical formulation for the vorticity equation only being proposed about 15 years later (Arakawa, 1966).The actual integrations by Charney et al. (1950) also took some 24 hr to carry out for a 24 hr prediction, meaning this forecast was able to just keep up pace with the evolution of the atmosphere itself.The same would likely hold true for physics-informed neural networks for more complicated models from atmosphere-ocean dynamics, although parallelization of the training of these networks on multi-GPU machines is possible.

The Barotropic Vorticity Equation on the Sphere
Here we describe the form of the vorticity equation on the sphere which we will solve using physics-informed neural networks.

Model Formulation
The barotropic vorticity equation is a single prognostic equation for the geopotential height of the atmosphere.It is derived from the two-dimensional Euler equations under the assumption of non-divergence of the velocity field, which allows recasting of these equations using the stream function.On the rotating sphere, the barotropic vorticity equation is given by where λ ∈ [ π, π] and φ ∈ [ π/2, π/2] are the longitude and latitude, respectively, and subscripts denote derivatives with respect to the associated variable.Here, ψ = ψ(t, λ, φ) is the streamfunction generating nondivergent flow, ζ is the associated vorticity, related to the streamfunction by the Laplacian in spherical coordinates, Ω is the angular velocity and a the radius of Earth.The stream function is related to the geopotential height z via ψ = gz/f 0 , with g being the gravitational acceleration and f 0 = 2 Ω sin φ 0 being the Coriolis parameter at a constant reference latitude.For more details on this model, consult the textbooks (Holton, 2004;Satoh, 2004).
The barotropic vorticity equation is an exceptional model in dynamic meteorology as it condenses down the evolution of the atmosphere into a single prognostic equation.Derived by Charney (1948) as an approximation of the primitive equations, the barotropic model eliminates several complications of these equations, such as the presence of both fast and slow moving waves, by filtering out the fast gravity waves and only retaining the slow Rossby waves.The barotropic vorticity equation is also easier to integrate as only a single variable (the geopotential height or pressure field) is needed, rather than both pressure and wind fields.This was the model chosen for the ENIAC integrations (Charney et al., 1950), see (Lynch, 2008) for an excellent historical review and recreation of these forecasts.What was forecast was the height z of the 500 hPa pressure surface, the so-called geopotential (height field), instead of the stream function, and the goal was to compare the forecast of z after 24 hr to the observed geopotential height.The 500 hPa geopotential serves as a prevalent and valuable meteorological parameter, offering a comprehensive depiction of synoptic-scale mid-latitude weather conditions.
We note here that the original ENIAC integrations (Charney et al., 1950) were done in Cartesian coordinates, using a polar stereographic projection of the spherical geopotential height field onto a twodimensional plane.We speculate that this was done largely out of necessity, since solving (1) requires time-stepping of the vorticity and a subsequent numerical solution of the Poisson equation for the stream function, which is a non-trivial task on the sphere.On the plane this Poisson equation can be solved using Fourier series, on the sphere this solution would require the use of spherical harmonics.Since the original input data is of course given on the sphere as the geopotential height z, and as we shall review below, no issue with solving Equation 1 directly using neural networks arise, we will solve this equation as given, and will only project the results onto the used two-dimensional plane for comparison purposes against the original ENIAC integrations.

Data
The initial conditions and verifying analysis for the geopotential height were obtained from the NCEP-NCAR reanalysis data (Kalnay et al., 1996).This reanalysis data is available from January 1948 onward and thus covers the four ENIAC test cases, which were (a) 5 January 1949, (b) 30 January 1949, (c) 31 January 1949, and (d) 13 February 1949.The spatial resolution of the data is 2.5°× 2.5°, equating to 144 × 73 grid points.We obtained the geopotential height z of the 500 hPa pressure surface.As indicated above, the stream function itself is related to the given geopotential height by ψ = gz/f 0 .We set the reference latitude in the Coriolis parameter f 0 to φ 0 = π/4, or 45°North (Holton, 2004).

Solving the Vorticity Equation Using Neural Networks
In the following, we denote by Δ(t, λ, φ, ψ (3) (t, λ, φ)) = 0 the equation obtained by substituting the vorticity ζ in Equation 1 into the first equation and converting it into a third-order partial differential equation for the streamfunction ψ.Here, we denote with ψ (3) all partial derivatives of the streamfunction with respect to the independent variables of order not greater than three.The ENIAC integrations solved the vorticity equation for 24 hr, or t f = 86,400 s.
For solving the barotropic vorticity equation Δ = 0 using physics-informed neural networks we parameterize the solution as a neural network of the form ψ θ = N θ (t,λ,φ), where θ are the weights of the neural network (since physics-informed neural networks typically employ fully connected architectures, these weights are the weight matrices and bias vectors of a multi-layer perceptron (Raissi et al., 2019).See also the appendix for further details.).That is, the input to the neural network is a point (t, λ, φ) in three-dimensional space-time, and the output is the scalar ψ θ , which should be an approximation to the true streamfunction at the given point.
The weights of this neural network are then found by minimizing the physics-informed loss function.
with the equation loss and initial value loss being defined as the mean-squared errors and with γ Δ , γ i ⩾ 0 being positive scaling constants.These losses are minimized over the set of collocation points j=1 covering the spatio-temporal domain over which the vorticity equation should be solved, and covering the spatial domain at the initial time t = 0. Here, ψ 0 = ψ(0, λ, φ) is the initial condition for which the vorticity equation is being solved.

Geophysical Research Letters
10.1029/2023GL107718 The physics-informed loss function adjusts the weights θ of the neural network in such a manner that both the differential equation and the associated initial condition hold, upon which point ψ θ = N θ (t,λ,φ) will provide a numerical approximation to the true solution ψ = ψ(t, λ, φ).Here we use a simple feedforward neural network for N θ .Specifically, all neural networks trained in the following have a total of 10 hidden layers with 80 units per layer, using the hyperbolic tangent activation function.The collocation points {t j=1 for the vorticity equation were sampled from the temporal-spatial domain Ω = [0, 86,400] × [ π, π] × [ π/2, π/2] such that the spatial points covering the sphere were uniformly distributed, that is, fewer points going either north or south from the equator toward the poles.This was realized by sampling an (N Δ , 3)-array of points from the interval [0, 1] using Latin hypercube sampling (Stein, 1987).Each resulting sampled point (t*, λ*, φ*) was then mapped to the domain Ω using the point transformation see (Simon, 2015) for further details.For generating the initial value collocations points we proceed in the same manner, except for only sampling (N i , 2)-tuples of points, with each sampled point being (λ*, φ*) and t* = 0 for all such pairs of spatial points.We use N Δ = 100, 000 and N i = 10, 000 collocation points for the differential equation and initial value loss, respectively.These values were chosen to roughly equate the spatial resolution of the gridded reanalysis data, 144 × 73 = 10, 512, and factoring in the time step of 2-3 hr chosen in (Charney et al., 1950), which would yield a total of 84,096 to 126,144 computational points over the global spatio-temporal domain.While a direct comparison between the classical numerical method used in (Charney et al., 1950) and neural network based methods is of course challenging, the chosen values at least make the used computational grids comparable.We use bilinear interpolation from the given initial reanalysis geopotential height on the latitude-longitude grid to sample the initial condition at the required collocation points.
Neural networks are known to be sensitive if their input range is not normalized.Thus, we normalize the time values from [0,86,400] to [ 1,1] in a custom normalization layer to the network.Since the hidden layers of our network produce values between 1 and 1 and the values of the unscaled stream function are large, of the order of 10 8 , we pass the output of the network to a custom scaling layer, that multiplies the predicted value with Ψ = a 2 /t f .
To non-dimensionalize the loss function, we set γ Δ = a 2 t f / Ψ) 2 and γ i = 10/Ψ 2 .This allows us to work with the equation in the form 1, without having to non-dimensionalize the equation itself.
We use the same approach as in (Bihlo & Popovych, 2022) to enforce the boundary conditions on the sphere, namely passing the input (λ, φ)-coordinates into a custom coordinate-transform layer that computes (cos φ cos λ, cos φ sin λ, sin φ), meaning it embeds the sphere into R 3 via an implicit use of Cartesian coordinates r = (x, y, z), so that the neural network solution is represented as ψ θ = ψ θ (t, r(λ, φ)).This is done to simplify the loss function (2a) which otherwise would also require an explicit boundary value loss.

Implementation
The method described in the previous subsection was implemented using TensorFlow 2.12, with training being done on a single NVIDIA RTX8000 GPU.Training for each model took about 4 hr.The loss function was minimized using a standard Adam optimizer (Kingma & Ba, 2014) with a learning rate of 10 3 , using a mini-batch size of 1,000 collocation points for the differential equation loss and 100 collocation points for the initial value loss per batch.Training was done using single precision arithmetic to reduce the overall computational cost.The codes will be made available on GitHub upon publication of this paper (Bihlo & Brecht, 2024).

Numerical Results
We present here the numerical results of our machine learning recreation of the ENIAC forecasts.The evaluation is done on the original ENIAC domain, which encompasses North America and the North Atlantic.

Error Measures
As error measures we choose the same measures as reported in (Lynch, 2008), which are the where ψ fcst is the forecast stream function and ψ gt is the true stream function after 24 hr, and Δψ fcst and Δψ gt denote the spatial differences between adjacent grid points of the forecast and ground truth stream functions, respectively.
The S1 score is a measure of the accuracy of the gradients of forecasts, which are important for meteorological parameters such as geopotential or pressure, as their gradients are directly related to the wind fields (Holton, 2004).Perfect forecasts will have an S1 score of zero.For more details on these verification measures, see (Wilks, 2011).

Evaluation
We choose two baseline forecast models here, the original ENIAC integrations (Charney et al., 1950), obtained from the recreation of these forecasts in (Lynch, 2008), and the persistence prediction.The persistence prediction predicts that the final stream function field is exactly the same as the initial field, that is, that there is no temporal evolution over the forecast horizon.Note that the values for the and persistence predictions are the same as in Table 1 of Lynch (2008).
Table 1 shows that the recreated ENIAC integrations using physics-informed neural networks outperform the original ENIAC integrations in all four cases using the RMS error measure, and importantly also outperform persistence in all four cases.Note that this was not the case for the original ENIAC integrations, for which only one forecast was better than persistence in terms of the RMS error.The physics-informed neural network predictions also outperform persistence in one case using the mean error measure, while the original ENIAC integrations did not outperform persistence in a single case in this error measure.We also note that physics-informed neural networks outperform the ENIAC integrations in all cases in the mean error measure.The S1 score indicates that both ENIAC and M-ENIAC outperform persistence, with M-ENIAC outperforming ENIAC in two of the four cases, and coming close to matching the S1 scores for the remaining two cases.The forecast differences are shown in Figure 1.

Conclusions
This paper was devoted to the recreation of the first successful numerical weather forecast using machine learning.We have shown that the method based on physics-informed neural networks is capable of outperforming the original ENIAC integrations in terms of the mean error and RMSE, and matching it in terms of the S1 score.
The method also outperforms the persistence prediction in terms of the RMSE and S1 score, while it could not consistently outperform the persistence prediction in terms of the mean error.
We should like to stress that physics-informed neural networks have several advantages over classical numerical methods, which are particularly relevant for the atmospheric sciences.First, the method avoids having to compute derivatives using numerical methods, since all derivatives are computed using automatic differentiation.This renders the method meshless which greatly simplifies solving differential equations on manifolds, and in particular on the sphere.This was a major complication for the original ENIAC integrations by Charney et al. (1950), who first formulated the equations on the sphere but then solved it on a polar stereographic projection so that Cartesian coordinates could be used.In turn, no such complications arose in the present work.Moreover, meteorological equations such as the barotropic vorticity equation, but also more complicated models such as the shallow-water equations using the vorticity-divergence formulation require the solution of a Poisson equation on the sphere, which is a rather costly endeavor if spherical harmonics have to be used.No Poisson equation has to be solved using our physics-informed neural network formulation, which greatly simplifies the numerical procedure involved.
While the solution of the barotropic vorticity equation for numerical weather forecasting is of mostly historical interest today, there are some interesting parallels between the early days of numerical weather forecasting and the state-of-the-art of deep learning-based solvers for meteorology.In particular, these issues concern the accuracy of the numerical method, the lack of conservation, and the time it takes them for completing a numerical integration.All of these issues were overcome in less than 50 years (Bauer et al., 2015) for traditional numerical solvers for the atmospheric governing equations.Given the immense interest in the field of scientific machine learning today and the trajectory of research in artificial intelligence in general, we are optimistic that it will not take 50 years for these issues to also be overcome for machine-learning based numerical solvers.With this in mind we would like to encourage the development of advanced physics-informed neural networks for numerical weather forecasts.

Geophysical Research Letters
10.1029/2023GL107718 The first expression defines the equation loss, which is sought to be minimized over the sets of collocation points i=1 , ∈ [0,t f ] , the second expression defines the initial value loss, and the norm being used is usually the l 2norm.The respective loss functions are thus Geophysical Research Letters 10.1029/2023GL107718 and the overall loss being minimized is the composite loss L(θ) = L Δ (θ) + L i (θ).Since N θ (t) of Equation A2 is a differentiable functions of t one can analytically compute dN θ dt .In practice this is done using automatic differentiation (Baydin et al., 2018).
The results of training a PINN for the Lorenz-1960 model for t f = 5 are shown in Figure A2.Here we used a neural network with 4 hidden layers and 60 units per layer with a hyperbolic tangent activation function.A total of N Δ = 200 collocation points were used and the network was trained to minimize the loss L(θ) over 15,000 steps (epochs) using the Adam optimizer.At the end of training, the neural network indeed approximates the solution to the Lorenz-1960 model reasonably well.

Figure 1 .
Figure 1.Absolute error of 500 hPa geopotential for ENIAC (top row) and M-ENIAC (bottom row) for each case after 24hr forecast time.

Figure A2 .
Figure A2.Result of training a PINN for the Lorenz-1960 model.Left: Neural network solution and numerical reference solution obtained from a Runge-Kutta integration.Middle: Difference of neural network solution and Runge-Kutta reference solution.Right: Time series of loss over training the network.

Table 1
Verification Measures for the Three Forecast Models, Physics-Informed Neural Networks (M-ENIAC), the Original ENIAC Forecasts (ENIAC) and the Persistence Forecast (Pers) Reported are mean error, root mean squared error and S1-score.Bold indicates the best values.