Could artificial intelligence win the next Weather Photographer of the Year competition?

Since the 1950s, artificial intelligence (AI), in various guises and levels of abstraction, has been used to create visual art. Early efforts typically comprised abstract computer graphics – such as a 1967 untitled piece by Frieder Nake that now hangs in the Tate Modern – or required artistic styles to be hardcoded into the system – such as the AARON software written by Harold Cohen in the 1970s. However, in each case, the creativity came from the human rather than the computer. This requirement for a high level of human interference changed with the development of generative adversarial networks (GANs; Goodfellow et al., 2014). GANs are a type of artificial neural network (ANN) used to generate new data samples from a given input dataset. GANs are composed of two sub-networks: a generator network that creates new data samples, and a discriminator network that tries to classify the generated samples as either real or fake. The two networks are trained together in a competitive manner, such that the generator network learns to create realistic data samples that fool the discriminator network, while the discriminator network becomes better at identifying fake data samples. GANs have seen considerable success in art and image generation, as well as special cases of photorealism (e.g. human faces), but typically perform poorly in general cases of photorealism (e.g. Wang et al., 2022). They have recently been surpassed in this regard by denoising diffusion probabilistic models (hereafter, diffusion models; Ho et al., 2020). Diffusion models are essentially autoencoders trained to remove Gaussian noise from images; once trained, the denoising networks are sufficiently powerful that an image can be generated from pure noise. Both GANs and diffusion models can be trained to accept text prompts that ‘guide’ the generation or denoising towards a desired destination in some language embedding space. In each case, during model training, a caption describing the training image is compressed into this embedding using an additional pre-trained language model and fed in as a secondary input along with the image. Figure 1 compares attempted photorealistic image generation between a leading GAN and a leading diffusion model, given the same text prompt. Although the GAN produces an image that accurately fits the description of ‘a foggy day in London’, it is highly unlikely to fool any human into thinking it is a real photograph, even less so that it is ‘award-winning’. In contrast, the diffusion model produces an image that might reasonably convince a human that it is a real photograph, even accurately rendering Big Ben and the Palace of Westminster. In this article, we will explore the ability of three publicly available diffusion models to produce photograph-like images of weather themes capable of winning the Weather Photographer of the Year competition. We will compare the models against previous winners and discuss the relative abilities of each model across different themes before finishing with a Turing test – where the reader is invited to try to tell apart the real photographs from those created by the diffusion model ‘AI artists’.


Introduction
Since the 1950s, artificial intelligence (AI), in various guises and levels of abstraction, has been used to create visual art.Early efforts typically comprised abstract computer graphics -such as a 1967 untitled piece by Frieder Nake 1 that now hangs in the Tate Modern -or required artistic styles to be hardcoded into the system -such as the AARON software written by Harold Cohen in the 1970s.However, in each case, the creativity came from the human rather than the computer.
This requirement for a high level of human interference changed with the development of generative adversarial networks (GANs; Goodfellow et al., 2014).GANs are a type of artificial neural network (ANN) 2 used to generate new data samples from a given input dataset.GANs are composed of two sub-networks: a generator network that creates new data samples, and a discriminator network that tries to classify the generated samples as either real or fake.The two networks are trained together in a competitive manner, such that the generator network learns to create realistic data samples that fool the discriminator network, while the discriminator network becomes better at identifying fake data samples.GANs have seen considerable success in art and image generation, as well as special cases of photorealism (e.g.human faces 3 ), but typically perform poorly in general cases of photorealism (e.g.Wang et al., 2022).
They have recently been surpassed in this regard by denoising diffusion probabilistic models (hereafter, diffusion models; Ho et al., 2020).Diffusion models are essentially autoencoders 4 trained to remove Gaussian noise from images; once trained, the denoising networks are sufficiently powerful that an image can be generated from pure noise.Both GANs and diffusion models can be trained to accept text prompts that 'guide' the generation or denoising towards a desired destination in some language embedding space. 5In each case, during model training, a caption describing the training image is compressed into this embedding using an additional pre-trained language model and fed in as a secondary input along with the image.Figure 1 compares attempted photorealistic image generation between a leading GAN and a leading diffusion model, given the same text prompt.Although the GAN produces an image that accurately fits the description of 'a foggy day in London' , it is highly unlikely to fool any human into thinking it is a real photograph, even less so that it is 'award-winning' .In contrast, the diffusion model produces an image that might reasonably convince a human that it is a real photograph, even accurately rendering Big Ben and the Palace of Westminster.
In this article, we will explore the ability of three publicly available diffusion models to produce photograph-like images of weather themes capable of winning the Weather Photographer of the Year competition.We will compare the models against previous winners and discuss the relative abilities of each model across different themes before finishing with a Turing test -where the reader is invited to try to tell apart the real photographs from those created by the diffusion model 'AI artists' .

The diffusion models
This section provides a brief description of the three diffusion models used in this article.These three models were chosen because they were publicly available -ruling out, for example, Google's Imagen -and Language embedding spaces are mathematical spaces where words or phrases from a language are mapped according to some similarity metric.consistently produced good quality output -beating, for example, craiyon (also known as DALLE-mini) and LMU Munich's Latent Diffusion.DALLE-2 was developed by OpenAI, who also produced the open access languageembedding model CLIP (used by both Midjourney and Stable Diffusion, as well as VQGAN in Figure 1).Despite this, DALLE-2 uses the more recently developed GPT-3 as its language-embedding model.Although both DALLE-2 and GPT-3 are publicly available, 6 neither are open access.DALLE-2 uses 3.5 billion parameters, a significant reduction from the 12 billion used in its predecessor; however, GPT-3 uses 175 billion parameters.
Midjourney was founded by David Holz.It is not open access but is publicly accessible through a dedicated Discord server. 7t uses CLIP for language embedding, with the underlying diffusion model constantly being refined.It is at a slight disadvantage to the other two diffusion models used, as Holz says: 'we wanted it to kind of look beautiful, and beautiful doesn't necessarily mean realistic ….If anything, actually we do bias it a little bit away from photos ' . 8 Stable Diffusion was developed by Stability.Ai and made fully open access on 22 August 20229 whereafter users could download and run it on their own computers.Prior to this, like Midjourney, it was publicly accessible through a Discord server, where the user could supply prompts to a bot.The most recent version, v1.4,uses 890 million parameters.
There are three final caveats before we start.First, for fair comparison throughout, all images are rendered square -either directly, or through cropping.When cropping actual photographs, I have tried to do so in as fair a manner as possible, preserving as much of the original content and artistic direction as I could.Second, I have indulged in minor 'prompt engineering' throughout.In other words, adding suggestions like 'award-winning' or the names of stock photo companies to encourage the AI artists to produce more professional-looking images.In most cases, without these additions, the AI would still produce a photorealistic depiction of the weather in the prompt, but they tended to be far less visually appealing.Third, in most cases I have had each artist produce several images for each prompt (never exceeding four) using a different random seed to generate the initial noise.This then requires one of several to be selected for use in the paper -typically I have selected those that I think best fit the competitive photography aesthetic, while rejecting the handful that contain obvious giveaway artefacts or errors.

Comparison with previous winners
For the first task (Figure 2), we compare our three AI artists to three winners of the 2021 Weather Photographer of the Year competition -the overall winner and the winners of the youth photographer and public choice categories.In each case, the AI is fed only the description of the winning photo given on the RMetS website 10 , modified slightly in the young photographer case to include 'photo of' at the beginning.
The overall winner was a photo of a foggy autumn morning in northern Italy taken from a hilltop church by Giulio Montini.The prompt mentions only that a church is involved rather than explicitly defining it as the viewpoint, and that ambiguity leads Midjourney and Stable Diffusion to produce photographs of an Italian-style rural church surrounded by morning mist.DALLE-2 produces an image much closer to the winner, capturing a foggy valley (albeit containing a church tower) lit by a low elevation morning sun.
The young photographer winner was a photo of a supercell building over a farm in Kansas taken by Phoenix Blue.The prompt does not mention a farm, but does include Kansas, and the two AI artists that include some landscape (Midjourney and Stable Diffusion) produce scenery identifiable as the Great Plains.All three AI artists capture the greenish hue associated with developing severe convection, but Stable Diffusion is probably closest to the winner in terms of producing an authentic supercell cloud structure.Note that the prompt does not explicitly mention a supercell, so the AI artists have inferred this structure based on it being a severe thunderstorm over Kansas.
The public choice winner (and overall runner-up) was a photo of a lightning strike off the southeast coast of France by Serge Zaka.The prompt is clear and detailed, and all three AI artists capture the scene correctly with DALLE-2 and Stable Diffusion producing realistic looking clouds, water and lightning.Midjourney captures realistic looking clouds and sea, but the lightning looks more like a firestorm from a fantasy film -though it is at least accurately reflected in the ocean.In each case, the text used to describe the photograph and its merits on the competition website was used as a prompt, with occasional modification.Overall winner prompt: 'This photo can only be taken from one point.There is a small church on top of a hill in the town of Airuno, in the province of Lecco in Italy.Under the mist passes the River Adda.In the autumn months, on some days, it is possible to see this show with the first lights of sunrise' .Young photographer winner prompt: 'Photo of beautiful clouds coming in right before a Kansas storm.Anyone who has experienced a severe thunderstorm knows about the eerie deep green/blue colour sometimes present as the storms approach' .Public choice winner prompt: 'Lightning from an isolated storm over Cannes Bay.The judges commented that few storms are as beautiful as those isolated over water.The photographer was a perfect distance away from this storm to capture three things crucial for a winning photo composition: the sky, the storm, and the water' .
These results highlight the importance of using the right prompt and right theme with the right AI artist, and show that this type of direct comparison is probably not a fair competition.

Exploring prompts and themes
So, if we are going to get these AI artists to produce competition-worthy photographs, we need to get a feel for the themes and prompts that suit their particular style.After all, one would not ask Annie Leibovitz to take landscape photos, nor Ansel Adams to take portraits.In Figure 3, we give each of the three AI artists a more 'artist-friendly' prompt that describes the composition and style in more detail.They are given one prompt for each of four themes (extreme weather, landscape, human interest and macro/close-up) that broadly capture all the Photographer of the Year competition finalists over the few years.
We can see that each artist has a different strength.DALLE-2 is excellent at capturing physics and structural features, highlighted in its render of the breaking wave hitting a lighthouse.This is challenge because not only are breaking waves turbulent and extremely complex, but the artist must also correctly illuminate the structure -containing thousands of translucent or diffractive particles -for it to be believable.Not only that, but DALLE-2 simulates an interaction between the two, showing the wave coming around each side of the lighthouse.
Midjourney is the most creative with the prompts and has perhaps the best lighting of the three AI artists.For prompt 2, the bridge is in the distance with the frosty field in the foreground beautifully (and correctly, given the roughness of the surface) illuminated by the rising sun's yellow light.The wave is also lit in a visually striking manner, and, unlike the other two artists, the Indian girl is rendered as a silhouette with the monsoon background as a focus -an inventive way to respond to the request for 'candid' framing.
Stable Diffusion excels at composition.Regardless of the ultimate render quality, each of the four prompts was framed in the manner most consistent with a human photographer.This is particularly highlighted both by the clever choice of depth-of-field in the plant bud shot; the still, reflective stream passing under the frosty bridge; and the portrait-like framing of the Indian girl, including a realistic render of her wet clothing.All three artists can produce photorealistic renders of the given prompts, but it is clear that to fool the human eye -or even win a competition -we must choose the right combination of artist, theme and prompt.

The final test
In this final section, you -the reader -are invited to judge.Sixteen images are given in Figure 4, five of which are real photographs that were shortlisted finalists in the 2021 Photographer of the Year competition.The other 11 have been generated by the three AI artists (four by DALLE-2, two by Midjourney, five by Stable Diffusion).When you have made your decision, you can check your answer by looking at the final word in the second paragraph of the conclusion, which comprises the five letters labelling the real photographs.The text prompts and models used for each photograph are given in the Supporting Information Notes S1.

Concluding remarks
In this short article, we explored the ability of three publicly available diffusion models (DALLE-2, Midjourney and Stable Diffusion) at creating realistic weather-themed photographs that could compete in a photography competition.We discussed the relative strengths and weaknesses of each of the three AI artists before finishing with a Turing test, where the reader was asked to choose which 5 of 16 photographs were shortlisted finalists in the 2021 Weather Photographer It is, of course, not that simple.There is still a reasonable degree of human creativity required -in dreaming up the text prompts, in refining them to make the generated images more suitable, and in selecting the best images from those produced to take forward.Nor are these artists perfect -many generated images still contain artefacts that immediately give them away as fakes -and they are still far better at creating artistic stylisations than photorealistic images.But, these gaps are reducing with every new hardware and software development.Eventually, when it comes to telling the dif-ference between human and AI-generated photographs, we will be blind.
GANs and diffusion models are not limited to producing images and have many potential applications in other fields, including meteorology and climate science.Such applications include downscaling climate model output (Cheng et al., 2021), ensemble weather prediction (Bihlo, 2021), nowcasting (Rüttgers et al., 2022) and even storm surge models (Lütjens et al., 2020).Generative models can also now produce convincing text output.To demonstrate this, we used GPT-3 11 to produce the first paragraph of the introduction as well as the definitions used in footnotes 2, 4 and 5.

Figure 1 .
Figure 1.Comparison of two artificial intelligence artists, one based on a generative adversarial network (VQGAN + CLIP, top), and one based on a diffusion model (Stable Diffusion, bottom).Each was given the prompt: 'Award winning photograph of a foggy day in London' .Output is shown after 10, 25, 50 and 150 iterations of each model.

Figure 2 .
Figure 2. Comparison of 2021 winning photographs with those produced by the three artificial intelligence artists.In each case, the text used to describe the photograph and its merits on the competition website was used as a prompt, with occasional modification.Overall winner prompt: 'This photo can only be taken from one point.There is a small church on top of a hill in the town of Airuno, in the province of Lecco in Italy.Under the mist passes the River Adda.In the autumn months, on some days, it is possible to see this show with the first lights of sunrise' .Young photographer winner prompt: 'Photo of beautiful clouds coming in right before a Kansas storm.Anyone who has experienced a severe thunderstorm knows about the eerie deep green/blue colour sometimes present as the storms approach' .Public choice winner prompt: 'Lightning from an isolated storm over Cannes Bay.The judges commented that few storms are as beautiful as those isolated over water.The photographer was a perfect distance away from this storm to capture three things crucial for a winning photo composition: the sky, the storm, and the water' .

Figure 3 .
Figure 3.Comparison of the three artificial intelligence artists across four different weather photography themes (extreme, landscape, human interest, and macro).Prompt 1: 'Award-winning photograph of large waves crashing into a lighthouse in Devon during the peak winds of Storm Eunice' .Prompt 2: 'Award-winning photograph of a small stone bridge in the English countryside, covered in frost, lit by an early sunrise.The photographer got up early and waited for the clouds to clear before capturing this moment' .Prompt 3: 'Award-winning candid black and white photo of a young Indian woman in Varanasi smiling and dancing in the first monsoon rains' .Prompt 4: 'Close up photograph of a single bud emerging from dried, cracked ground with some dead grass after a long drought, bokeh trees in the distance, hot summer day, Photofest, trending, 4k' .