Exploring the Evolution of Metal Halide Perovskites via Latent Representations of the Photoluminescent Spectra

In the last several years, laboratory automation and high‐throughput synthesis and characterization have come to the forefront of the research community. The large datasets require suitable machine learning techniques to analyze the data effectively and extract the properties of the system. Herein, the binary library of metal halide perovskite (MHP) microcrystals, MAxFA1−xPbI3−xBrx, is explored via low‐dimensional latent representations of composition‐ and time‐dependent photoluminescence (PL) spectra. The variational autoencoder (VAE) approach is used to discover the latent factors of variability in the system. The variability of the PL is predominantly controlled by compositional dependence of the bandgap. At the same time, secondary factor of variability includes the phase separation associated with the formation of the double peaks. To overcome the interpretability limitations of standard VAEs, the workflow based on the translationally invariant variational (tVAEs) and conditional autoencoders (cVAEs) is introduced. tVAE discovers known factors of variation within the data, for example, the (unknown) shift of the peak due to the bandgap variation. Conversely, cVAEs impose known factor of variation, in this case anticipated bandgap. Jointly, the tVAE and cVAE allow to disentangle the underlying mechanisms present within the data that bring a deeper meaning and understanding within MHP systems.


Introduction
The compositional space of metal halide perovskites (MHPs) is vast due to an extensive number of materials that can form the perovskite structure. Judicious combinations of these components result in a distinctive MHPs matrix with specific physical, structural, and optoelectronic properties. [1] These properties can be widely tuned depending on each component, making the MHPs combinatorial space a fascinating platform for material discovery and design. Despite this vast combinatorial space, only a handful of these compounds have been explored which brings about opportunities in synthesizing compositions that can create compelling and enhanced capabilities.
MHPs are intrinsically unstable and tend to undergo degradation due to various factors such as light, [2] temperature, [3] and ambient humidity, [4] which can lead to phase segregation [5] and ion migration. [6,7] Cation and anion alloying [8] has been explored as a pathway to ameliorate this problem. Notably, creating binary and ternary solid solutions allows not only for structural, chemical, and thermal stabilities but also for application tunability and even new discoveries.
Optimization of these materials is quite complex and traditionally relies on a trial-and-error-based experiment that is used to tune a single composition toward the desired properties. Correspondingly, a manual approach to synthesize these compositions is extremely time-consuming and not as efficient. High-throughput synthesis using various automated systems such as microfluidic systems, [9] flow reactors, [10] and pipetting robot [11] that can be used for fully automated systems or combined human-automated workflows has increased in popularity. Incorporating this increased rate of synthesis with a highthroughput optical characterization such as photoluminescence (PL) can provide an expansive dataset that can provide composition-optical properties and the stability of the compositions. Here, we use high-throughput experiment workflow using an automated liquid handler as described previously [12][13][14][15] to create a binary system.
There are two main known variabilities presenting in this system that is important to understand for composition tuning: the first is bandgap variability in a mixed solid solution system DOI: 10.1002/aisy.202200340 In the last several years, laboratory automation and high-throughput synthesis and characterization have come to the forefront of the research community. The large datasets require suitable machine learning techniques to analyze the data effectively and extract the properties of the system. Herein, the binary library of metal halide perovskite (MHP) microcrystals, MA x FA 1Àx PbI 3Àx Br x , is explored via low-dimensional latent representations of composition-and time-dependent photoluminescence (PL) spectra. The variational autoencoder (VAE) approach is used to discover the latent factors of variability in the system. The variability of the PL is predominantly controlled by compositional dependence of the bandgap. At the same time, secondary factor of variability includes the phase separation associated with the formation of the double peaks. To overcome the interpretability limitations of standard VAEs, the workflow based on the translationally invariant variational (tVAEs) and conditional autoencoders (cVAEs) is introduced. tVAE discovers known factors of variation within the data, for example, the (unknown) shift of the peak due to the bandgap variation. Conversely, cVAEs impose known factor of variation, in this case anticipated bandgap. Jointly, the tVAE and cVAE allow to disentangle the underlying mechanisms present within the data that bring a deeper meaning and understanding within MHP systems. and the second is phase stability. Composition and time dynamics of the PL spectra are determined by the interplay of multiple factors. One is the composition dependence of the bandgap in the initial system, affected by the specific phase formation mechanisms. Depending on system, this can include formation of the ideal and nonideal solid solutions, phase separation, and formation of the disordered systems. The PL peak position is then determined by the bandgap of the host material and the presence of the defects and Urbach tails. The time dynamics of the PL are determined by the size evolution of the particles, surface and bulk defect density evolution with time, and the oxidation. These mechanisms are, in turn, reflected in the evolution of the PL peak intensity and position.
The composition and time dynamics of the PL are an interplay of a mix of the variabilities mentioned. Because of the occurrence of phase segregation seen frequently in MHPs, as an example, analyzing the evolution of PL after synthesis over time can provide information about the kinetics and characteristics of the system, providing feedback to automated synthesis workflows. Phase segregation or degradation in MHPs can be seen in PL over time by a decrease in intensity of the initial peak and the emergence of a new peak that tends to be shifted. A time-dependent PL dataset that includes an evolution of peak intensity and peak position makes it difficult to interpret disparity in a large dataset from a human's viewpoint. Therefore, the analysis of such data requires a more complex processing tool to gain insight into the underlying mechanisms of these compositions.
Machine learning (ML) methods not only allow for the analysis of certain kinetics and characteristic of the compositions but also create discovery pathways that allow scientists to modulate and define parameters based on the results of the high-throughput synthesis. Recently, some of the ML methods that are being applied to study materials properties based on different synthesis and characterization techniques are support vector machine algorithm (SVM), [16,17] Bayesian optimization, [18,19] nonnegative matrix factorization (NMF), [12,13,20] and Shapley approach. [21,22] While many of these ML techniques are beneficial in analyzing these systems, an increase of complexity and various kinetics and characteristic processes can become harder to interpret with a nonlinear dataset, mainly ML techniques that use linear analysis. Many linear analyses work by reducing the dimensionality of datasets, which, in turn, can forgo information in the dataset. Traditionally, these datasets are analyzed using linear multivariate analysis methods including principal component analysis (PCA) and NMF. [13,[23][24][25] These methods seek to represent the dataset as a linear combination of the components and weights, where components represent the specific behaviors and weight the distribution of these behaviors across the chosen parameter space. For the time-dependent PL spectra, the typical representation will be as Iðc, t, λÞ ¼ Σa i ðcÞw i ðt, λÞ, where w i ðt;λÞ is 2D representation of the evolution of PL spectra and a i (c) is weight of this behavior within compositional space. Note that this representation is not unique, and for example, decompositions Iðc, t, λÞ ¼ Σa i ðc, tÞw i ðλÞ are also possible. In this case, the evolution of characteristic spectral responses over the joint composition-time space will be explored.
The fundamental limitation of the linear analysis methods is that they tend to give very poor performance for the cases when the data comprise peaks with significant variation of peak positions. [24] In this case, the PCA/NMF analyses give rise to multiple components with the number required to capture the system variability dependent on the ratio of the peak width to the range of peak position variability and the noise level. Similarly, in this case these components cannot be related to the physical factors of variability beyond peak shift, i.e., shape. As such, these methods can be used predominantly for exploratory data analysis and denoising.
In this study, we have chosen the binary system of MA x FA 1Àx PbI 3Àx Br x as a model system in which mixed cation/halide system allows fine-tuning of the bandgap, optical properties, and stability. The synthesis workflow of this binary system was extensively explained in our previous study [12] and can also be found in the supporting information (Table S1, Supporting Information). MAPbX 3 (X: I, Br) MHPs have shown amazing performances, although MA is inherently unstable when exposed to heat [26] and moisture, [26][27][28] which is problematic in terms of long-term stability. [29,30] FAPbX 3 compared to MAPbX 3 has shown to be more stable in air because FA-based perovskites have a Goldschmidt tolerance factor closer to one and higher thermal stability. [31,32] Generally, substituting iodide with bromide can lead to a bandgap decrease. Overall, a systematic incorporation of these two compositions within one another can be beneficial in determining what concentration of cation and halide is optimal for stability. Studying the PL over time is beneficial in terms of understanding the stability and kinetics of these compositions over time. The time-and compositiondependent PL spectra I(c, t, λ), where c is the composition vector, t is time, and λ is the wavelength, contain the information on the evolution of bandgap and surface defects in our system.
Here, the time evolution and concentration of the latent variables is explored to disentangle the representations found in the PL spectra across the composition space of a binary MHPs system. We further compare the conventional and translationally invariant variational autoencoder (tVAE) and conditional variational autoencoder (cVAE) that can allow us to learn information about these complex systems through manifold learning. [33] Furthermore, the structure of the latent space of VAE, tVAE, and cVAE can be explored as a function of behavior. The curves reconstructed from the latent manifold can be studied by Gaussian fitting of the parameters, peak intensity (σ), peak position (μ), and peak width (δ). These maps can provide variational trends found within the material, which can give an indication of real parameters associated within the system.

Results and Discussion
Here, we explore the use of VAE for the analysis of the PL data. The VAE principle is based on compression of the initial dataset to a small number of latent variables (encoding), with the subsequent decoding of the latent vector into reconstructed data (decoding). The training of the VAE balances the reconstruction quality (reconstruction loss), and the Kullback-Leibler distance between the latent variable distribution and chosen prior, typically Gaussian (KL loss). The detailed description of VAE and their applications for broad set of data analysis problems are described in depth. [34,35] For the forthcoming discussion, we note that the important aspect of VAEs is their capability to disentangle the www.advancedsciencenews.com www.advintellsyst.com representations of the dataset. While the rigorous definition of the disentanglement is still lacking, it practically manifests as the separation of the traits in the data, often in the form matching the human intuition and perception, along the different latent directions. For example, for the classical handwritten MNIST dataset, the two latent variables often correspond to the font width and the tilt of the handwriting. In more complex cases and for labeled datasets, the (local) relationships between latent variables and externally provided characteristics can be established. The workflow for our VAE analysis is shown in Figure 1.
For the PL data, neither human intuition nor labels are generally available, necessitating the development of the unsupervised learning strategies. Here, we develop the framework for the analysis of the MHP PL spectra using conventional VAE, as well as explore the opportunities opened by the translationally invariant tVAEs and conditional cVAEs. These latter approaches allow us to discover known factors of variation, and impose known factor of variation.
As a first step, we explore the conventional VAE approach. Here, we normalize the data of each individual dataset from [0,1] range. It is important to note that while in many ML problems absolute scale of data is irrelevant and very often analysis starts with normalization of each individual dataset to [0,1] range, this is not the case for physical data. For the PL datasets, the intensity can be normalized to the maximal value throughout the I(c, t, λ) dataset, maintaining the relative intensities across the full parameter space. In this case, the VAE analysis incorporates amplitude as one of the factors of variability. Alternatively, the data can be normalized for each I(λ), in which case each spectrum individually spans [0,1]. This normalization process separates A max (c, t) as a separate variable, and subsequently analyzes variations of the peak shape and position only. The intermediate normalization schemes are also possible, if justified from physical perspective. Figure 2 shows the simple VAE analysis of the PL spectra for the case of the two latent variables where normalization is taken for each individual dataset to a range of [0,1]. Here, Figure 1a depicts the latent distribution of the experimental data. For this, all PL spectra I(λ) within the parameter space (c, t) are encoded into a pair of latent variables (z 1 , z 2 ), and the resulting distributions are plotted in the (z 1 , z 2 ) space. Examination of the latent space distribution suggests that the experimental dataset clearly forms several 1D manifolds that jointly span the latent space.
The second key component of the VAE analysis is the latent space representation. For this, the latent space is sampled via the rectangular grid of points (z 1k , z 2l ), where k, l ¼ 1, … , N. These pairs of latent variables are decoded to yield the PL spectra I(λ), and the resulting shapes are shown in Figure 2b. Note that one of the key characteristics of the VAEs is that they are generative models, meaning that the spectrum can be generated from the point of the latent space for which no prior data are available. In this fashion, the VAEs allow one to "interpolate" between representative data points.
The latent space representation allows us to illustrate the trends within the latent space. Note that in this particular case the latent representations are clearly locally smooth, meaning that the small changes in latent variable result in the small Figure 1. The workflow for VAE. The PL spectral data I(λ) within the parameter space (c, t) are encoded into a pair of latent variables (z 1 , z 2 ). The distributions are then plotted in the (z 1 , z 2 ) space, where they are decoded and plotted as PL spectra. The changes in the latent variables will show changes in the decoded spectra. The found PL variability can then be presented through the latent variables and is plotted to illustrate the found PL trend from within the input data.
www.advancedsciencenews.com www.advintellsyst.com changes in the decoded spectra. Similarly, within each local region, the trends in the data can be clearly identified. In Figure 2b, the decoded spectra are shown in the manifold.
Looking from left to right from the upper section of the manifold, the peaks shift as we go along. Going vertically across the manifold, the peak position redshifts and the peak intensity decreases. The left bottom section of the manifold shows an abrupt development of double peaks as we go from right to left; this can be seen without the normalization of intensity shown in Figure S3, Supporting Information. Due to the normalization in Figure 2b, the peak splitting in the decoded spectra is difficult to discern. Finally, the latent variables for each point in the (c, t) space define the latent dynamics of the system. The amplitude (from normalization), z 1 , and z 2 are shown in Figure 2c-e. These show clear composition and time dependencies. The latent variable, z 1 , shown in Figure 2d describes intensity change reconstructed within the manifold. For example, when there is pure FAPbBr 3 the intensity of the system is high, and as intermixing between the two solution occurs, the intensity of the system also decreases, due to the instability of certain compositions that may lead to peak splitting (looking at Figure S3b, Supporting Information to see the formation of peak splitting). Peak splitting is most likely to occur when a large amount of I À is mixed with Br À unless an optimal tolerable amount is used. [13,[36][37][38] When MAPbI 3 is pure, the intensity is still not as comparable to FAPbBr 3 due to MAPbI 3 being more sensitive to environmental factors such as humidity and illumination. [39,40] Figure 2e represents peak shift found within the system; it indicates that as we go along the manifold the shift occurs mostly during mixing of the two solutions. VAE is not given a fixed reference point for peak position in the system, so it will make general assumptions about the peak shift present in the system.
Further insight in these behaviors can be derived via Gaussian fit, establishing the equivalence between the latent variables and physics-based descriptors such as peak position, amplitude, and width. Here, the spectrum is decoded from the selected latent point (reconstructed manifold) and fitted by the Gaussian function to yield peak intensity (σ), peak position (μ), and peak width (δ). The learned manifold is reconstructed in a 64 Â 64 grid and then fitted by the Gaussian function. In Figure 3a, the σ shows gradual changes from left to right of the map with the variable in the left midsection of the map showing a huge increase in peak intensity, which can be attributed to peak fit error shown in www.advancedsciencenews.com www.advintellsyst.com Figure 3d. Otherwise, Figure 3a peak intensity seems to increase as we go toward the right of the figure. The highest peak intensity seems to correlate to areas where FAPbBr 3 is in majority, also seen in Figure 2b,d. FAPbX 3 has been shown to be more stable in air due to the Goldschmidt tolerance factor being approx. 1. [41] The manifold in VAE was able to capture intensity variation in the system, showing that there are increasing and decreasing intensity in the manifold. Figure 3b somewhat discovers μ as a factor of variability in the system. The highest peak shift is found in the upper left size of the manifold and as we move along to the bottom right shift is found to slowly decrease. Due to VAE, not understanding shift as a factor of known variation this analysis is general, and may not be fully correlated to the trend found in our system because there is no fixed peak position. The peak width δ, Figure 3c, shows uniformity in the middle of the map with values close to zero to a half. The latent space shows uniformity throughout until looking at the outer sections of the latent space, where peak width is increased slightly, and halide segregation is more pronounced. The section where there is halide segregation shows the formation of a double peak, and these sections on the δ map show a higher width value. The Gaussian fit error is shown in Figures 3d which all may be contributed to the presence of double peaks. The VAE analysis seems to only capture peak intensity as a trait the best. We do see that the area with some formation of peak splitting does seem to have a lower peak intensity which is inline with halide segregation due to a reduction of stability.
However, the examination of the data in Figure 4 suggests that while the VAE analysis has allowed for the elucidation of the local traits within the data, the overall changes across the latent space are not obvious. This is not surprising because the latent manifold in Figure 2a is 1D, but strongly curved within the latent space. Correspondingly, the primary factor of variability along the manifold is not represented well by the variables, hindering the straightforward interpretation of their physical meaning and, equivalently, the time and composition dynamics of (z 1 , z 2 )(c, t).
To extend this analysis, we further explore two strategies for the latent analysis based on the invariant and conditional autoencoders that allow to discover known factors of variation and impose known factor of variation. First, we consider the application of the invariant autoencoders, specifically tVAEs. The general principle of tVAE is described in previous work. [42] Basically, the tVAE allows to separate the known factor of variations as the coordinate transform, namely, shift along the x-axis. Note that the key assumption of this analysis is that the shift of the PL peak is the factor of variation; however, no assumptions about the magnitude of this peak shift or its behavior as a function of composition or time are being made.
The tVAE analysis of the PL spectra is shown in Figure 4a. In this case, the PL spectra I(c, t, λ) are encoded as a triples of (Δ, z 1 , z 2 ), where Δ represents the shift variable and z 1 , z 2 are the latent variables. It is immediately clear that in this case the data form more localized distribution in the latent space, with clearly visible group and extended wing. In classical VAE literature, this behavior is often referred to as dimensionality collapse, meaning that the latent manifold does not span the full latent space. However, for physical systems this behavior can be interpreted as that the major factors of variability within the dataset have been discovered.
The latent representation in Figure 4b further elucidates this assertion. Here, the relevant part of the latent space (i.e., the one that contains data) represents single peak at a fixed position. The variation of the latent variable z 2 toward increase corresponds to the peak splitting. The variation of z 2 can be seen more clearly in Figure S4b, Supporting Information, where the peaks are not normalized. Without the normalization of peaks shown in Figure S4, Supporting Information, the peak splitting variability is seen more prominently. Finally, the variable z 1 effectively represents the variation of the intensity of the primary peak. Note that the VAE analysis of the experimental data introduces a subtle aspect in that while the input data were normalized, the decoded spectra are not necessarily normalized. Here, in Figure 4b the reconstructed peaks were normalized so that the intensity variability is better understood. The behaviors in Figure 4b emerge due to the behavior of noise-induced peaks. With this, the time and composition dependence of the latent variables can be represented as a latent variable evolution (Δ, z 1 , z 2 ) (c, t) as shown in Figure 4c-e.
Here, tVAE analysis of the spectra is shown in Figure 4b, where the latent space is represented as a single peak at a fixed position. The peaks in the middle of the manifold show widening of the peaks as we go toward the bottom of the manifold. The right-bottom side shows the development of double peaks, which is an indication of halide segregation. To the left of the manifold, the development of noise in the dataset is shown. The top of the manifold shows narrowing of the peaks. We also note that z 1 dynamics are almost irrelevant, without normalization of reconstructed PL peaks ( Figure S4, Supporting Information). Figure 4c shows the shift in the system. As our experiment mixes the two compositions incrementally, the shift in the system slowly moves from one end of the wavelength to the other. We can see that the system starts from pure FAPbBr 3 and as MAPbI 3 is mixed by 1-2% at a time the PL peak redshifts. We can also see that some compositions do not stay stable in terms of peak position, such as 30% MAPbI 3, where the composition over time moves back toward a Br À -rich area. Figure 4d shows the variation of intensity of the primary peak. The figure shows that the variation of the peak intensity is irrelevant due to variation of intensity, which is mostly shown around area of noise and peak splitting. Figure 4e represents the peak splitting found in the system. The top of the figure represents the peaks that have no splitting and show a sharp spectrum. As we go along the map, tVAE captures the development of noise and peak splitting found in the system.
Further understanding of the variability in the tVAE-decoded spectra can be better understood through Gaussian fitting of the latent space. Figure 4 illustrates the fitted peak intensity, position, and width of tVAE-decoded PL sepctra (denoted as σ t , μ t , δ t ). In Figure 5a, the σ t map is generally uniform, with most values being at 0. When the peaks are not normalized in the latent space, the peaks are almost nonexistent in the middle of the map. The intensity change is seen more on the right side of the manifold and the map with an increasing starting at the upper right corner. The lower right corner is also slightly increased but not as much. The z 1 dynamics have been proven to not provide much www.advancedsciencenews.com www.advintellsyst.com information on intensity variability in the system. This variability so far has been more discovered in VAE. While it does show that there is intensity variability, correlation to the actual system is not discovered fully. In Figure 5b, μ t shows less uniformity in the latent space, attributed to known peak shifting (i.e., bandgap tuning) feature depending on FA:MA and/or Br:I compositions in our model system. The peak splitting and noise in the manifold also contribute to this consequence. The top of the map has values at zero, and comparing to the manifold of the decoded spectra the peak position stays the same throughout as it should in tVAE. But the knowledge we can gain from the peak shift using tVAE is how there is a development of peak splitting in this system that can be recognized. The values increase as we go along the bottom of the map and the peak position slowly increases or shifts. The highest values on the left of the map correspond to the noise in our system which no longer has a specific peak position. As we move along to the bottom right of the manifold and map, the peak develops slowly into noise, to one peak and finally into two peaks. Our system is also known to show halide segregation which is represented in the left lower midsection of the map. The peak width δ t (Figure 5c) shows uniformity in the top and middle of the map with values close to zero. Again, in the manifold and in the map, we can see higher peak width values going along the bottom left to right. The bottom right corresponds to noise that may be found in the spectra which again no longer shows a define peak. As we go along toward the bottom right, a single peak is formed and eventually split into two peaks. The section where there is halide segregation shows the formation of a double peak, and these sections on the δ t map show a higher width value. An increase in width value is an indication of noise or some sort of peak splitting in the system. Generally, the width change is not a significant factor expressed in tVAE. Looking at the fit error shown in Figure 4d there is slight error in the bottom right corner, which can be attributed to peak splitting.
tVAE was able to reconstruct PL spectra showing how the system can develop peak splitting. Halide segregation is a known factor of variation that not only affects the stability of the system, but is also an effect of instability. Over time, the MHPs, when exposed to ambient conditions such as light [43][44][45] and oxygen, [46,47] can induce halide segregation. Illumination induces halide segregation by creating a Br-rich area in the illuminated region by repelling iodide ions. Oxygen is known as an additional degradation mechanism when combined with illumination of MHPs due to oxygen ions reacting with the organic cations in the hybrid perovskites which results in degradation. Incorporation of intolerable amounts of one halide to another can also induce halide segregation due to changes in crystal structure [48,49] and perovskite stoichiometry. [44] Finally, we illustrate the applicability of the conditional variational autoencoder (cVAE) for the PL data. In cVAE, the encoding and decoding are conditioned on the known variable, allowing one to impose the known physical information. Here, as an obvious conditional variable we choose the expected bandgap of the material, assumed here to follow linear trend between the endmembers. The cVAE tracks the peak shift in the system as the change in endmembers is methodically varied. In Figure 6a, the latent space stores the specific input of the c-vector (in this case shift). The change in bandgap is clearly grouped according to shift in the system seen in the manifold. Figure 6b shows how the shift is tracked according to the input data. Sampling different parts of the latent variables can provide information about the overall system. The trend seen in the region can be understood through the two latent variables z 1 and z 2 shown in Figure 6c-e. Figure 6c displays the input c-vector data that clearly shift by concentration. The known bandgaps are 0 and as mixing occurs between the two solutions, we can see shift occurring in the bandgap. Here, tracking z 1 is shown Figure 6a. With consideration of the shift represented in www.advancedsciencenews.com www.advintellsyst.com the latent space and combining this knowledge with the latent representation in Figure 5d, it can be understood how the bandgap of the material changes the most within the system. Each composition is grouped into an area of the manifold that entails the details of the system's shift in wavelength. In our system, it can be understood that mixing one endmember into the other at a certain point causes increasing amount of bandgap change until reaching the known bandgap of the endmembers. Figure 6e represents the latent variable z 2 which looking at the manifold can give information of the kinetics of the input. Each composition is represented in different areas of the manifold which corresponds to the c-vector which is the expected bandgap of the material found in the shift of the PL. We can understand which compositions are represented in which areas of the manifold and the change in the specific composition can be looked in accordance with the composition being in the negative or positive area of the manifold. In the manifold looking at z 2 the manifold starts at no shift and as moving along the manifold on the z 2 axis it shows the kinetics of the composition as change in bandgap. One of the major differences so far seen in VAE, tVAE, and cVAE (in terms of concentration) is that VAE was able to give us a general change in the latent space. VAE was able to show the variability of intensity found in the system well. tVAE, although starting with the assumption that shift of the PL peak is a factor of variation, was able to recognize peak splitting as a factor of variation in our system. In this system, peak splitting is an indication of halide segregation due to halide demixing that can occur over time. cVAE introduces a known factor of variability that imposed our known physical information. By inputting concentration, the endmembers were easily separated and denoted as 0 in terms of shift. We know that as we start off with a pure system, the position of the PL is at a known fixed wavelength. Yet as we mix both endmembers systematically we extract peak shift as a behavior of variability. We can then make a comparison of expected bandgap with the position of the peak maximum found in the dataset to see if we can extract more local traits in the system.
In Figure 7a, the latent space stores the specific input of the cvector (in this case, position of peak maximum). The change in bandgap is clearly grouped according to the position of peak maximum in the system seen in the manifold. cVAE uses the known input for training but will also use the overall system to pull out data points as samples to output a prediction. cVAE with the input of concentration, shown in Figure 7a, groups both endmembers together as no variation, but here with position of peak maximum, the endmembers are separated and the factor of variation relies solely on the maximum peak position of each PL read. Figure 7b shows how the position of the peak maximum www.advancedsciencenews.com www.advintellsyst.com is tracked in accordance with the input data. The position of maximum peak of the input data over the wells and time is shown in Figure 7c. Here you can clearly see how the peak position change is grouped in the manifold, starting from one endmember intermixing to the other. The mixing of the endmembers is very clear and systematic with a few outliers shown with a high peak position above 780 nm. This figure shows how nicely the mixing of compositions works in our system and how over time the wells with pure FAPbBr 3 to about 90% FAPbBr 3 tend to stay or shift on the lower end of the spectrum. The trend seen in the manifold can be understood through the two latent variables z 1 and z 2 shown in Figure 7d,e. Here, tracking z 1 is shown in Figure 7a. With consideration of the position of the maximum peaks represented in the latent space and combining this knowledge with the latent representation in Figure 7d, we can see the defined groups of the endmembers and mixed solutions. Here, not all the groups of the peak position formed in the manifold are interpreted. Figure 7e represents the progression of peak position over time. cVAE will be able to reconstruct and give insight better on the known input. Each position found is at peak max, and we can tell that the intensity of the middle of the manifold is low but as we go along to the outer sides of the manifold intensity increases. The right side of the manifold shows peak positions that are similar to one another with the least amount of shift, and as we go along to the left side there is more variability of shift reconstructed.

Conclusion
In summary, here we explore the applicability of the variational autoencoders for the analysis of the time evolution of the PL spectra in the MHP microcrystals. VAE provided us with the general traits that could be found in our system. Mainly the peak intensity is captured best. We further develop the workflow based on invariant and conditional autoencoders. Here, the former discovers known factors of variation, for example, the shift of the peak due to the bandgap variation. Notably, while the factor of variation is shift, the actual value of the shift is discovered during the analysis, while the remaining factors of variation are encoded in the latent variables. tVAE makes no assumption of the magnitude of the peak shift or behavior it can have on the system. Shift is treated as a known factor and is fixed during tVAE analysis. When the peak position is fixed, it allows other types of variability in the system to be discovered. In this case, it is peak splitting/halide segregation. Conversely, conditional VAEs www.advancedsciencenews.com www.advintellsyst.com impose known factor of variation, in this case anticipated bandgap. Two different conditions were given, concentration and position of peak maximum. The first condition provided us with information on how shift in the system relies on the mixing of the endmembers. Majority of shift was found in wells where the highest amount of solution mixing was a major factor of variability. Looking at maximum peak position over time, the information provided is somewhat similar when using concentration as the input. Both conditions provide similar information. Together, VAE, tVAE, and cVAE allow to understand the underlying factors of variation in this system and can be beneficial in discovering unknown factors of variation in other complex materials systems.

Supporting Information
Supporting Information is available from the Wiley Online Library or from the author.