A Deep Learning Approach to Powder X-Ray Diffraction Pattern Analysis: Addressing Generalizability and Perturbation Issues Simultaneously

for the conventional XRD analysis, such as indexing, space group determination, LeBail and Rietveld re ﬁ nement. We removed all of those low-quality XRD patterns by applying a reasonable cutoff criterion. The standard deviation of the XRD signals in the 2 θ range between 5° and 8° was calculated for all the RRUFF entries and ranked according to the calculated standard deviation. If a peak was located in this range, we moved in a slightly higher angle direction. The cutoff criterion was that the top 40% of entries with high standard deviations were all removed.

[16][17][18][19][20][21][22][23][24][25][26][27][28][29] The perturbation that originates from several well-known artifacts, such as strain-induced peak jittering (or shift), particle size-induced peak broadening, preferred orientation-induced peak intensity variation (texture), sample-or apparatus-induced background and noise, etc., matters when constructing synthetic XRD patterns for use in DL model training.There have been several methodologies to take this sort of perturbation into consideration for XRD analysis.Lee et al. [24] simply incorporated peak shift and broadening by randomly varying the peak parameters, such as Caglioti and mixing parameters in a narrow range.Oviedo et al. [21] considered typical perturbations that can usually be observed in thin film samples, such as texture and epitaxial strain.Maffettone et al. [18] introduced an ensemble concept using 50 convolutional neural networks (CNN)-based classifiers and thereby efficiently treated the perturbations without combinatorial explosion.Wang et al. [19] extracted the perturbation from experimental data and thereafter merged it with the theoretical spectra to synthesize new spectra to use for training.Szymanski et al. [23] introduced physics-informed data augmentation, wherein all the perturbations were incorporated into synthetic XRD patterns by considering the domain size and lattice parameters controlled by reasonable solid solutions and the random preferred orientation.Elimination of the identified phase from the blended XRD also played a helpful role in determining all the constituent phases in Szymanski et al.'s [16] approach.
Despite such a brilliant treatment to address the perturbation issue, all of these approaches were limited to a narrow material system and thus, had no extensibility.For instance, Lee et al.'s [17] approach was confined to the Li-Al-Si-O-N composition system, Oviedo et al.'s [21] approach to thin-film metal halides with only seven space-group categories, Szymanski et al.'s [23] approach to the phase identification in the Li-Mn-Ti-O-F composition space, and Maffettone et al.'s [18] approach to three different narrow-ranged materials systems, such as the phase transition identification of BaTiO 3 , crystal structure prediction for ADTA, and phase mapping of the ternary Ni-Co-Al alloy system.Instead of exquisitely working for a certain narrow material system, a DL model that can work for all inorganic compounds should be prioritized.Only Vecsei et al., [22] Suzuki et al., [23] and Park et al. [24] dealt with a whole range of inorganic materials (from the ICSD) [44] when training and testing their DL models for symmetry classification, but the perturbation was not rigorously considered in these generalizable approaches.Thus, it is noted that the previous all-ICSD-entry-involved approach secured generalizability but not the perturbation issue.If we were to satisfactorily consider the perturbation for all ICSD entries, a combinatorial explosion issue would arise, and a typical lab-scale computation facility would not be able to keep up with the required computation scale.To overcome this, we leveraged two cloud infrastructures, the Oracle cloud infrastructure (OCI) [45] and the KISTI intelligent cloud platform (KICP), [46] for the DL-driven XRD analysis, which allowed us to introduce every possible perturbation into all ICSD entries.
Fully convoluted networks (FCNs) [47] with shallow (small) and deep (large) architectures are adopted for symmetry classification and trained on the augmented (perturbed) dataset using the OCI and KICP.[50] As an alternative approach to clarifying the perturbation problem, we also use a generative adversarial network (GAN) [51] that can covert perturbed XRD patterns to clean, standard (unaugmented) XRD patterns to be used for testing a small FCN model that is only trained on the standard XRD data.Specifically, the GAN was operated independently and exclusively used for generating clean data, which was then provided as input for the FCNs.An image-to-image translation model with conditional GAN, the Pix2Pix model, [52] a versatile style converter that has been utilized in many other disciplines, is employed as a GAN-based XRD pattern style converter.
Despite the DL models' promising performance, we also examine the limits of DL-driven XRD analyses by implementing a more systematic failure analysis that goes beyond the typical confusion matrix analysis.We conduct a systematic analysis of one-to-one comparisons between misclassified and ground-truth structures.Some of the misclassified and ground-truth pairs are indistinguishable and have corresponding indistinguishable XRD profiles.As a result, we raise the issue that some misclassifications do not come from DL but from the intrinsic 1D XRD indistinguishability that originates from the dogmatic XRD pattern refinement process.
We aim to set up a reliable DL-driven symmetry classification model by simultaneously addressing both the generalizability and perturbation issues in a systematic manner.Our approach is not confined to a narrow-ranged material system; it is generalized for nearly all the ICSD entries.We augment the training data by incorporating all possible real-world perturbations into the XRD pattern for ICSD entries.Our ultimate target is to make the augmented-dataset-trained DL model work properly for any type of XRD pattern, including the experimental XRD pattern.In contrast to the typical DL-based XRD analysis routine for phase identification and symmetry classification for a specific material system with a narrow composition range, which exhibits high test accuracies, we introduce a DL model that works for perturbation-involved, real-world XRD patterns spanning almost all ICSD entries.

FCN Training and Testing Results
[16][17][18][19][20][21][22][23][24][25][26][27][28][29] The FCL in the CNN has a disadvantage in that image (or peak) location information is extinguished, that is, the receptive field concept disappears after the FCL. [47]The FCN is constituted by eliminating the FCLs from the conventional CNN.However, Vecsei et al. [22] and Tatlier et al. [53] argued that a simple multilayer perceptron (MLP) could work better than CNNs.Regardless of performance, FCN has a notable advantage over other CNN and MLP models.If the same level of performance can be achieved, the FCN is more efficient because it has fewer parameters than the CNN and MLP models.For instance, the number of parameters for Park et al.'s [24] CNN models is %7, 100, and 200 M for crystal system (CS), extinction group (EG), and space group (SG) prediction, respectively.However, our FCN model has only 1.8 M parameters and achieved similar or higher prediction accuracies for symmetry identification.Smaller models are more efficient in terms of computational cost savings and should be preferred when a similar level of performance can be guaranteed with the same training data size.Therefore, our FCN models are noteworthy because of their performance and simplicity.We also tried other DL algorithms, such as transformers, for the classification task.Although Lee et al. [26] claimed that the transformer encoder can be used for XRD classification, it performed worse than the FCN.We have conducted self-supervised learning for transformers using both the blanked XRD and bisected XRD patterns.Subsequent to the self-supervised learning phase, the ensuing training for classification tasks exhibited a somewhat accelerated pace compared to the direct transformer training devoid of any self-supervised learning experience, but we observed no improvement in test accuracies.This disappointing outcome is due to the fact that the training dataset was identical for both the selfsupervised learning and the ensuing classification.As corroborated by the literature, [48][49][50] self-supervised learning typically necessitates a substantially larger volume of training data.Consequently, %190 k XRD patterns at our disposal are likely insufficient for this purpose.Nonetheless, as previously alluded to, we were able to moderately accelerate the subsequent training processes.It is worth noting that the self-supervised learning typically required for transformer training was not possible with both Lee et al.'s [26] and our datasets, so our attempts with the transformer also failed.Therefore, we omitted the results here for brevity.
Figure 1 shows the small and large FCN models used for symmetry identification and schematically describes various sets of training and testing datasets for use in the small and large FCN models, leading to various test accuracies shown in Table 1 and 2. We used mesh enumeration for hyperparameter optimization, the details of which can be found in Table S1, Supporting Information.When trained on 187 131 standard XRD patterns and tested with 10 000 hold-out patterns, the FCN achieved test accuracies of 93.06%, 87.64%, and 84.52% for CS, EG, and SG, respectively.These accuracies, limited to standard XRD patterns with synchrotron light source quality, represent the upper limits.Subsequent training with perturbed (augmented) datasets closely reached these figures and never exceeded them.In addition to the standard XRD dataset, we also generated two additional perturbed datasets: one dataset was texture-free and the other was texture-included.As mentioned in the data preparation subsection, texture is important for thin-film materials, but well-ground powder form materials rarely exhibit texture.However, we included texture-induced peak intensity variation in the data augmentation process, even though it does not match real-world powder conditions.This doubled the size of the perturbed dataset.The symmetry classification performance remained unchanged regardless of whether texture was included.The test accuracy for the 1.9 M hold-out texture-free XRD patterns was 92.25%, 87.34%, and 84.39% for CS, EG, and SG, respectively, while it was 92.09%, 87.52%, and 84.42% for the 3.8 M hold-out texture-included XRD patterns.It is worth mentioning that the FCN trained on the perturbed dataset performed nearly as well as the one trained on the standard dataset, indicating the effectiveness of our perturbed XRD data generation process.
It is important to note that the above test accuracies for the perturbed dataset are different from those reported in other studies, which are based on the conventional random training and testing dataset splitting scheme.In this scheme, perturbed XRD patterns for the same compound can be split into training and test datasets.Using this scheme, the accuracy was substantially improved to 98.95%, 97.18%, and 96.03%.These promising accuracy values stand for the FCN model capability to recognize a compound of concern despite a certain perturbation by observing many other XRD patterns of the same compound with different perturbations.However, we used a more reasonable splitting scheme called individual compound-based splitting, which involves testing the FCN model on XRD patterns for compounds that have never been used for training, which denotes the FCN model capability that identifies an unseen compound regardless of the types of perturbations involved.
In line with the larger texture-included dataset, we also increased the size of the FCN model, which we refer to as the "large FCN model."This model achieved slightly improved accuracies of 92.10%, 88.25%, and 84.85%.In particular, a conspicuous improvement in EG and SG classifications is noteworthy.This is in accordance with the general principle in the AI community that larger and more complex problems, with a larger number of output features (101 for EG and 230 for SG), require larger models with more parameters and larger datasets.The large FCN model has 26 M parameters and is schematically described in detail in Figure S1, Supporting Information.
In the AI community, it is common to use multitop accuracy as a test performance metric for many-class classification problems, rather than the one-top accuracy that we adopted in this case.The EG and SG classifications dealing with 101 and 230 classes can be evaluated by the multitop accuracy, whereas the one-top accuracy is appropriate for the CS classification dealing with only 7 classes.Even state-of-the-art computational indexing tools, such as ITO, [54] TREOR, [55] DICVOL, [56] McMaille, [57] EXPO, [58] and X-CELL, [59] often suggest multiple equally applicable candidates, and the final determination is left to the crystallographer.We tested all the FCN models using threetop and five-top accuracies and obtained nearly 100% test accuracy, but we did not include these results because we only considered the one-top accuracy for the sake of a fair analysis.
We also performed experimental data testing on both small and large FCNs trained with standard, texture-free, and texture-involved perturbed XRD datasets.We prepared two experimental XRD datasets for testing: one from the direct XRD measurement of commercially available raw powders (Ex_dataset_1) and the other from the RRUFF database (Ex_dataset_2). [60]The test accuracy on experimental data was slightly lower than that on synthetic data but still acceptable for FCNs trained with perturbed XRD datasets, as shown in Table 2.The test accuracy on Ex_dataset_1 was 90.38% (for CS) for the small FCN trained with the texture-free perturbed XRD dataset.This is a substantial achievement, as it is the first time such promising performance has been obtained on a general experimental dataset, rather than a limited achievement on a specific, narrowly defined material system.Interestingly, the Ex_dataset_1 test failed for all texture-involved FCNs.This is because Ex_dataset_1 is composed of well-crystallized commercially available powders without texture, so training with textureinvolved data led to disappointing results on Ex_dataset_1.Of all the test datasets, Ex_dataset_1 is closest to the typical situation where powder samples are of primary concern.It is worth noting that our DL-driven approach is targeted at inorganic powder materials.In fact, the many Cagrioti and mixing parameter values used in the perturbation of the training dataset were intentionally collected from the vicinity of those for well-crystallized inorganic powders.This approach was taken to avoid incorporating low-quality thin-film XRD patterns.Figure S2, Supporting Information, shows the XRD pattern for all Ex_dataset_1 entries.The promising test accuracy may be due to the high quality of the XRD patterns in Ex_dataset_1, which exclude low-quality samples, such as thin films, nanopowders, glassy powders, and powders with extremely small volumes of crystallites.
We tested the large FCN model using Ex_dataset_2, and the test accuracies were evaluated to be 74% for CS classification and 58% for SG classification.These results are higher than the previous results (70% and 54%) [22] obtained from a smaller RRUFF dataset consisting of only 800 XRD patterns.The Ex_dataset_2 was also used for testing other small FCN models trained with texture-free and texture-involved training datasets, and the results are given in Table 2.Although noteworthy experimental data-driven test accuracies for synthetic-data-trained DL models have been previously reported, [17][18][19][20][21] it should be noted that all of them are obtained from a narrow material system in a restricted composition range, so it is not appropriate to directly compare them with our Ex_dataset_2-based test accuracy.The Ex_dataset_2 dataset includes a larger number of XRD patterns (for %1,600 materials) with greater generalizability than any previous experimental dataset since the RRUFF database includes a variety of inorganic materials in a wide range of compositions.The Ex_dataset_2 contains many low-quality XRD patterns due to the loose supervision and acceptance criteria of the RRUFF database.Even after applying our reasonable cutoff policy, Ex_dataset_2 still contains many poor-quality entries compared to Ex_dataset_1 quality standards.In particular, the background noise stands out in these poor-quality patterns, as shown in Figure S3, Supporting Information.The test accuracies of 74% and 58% for CS and SG classification are reasonable.The entire Ex_dataset_2 data are available on our GitHub site [61] so future competitions can improve the test accuracy by either improving the DL model or the data augmentation process.

XRD Data Conversion for the FCN Test
The test accuracy was very low when the standard XRD datatrained FCN model was tested on the perturbed XRD dataset, as shown in Table 1.To make the model more versatile and able to handle different types of XRD data, we transformed the perturbed XRD data into standard XRD data using a meticulous process.This conversion allowed us to test the standard XRD data-trained FCN model on the transformed data.Our data conversion strategy can be seen as an alternative approach to directly training the model with huge augmented datasets.We used a pristine U-net [62][63][64] as the first algorithm for data transformation, which is a commonly used DL model for semantic segmentation that consists of a contracting path (encoder) and an expansive path (decoder), as shown in Figure 2. The overall architecture of the encoder is similar to the FCN used for symmetry classification, and the decoder consists of upsampling and concatenation followed by regular convolution operations.We trained the U-net using pairs of perturbed and standard XRD patterns and used the fully trained U-net to transform the perturbed version of the hold-out test dataset into the standard style.However, the U-net-converted XRD pattern did not resemble the standard XRD pattern and still retained the original characteristics of the perturbed XRD pattern, as shown in Figure 3.As a result, Table 1.The test accuracy for CS, EG, and SG classifications for various models trained and tested with various synthetic datasets.All the accuracies come from the hold-out test dataset prepared by the compound-based splitting scheme.The small and large Pix2Pix models represent Pix2Pix models involving the large and small FCNs inside as a generator, respectively.Some extremely low accuracies appearing in the first column indicate malfunctioning under that condition.The accuracy data in parentheses represent those obtained from conventional random training and testing data splitting.1, both the bare perturbed and U-net-converted datasets demonstrated very low accuracies when testing the standard XRD data-trained FCN model.Our initial attempts to use a pristine U-net for data conversion failed.It became clear that the U-net was not capable of generating XRD data that were as close to the standard data as possible.To achieve applicable data conversion, we used a GAN with the same U-net as a generator.We used a Pix2Pix model, [51] which is an image-to-image (spectrum-to-spectrum) translation model with a conditional GAN, to realize more promising data conversion.The Pix2Pix model consists of an FCN-based model as a discriminator and U-net as a generator.The small and large Pix2Pix models, which consist of small and large FCNs, as well as small and large U-nets, are schematically illustrated in Figure 2. The small and large models are named according to their architecture and the size of the datasets they were trained on, for example, with the small model trained on texture-free data and the large model trained on texture-involved data.
The Pix2Pix-converted XRD patterns matched the quality of standard XRD data well, resulting in improved symmetry classification test results for the FCN model trained on standard XRD data.The hold-out test dataset, prepared using an individualcompound-based splitting scheme, was input to the fully trained Pix2Pix model, which converted it into a test dataset for the standard XRD data-trained FCN model.This means that none of the compounds in the Pix2Pix-converted test dataset had been used in the training of the Pix2Pix and FCN models.The test accuracy of the Pix2Pix-converted XRD data for the FCN model trained on standard XRD data was 87.5%, 82.61%, and 79.88% for CS, EG, and SG classifications, respectively.These results are slightly lower than those obtained by directly training and testing the FCN model on perturbed XRD data, but they are still acceptable and much more improved from the pristine U-net converted case.This GAN-based data conversion approach can expand the use of small DL models trained on a limited amount of single-style data to testing on other types of data.
The focus of this work is on whether the Pix2Pix-driven conversion can be applied to real-world experimental XRD data.As shown in the bottom of the first column in Table 2, the test accuracy was 61.3%, 51.52%, and 46.87% for CS, EG, and SG classifications, respectively, when testing on the Pix2Pixconverted dataset from Ex_dataset_2.These accuracies are lower than the results (74.34%, 60.74%, and 58.82%) obtained from the large FCN model when it was directly trained on the texture-involved perturbed dataset and then directly tested on Ex_dataset_2.As shown in Table 2, even more deteriorated test accuracies were obtained when testing on the Pix2Pix-converted dataset from Ex_dataset_1, in contrast to the acceptable results of 90.38%, 71.15%, and 59.62% obtained from the small FCN model when it was directly trained on the texture-free perturbed dataset and then directly tested on Ex_dataset_2.The deterioration in the performance of the Pix2Pix-converted data is consistent across both synthetic and experimental data conversion cases.This suggests that the XRD data style conversion strategy still requires considerable improvement.

Discussion
Using the synthetic XRD pattern data, which can be generated in large quantities, to train DL models is more cost-effective than using rare and expensive real-world XRD data.Careful consideration of perturbations, such as peak broadening, jittering, intensity variation, and noisy background, allows for the creation of synthetic XRD patterns that are indistinguishable from experimental patterns.There have been several pioneering efforts [16][17][18][19][20][21][22] to incorporate synthetic, perturbed (augmented) XRD data into DL-driven approaches.In addition, alternative attempts have also been made to directly incorporate perturbations into a DL model rather than into data augmentation.For example, Chen et al. [15] introduced a deep reasoning network (DRN) that combines DL with constraint reasoning to incorporate prior scientific knowledge; therefore, only a small amount of unlabeled data was sufficient to address the perturbation.As a result, DRN training does not require a large number of synthetic XRD patterns with perturbations, as the perturbations are incorporated in the latent embedding space of the DRN.However, all these previous attempts to incorporate perturbations into DL-driven XRD approaches have only been shown to be practical for specific, narrow material systems with small dataset sizes, such as certain ternary or quaternary inorganic material systems.These approaches [15][16][17][18][19][20][21][22] cannot be easily generalized to other material systems and are not extendable to all ICSD entries.
The size of the dataset needed to solve a problem is closely related to the complexity of the problem and the size of the model being used.The model size is measured by the dimension of the input and output features.The model size is measured by the number of parameters in the DL model.In general, higher complexity requires more data, which in turn leads to larger DL models. [65]There is a common conception that the use of large amounts of data is a disadvantage according to the parsimony principle, [66] and this conception is dogmatically pursued, particularly in the materials research community.However, we believe that the availability of a large (deeper) model along with large datasets is not something to be feared but rather something to embrace to improve the performance of the DL models.The real problem is the lack of useful real-world data in the materials research community.As a result, most of the recent DL-based XRD studies have focused on synthetic XRD data.If synthetic data can be generated, there is no reason to avoid using large datasets, heavy models, and complex problem settings.With the recent advances in computational hardware, including cloud computing, the computational scale required for the present XRD analysis is no longer a concern.We had sufficient computational capacity for the present XRD analysis, as we were able to fully utilize both OCI and KICP.Nonetheless, we do not underestimate the value of expert-intervened, downsized (parsimonious) models that work with limited data based on the lab-scale computation infrastructure, but we believe that both approaches should be pursued simultaneously.
We achieved a large-scale augmentation of XRD data for all ICSD entries using various perturbed artifacts, such as peak broadening, jittering, intensity variation, and noisy background addition.A fully convolutional network (FCN) trained on these perturbed data for symmetry identification achieved state-of-the-art (SOTA) test accuracies.The way in which the training and testing datasets are split is important when working with the complete perturbed XRD dataset.Using a completely random split resulted in abnormally high test accuracies for all symmetry classifications (98.95%, 97.18%, and 96.03% for CS, EG, and SG, respectively).We particularly focused on the 96.03% accuracy for the identification of 230 space groups.It is unusual to achieve such high accuracy for a classification task with so many classes, which suggests the possibility of information leakage. [67]To avoid information leakage, the dataset must be properly split into training and testing datasets.Instead of using a completely random split, we adopted an individual compound-based splitting scheme for the perturbed (augmented) dataset.This means that XRD patterns generated from the same ICSD entry were never divided into training and test datasets, even if distinct perturbations were applied.This ensured that none of the perturbed XRD patterns of ICSD entries in the test dataset was seen by the DL models during training, despite being severely deformed by various perturbations.In contrast, most previous DL approaches using perturbed XRD patterns employed the conventional random data splitting scheme, where the same compound with different perturbations can belong to both the training and testing datasets.While this type of approach has resulted in excellent test accuracy, it also leads to a serious issue of information leakage.In our study, we avoided this issue using the individualcompound-based split.Thus, the more righteous splitting scheme resulted in more reasonable test accuracies of 92.09%, 87.52%, and 84.42% for CS, EG, and SG classifications, respectively, which are still SOTA records.
In addition to training a FCN on the entire perturbed synthetic XRD dataset, we also trained a Pix2Pix model on the perturbed standard XRD pair dataset.The standard (unperturbed) XRD pattern in this case refers to an XRD pattern with synchrotron light source quality rather than an ideal stick-type pattern.Using the fully trained Pix2Pix model, a hold-out test dataset that was split from the perturbed dataset was converted to the standard data style.This Pix2Pix-converted test dataset resembles the standard dataset and was used to test the FCN trained on the standard XRD dataset only.This approach resulted in a considerable improvement in test accuracy compared to testing the FCN on the pristine U-net-converted dataset.The conversion using the pristine U-net (without a GAN) had poor accuracy compared to the Pix2Pix conversion.The use of a discriminator along with the U-net (generator) in the GAN algorithm likely played a key role in improving the quality of the style-converted XRD patterns, suggesting that the Pix2Pix model was able to produce a muchimproved transformation from perturbed to standard XRD patterns.Although we used the same U-net as a generator in the Pix2Pix model to transform the perturbed XRD patterns to standard patterns, the test performance for FCN-based classifiers was greatly improved, as evidenced in Table 1.The conversion was imperfect without the use of a discriminator, and the discriminator played a crucial role in the conversion process.In addition, the promising Pix2Pix performance suggested that a cycle GAN could also potentially be effective for XRD pattern conversion. [68]he cycle GAN has the advantage of not requiring a paired training dataset. [68]However, preparing a perturbed-standard XRD pair dataset was not an important issue in the synthetic XRD pattern preparation process.Therefore, we did not introduce the cycle GAN for the perturbed-to-standard XRD conversion.
There are two key challenges we face when using deep learning (DL) for XRD analysis: expanding the generalizability of DL models to all inorganic materials and allowing for perturbations that make the synthetic XRD patterns indistinguishable from real ones.The generalizability of a DL model refers to its ability to be applied to a wide range of materials.Only a few previous attempts, such as those by Vecsei et al., [22] Suzuki et al., [23] and Park et al., [24] developed DL approaches that work for all inorganic materials.However, these approaches have not adequately addressed the perturbation issue, ignoring peak shift, broadening, intensity variation, and noisy background.While previous simulations have successfully created perturbed XRD patterns that are indistinguishable from real-world patterns, these achievements are limited to specific material systems with narrow compositions, such as ternary or quaternary inorganic compound systems.These systems allow for sufficient data augmentation, as the number of base materials is typically only dozens or at most approximately a hundred.Perturbations on these limited starting materials result in a dataset of a reasonable size.For example, while Lee et al. [17] achieved almost 100% accuracy for their phase identification tasks using perturbed XRD data, this was limited to the Li-Al-Si-O composition system, which is composed of only 38 base materials.Szymanski et al. [16] also achieved a high test accuracy for phase identification in the Li-Mn-Ti-O-F space, but this was based on 140 base materials and 150 perturbed XRD patterns generated for each base material, resulting in only 21 000 generated XRD patterns.Applying this augmentation strategy to our material systems (197 131 ICSD entries) would not be feasible.If 150 perturbations were made for every single ICSD entry, the total number of XRD patterns would reach 29 569 650.This kind of combinatorial explosion hinders the DL approach.Therefore, it is difficult to address both the generalizability and perturbation issues simultaneously, as they are somewhat conflicting.
We developed a data augmentation strategy that applied various perturbations, including peak shift, broadening, intensity variation, and noisy background, to almost all entries in the ICSD.Twenty perturbed XRD patterns were made for every single ICSD entry rather than the inordinate 150 patterns because more than 20 patterns were impractical.To our knowledge, no previous attempts have been made to perform this sort of largescale augmentation on the entire ICSD.However, Chen et al. [15] introduced a perturbation-related layer (variance/shifting embeddings) in the DL model and thereby systematically addressed perturbation issues without increasing the size of the training dataset, eventually avoiding the combinatorial explosion problem.However, by referring to the recent advances in transformer-based massive models for natural language processing and in high-performance hardware, such as GPUs and TPUs, we were able to implement DL-driven XRD analysis without the need to minimize either the size of the training data or the model.The two aforementioned approaches to DL-based XRD analysis each have their merits: one approach incorporates prior knowledge within the DL model, utilizing a limited amount of training data, while the other approach operates entirely without human intervention and relies on an extensive volume of training data.In the current study, our focus is on the latter approach.However, we believe that it is important to pursue both approaches simultaneously, rather than emphasizing the strengths of one and indicating the weaknesses of the other.
We utilized 40 GPUs (A100 and A10) from the OCI and KICP and believe that we had secured sufficient computational capacity, making it unnecessary to minimize both the sizes of the training data and model.The potential for combinatorial explosion should not be a concern at this stage of development.Of course, it is generally desirable to create parsimonious models by reducing the size of the model in the condition of a fixed data size.However, the size of the data should be determined based on the size of the problem (the number of input/output features) and the size of the model (the number of parameters) rather than being subjectively adjusted.
To investigate the misclassified entries in the CS classification for the standard XRD dataset-trained FCN, we constructed a confusion matrix.As shown in Figure 4, most of the misclassified entries were concentrated in the low symmetry classes, such as triclinic, monoclinic, and orthorhombic, which we now refer to as the "Seattle zone."We can notice two findings that have already been reported by Szymanski et al.: [23] first, as mentioned earlier, most of the misclassified entries are in the Seattle zone, and second, the distance between the DL model's misprediction and the ground truth is very close, indicating that most of the misclassified entries are still in the Seattle zone.Based on these findings, we examined the misclassified entries in more detail and provided a more convincing explanation for the failure, highlighting the capabilities and limitations of DL-based XRD analysis.
Figure 5 presents eight pairs of XRD patterns: one pattern is mispredicted by the FCN and one pattern is the ground truth.The FCN-misclassified structures are hypothetical and do not actually exist.Each pair consists of two structures that are very similar, but the strict symmetry principle categorizes them as low-symmetry and high-symmetry counterparts.It is likely that a small degree of lattice distortion in the high-symmetry (FCN-misclassified) structure resulted in the low-symmetry (ground-truth) structure.This distortion can alter the crystal system and space group, although the difference between these two similar structures is difficult to observe with the naked eye.It is common for the periodicity to change and for the Bravais lattice to be reset if a unit cell is subjected to a certain degree of distortion.However, we assumed that the periodicity would not be affected by such a small distortion and ignored the reset of the lattice periodicity.As shown in Figure 5, the overall appearance of the structures is indistinguishable between the pristine and slightly deformed structures, and the same is true for their corresponding XRD patterns.It can be assumed that all the ground-truth structures have been slightly deformed from their hypothetical high-symmetry counterparts.This small distortion is represented by a negligible change in the lattice angle in Figure 5; for example, some lattice angles for the ground-truth structures are 90.01°,90.03°, and 89.93°, which are assumed to be slightly distorted from the right angle (90°) in the hypothetical original structure.These unusual angles that appear in some ICSD entries may be the result of the rigid regression fitting procedures used during Rietveld refinement, and they do not consider the structure from a practical perspective.
It is difficult for even trained crystallographers to properly analyze and classify structures with small distortions in their XRD patterns, as these distortions are difficult to detect with the naked eye.The FCN model, like a human expert, may also make mistakes in this situation.The use of state-of-the-art computational indexing tools, such as ITO, [54] TREOR, [55] DICVOL, [56] McMaille, [57] EXPO, [58] and X-CELL, [59] can be time-consuming when attempting to properly index these confusing structures.Even a negligible change in the lattice parameter can considerably alter the symmetry, resulting in a novel structure that is distinct from the original in the symmetry space but appears quite similar.The FCN model often struggled to distinguish between such a slightly deformed structure and its highersymmetry counterpart, eventually resulting in misclassification.Approximately 20% of the total misclassifications in our FCN testing were of this type.It may not be possible for improved DL models to effectively address this issue, as it is not a problem with the DL model itself but rather a problem with the intrinsic ambiguity in the data.
Instead of strictly adhering to the symmetry principle, the FCN model operates on the XRD data as it looks as a visual pattern.It does not seem to learn the symmetry principle from the pattern but rather just learns the appearance of the peak distribution.As a result, the crystal structure abstracted from the XRD pattern may not necessarily be classified based on the theoretical crystallography principle.The concept of "apparent structure groups," [69] which classify structures based on their apparent shape rather than strict adherence to the symmetry principle, suggests that this alternative method of classification may be more practical for describing certain material properties, such as ionic transport in cathode, anode, and solid electrolyte materials for batteries, where dynamic perturbations in association with ionic diffusion can disrupt consistent symmetry.Furthermore, when considering ab initio calculation-based structure relaxation, the apparent structure is also important, as the strict symmetry principle may not be strictly preserved during the relaxation process.If an alternative method of XRD pattern classification based on the apparent structure concept was adopted, a substantial portion of the FCN's misclassifications can be correctly classified into the appropriate apparent structure group, although the classification of all ICSD entries based on apparent structure groups is not yet complete.However, the apparent structure group clustering is now ongoing by using DL-based clustering techniques, and the result will soon be reported by the authors.

Conclusion
Two different strategies are introduced to implement DL-driven symmetry classification.The first approach involved training and testing FCN models using large augmented XRD datasets and achieving state-of-the-art test accuracies of 98.95%, 97.18%, and 96.03% for CS, EG, and SG classification, respectively, using the conventional random training and testing data splitting method.However, more reasonable results of 93.07%, 87.64%, and 84.52% were obtained using an individual compound-based splitting scheme.The second approach involved using Pix2Pix to transform perturbed XRD patterns into a standard style; then, we used the Pix2Pix-transformed data when testing the FCN model trained on standard data.This resulted in slightly lower test accuracies of 87.50%, 82.61%, and 79.88% because the use of Pix2Pix for this purpose is still in the early stages of development.The challenges of generalizability and perturbation in DL-driven XRD analysis were systematically addressed.Five synthetic and experimental datasets were created for training and testing DL models, including small and large FCNs for symmetry classification and small and large Pix2Pix models for XRD pattern style conversion.These datasets were designed to ensure generalizability by including a wide range of inorganic compounds from the ICSD and various perturbations, such as peak shifts, broadening, texture, and noisy backgrounds, to more closely simulate real-world XRD patterns.Either OCI or KICP was used for large-scale parallel computations, thereby addressing the combinatorial explosion problem.The DL models trained and tested on various dataset combinations worked successfully for any inorganic compound in the ICSD, improving upon previous approaches that only worked for specific, narrowly defined material systems.In addition, the FCN model trained on the synthetic dataset demonstrated strong performance in classifying the CS of real-world experimental XRD data with an accuracy of 90.38% for 52 commercially available powder samples and 74.24% accuracy for the RRUFF experimental dataset.
Upon analyzing failures, it was found that a substantial portion of FCN misclassifications occurred in low-symmetry systems such as triclinic, monoclinic, and orthorhombic, and these misclassified structures were often close in symmetry to the ground-truth structure.Additionally, %20% of the misclassifications were found to be high-symmetry neighbors of slightly distorted ground-truth structures.Such a small distortion in the structure confused the DL model as well as the human crystallographer.

Experimental Section
Synthetic XRD Pattern Preparation: To generate our dataset of XRD patterns, we used the crystal structure solution data from the ICSD.However, the data in the ICSD were expressed in various space group settings, so we converted the entries into a standard setting used in the FullProf software. [70]The XRD pattern was produced within a 2θ range of 5-86.91 degrees, with an interval of 0.01 degrees, resulting in a total of 8192 data points.The peak height varied from 0 to 1.We also removed entries with lattice sizes larger than 10 000 Å 3 and those with too many peaks (more than 20 000).As a result, our dataset included 197 131 inorganic compounds, which was the largest dataset ever used for DL-based XRD analysis.Our XRD generation code, including the space group setting conversion, was available on our GitHub site. [71]reating synthetic XRD patterns that are indistinguishable from real-world patterns is a complex task, as we must consider various perturbations, such as peak shifts (jittering), peak broadening, peak intensity variations, background, and noise.Peak jittering is often caused by internal or external strain, peak broadening is caused by inhomogeneous domain size and the apparatus, the peak intensity may deviate from the standard intensity for isotropic materials due to preferred orientation (texture), and the background can come from the apparatus, sample preparation, or amorphous impurity phases.For our primary dataset of powder XRD patterns, we excluded the texture effect.However, we also prepared a separate dataset with texture-involved XRD patterns for final DL training and testing.In this case, we randomly selected a preferred peak index from 16 major low-index peaks and used a modified March's function with a random choice of G1 in the range of 0.5-0.9.
The background was introduced with random noise according to the sixth-order polynomial functions.However, it has been customary to eliminate the background with ease, and background elimination was well automated in many XRD analysis packages, such as Fullprof, [70] VESTA, [72] TOPAS, [73] X'Pert, [71] etc.A severe background issue arose only when dealing with the low-quality experimental data because the low-quality experimental dataset collected from the RRUFF database [60] includes many high-background XRD data that would never be accepted even in conventional XRD analysis.The high background first originates from the high signal/noise ratio that matters for poor crystallinity, such as low-quality thin films, nanosized domains, amorphous phases, and from the mal-prepared sample stack and misled machine settings.Including these high-background entries in the test dataset severely undermined the test result.Figure S4, Supporting Information, provides some examples of them, from which one can notice their inappropriateness and justify their preclusion from the present investigation.
The peak shift was incorporated by adjusting the lattice parameters within the range of AE1% deviation from the ICSD solution data.A huge number of solid solutions are present in ICSD, so the peak shift originating from solid solutionization was readily involved in the ICSD solution data.Thus, the peak shift was doubly taken into consideration in the course of the XRD pattern generation process.The peak broadening was randomly controlled by the Caglioti parameters and mixing parameters.u, v, w, x, y, and η must not be chosen in a completely random manner since there was a certain correlation among these parameters.To avoid an awkward set of the parameters, we conducted an online search and collected 250 sets of these parameters, which were derived from actual experimental XRD data for inorganic materials.We then randomly varied the u, v, w, x, y, and η parameters around each point within the range depicted in Figure S5, Supporting Information.Although Szymansk et al.'s [16] physics-informed perturbation would be of great help, it would make no difference from the random choice of the peak formation parameters as long as the selection range was reasonable.On this basis, we simply adopted a random choice of peak formation parameters in the above-described reasonable range for our data augmentation.The peak formation physics, such as the Vegard rule (composition of solid solution), domain size, internal residual strain, etc., were believed to be unconsciously considered in the random parameter choice process.
10 texture-free perturbed and another 20 texture-involved XRD patterns were produced for each ICSD entry so that we produced 1 971 310 texturefree perturbed XRD patterns and 3 942 620 texture-involved perturbed XRD patterns starting from the original 197 131 standard XRD patterns.While Szymansk et al. [16] generated 150 perturbed XRD patterns for each of the 140 inorganic compounds in the Li-Mn-Ti-O-F composition space, we generated only 10 or 20 perturbed patterns for each of the 197 131 ICSD entries.If we adopted 150 perturbed patterns for each ICSD entry, the tremendously skyrocketed test accuracy would have been attained both in the Pix2Pix and FCN performances.This dream dataset size (197 131 Â 150) would definitely lead to a considerable improvement in the test accuracy.Although the combinatorial explosion was argued to be a negative issue in DL-driven XRD analysis, our opinion differs from those who are concerned about the combinatorial explosion issue.Modern large DL models that consist of billions of parameters are routinely treating much larger training datasets than the conventional XRD dataset with no restriction due to the recent revolutionary development of GPU and TPU, as well as cloud computing.However, the information leakage that might originate from the inordinately augmented dataset should be a concern and avoided in a systematic manner.
Figure 3 shows different types of synthetic XRD patterns generated for use in DL model training.First, a representative standard pattern with no perturbation included is shown in Figure 3A, exhibiting very sharp peaks that simulate synchrotron light source quality (SLSQ).It should be noted that the standard XRD differs from the ideal stick pattern but represents a typical SLSQ XRD pattern, with fixed peak formation parameters (u = 0.004133, v = À0.007618,w = 0.006255, x = 0.018961, y = 0, and η = 0).In addition, Figure 3B,C shows texture-free and texture-involved perturbed XRD patterns that simulate conventional lab-scale XRD quality.The texture-free perturbed pattern is converted to standard patterns via two different DL models: a pristine U-net and a Pix2Pix.The pristine U-net-converted pattern (Figure 3D) cannot catch up with the standard pattern quality.However, the Pix2Pix-converted pattern looks closer to the standard pattern, as shown in Figure 3E.It was clearly validated that the GAN approach (Pix2Pix) based on the counterfeiter and policeman competition helped improve the conversion quality immensely.
The high-quality experimental XRD dataset, referred to as Ex_dataset_1, was also prepared for 52 commercially available inorganic compounds in the powder form.Figure 3F shows the experimental XRD extracted from Ex_dataset_1.Figure S2, Supporting Information, lists all the Ex_dataset_1 entries.Although the manufacturer report says that the compound is a single phase, many are not a single phase but contain a considerable amount of impurities.Thus, we eliminated those involving impurity phases, and consequently, only 52 relatively pure compounds survived for use in DL model testing.The selection criterion was that if the maximum peak intensity of an impurity phase exceeded 10% of the maximum peak intensity of the main phase, then the entry was discarded.In addition, when measuring the XRD pattern, we took special care to compensate for the lack of diversification ascribed from a single machine measurement, and thereby, made appropriate perturbations, such as malflattened sample surfaces, less-densified powders, and strongly ground powders without annealing.Despite the perturbation, these XRD patterns were referred to as high-quality experimental data as the data quality was relatively higher than that of the RRUFF data (Ex_dataset_2).In fact, Ex_dataset_1 includes a certain amount of unwanted impurities, as clearly shown in Figure 3G.
The validation of the DL model was also implemented using a general experimental dataset obtained from the RRUFF database. [60]Instead of employing their own experimental data, Vecsei et al. [22] introduced RRUFF experimental data for the first time to test the DL model that was trained on their synthetic perturbation-involved XRD dataset and thereby secured a certain extent of generalizability in testing.Their RRUFF dataset consisted of 800 XRD patterns without presenting the selection criterion, and the test accuracy was not promising.A total of 1,600 XRD patterns were extracted from the RRUFF database in the present investigation.In fact, there were so many practically unacceptable XRD patterns that were in truly poor quality that were not conventionally acceptable for the conventional XRD analysis, such as indexing, space group determination, LeBail and Rietveld refinement.We removed all of those low-quality XRD patterns by applying a reasonable cutoff criterion.The standard deviation of the XRD signals in the 2θ range between 5°and 8°was calculated for all the RRUFF entries and ranked according to the calculated standard deviation.If a peak was located in this range, we moved in a slightly higher angle direction.The cutoff criterion was that the top 40% of entries with high standard deviations were all removed.Figure S4, Supporting Information, shows typical XRD features for the removed RRUFF entries.If the quality of those removed entries was checked, the cutoff strategy can be rationalized.Incorporating such low-quality XRD patterns into the DL approach would be counterproductive, as it is common practice to exclude them from conventional XRD analysis.However, some XRD patterns of subpar quality were still present in Ex_dataset_2 and adversely affected the testing of the DL model.

Figure 1 .
Figure 1.Schematic illustration of the FCNs for use in symmetry classification and XRD datasets.A) The small FCN that has a small architecture with the total number of parameters (2 786 615) trained on the standard XRD dataset and the texture-free and texture-involved perturbed datasets.B) The large FCN with the total number of parameters (26 602 853) trained on the texture-involved perturbed dataset.C) The full schematic illustration describing all the training and testing datasets for use in the FNCs.More details about the FCN architecture are described in Figure S1, Supporting Information.

Figure 2 .
Figure 2. U-net architectures exhibiting an encoder and a decoder.A) Small and B) large U-nets consisting of a total of 13 891 265 and 221 944 577 parameters, respectively.Note that the U-net architecture is also used for the Pix2Pix model as a backbone structure.More details for the U-net architecture are described in Figure S1, Supporting Information.C) The small and large Pix2Pix models with the total number of parameters of 15 076 546 and 226 671 874.The basic architecture of U-net is used for the generator, and the FCN-based encoder is also used for the discriminator, but the discriminator architecture slightly shrank in comparison to those for the FCN used for symmetry classification.The small Pix2Pix is trained on the texture-free perturbed dataset, and the large Pix2Pix is trained on the texture-involved perturbed dataset.

Figure 3 .
Figure 3. Different types of synthetic XRD patterns for (Sb 2 O 3 and CaCO 3 (ICSD col.Code 240 206 and 18 164)) generated for use in the DL model training: A) the standard pattern with no artifact included; B) the texture-free; and C) the texture-involved perturbed XRD patterns with a preferred orientation of <1 0 0> and <0 0 1>; D) the pristine U-net-converted pattern; E) the Pix2Pix-converted pattern; F) the high-quality experimental XRD extracted from the Ex_dataset_1; G) the general experimental XRD extracted from the Ex_dataset_2, exhibiting some impurity peaks.

Figure 4 .
Figure 4. Confusion matrices for the classification of crystal systems, extinction groups, and space groups.The color scale represents the frequency.The Seattle zone in the confusion matrices for the crystal system, extinction, and space group classification is shown in an enlarged view within an orange square.

Figure 5 .
Figure 5. FCN failure analysis for 8 misclassified entries, including Sr 2 ZnWO 6 , (Nb 0.88 Ca 0.12 )FeO 3 , Y 2 AlCrO 6 , CsLi(BeF 4 ), Bi 2 (GeO 5 ), (VO) 2 (P 2 O 7 ), Be(Al 6 O 10 ), and La 2 Sn 2 O 7 (ICSD col.codes given).The leftmost column shows the ground-truth structure provided by the ICSD, which exhibits lower symmetry due to a slight imaginary distortion from the virtual higher-symmetry counterpart, shown in the second column.The synthetic standard XRD patterns corresponding to the structures on the left are also shown in the right columns.The distortion was limited to only the lattice angle, and the angle change is highlighted in red font for the ground-truth structures.

Table 2 .
The test accuracy for CS, EG, and SG classifications for various models tested with different experimental test datasets.Some extremely low accuracies appearing in the first column indicate malfunctioning under that condition.
the U-net-converted dataset test results for the standard XRD data-trained FCN model were just as poor as those for the bare perturbed dataset test.As shown in the first column of Table