Machine learning in energy storage materials

With its extremely strong capability of data analysis, machine learning has shown versatile potential in the revolution of the materials research paradigm. Here, taking dielectric capacitors and lithium‐ion batteries as two representative examples, we review substantial advances of machine learning in the research and development of energy storage materials. First, a thorough discussion of the machine learning framework in materials science is presented. Then, we summarize the applications of machine learning from three aspects, including discovering and designing novel materials, enriching theoretical simulations, and assisting experimentation and characterization. Finally, a brief outlook is highlighted to spark more insights on the innovative implementation of machine learning in materials science.


| INTRODUCTION
The foreseeable exhaustion of fossil fuels and consequent environmental deterioration has triggered burgeoning worldwide demands in developing sustainable energy alternatives. [1][2][3] Some renewable or clean sources, such as wind, hydro, and solar energy, have been regarded as promising solutions to generate electrical energy. [4,5] However, no matter what the source of energy is, it inevitably needs to face a key challenge on how to efficiently store fluctuating energy in miscellaneous applications ranging from large power grids to electric vehicles and various portable devices. [6,7] Thus, energy storage is a crucial step to determine the efficiency, stability, and reliability of an electricity supply system. [8] Up to now, dielectric capacitors (DCs) and lithium-ion batteries (LIBs) are two leading electrical energy storage technologies, as shown in Figure 1A. [9][10][11] Benefiting from the physical storage mechanism via electric dipoles as illustrated in Figure 1B, DCs endow ultrahigh power density (on the order of megawatt), fast response time (up to~us), high working voltage (~MV m −1 ), and long cycle life (>10 5 cycles), and thus serve as important basic components in a wide range of power systems. [12][13][14][15][16][17] However, the low energy density of DCs (<0.1 Wh kg −1 in most cases) limits more widespread applications. [18] While for state-of-the-art LIBs based on electrochemical storage mechanism as presented in Figure 1C, the gravimetric energy density is approaching 300 Wh kg −1 , which enables the rapid expansion of hybrid or allelectric vehicles, portable electronic devices, and other stationary applications. [19][20][21] However, the explosive growth of the market has put forward much higher requirements for energy storage density. [13,22,23] Moreover, their charge-discharge rate, lifetime, recyclability, and safety have also become obstacles for further advanced applications. [24][25][26][27] Therefore, although both DCs and LIBs have been very significantly improved in recent decades, they are not capable of catching up with the mushrooming demand of tomorrow's energy storage and power supply systems in terms of performance, durability, safety, cost, recyclability, and so on. Therefore, addressing the above major challenges requires research and development (R&D) of energy storage materials at an unprecedented pace and scale.
Research paradigm revolution in materials science by the advances of machine learning (ML) has sparked promising potential in speeding up the R&D pace of energy storage materials. [28][29][30][31][32] On the one hand, the rapid development of computer technology has been the major driver for the explosion of ML and other computational simulations. Ever since the first computer (ENIAC, Electronic Numerical Integrator and Computer) was invented, the capabilities of computer systems have been substantially improved, including speed, accuracy, reliability, storage, adaptability, and so forth. For example, compared to 5000 addition operations per second of ENIAC, the calculation speed of a modern supercomputer has reached up to the order of PFlops (10 15 floating point operation per second). Furthermore, advances in data storage capability have also enabled us to efficiently deal with a ton of matrix multiplication when performing complex ML models. On the other hand, ML, as a radically new and potent method, is transforming the field of discovery and design of energy storage materials in recent years. [33,34] It could not only be used to understand the composition-structure-property-processing-performance linkages by encoding the domain knowledge into ML models but also realize property prediction, new materials discovery, multiobjective performance optimization and inverse design, and so forth. [35][36][37][38][39][40] More importantly, ML is playing an essential role in addressing some challenges that are too difficult or time-consuming for traditional physical modeling. [41][42][43] It is in the above context that the F I G U R E 1 (A) Ragone plot of electrical energy-storage technologies with the performances of power density vs. energy density. The discharge time (diagonal dotted line) is simply evaluated by dividing the energy density by the power density. (B) Schematic of dielectric capacitor in the charge process. (C) Schematic of lithium-ion battery in the discharge process applications of ML on the paradigm shift have generated substantial excitements. [44][45][46][47] While significant progress has been made, there is still much to be done. [48] In particular, to promote even greater advances of ML in materials science, it is imperative to strengthen the cross-fusion between materials and computer/physics/ mathematics. [49,50] This review aims at providing a critical overview of ML-driven R&D in energy storage materials to show how advanced ML technologies are successfully used to address various issues. First, we present a fundamental ML workflow and this section is organized along six basic steps involved in this process, as shown in Figure 2. Key concepts, approaches, examples, and challenges in each step will be discussed. Then, taking DCs and LIBs as two representative examples, we highlight recent advancements of ML in the R&D of energy storage materials from three aspects: discovering and designing novel materials, enriching theoretical simulations, and assisting experimentation and characterization. Finally, we outline some perspectives on future challenges and opportunities in ML for energy storage materials.

| ML WORKFLOW
ML, as an offshoot of artificial intelligence, is ubiquitous in our modern world. [51,52] It could tell us that systems can, if trained, identify patterns, learn from data, and make decisions with or without supervision. In materials science, it has been widely used in every part of the R&D cycle from materials composition design and technological process optimization to property prediction and performance evaluation under service conditions. In this section, we summarize the ML workflow with six basic steps, as shown in Figure 2. Up to now, there are many systematical reviews on the ML workflow, [29,[53][54][55] and thus we no longer discuss everything involved in each step in detail. Here, we do wish to present some core issues and thinking in every step that may be helpful to promote better applications of ML in energy storage materials.

| Goal
Before performing an ML project, the first step is to clarify the goal of this study, which will fundamentally guide how to build this model. The goal strongly determines the technical choices for each step from data and algorithm to reliability and stability of this model, as illustrated in Figure 2. Based on the type of tasks, ML models could be classified into the following categories: classification, regression, clustering, dimensionality reduction, and so forth. For example, to address a set of problems where the output variable can take continuous values is usually considered as a regression task. While for a classification task, the model aims at basically categorizing a set of data into different classes.
In materials science, it should be noted that the choice of ML model highly depends on the nature of the research problem. For instance, identifying whether a dielectric material is ferroelectric or paraelectric is a classification task. Establishing the analytical relationship between the effective electrical conductivity of the composite electrolyte and the intrinsic parameters of each phase component belongs to a regression problem. In addition, for feature selection and extraction from complex and diverse data, the dimensionality reduction model is an appropriate choice to analyze inputs and reduce them to only relevant ones for the learning goal. [56] In turn, identifying the goal of the ML model also needs to take into account the volume and quality of data and available features, such as crystal structure, chemical composition, and microstructure. In essence, the goal is the commander of the whole ML workflow, and all steps in ML interact with each other. The details of data resources will be further discussed in the following Section 2.2. On the basis of the above different goals, there are a large number of algorithms to choose from, which will be discussed in Section 2.4.

| Data
Data, as one of the most cherished resources in the information age, are an essential prerequisite for ML. The sufficient quality and quantity of data strongly determine the reliability and accuracy of ML results. [57] However, data in materials science are scarce and discrete. [58,59] Thus, how to obtain high quality and sufficient quantity of materials data has been recognized as the core challenge in the applications of ML in materials science. Generally, data in materials science could be acquired from open-source databases and literature, or generated by high-throughput calculations and experiments. [35,44,53,57] Over the past few decades, a large number of materials database, such as Materials Project, AFLOW, OQMD, and so forth, have been enhanced and open to the public. [33,45] For a more comprehensive review of the available materials database, please refer to some excellent review articles. [53,60,61] Those available databases have become important data sources for ML. However, limited by the diversity and volume of existing databases, it is still difficult to meet different research needs, for example, lack of particular properties or poor coverage or low consistency. Here lies the dilemma that what kind of problems ML can solve highly depends on the available databases, rather than our actual needs or interests, which strongly stifles the enormous potential of ML.
To expand materials data, especially for experimental data, mining data resources from open-source literature has gained extensive interest. [62][63][64] However, the complexity and diversity of literature make automatic data mining technology face huge challenges. For example, intelligently mining the numerical values of polarization and electric field from different diagrams of the ferroelectric hysteresis loop is extremely hard due to the diversity of image format, location, style, size, and coordinatometer in different kinds of literature. Furthermore, performing high-throughput calculations or experiments to generate desired data is becoming an efficient and popular method. On the one hand, the fast proliferation of data by high-throughput calculations and experiments is an important source for many databases to generate and expand data. [65][66][67][68][69][70][71] For example, Materials Projects, one of the most popular materials database, contains~10 5 crystal structures generated by density-functional theory calculations. On the other hand, according to the research goal of interest, we could design different high-throughput calculations or experiments to generate the data we need, without being obsessed with existing data. Fortunately, many opensource software platforms facilitate the conduct of highthroughput calculations. [72][73][74][75] For high-throughout experiments, specific secondary development of experimental instruments is generally required, including sample preparation and performance testing. No matter how the data are obtained, it is usually necessary to assess and clean the data before presenting, by fixing or removing some incorrect, incomplete, duplicate, or corrupted data. [45,76] Due to the existence of abnormal values, the regularity mining of algorithms will be disturbed and thus lead to an unreliable prediction. Thus, data assessment and cleaning is an important prerequisite for the reliability of the ML model. [44,77,78] However, due to the diversity of materials data, there has been no standardized way to handle this process. In addition to obtaining a large amount of data, data sampling and partition is also closely related to the reliability of the ML model. Flawed data sampling and partition are highly possible to cause bias. Some bias reduction methods will be discussed in Section 2.5. In short, ML, to some extent, has turned to the challenge of materials data.

| Featurization
Featurization or feature engineering is the step in which some features (or descriptors and fingerprints) are constructed from original data to describe different characteristics of materials, including the crystal/molecular composition and structure. The features in ML have strongly affected the model performance. [79][80][81] For example, establishing improved physics-based descriptors of perovskites enables a more accurate prediction of bandgap (E g ). [82] Thus, a critical challenge in applying ML to study materials problems is the selection of appropriate features. [83,84] In general, the construction of features is highly dependent on the goal of ML and involves a lot of knowledge and experience in the field of materials science. [85] For example, atomic radii and electronegativity are commonly used features in many ML works because they are key factors affecting dielectric/electrical properties based on some mature theories, such as the Goldschmidt tolerance factor and Pauling rules. This method of domain knowledge embedding makes the ML model more interpretable.
Meanwhile, existing human theories and knowledge inevitably introduce bias, which may result in nonoptimal results and even cover up new findings. In addition to artificially constructing the features, automatically extracting features from original data without human intervention may raise the possibility of discovering new underlying features beyond human cognition. This down-extraction process driven by data could be realized by some advanced algorithms, such as least absolute shrinkage and selection operator (LASSO) and genetic algorithms, and so forth. [53,86] However, the drawback of the data-driven method is that the features may vary from different hyperparameters of extracting algorithms and not well explain the causality. [87,88] To increase the efficiency and reliability of ML, feature selection and optimization are always performed where inessential or low-quality features would be removed or improved into new ones. For example, principal component analysis (PCA) is widely used to screen proper features. [89] In short, ideal features are to maintain both the physical/ chemical interpretability and the possibility of exploiting new effects.

| Algorithm
The famous "no free lunch" theorem suggests that no algorithm is best for all possible learning situations and datasets. Selecting algorithms needs to be based on the goal of ML and available data or features. According to the learning style, ML algorithms could be categorized into three types: supervised learning, unsupervised learning, and reinforcement learning. In materials science, supervised learning (mainly including regression and classification) is the most common type to learn the mapping between input features (e.g., composition and structure) and output properties (e.g., bandgap, dielectric constant, and electrical conductivity). According to the core idea and functionality of algorithms, they could be further divided into more detailed concrete algorithms, such as linear regression, logistic regression, decision tree, K-nearest neighbor, neural network, support vector machine, random forest, and so forth. A detailed description of each algorithm could refer to several excellent textbooks and reviews. [87,90] In a word, algorithm, as the executive of ML model, should be chosen after carefully considering multiple factors, such as goal, data, and features.
However, after selecting a suitable algorithm, it is not guaranteed that the model could exhibit good performance. The hyperparameters in algorithms define model configuration and architecture, which have an ultimate impact on the training results. For example, in neural networks, the number of hidden layers and the number of nodes in each layer are typical hyperparameters that let one control the training process. Thus, to achieve the best performance, it is important to understand how to optimize them. Here are some common strategies for tuning hyperparameters, such as grid search, random search, particle swarm optimization, Bayesian optimization, and gradient-based optimization. Thus, the selection and rational use of algorithms are also key processes in ML.

| Evaluation
In practice, even if we obtain high-quality data and choose a suitable algorithm, it is inevitable that there are some prediction errors, including bias, variance, and noise, resulting in underfitting or overfitting. Underfitting refers to the model that cannot capture the underlying trend of the data, leading to high bias and low variance. While for overfitting, it generally means that the model corresponds too closely to the training data set, resulting in failing to accurately fit additional data set. Especially, when the ML model is too complex with many empirical parameters, it is prone to overfitting. To avoid the above issues, we could deal with them from several aspects, including feature engineering, model complexity, data set, and evaluation methods. Thus, during the training process of ML, we should evaluate the model iteratively to obtain better model performance. First, how to get a reasonable partition scheme of available data is the basis of algorithm reliability. The data are usually split into three: training data set, validation data set, and testing data set. Some effective evaluation methods are developed to reduce ML bias by partitioning data samples with high quality, such as the cross-validation method, the hold-out method, and the bootstrapping method. [91] Then, when evaluating the accuracy of prediction results compared with training data, different evaluation indices are presented. Generally, for regression models, the means absolute error, mean square error, root mean square error, and the correlation coefficient (R 2 ) are common evaluation indices. For classification models, some indices, including the precision, the recall, the confusion matrix, and the accuracy are typically applied. If the evaluation of model performance is poor, we can constantly adjust the algorithms, features, and even data set until we get satisfactory results. Thus, evaluation could be regarded as the supervisor of ML as it helps in quantifying the model performance and guiding the improvement of the model.

| Application
Through continuous model evaluation and feedback optimization, trained ML models can be applied to materials research according to the research goal. For example, we could use ML to predict properties, discover new materials, optimize experimental design, and so on. However, good model performance is not equivalent to successfully solving scientific problems. How to further interpret and verify the ML results is the last critical step, which is an important link to combine theory with practice. In materials science, the crux of materials discovery and design is to identify the mutuality of four materials elements: composition/ structure, property, synthesis/processing, and performance. Thus, excavating underlying relations and establishing new theories are vital problems for ML to address. Good interpretability of the ML model is helpful to the revelation of fundamental insights into the problem and the rationalization of design principles. On the other hand, on the basis of good model performance, how to experimentally verify the reliability of ML prediction results is a tricky business. Due to the complexity and variability of the experimental preparation process, many theoretically predicted materials are unable to be synthesized as expected, leading to a huge gap between theories and experiments. Thus, more and more attention is being paid to the integration of ML into the experimental process, such as the screening of materials composition, optimization of preparing technology, and evaluation of device performance. [30,[92][93][94][95] In summary, the applications of ML have brought great convenience to the R&D of energy storage materials but also face diverse challenges. More specific applications of ML will be discussed in the following section.

| ML-BASED MATERIALS RESEARCH AND DEVELOPMENT
In this section, we will review recent advances in the applications of ML on the R&D of energy storage materials, taking DCs and LIBs as examples. As summarized in many reviews, ML in materials science enables broad applications from the predictions of new materials/properties/theories to the optimization of composition/structure/process. [28,44,76,96,97] Here, we mainly focus on three attractive aspects: discovering and designing novel materials, enriching theoretical simulations, and assisting experimentation and characterization.

| Discovering and designing novel materials
By combining different elements, compositions, and crystal/molecular structures, there is tremendous searching space for us to discover new materials with desired properties. However, the labor-intensive experimental process of trial and error is time-consuming, costly, and inefficient, which slows the pace of the R&D of energy storage materials. By benefitting from the improvement of computing techniques and algorithms, ML has shown great potential in accelerating the discovery of novel energy storage materials, [28,[98][99][100][101] such as dielectrics with high dielectric constant or high breakdown strength, solid electrolytes with high ionic conductivity, and so forth. Here, we present several representative research works to demonstrate how ML can predict properties and screen novel materials.
The dielectric constant ε is a critical parameter in the design of polymer DCs. [102,103] However, limited by the poor thermal stability (or low glass transition temperature T g ) of polymer dielectrics, finding a polymer dielectric with desired ε and T g remains a challenge. [14,104] To accelerate the discovery of satisfying polymer dielectrics, an ML-based model has been recently developed to instantly predict the frequency-dependent ε and T g of polymers. [105] As shown in Figure 3A, the training data set consisted of 1210 experimentally measured ε at different frequencies and T g . Then, by performing a fingerprinting scheme and the Gaussian process regression algorithm, the model was utilized to predict the ε across the frequency range 60~10 15 Hz and T g of synthesizable 11 000 candidate polymers, as presented in Figure 3B. Finally, taking the desired ε and T g as screening criteria, five representative polymers with ε > 5 and T g > 450 K were selected with the potential for high-temperature capacitors, as shown in Figure 3C. This study demonstrates that ML could be successfully used to rapidly predict properties and discover novel materials with desired property requirements.
As another important parameter determining the reliability and maximum operating voltage of DCs, the breakdown strength has gained lots of attention when designing dielectric materials. [106,107] However, the complex dielectric breakdown mechanism is still not well understood, and thus how to discover and design novel dielectrics with high breakdown strength is confusing. [108] To explore the intrinsic factors on breakdown strength and find novel dielectrics with high intrinsic breakdown strength, a transferable ML model was established to screen high-breakdown-strength dielectrics from 18928 ABX 3 perovskites, as shown in Figure 3D. [109] First, based on the dynamic stability of the perovskite crystal structure, 209 insulators were systematically selected. Then, based on the training data set from first-principle computations, the LASSO approach was used to find the explicit function of the intrinsic breakdown strength with the two most relevant features of bandgap and phonon cutoff frequency. Finally, the intrinsic breakdown strengths of all selected materials were predicted and three perovskite materials (SrBO 2 F, SrBO 2 F, and BSiO 2 F) were identified as promising and worthy of further in-depth studies. In addition, ML could also play a guiding role in screening and designing polymer-based dielectric composites. [110][111][112] For example, as shown in Figure 4A, a powerful theoretical framework combined phase-field simulation and ML was established to study the electric-thermal-mechanical breakdown process and screen several suitable nanofillers for optimizing the breakdown strength. [113] First, highthroughput simulations were performed for the poly (vinylidene fluoride-hexafluoropropylene) (P(VDF-HFP))based nanocomposites filled with nanoparticles of different properties. Then, taking the high-throughput phasefield simulation results as the database, ML was conducted to produce an analytical expression for predicting the breakdown strength by parameterizing the dielectric constant, electrical conductivity, and Young's modulus. Based on screening results in Figure 4B, it was found that some nanocomposites filled with oxides such as Al 2 O 3 , SiO 2 , MgO, and TiO 2 could exhibit higher breakdown strength, which was also verified by targeted experiments. This ML work develops a universal workflow of materials screening and design for optimizing the performances of dielectric nanocomposites. and their glass transition temperature T g at different frequencies for 11 000 unseen polymers. (C) Ten representative polymers with high T g (≥450 K) from ID 1 to 10. Reproduced with permission. [105] Copyright 2020, Springer Nature. (D) The machine learning framework of predicting intrinsic breakdown field as functions of bandgap and maximum phonon frequency for 209 perovskites. Reproduced with permission. [109] Copyright 2016, American Chemical Society SHEN ET AL.

| 181
For LIBs, ML is also contributing to accelerating the screening of novel battery materials, [114][115][116][117] such as active electrodes and solid/liquid electrolyte materials as illustrated in Figure 1C. For example, in the case of solid electrolytes, massive efforts are focused on the search for inorganic/polymer solid ion conductors with high ionic conductivities and suitable mechanical properties. [118][119][120][121][122] As shown in Figure 5A, an unsupervised learning approach was proposed based on unsupervised clustering of Li-containing compounds, and then 16 new fast Li conductors with room temperature conductivities of 10 −4 -10 −1 S cm −1 were predicted by ab initio molecular dynamics simulations. [123] In this study, this unsupervised scheme was achieved by utilizing a limited quantity of conductivity data to prioritize a candidate list to increase the accuracy of screening results. In particular, three new materials systems, Li 8 N 2 Se, Li 6 KBiO 6 , and Li 5 P 2 N 5 , exhibit room temperature conductivities exceeding 10 −2 S cm −1 , which are higher than that of the best known solid Li-ion conductors. Compared with inorganic solid electrolytes, polymer electrolytes possess the advantages of safety, cost, and flexibility, which have been regarded as one of the most promising electrolyte materials for LIBs. [125,126] To explore the polymer electrolytes with higher ionic conductivity, a transfer-learned graph neural network was constructed to accurately predict the ionic conductivities of 10 4 polymers in the polymer database, as illustrated in Figure 5B. [124] It was found that the superionic conductors composed of charge-transfer complexes of aromatic polymers could exhibit the ionic conductivity of around 10 −3 S cm −1 at room temperature. Contrary to the traditional concept of rubbery polymer electrolytes, this glassy design was proposed to achieve fast, decoupled motion of ionic species from polymer chains and thus enhance mechanical and thermal stability. This study is an excellent example to show the great potential of ML in discovering new materials without being fettered by human experience. In recent years, polymer blend or polymer-based composite electrolytes have been extensively studied to modulate the trade-off between ionic conductivity and mechanical property. [127][128][129][130] Recently, an ML method has been employed to realize the multiobjective design of polymer blend electrolytes in two terms of ionic conductivity and viscosity, [131] and it was concluded that polymer blend electrolytes with unequal molecular weights hardly improve electrolyte performance. Overall, plenty of research works suggests the potential of ML to discover new phenomena and novel materials, which tremendously promotes the breakthrough and innovation of energy storage materials. [97,105,132,133] In summary, plenty of successful works indicates that ML has been a powerful tool in the discovery and design of novel energy storage materials. At the same time, materials scientists are also strongly interested in figuring out the underlying mechanism behind novel F I G U R E 4 (A) Schematic workflow of the machine learning strategy for producing an analytical expression for the breakdown strength.
(B) Comparisons of breakdown strengths between the phase-field model and the machine learning prediction. Reproduced with permission. [113] Copyright 2019, Springer Nature properties. As summarized in Section 2, by constructing some physical/chemical descriptors, ML results could become readable and understandable by summarizing some qualitative trends. However, it is still a huge challenge to accurately establish mathematical expressions. In the future, the ML model combined with physics and mathematics may promote the birth of new physical theories and then achieve innovative breakthroughs in new materials.

| Enriching theoretical simulations
Over the last few decades, theoretical simulations have been viewed fundamentally as an endeavor to describe constitutive relationships in the fields of materials physics, mechanics, chemistry, and materials engineering. [134,135] Up to now, there are lots of well-developed theories that could be used to study various materials phenomena at different scales of time and length, such as quantum mechanics, [136] molecular dynamics, [137] phase-field theory, [138,139] and effective medium theory. [140] However, existing computational simulations generally consume high computational costs and time for complex or multiscale problems. The implementation of ML could improve the efficiency and accuracy of theoretical simulations, and explore novel insights based on simulation results. [50,[141][142][143] In this section, we aim to illustrate how ML can enrich computational simulations by several representative examples from the atomic/molecular level to the mesoscopic/macroscopic level.
For the simulations at the atomic/molecular level, ML has been widely applied to improve the calculation accuracy or efficiency of density functional theory and molecular dynamics. [144][145][146][147] For example, an efficient gradient-domain ML was developed to construct accurate energy-conserving molecular force fields of ab initio F I G U R E 5 (A) Bottom-up tree diagram of Li-containing compounds and corresponding conductivity generated using the agglomerative hierarchical clustering method. Reproduced with permission. [123] Copyright 2019, Springer Nature. (B) Machine learning workflow of predicting properties and discovering new solid polymer electrolytes. Reproduced with permission. [124] Copyright 2020, American Chemical Society molecular dynamics simulation. [148] Moreover, Bayesian optimization was used to determine the optimal Hubbard U parameters to improve the accuracy of density functional theory with a Hubbard U correction. [149] Except for the enhancement of algorithms and models, ML also can help to effectively discover more hidden information. [150,151] For example, to motivate the development of novel highly conductive solid polymer electrolytes, molecular dynamics provide immense opportunities to study the role of molecular behaviors on ion transport properties. However, to fully understand what governs the ion mobility in solid polymer electrolytes and achieve global optimization is beyond the current capacity of conventional fully atomistic simulations. Thus, a new design approach that integrates coarse-grained molecular dynamics with ML was proposed to design highly conductive polymer electrolytes, as the framework shown in Figure 6. [152] First, a continuous space with physical tunable and interpretable descriptors was constructed by the coarse-graining of chemical species. Then, the authors used Bayesian optimization to efficiently explore the relationships between the molecular descriptors and the conductivity of solid polymer electrolytes. It was concluded that changing TFSI − , introducing secondary sites, and replacing PEO backbone chains are beneficial to improve the ionic conductivity of poly(ethylene oxide)-lithium bis (trifluoromethanesulfonyl)imide (PEO-LiTFSI) system. This method not only greatly reduces the computation cost and time but also gains useful mechanistic insights and optimize functions of solid polymer electrolytes.
At the mesoscopic level, the microstructure is one of the most important factors influencing the performance of energy storage materials, [108,[153][154][155][156] such as crystallization behavior and composite structure. Due to the complexity and variability of the microstructures in materials, comprehensively understanding the relationships between microstructures and properties is still a challenge. The combination of ML and mesoscale simulations is becoming an effective tool to tackle this issue. For example, to address microstructure-property relations of polymer nanocomposites for dielectrics energy storage, a materials development paradigm was developed by combining high throughput phase-field simulations and ML, as shown in Figure 7. [157] Based on 6615 phase-field simulation results, an ML strategy was then performed to evaluate the capability of energy storage by a scoring function. The screening results revealed that taking parallel perovskite F I G U R E 6 Framework of the coarse-grained molecular dynamics-Bayesian optimization (CGMD-BO). This workflow starts with the process of transforming the discrete conventional chemical species space to continuous space with coarse-graining (CG) parameters (①-②). Then, Bayesian optimization (BO) is employed to predict the relationships between the transport properties and the associated CG parameters (②-③). Reproduced with permission. [152] Copyright 2020, American Chemical Society nanosheets (e.g., Sr 2 Ta 3 O 10 , Ca 2 Nb 3 O 10 , LaNb 2 O 7 ) as the nanofillers is beneficial to the improvement of the breakdown strength of polymer nanocomposites. Based on the guidance of ML results, a polymer nanocomposites P(VDF-HFP)/Ca 2 Nb 3 O 10 with an ultrahigh discharged energy density of 35.9 J cm −3 at 853 kV m −1 was fabricated. This study is a good combination of ML and phasefield models to realize the optimization and design of the composite structure.
Combined with theoretical simulations at the macroscopic level, ML is also applied to provide important guidelines on the safety assessment of LIBs for electric vehicles. [158][159][160] In the crash of electric vehicles, the battery pack may be damaged by the tremendous collision force, resulting in the electric short circuit, thermal runaway, and possible fire or explosion. However, the understanding of mechanical failure has been a blind spot for the battery community. Thus, it is an urgent need to identify the range of mechanical loading conditions ensuring the safe operation of battery cells, known as the "safety envelope." To overcome the challenge of insufficient mechanical test data, a "bottom-up" modeling approach was proposed to include constitutive models, finite element models, and ML models, as shown in Figure 8. [161] First, a high-accuracy finite element model of a pouch cell was employed to generate~2500 simulations and then the ML model was used to predict the safety envelope. As shown in Figure 8D, on the one hand, it could quantitatively tell the maximal amount of deformation before failure by the constitutive models. On the other hand, this approach could quickly identify whether there is a short circuit at different mechanical loading conditions, as plotted in Figure 8F. The combination of finite element simulations and ML not only reduces the number of computations but also provides a high-accuracy prediction of macroscopic performance.
Therefore, ML could enrich various theoretical simulations at different isolated scales, including optimizing potential function or parameters, accelerating the algorithms, reducing the number of calculations, analyzing and interpreting the simulation results, and exploring new insights. However, many challenges remain in multiscale simulations of one system from microscale to macroscale. Given the powerful learning ability of ML, we believe that ML has the promising potential in connecting different theories to realize automatic multiscale simulations. [135,162] F I G U R E 7 The research and development paradigm of polymer nanocomposites from theoretical prediction and machine learning strategy to targeted experiment. (A) The input data set was generated from high-throughput phase-field simulations. (B) Comparisons of the predicted results between the phase-field simulations and machine learning results. (C) The discharged energy density of the targeted experiment of P(VDF-HFP)/Ca 2 Nb 3 O 10 (CNO) nanocomposites. Reproduced with permission. [157] Copyright 2021, Springer Nature SHEN ET AL. | 185

| Assisting experimentation and characterization
The traditional experimental process highly depends on human experience and intuition, resulting in a slow and expensive cycle of the R&D of energy storage materials. [48] Furthermore, materials research has put forward higher and higher requirements for experimental characterization technology, which are gradually beyond the capabilities of existing advanced technologies. To overcome those limitations, researchers are trying to apply ML to assist experimentation and characterization, such as exploring optimal compositions, optimizing experimental processing, enhancing characterization technologies, and so forth. [92,[163][164][165][166][167][168][169] Here, we will highlight several successful examples to spark more thinking on exploiting the potential of ML in experimentation and characterization.
In terms of experimental optimization design, it is attractive to employ ML to guide experimental synthesis to search for suitable materials with better properties. [92,93,170] Recently, to find better energy storage density of Pb-free BaTiO 3 -based dielectrics at low electric fields as few experiments as possible, an ML-based approach was proposed to search optimal compounds. [171] As shown in Figure 9A, a closed feedback loop from experiments to ML was developed to guide the experimental design. It was started with physics intuition to reduce the search space in the (Ba 1−x−y Ca x Sr y ) (Ti 1−u-v−w Zr u Sn v Hf w )O 3 ceramics system. It is well known that the ceramics located around the crossover region in the composition-temperature phase diagram between relaxor ferroelectric phase and ferroelectric phase are more likely to exhibit better energy storage performance. Thus, the first step was to classify the compounds by predicting the phase diagram, as the example of elements Zr and Ca shown in Figure 9B. Then, taking the preselected compounds in the crossover region as the next search space, a regression model based on 182 experimental data was employed to make predictions. The prediction results of each ML would be verified by experiments, and then the experimental results would be fed back to the training data for iterative training. By performing only two active learning loops, the largest energy storage density ≈73 mJ cm −3 at 20 kV cm −1 was found in the compound (Ba 0.86 Ca 0.14 )

F I G U R E 8 (A-C) Multiple scales of the mechanical safety from EV level (A) to battery cell level (B) and materials level (C). (D-F) The
"bottom-up" modeling approach includes constitutive models (D) and finite element models (E), and the data-driven machine learning model (F). Reproduced with permission. [161] Copyright 2019, Elsevier (Ti 0.79 Zr 0.11 Hf 0.10 )O 3 , which is improved by 14% compared to the best in the training data, as shown in Figure 9C. This study provides an exemplary framework of ML to accelerate the searching process of multicomponent materials in experiments.
Besides, based on partial experimental results, ML is also widely used to accurately predict the lifetime or safety of complex and nonlinear systems such as LIBs. [117,159,160,172,173] For example, the ML method was applied to assess the battery cycle life. [174] First, the cycle lives of 124 commercial lithium iron phosphate/graphite cells were tested under fast-charging conditions to generate a training data set. Taking discharge voltage curves to exhibit capacity degradation, the ML model quantitatively predicted cycle life with 9.1% test error and classified cycle life into two groups with 4.9% test error. In addition, the degradation patterns of LIBs were identified using Gaussian process ML. [175] Over 20 000 electrochemical impedance spectroscopy of commercial LIBs were measured at different states of health, charge, and temperatures. Without feature engineering, this model could accurately predict the remaining useful life. In addition, ML could be also used to guide the battery cell manufacturing process, which is difficult to plan and control by human experience. [95] Those works elucidate the strong ability of ML on predicting the performance of complex systems in experiments.
Another particularly promising area of ML in assisting the R&D of energy storage materials is the enhancement of characterization techniques, including microscopy processing [154,[176][177][178] and spectroscopy analysis. [179][180][181] Here, we mainly focus on recent applications of ML on image analysis to demonstrate the power of ML on recognition and classification. In materials science, it is highly desirable to characterize the local behaviors from nanoscale to microscale, which are closely correlated to the macroscopic properties of materials. By benefitting from the advancements of characterization techniques, such as scanning electron microscope, transmission electron microscope, atomic force microscope (AFM), and tomography, some local information from the atomic level to microstructure could be directly measured. However, plentiful characterization results are not well understood and utilized by human experience alone. ML could be an effective tool to assist in analyzing the characterization results, such as resolution enhancement, real-time processing,  [171] Copyright 2019, Wiley-VCH identification and segmentation, reconstruction, and so forth. [154,[182][183][184][185] An interesting application of ML is constructing an artificial intelligence atomic force microscope (AI-AFM) system. [186] As shown in Figure 10A, this system is capable of not only pattern recognition and feature identification in electrochemical systems and ferroelectric materials but can also classify via adaptive experimentation with additional probing at critical locations like domain wall and grain boundaries. The innovation in this study was using a support vector machine algorithm to realize high fidelity pixel-by-pixel recognition and classification in real time, which could be able to efficiently analyze AFM results without human interference. In addition, in LIBs, the microstructure evolution in a composite electrode highly determines the battery performance during the charging and discharging process. However, it is a frontier challenge to experimentally characterize the particle behavior in the conductive matrix and thus understand the electrochemical response under operating conditions. To address this issue, an ML-revealed statistics approach was proposed to study the behaviors of active particles, the carbon/binder domain, and pore structures in an Ni-rich LiNi 0.8 Mn 0.1 Co 0.1 O 2 composite cathode. [187] To achieve the statistical representativeness of complex microstructures with more than 650 active particles, an ML model was developed to segment and label with superior accuracy and efficiency. As shown in Figure 10B, compared with the map results of a few representative particles obtained by phase-contrast tomography and traditional segmentation, the ML approach significantly improved the performance of identification. Up to now, extensive research has demonstrated the huge potential of ML in improving the accuracy or efficiency of experimental characterization. [164,[188][189][190] In short, ML has already been widely integrated with the full life of experimental processes and shows promising potential for tackling complex problems. The application domains of ML in assisting experimentation and characterization range from materials preparation process and microstructure characterization to performance evaluation. F I G U R E 10 (A) The concept of an artificial intelligence atomic force microscope (AI-AFM) with the functions of pattern recognition and classification without human interference. Reproduced with permission. [186] Copyright 2018, RSC. (B) Comparison of the machine-learning-assisted segmentation results (B2) and traditional segmentation (watershed and separation algorithms) results (B3) for a few representative particles visualized by the hard X-ray phase-contrast nano-tomography technique (B1). Reproduced with permission. [187] Copyright 2020, Springer Nature Although ML has made good progress in facilitating experimental progress, there are still some challenges that need to be addressed, such as multiobjective optimization [191,192] and inverse design. [193][194][195]

| SUMMARY AND OUTLOOK
Mainly focusing on the energy storage materials in DCs and LIBs, we have presented a short review of the applications of ML on the R&D process. It should be pointed out that ML has also been widely used in the R&D of other energy storage materials, including fuel cells, [196][197][198] thermoelectric materials, [199,200] supercapacitors, [201][202][203] and so on. So far, considerable progress has been made on the implementation of ML for discovering and designing novel materials, enriching theoretical simulations, and assisting experimentation and characterization, which has tremendously accelerated the R&D pace of energy storage materials. However, it is worth noting that numerous challenges still remain to be addressed. Here, we present our perspectives on some nascent but promising directions where ML may lead to further breakthroughs.
(1) Establishing full lifecycle materials database. As we reviewed above, most materials databases mainly consist of information about atom/molecule/crystal structure and corresponding intrinsic properties, which are inherent genes that fundamentally determine materials properties. However, materials performance could be also influenced by external factors during the full life cycle, such as the preparation process and service environment. To tackle this issue from a data-driven perspective, the urgent task is to build a full life-cycle materials database by collecting various types of data generated from the preparation and characterization process to the service process. Then, the key step is to establish a knowledge graph of composition-microstructure-preparation-service performance, which will be the basis for further mechanism analysis and optimization design. Among them, microstructure, as the extrinsic gene of materials, is very sensitive to the material composition, preparation process, service environment, and performance, and bridges all links in the full life-cycle process. However, insufficient attention is paid to studying the microstructural evolution via ML. We believe that taking the microstructure as the main line to series the full life-cycle research could not only provides more fertile ground for ecientific innovation but also be more conducive to the practical use of materials.
(2) Developing multiobjective optimization algorithms. Whether a material can be used in engineering applications often requires comprehensive performance meeting the practical requirements. For example, for the application of a dielectric material, it should simultaneously meet the needs of multiple performance metrics instead of a single performance index, including the energy density, efficiency, working voltage, thermal stability, loss, and so forth. However, materials performances are affected by various complex factors, and there may exist an inverted coupling relationship among different performances, such as dielectric constant and breakdown strength for dielectric materials, ionic conductivity, and mechanical strength for solid electrolytes. Collaborative optimization of multiple performance indexes is a growing public concern in material research. Up to now, ML has made successful progress in optimizing single-objective performance as discussed in this review, such as screening solid electrolytes with high ionic conductivity or dielectrics with high dielectric constant. However, when dealing with two or more performances, the optimization problems will become tricky. Therefore, besides Pareto optimization or Bayesian optimization, new efficient algorithms urgently need to be developed. (3) Improving quantifiable interpretability. Although many ML algorithms enable quick and accurate prediction, the black-box nature makes us trouble in understanding the decisions of ML systems. In materials science, quantifying physical correlation or causality and further establishing quantified expression between inputs and outputs is critical to understanding the underlying mechanism of novel phenomena and further extending to guide more materials research. Thus, developing ML models with both high predictive ability and high interpretability has been the hot pursuit of materials scientists. [87,113,[204][205][206][207] Recently, physics-informed ML integrates data and mathematical models, which may provide a new insight to discover hidden physics and tackle high-dimensional problems. [50] Altogether, keeping the balance of quantifiable interpretability and intelligent forecasting of ML is a daunting challenge that requires deep integration and collaboration across multiple disciplines.