A Machine Learning–Based Design Rule for Improved Open‐Circuit Voltage in Ternary Organic Solar Cells

Organic solar cells (OSCs) based on ternary blends are among the most promising photovoltaic technologies. To further improve the power conversion efficiency (PCE), the materials selection criteria must be focused on achieving high open‐circuit voltage (Voc) through the alignment of the energy levels of the ternary blends. Hence, machine‐learning approaches are in high demand for extracting the complex correlation between Voc and the energy levels of the ternary blends, which are crucial to facilitate device design. Herein, the data‐driven strategies are used to generate a model based on the available experimental data, and the Voc is then predicted using available machine‐learning methods (the Random Forest regression and the Support Vector regression). In addition, the Random Forest regression is developed to find the appropriate energy‐level alignment of ternary OSCs and to reveal the relationship between Voc and electronic features. Finally, an analysis based on the ranking of variables in terms of importance by the Random Forest model is performed to identify the key feature governing the Voc and the performance of ternary OSCs. From the perspective of device design, the machine‐learning approach provides sufficient insights to improve the Voc and advances the comprehensive understanding of ternary OSCs.


Introduction
Organic solar cells (OSCs) have been considered as a viable clean energy technology for power-generating window and rooftopmounted photovoltaic applications of the 21st century. [1][2][3][4][5] Presently, OSCs have achieved promising progress in power conversion efficiency (PCE), which has exceeded 14% with the two-component (binary) bulk-heterojunction (BHJ) structure. [6] Moreover, the three-component (ternary) structures have drawn much attention recently due to their significant contribution to improving light-harvesting capabilities via greater solar spectrum coverage. [7] The device metrics of ternary OSCs [e.g., short-circuit current ( J sc ), open-circuit voltage (V oc ), and fill factor (FF)] are enhanced primarily by adding a novel third component to extend the absorption profile of the photoactive layer and improve photocurrent generation. The PCE of the ternary OSCs is generally similar to that of the binary OSCs and can be calculated from the product of the three key important photovoltaic parameters J sc , V oc , and FF. Various organic materials (e.g., small molecules and conjugated polymers) can be used as the third component in ternary OSCs based on fullerene derivatives, which may extend the light absorption profile, control the morphology, and generate suitable energy alignment, resulting in improved photovoltaic performances. In particular, ternary OSCs based on PM6:Y6:T-4F ternary blend active layers have achieved a PCE of 16.27%. [8] In addition, the introduction of a third component has addressed the charge transport and/or charge recombination problems associated with binary OSCs. For example, Hou and coworkers stated that the V oc was increased from 0.937 to 0.945 V by adding Bis 70 PCBM (seen as a second acceptor) into the system of PBDB-T:IT-M, which also resulted in the PCE increasing from 10.8% to 12.1%. [9] Zhang and coworkers have investigated the effect of adding the MeIC1 into the INPIC-4F:PBDB-T binary blend. [10] The PCE of ternary OSCs was increased from 12.55% in the INPIC-4F: PBDB-T binary OSCs to 13.73% in the INPIC-4F:PBDB-T: These examples demonstrated the advantages of ternary OSCs and provided practicable methods for enhancing the V oc of the ternary OSCs. To develop the ternary OSCs for use in commercial photovoltaics, however, some fundamental design rules for ternary OSCs must be further used to improve their efficiencies.
Ternary OSCs have provided unique effects for improving the V oc , which have been demonstrated by rationally tuning the highest occupied molecular orbital (HOMO) and lowest unoccupied molecular orbital (LUMO) energy levels of the third component. Therefore, the understanding and optimization of the V oc are central to developing high-performance ternary OSCs. [11,12] For instance, Thompson and coworkers reported that the utilization of the PC 61 BM:P3HT75-co-EHT25:P3HTT-DPP system is a practical strategy to improve the V oc and efficiency of ternary OSCs. [13] A previous report suggested that the V oc of ternary OSCs was defined by the offset between the higher of the two HOMO energy levels of the donors and the LUMO energy level of the acceptor in two-donors/one-acceptor ternary blends. [14] Alloy-based ternary blends, comprising of a fullerene-derivative acceptor and a nonfullerene acceptor, have been demonstrated for increasing the V oc of ternary OSCs. Furthermore, the V oc of the alloy model-based ternary OSCs can be calculated according to Equation (1) [15] V oc ðTernaryÞ where V oc (Binary1), V oc (Binary 2), and V oc (Ternary) are the respective V OC values of the binary and ternary OSC, respectively. Ne1 and Ne2 are the mixed new quasi-frontier orbitals densities of acceptors 1 and 2, respectively; and f 1 and f 2 are the respective weight ratios of acceptors 1 and 2 in the acceptor alloy. Alloy model-based ternary OSCs require precise density functional theory (DFT) calculations to obtain V oc results. For example, Ne can be calculated by Ne ¼ n Â l, where n is the unit molecular mass, and l is the number of quasi-degenerate LUMO levels (<0.1 eV) per molecule. [15] Equation (1) was proposed to explain the observed tunable V oc and investigate the relationship between Ne and V oc in alloy model-based ternary OSCs. This finding motivated the present researchers to develop reliable machine-learning approaches for understanding the relationship between the energy level tuning mechanism and the V oc . Although many studies have been aimed at optimizing the V oc by adjusting the energy levels of the donor and acceptor components, it remains a worthwhile challenge to construct a model from easily accessible descriptors to quantify the V oc enhancement, to extract the complex correlation between the V oc and frontier orbital energies of the active blends, and to identify the key parameter governing the V oc and ternary OSC performance.
Although fundamental knowledge of the V oc dynamics for one type of interface (e.g., P3HT:PC 60 BM) and two energy offsets of binary OSCs has previously been gained, the design rule for improved V oc in ternary OSCs (especially those that include three types of interfaces and six energy offsets) is not well understood. Recent studies indicate that intensive research efforts are still needed to establish predictive models for device efficiency, which are essential to facilitate device design and guide future research. [16][17][18] For example, Troisi et al. reported that the microscopic models could be used to identify the relationship between an isolated acceptor property (e.g., LUMO and LUMO orbitals) and device efficiency. [19] With this in mind, the present study makes systematic use of machine-learning approaches to investigate the effects of the frontier orbital energy levels of fullerene derivative-based ternary blends on V oc optimization. This proof of concept is realized 1) by applying machine-learning algorithms to generate regression models; 2) by referring metrics to evaluate the prediction model performances; and 3) using the variable ranking method to help identify the key features that may guide the design of novel ternary blends. To the best of our knowledge, this is the first report providing the design rule for predicting the V oc of ternary OSCs using a machine-learning approach.

Discussion
The process of Random Forest regression we used is shown in Figure 1. The Random Forest regression was used to learn an accurate functional relationship between the input features (seen as the electronic properties of ternary blends) and the output variables (seen as the V oc values). In complex ternary systems, the influence of the electronic features of organic materials on V oc was shown by carefully analyzing the results from the machine-learning model. In evaluation metrics, R 2 values were used to measure the goodness of fit of a machine-learning model, and the root-mean-squared error (RMSE) values were used as an estimator to represent the deviation of each of the V oc from the overall trend. The Random Forest regression model was conducted through a collective learning algorithm. Figure 2a shows the comparison between predicted V oc values and measured V oc values. The green line (seen as 45 line) is a useful visual aid for assessing the perfect fitting of the regression model. Good fitting of the Random Forest regression model is found between the measured and predicted results. The Random Forest model has the statistics values of R 2 ¼ 0.84 and RMSE ¼ 0.030 for the training set. For the test set, R 2 ¼ 0.77 and RMSE ¼ 0.144. The reasonably good prediction of the V oc over a wide range  These results suggest that the Random Forest regression model can provide more accurate V oc predictions when comparing against the Support Vector regression model. We should note that our model performance was obtained by randomly shuffling all data points and repeating experimental processes ten times (e.g., the training and evaluation procedure). The regression results of the Random Forest model, such as average R 2 (training set), average R 2 (testing set), best R 2 (training set), and best R 2 (testing set), are shown in Table S1, Supporting Information. For the visualization purpose, the aforementioned plots used in performance comparison were from Experiment 1. Among the learning curves shown in Figure 3, the effect of the training set size in the performance of the machine-learning models is studied. In Figure 3a, the difference between train and test RMSE tends to slightly reduce for larger training set size. It means that the Random Forest regression model shows a good learning capacity for generalization.
To evaluate the performance of the Random Forest model for the separate unseen data, the available experimental results (e.g., PC 60 BM:PBDB-T:tBTI-IDT (A1/D1/D2), [20] PC 60 BM: PBDB-T:tPDI2NHex (A1/D1/A2), [20] and PC 70 BM:PTB7-Th: ITIC-Th (A1/D1/A2) [21] ) were used to predict the V oc values of ternary OSCs. The prediction and measurement results are shown in Table 1. According to the reported literature, [20] PC 60 BM:PBDB-T:tBTI-IDT-based OSC and PC 60 BM:PBDB-T: tPDI2NHex-based OSC were fabricated to make a comparison with the same laboratory condition. The relative errors for PC 60 BM:PBDB-T:tBTI-IDT-based OSC and PC 60 BM:PBDB-T: tPDI2NHex-based OSC can be calculated to 11.6% and 10.3%, respectively. For the ternary OSC based on PC 70 BM:PTB7-Th: ITIC-Th active layer, the predicted V oc (0.80 V) and measured V oc (0.81 V) were found to be in good agreement. Likewise, a small relative error (1.2%) indicated that the Random Forest model generalizes well to unseen data. These results suggested that the appropriate Random Forest algorithm may be sufficient to construct a V oc prediction model and provide rules for a rational design of ternary OSCs.
A strategy for ranking the importance of variables, based on the Random Forest model, has been developed to provide multivariate feature importance scores, which have been successfully applied to MXenes [22] and structure-based drug design. [23] From the perspective of physical intuition, the Random Forest model showed great feasibility for examining the feature importance of  www.advancedsciencenews.com www.advintellsyst.com the whole ternary OSCs data set, which is determined using the bagging and Gini impurity methods. [24] The results for each potential electronic feature are shown in Figure 4. The Random Forest regression model identified the LUMO and HOMO of donor1 as the top two features influencing the V oc . These results indicated that the rational design of donor materials plays an important role in fabricating ternary OSCs. [25] Moreover, the LUMO levels of donor materials were closely correlated with OSC characteristics. For example, an efficient thermodynamic driving force for charge separation can be achieved by matching appropriate molecular energy levels between electron donors and electron acceptors. Figure 5a shows the V oc value contour as a function of the HOMO levels of the donor and the third component. Likewise, the predicted V oc values for the ternary OSCs versus the LUMO levels of D1 and the third component are shown in Figure 5b. These results indicate that the V oc can be tuned according to the HOMO-HOMO offset in D1-third component pairs and the LUMO-LUMO offset in D1-third component pairs, demonstrating the energy level alignment between donors and third components is the main mechanism influencing the V oc . These machine-learning approaches therefore suggested that fine-tuning the energy levels of the donors is not only essential for providing complementary optical absorption with other ternary materials but also beneficial to increase the V oc .
Based on the aforementioned machine-learning model results, we further established the fundamental design principles for the energy levels of all involved organic materials (including donors, acceptors, and third components) to optimize the V oc of the ternary OSCs. The optimal V oc value for the fullerenederivative based ternary blends has been predicted to be >0.9 V and corresponds to a frontier orbital energies combination of À3.3 -À3.5 eV for the donor 1's LUMO level, À5.7 eV for the donor 1's HOMO level, À3.9 eV for the third component's LUMO level, and À5.8 to À5.9 eV for the third component's HOMO level ( Figure 6). For instance, a ternary PTBTz-2/ITIC/ PC71BM OSC (HOMO level of PTBTz-2 À5.48 eV, LUMO level  Relative error (%) between predicted V oc value and measured V oc value was calculated as % of error ¼ (predicted valuemeasured value)/measured value.   [26] in excellent agreement with the machine-learning results. Therefore, the careful selection of materials (donors, acceptors, and third components) based on the relative energies of their frontier orbitals is desirable for tuning the V oc by the formation of an energy cascade alignment and for eliminating energy losses during the charge-transfer process. The present study also provided impressive evidence that the self-organizing cascade electronic energy structure is the key concept for overcoming the V oc limitation in ternary OSCs. [27] In the cascade structure-based OSCs, the HOMO and LUMO levels of the third component (known as the cascade) are located between the HOMOs and LUMOs of the donor and acceptor materials. [28] The energy levels of the donors, acceptors, and third components must be carefully selected to form an "energy cascade structure" that can stabilize the transfer of electrons and holes between the three materials, thus increasing exciton dissociation and reducing charge recombination at each interface. [29,30] Thus, designing the cascade structure-based OSCs typically requires appropriate energy levels of third components to fabricate cascade energy heterojunction devices. Although there are still some arguments surrounding the tunable V oc in ternary OSCs, cascade-type energy band structures have been demonstrated to increase the V oc of these materials.
The use of a ternary cascade charge transfer system is beneficial in the device design of ternary OSCs in the following three ways: 1) in producing higher V oc ; 2) in suppressing geminate charge recombination; and 3) for increasing charge-transfer routes (e.g., donor-third component, third component-acceptor, and donor-acceptor). In this regard, the performance of the Random Forest model is reliable enough for the quantitative analysis of the schematic energy level alignment in ternary cascade systems. This machine-learning approach not only offers accurate energy level matching for the three organic materials but also promotes an understanding of the physical phenomena of such ternary OSC devices. Nevertheless, the predictive capabilities for V oc are still limited (e.g., R 2 > 0.9) because the derived machine-learning model did not consider experimental details such as the effects of solvents and side chains upon aggregation and of the donor/acceptor interface morphology. [31][32][33][34][35][36] However, based on the analysis of the machine-learning model, considerable efforts can be made to develop a hybrid modeling framework (e.g., genetic algorithm, [37] knowledge-base, and database) relating thin-film characteristics (e.g., the appropriate ratio of D1:A1:T) to device fabrication conditions (e.g., annealing temperature and solvent additive). The present results may also provide more insight into material selection criteria for designing ternary OSCs with improved V oc values. It is hoped that this approach can enable the significant progress of simple, easily calculable, and physically interpretable models to assist the development of OSCs. Moreover, the approach can be adopted in various optoelectronic devices, such as perovskites solar cells, [38] nonfullerene OSCs, [39] and organic light-emitting diodes.

Conclusions
A cascade energy level alignment of all involved organic materials (including donors, acceptors, and third components) should be carefully designed to ensure efficient exciton dissociation and reduce charge trapping in the ternary OSCs. Through using the machine-learning approaches, this study demonstrates that the V oc of the ternary OSCs can be predicted from the energy levels of the ternary blends with high accuracy. The combination of domain knowledge, feature selection, and machine-learning model could provide the design principles (e.g., reliable energy level tuning) to fabricate promising fullerene derivative-based ternary OSCs and show new correlations between the electronic properties of the organic materials and the V oc . It is hoped that these results will drive experimental efforts to develop efficient ternary OSCs with high V oc .

Experimental Section
Dataset Collection: At first, the 121 V oc data points were collected as an available dataset from recently reported review articles. [40] A total of 121 fullerene derivative-based ternary OSCs (26 polymer donors and 90 third components) were trained in the study. The dataset consists of six potential variables as its input features and one response variable (e.g., V oc ) as its target output. All potential electronic features considered in this study were as follows: In most of these studies, ultraviolet photoelectron spectroscopy (UPS) was used to determine the energy of the HOMO levels of thin films, and inverse photoelectron spectroscopy (IPES) was used to measure the LUMO levels of thin films. For outlier detection, the V oc distribution of ternary OSCs dataset was measured (see Figure S1, Supporting Information). The dataset was split by randomly splitting into two parts. The training set (80%) for model training and the testing set (20%) for evaluating the model's prediction ability (for unseen data) were used. The distribution of training and testing splits is more likely to be discriminatory in the small dataset, which indicates a single split may cause a failure for large variances to evaluate the models. An effective and rational previous method [41] was followed to further ensure the machine-learning model for reliable prediction of efficiency: 1) to randomly run shuffles on www.advancedsciencenews.com www.advintellsyst.com the dataset ten times; 2) to go through the training and evaluation process; and 3) to report the average of the scores.
Model Building: In this study, the details (e.g., exploring the data, cleaning the data, engineering new features, training the model, and evaluating the model) of this methodology utilized to construct and verify the various types of the machine-learning algorithms for predicting the V oc of ternary OSCs are presented. The Random Forest algorithm is an ensemble machine-learning approach that consists of different regression decision trees (known as weak predictors), which were trained through bagging and random variable selection. In the study, each regression tree in the Forest gave a prediction to predict the V oc of ternary OSCs. Random Forest method aggregated the result (seen as the majority voting) from many regression trees (seen as independent votes) in an attempt to determine the best possible prediction. [42,43] A Random Forest algorithm with the aid of the majority voting function served as a successful technique of ensemble prediction. [44] The Support Vector regression algorithm was considered as a nonlinear prediction method for solving the regression problem by searching a hyperplane that maximizes the distance between multiple classes of samples. [29] Performance Evaluation: A common measure of error was the RMSE, defined as Equation (2), and the coefficient of determination (R 2 ) value can be evaluated by Equation (3) where n is the total number of data; y i is the measured value (seen as measured V oc ); y 0 i is the predicted value (seen as predicted V oc );ŷ is the mean of measured value; andŷ 0 is the mean of predicted value. These two prediction accuracy evaluation metrics were measured to evaluate the prediction performance of the machine-learning model. A high value of R 2 was used to indicate that the predicted V oc perfectly fit the measured V oc . The sample standard deviation of the residuals between the predicted and measured values was measured by considering RMSE. The measurement of RMSE was used to identify large errors and evaluate the fluctuation of model response regarding variance. To exclude any statistical bias, the learning-curve approach was used to access the data dependence of machine-learning modeling. The learning-curve method was adopted to demonstrate the relationship between RMSE and training sample size, which was an extremely useful tool for diagnosing machine-learning model performance. [45,46] Supporting Information Supporting Information is available from the Wiley Online Library or from the author.