Machine Learning for Ultra High Throughput Screening of Organic Solar Cells: Solving the Needle in the Haystack Problem

Over the last two decades the organic solar cell community has synthesized tens of thousands of novel polymers and small molecules in the search for an optimum light harvesting material. These materials are often crudely evaluated simply by measuring the current–voltage (JV) curves in the light to obtain power conversion efficiencies (PCEs). Materials with low PCEs are quickly disregarded in the search for higher efficiencies. More complex measurements such as frequency/time domain characterization that could explain why the material performed as it is often not performed as they are too time consuming/complex. This limited feedback forced the field to advance using a more or less random walk of material development and has significantly slowed progress. Herein, a simple technique based on machine learning that can quickly and accurately extract recombination time constants and charge carrier mobilities as a function of light intensity simply from light/dark JV curves alone. This technique reduces the time to fully analyze a working cell from weeks to seconds and opens up the possibility of not only fully characterizing new devices as they are fabricated, but also data mining historical data sets for promising materials the community has overlooked.

Over the last two decades the organic solar cell community has synthesised tens of thousands of novel polymers and small molecules in the search for an optimum light harvesting material.These materials were often crudely evaluated simply by measuring the current voltage curves in the light to obtain power conversion efficiencies (PCEs).Materials with low PCEs were quickly disregarded in the search for higher efficiencies.More complex measurements such as frequency/time domain characterisation that could explain why the material performed as it did were often not performed as they were too time consuming/complex.This limited feedback forced the field to advance using a more or less random walk of material development and has significantly slowed progress.Herein, we present a simple technique based on machine learning that can quickly and accurately extract recombination time constants and charge carrier mobilities as a function of light intensity simply from light/dark JV curves alone.This technique reduces the time to fully analyse a working cell from weeks to seconds and opens up the possibility of not only fully characterising new devices as they are fabricated, but also data mining historical data sets for promising materials the community has over looked.a) roderick.mackenzie@durham.ac.uk

I. INTRODUCTION
Over the last 22 years organic solar cell efficiencies have risen from 2.5% in 2001 1 to over 19% 2 today.Much of this increase in performance can be attributed to steady improvement in material systems 3,4 .The first reported cells relied on blends of MEH-PPV/P3HT and C60 fullerene derivatives 1,5 .Later in the late 2000s low band gap polymers started to emerge with alternating copolymers of fluorene with Donor-Acceptor-Donor (D-A-D) segments such as PTPTB with efficiencies around 10% 6 .In the late 2010s the community moved away from fullerene based acceptors to small molecules, with this came efficiencies nearing 20% [7][8][9] .Although efficiencies are slowly increasing at a rate of around 1% a year it takes tremendous effort from thousands of researchers across the world to achieve this.
Furthermore, quantities such as device life time and efficiency still need to be significantly optimised before commercialisation can be considered for polymer cells 10,11 .This points to another decade of slowly improving device performance that humanity can ill afford given the rapidly rising global temperatures 12 .Part of the reason for this slow progress in organic photo-voltaics (OPV) development is a lack of timely and detailed feedback to chemists from device engineers 13,14 .Typically a new material will be synthesised and then used to fabricate a few test devices using a handful of solvents and a few annealing temperatures.
Simple current-voltage (JV) curve sweeps will be performed to determine Power Conversion Efficiency (PCE), Fill Factor (FF), Open Circuit Voltage (V oc ) and short-circuit current (j sc ).These measurements will take only seconds and allow the scientist to see if the material has good photovoltaic properties.However, JV measurements will not give information as to why the device/material works well or poorly and do not give hints as to how material form/function should be improved.To obtain this information one has to perform more time consuming measurements to extract key device parameters such as recombination rate, charge carrier mobility, and measures of disorder.Examples of techniques that can extract this information are, impedance spectroscopy (IS) 15,16 , Impedance Modulated Photocurrent Spectroscopy (IMPS) 17 , Impedance Modulated Photovoltage Spectroscopy (IMVS) 18,19 , Transient Photocurrent (TPC) 20,21 , Transient Photovoltage (TPV) [22][23][24] and charge extraction (CE) measurements 25,26 .
Although considerable efforts have gone into refining these methods they remain complex and require expertise and equipment that is often not found in the same lab as the people with knowledge in synthesis.Other approaches to get at fundamental device parameters such as fitting numerical models to experimental data can often take longer than the experiments themselves and also require expertise and models which are rarely found in the same place as where the material is fabricated 27 .Thus very often without detailed characterisation the scientist is left guessing as to why one molecule performs better than another or why devices fabricated under given conditions perform as they do.This makes it very difficult to determine the next steps in material/device optimisation.
Thus one can think of the development of OPV materials as a random walk, with chemists developing new materials and disregarding the majority of them as on first glance they do not perform.Some more highly performing materials are occasionally investigated with more comprehensive methods (such as P3HT:PCBM in the past and more recently PM6:Y6).This may well have led to promising materials being disregarded and skipped over as they did not perform well in the first batch or two of fabricated devices due to selecting the wrong solvents/annealing conditions or molecular weights.We are in effect searching for a needle in the hay stack but in the dark.
Although this problem is serious in the academic setting where a researcher may make a new material every few weeks, it is much worse in high through-put labs where new materials are generated daily.Candidate materials are often only tested against a few standard combinations of donor/acceptor molecules, solvents and annealed at a few temperatures before the materials are disregarded.Thus there exist a huge back catalogue of JV curves both in the literature and in the industry for material which were never fully analysed.
Our aim when writing this paper was to develop a method that can accurately extract charge carrier recombination time (τ ) and mobility (µ) as a function of light intensity using the most simple, quickest and easy to perform set of experiments possible.We wanted a measurement technique that took seconds to apply, that anybody without expensive lasers/frequency domain equipment could use and enabled the feedback loop from device performance to material parameters to be efficiently closed for all in the community.We focused on the recombination time constant and charge carrier mobility because they can be used to identify if recombination or transport is the key bottleneck in device performance, which can in turn give hints as to how to tune the molecular packing and/or morphology.Furthermore, when combined in the µ • τ product they give a standard benchmark for material performance [28][29][30] .be extracted from JV curves alone using a combination of machine learning (ML) models trained on physically accurate device models.We compare the values of recombination rate and charge carrier mobility extracted by our new method to values extracted by more traditional frequency domain/transient measurements from both spin coated and evaporated cells.Thus we develop a high throughput tool that has the potential to close the feedback loop and accelerate device development.

A. Time domain measurements on evaporated devices
Two devices of layer structure Glass/ITO/nC 60 /C 60 /DCV-V-Fu-Ind-Fu-V:C 60 /MoO 3 /Ag were deposited by evaporation, in one device the substrate temperature was held at 50 • C during deposition of the active layer, while in the other device substrate temperature was allowed to float at room temperature 31 .The device structure is depicted in Figure 1a while the molecular structures and example JV curves can be seen in Figure 1b.The active layer was 50 nm thick and made by co-evaporating the small molecule donor DCV-V-Fu-Ind-Fu-V with C 60 .We performed TPV at open circuit and charge extraction at short circuit to measure recombination times and effective charge carrier mobility respectively.A summary of these measurements can be seen in Figure 2.
Both JV curve and transient measurements were performed at light intensities ranging from 0.025 Suns to 3 Suns.It can be seen that the charge carrier mobility measured at j sc is a factor two higher for the 50 C • device than for the room temperature device.This is attributed to slightly better transport properties caused by favourable morphology.Lifetimes at V oc are almost identical for both devices, indeed it can be seen from the JV-curves in Figure 2d that V oc is very close for both temperatures.It is now our aim is to see if using the JV curves alone (see Figure 1b) coupled with machine learning we can predict all the data extracted using transient measurements presented in Figure 2. JV curves are very quick and easy to measure.Thus if we were able to extract µ and τ from these curves alone months of measurement work could be saved.To do this we first set up the device structure in our drift-diffusion model OghmaNano 27,32 .
The model solves Poisson's equation to take account of electrostatic effects within the device, electron/hole charge carrier continuity and drift-diffusion equations to describe carrier transport.Finally to describe carrier trapping and recombination, the LUMO and HOMO Urbach tails are each split up into 8 discrete trap levels and a Shockley-Read-Hall capture escape equation is solved for each energetic range.This approach allows carries to be described both in energy and position space within the device.More detail about the model can be found elsewhere 27,33,34 .
Using this base device structure, 20,000 copies of the simulation file were made to form a sample set of 20,000 virtual devices.Each virtual device had randomly assigned electron/hole mobilities, trap densities, Urbach tail slopes and carrier trapping/escape constants.From these devices 20,000 corresponding light and dark JV curves were generated.Furthermore, for each device the calculated recombination rate at V oc and charge carrier mobility at J sc were stored.This process is described in Figure 3. given JV curve in the data set.Once the error is sufficiently small, the weights are fixed and the model is ready to predict on experimental data.To test the ability of the network to extract µ and τ from as of yet unseen data, 20% of the 20,000 training set is kept out of the training process, and used at the end of the training process to assess the performance of the network.Once the model was trained on virtual data to our satisfaction, the experimental JV curves for each device in Figure 1b were fed into the neural network in an attempt to predict the values in Figure 2.
The values of τ and µ predicted from the JV curves are shown in Figure 2 as solid triangles.It can be seen that the predicted values follow those of the directly measured values within one order of magnitude, accurately following the trend of the experimental data.This demonstrates that there is indeed enough information in the JV curves alone to determine τ and µ.As V oc is almost the same for both devices, the information gained with TPV is limited in our case.But the machine learning model enables to also predict lifetimes at the maximum power point P max .The inset in 2b shows this prediction.As the maximum power point for the room temperature device is at a lower voltage, the charge carrier density may be lower than at the maximum power point for the 50 • C device and therefore result in longer carrier lifetime.

B. Frequency domain measurements on spin coated devices
In the previous section we compared the ability of machine learning to extract τ or µ from JV curves to the values τ or µ extracted from transient measurements.In this section, we demonstrate the general ability of our ML-approach by turning our attention to state- The above results represent a base line against which to compare the machine learning.
Before we go further however, it is worth underlining some of the points made in the introduction about detailed characterisation being the bottleneck to device development by noting that the above measurements took around 6 months to measure and analyse by hand.Again the experimental JV curves for each device in Figure 6a were fed into the neural network in an attempt to predict the values in Figure 7.The predicted values are shown as solid triangles for mean values (geometric mean in case of charge carrier mobility), solid squares for electrons and solid circles for holes.Taking the top row of graphs first, it can be seen that the model predicts electron mobility to be orders of magnitude higher than hole mobility.This is in accordance with literature 35 .Further the predicted electron mobility is in good agreement with the experimental IMPS data.As the electrons are the faster charge carrier species they dominate the IMPS response.Due to their low charge carrier mobility holes will not be able to follow the high frequencies.Examining the second line of graphs it can be seen that the Neural Network can predict the absolute value of the recombination time constant as a function of light intensity very well with the error being slightly higher for the lower light intensities.Still the error stays well below one order of magnitude.Furthermore the trend of the lifetime is also accurately reproduced.The bottom row of graphs compare the predicted µ jsc,e • τ Voc product to the measured values with these trends also agreeing well.
Finally, it should be noted that the error bars in Figure 7 on the ML results were generated using a second Neural Network acting as an error estimation/confidence network.We used the 20% fraction of the training set that the µ or τ predicting network had not been exposed to, to train the error estimation network.The learning procedure was to ask our µ or τ neural network to guess τ and µ for a JV curve it had not yet seen.We would then ask our error estimation Network to predict the expected error in the guess of τ and µ.The error estimation network was then iteratively trained to try to improve its understanding of how good the values of µ/τ would be for a given JV curve.As is visible in Figure 7 the error prediction network is fairly confident about the ability of the µ/τ to be predictive.This error however should not be treated as an absolute measure of accuracy but treated as a flag to determine if the experimental JV curve is far from something the µ/τ has had experience with.

III. DISCUSSION
Above we have demonstrated that using a combination of ML algorithms trained on simulated JV curves alone, one can build a tool to extract charge carrier mobility and recombination rate as a function of light intensity, thus removing the need for time consuming and costly characterisation.We anticipate this tool being used by the community to quickly screen new devices and materials and also as a tool to screen the vast historical data sets available in the literature and in industry.The method can also be thought of as a tool to democratise the characterisation of OPV devices.Currently only well funded labs can perform mobility and life time measurements as they require relatively expensive lasers.This tool will allow more people to start extracting this data.
In some ways it is remarkable that using a simple drift diffusion model and a machine learning algorithm we are able to extract carrier recombination time and charge carrier mobility as a function of light intensity.One would have though that some type of transient measurement was needed to extract this information.However, this preconception comes from a human centric view of solar cell measurements, in that one thinks measurements such as TPC and SCLC are needed to measure charge carrier mobility because that is what has been done in the past.However, we should approach the problem from the perspective of Shannon entropy.Entropy in information theory 36 is a measure of how much information is in a signal.For example a photograph of a perfectly clear blue sky contains low entropy (embodied information) as it simply tells you it is a sunny day.However a picture of a clouded sky has higher entropy (embodied information), as it can tell you how high the clouds are, what type of clouds there are, likelihood of rain and likelihood of thunder.We should therefore think of electrical/optical measurements in the same context and ask how much embodied information does the measurement signal contain?In this case it is clear JV curves do encode information about τ and µ that the Neural Network can find and decode.
Continuing this line of reasoning, there is no reason why we should focus our efforts on decoding JV curves or other standard measurements such as TPC alone.There may be another, as of yet unknown, measurement that may be as easy to obtain as a JV curve but contain more information that a machine learning algorithm can extract.In other words, an experiment designed for machine learning extraction rather than for human extraction.Indeed, it may be that the machine has to design it's own perfect experiment to extract maximum possible information from a solar cell.Now we comment on accuracy, although we demonstrated above that our method is accurate for the devices we chose.It should also be noted that it does not need to be completely accurate for all unusual classes of devices to be successful.Our method just needs to be good enough to show trends between devices and also flag up promising materials which are unusual.This first sift can then be used to flag devices to be investigated with more traditional experimental methods.
A general comment should be made about the measurement of τ and µ.It should be noted that the fundamentally difficult thing about measuring τ and µ in organic devices is that they are both a very strong function of carrier density due to the large number of trap states in the materials.Thus if applied voltage, photon flux, or contact materials are changed τ and µ will change.Therefore it is well known that different experiments that subject a device to different experimental conditions will produce different values of mobility/lifetime.
For example both Charge Extraction by Lineally Increasing Voltage (CELIV) and TPC are commonly used to measure charge carrier mobility.In CELIV the device is held at V bi under constant illumination and a negative voltage ramp is applied to study charge carrier mobility while in TPC the device is usually held at J sc and the response of the device to a laser pulse is used to calculate mobility.Generally such measurements will produce values of mobility within an order of magnitude to each other with trends that agree but will not be identical.
Thus it should be noted that when we compare our simulated values to the experimental values we are not comparing identical quantities (as it always is the case in organics).Our simulated values of τ and µ are defined as: where µ free is the charge carrier mobility of completely free carriers, n free is the density of completely free carriers and n trap is the density of trapped carriers.The effective mobility is calculated for each charge carrier specimen separately and an average mobility is calculated by taking the geometric mean: The lifetime τ is calculated by: with n, p total being the total charge carrier density in the device, n 0 the equilibrium free charge carrier density and R the total recombination rate.
Thus some of the error in the graphs may be down to slightly different definitions of mobility and time constant.Further it has been shown that charge carrier mobility results for the same device vary up to one order of magnitude when using different measurement techniques and up to a factor of three when different scientist analyse an identical dataset 37 .
Difference between the ML predictions and experimental measurements are within the expected experimental error.
Finally, in the above examples we used Neural Networks for the machine learning, this is because we found their performance to be more accurate than other more traditional methods.Neural Networks do however require a lot of data and are also relatively slow to train.For comparison figure 8 plots the machine learning results from four other methods these include, k-nearest neighbour regression (KNN), 38 random-forest regression, 39 extremeboosted-gradient-descent regression (XG-Boost) 40 and support-vector regression (SVR) 41 .
The figure plots R2 score (accuracy) v.s.time taken to train for the data set generated for the PM6:DT-Y6 device.The size of the bubble represent the size of the training data set.Data sets of between 5000 and 100,000 devices were used.It can be seen that the XG-Boost algorithm is the fastest but also the worst, SVRs and KNNs have the same level of performance while KNN is slower.The best performing method is the Neural Network, closely followed by the random forest.Each of these algorithms can be optimised, for example the number and size of layers in the Neural Network can be tuned to obtain best performance.However, these results represent our best efforts.

IV. PREDICTING ON DATABASES
The real strength of the machine learning approach is revealed when large sets of data have to be analysed, as it enables material parameters to be extracted that have not directly been measured.Indeed, the devices may have been made and discarded years ago.As a demonstration of our method the ML algorithm was used to predict mobility and trap state density from a set of over 10000 historical JV curves held by Heliatek GmbH, the results can be seen in Figure 9.The original database only contained JV-curves at dark conditions and at 1 Suns light intensity.It can be seen that the model identifies a clear correlation between V oc and charge carrier mobility, as well as a clear correlation between PCE and trap density.
This technique would allow one to data mine these historical data sets and identify devices with optimal charge carrier transport properties that were potentially overlooked in the past.

V. CONCLUSION
Above we demonstrated that one does not need complex time domain/frequency domain measurement techniques to access charge carrier mobilities and recombination time constants.This information is encoded within the far more simple to obtain current voltage curves.One simply needs a relatively low cost computer to extract this information.Furthermore, once trained the machine learning models take a fraction of a second to apply which means devices can be analysed as they are produced.This is important in the academic setting but more important in an industrial setting where tens of devices are produced per day.Furthermore, this approach will allow researchers to scour historical materials for promising candidates that we have skipped over as a community.

Current-voltage characterization.
A Keithley 236 SMU was used for voltage application and current measurement.AM1.5 illumination was provided by a Wavelabs LS-2 solar simulator.No aperture was used.The illumination was kept switched on for two seconds per measurement to prevent the sample temperature from increasing.We measured from reverse bias to forward bias with no fixed sweep speed due to enabled autoranging.Measurements were conducted in a nitrogen-filled glovebox.

Charge-extraction
For the charge extraction measurements, the same white light LEDs as used for the TPV measurements are used to illuminate the device.The device is kept under short circuit conditions and upon switching of the light, the decay of the current density from steady state short circuit to zero dark current is recorded by measuring the voltage drop across a 50Ω resistor connected to the 1MΩ input of an oscilloscope (Tektronix TDS3032B) and converting the voltage to a current transient using Ohm's Law.By integrating the current transient, the total carrier density can be calculated.This is used to calculate the effective mobility as previously described 44 .

Details on training set generation
The device is replicated in the drift diffusion simulation model OghmaNano.20,000 copies with randomly generated device parameters are made.For each copy the JV-curves at the respective light intensities are simulated and saved together with the simulation results like charge carrier mobilities and recombination rates.The range of simulation parameters used by the drift diffusion model is noted in Table I.Each device is simulated at the respective intensities.The Simulated JV-curves get sampled at (-2.0, -1.0, -0.1, 0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.4) V.
Parameter Min Max Units

The Neural Network
The Neural Network consists of an input layer, 4 dense layers with 200, 50 , 50 and 50 neurons respectively and an output layer.The tensorflow hyperband optimisation algorithm was used to do the initial optimisation of the network topology.A full list of hyperparameters used can be found in Table II.We found it easy to access µ and τ using the machine learning model but far harder to access the parameters upon which it depends.This includes trapping related parameters such as Urbach energy shown in Figure 10 c or SRH capture cross-sections as shown in Figure 10 d.This suggests that there is not enough information in the JV curve to independently extract n trap , E U , σ.And one would need other experiments which contain more information to access this information (possibly temperature dependent measurements).Thus we define two types of parameters visible macroscopic parameters which can be extracted using easily the ML and hidden microscopic which although important and their influence can be measured they can not be directly measured themselves.

FIG. 1 :
FIG. 1: a) Device architecture and schematic depiction of transient techniques TPC/TPV and charge extraction ; b) Measured JV-curves from 0.025 Suns to 1 Suns for the device evaporated at room temperature.Inset: The molecular structures of DCV-V-Fu-Ind-Fu-V and C 60 .

FIG. 2 :
FIG. 2: a) Light intensity dependent charge carrier mobility measured using charge carrier extraction for a device deposited at room temperature (blue)/50 • C (red) ; b) Light intensity dependent charge carrier lifetime measured using TPV for a device deposited at room temperature (blue)/50 • C (red) ; c) The µ jsc • τ Voc product calculated from the above curves.In this figure the open triangles represent the experimental measurements and the solid triangles represent the results of the ML.d) JV-curves for devices deposited at room temperature (blue)/50 • C (red); inset shows charge carrier lifetime, closed circles show the predicted lifetime at maximum power point P max .
Generating this data set takes around two hours and provides the basis for training the machine learning algorithm.The advantage of training the machine learning algorithm on virtual data is that most machine learning algorithms are very data hungry requiring thousands of examples to learn.Furthermore, it enables us to know exactly what the recombination rate is at V oc (mobility at J sc ) which would be hard to do experimentally.The next task is to train the machine learning algorithm with the data.This is depicted in Figure 4.For each device in turn the light and dark JV curves are presented to the inputs of the neural network.The network is then asked to predict the values of charge carrier mobility and recombination rate as a function of light intensity on the outputs.At the start of training the model predicts these values quite poorly, however as training progresses and the network sees more examples, the predicted values of µ and τ for each JV curve become closer to the correct values (more details on the training can be found in the SI).Once the network has been trained on all devices, the order of the devices are shuffled and training begins again, this process repeats until the network can correctly predict µ and τ for any

FIG. 3 :
FIG. 3: Creation of the training data set by artificially generating the device with randomly assigned parameters in a drift-diffusion simulation.The dark JV-curves and at 1 Suns as well as recombination rate at V oc and mobility at J sc are simulated and stored.
FIG. 5: a) Device structure; b) Polymers of the active layer c) Device parameters depending on DT-Y6 content.

FIG. 8 :
FIG. 8: Comparison of accuracy and time taken to train Neural Networks, k-nearest neighbour regression (KNN), random-forest regression, extreme-boosted-gradient-descent regression (XG-Boost) and support-vector regression (SVR) on the SN21 data set.It can be seen the Neural Network performs best but is slowest to train.

FIG. 9 :
FIG. 9: Predicted device parameters of a database containing around 10000 devices.The predictions are plotted over the experimentally determined V oc or PCE.The colour code distinguishes planar-and bulk-hetero-junction devices.a) Mobility at V oc b) trap state density for electrons.
Finally we emphasise that experimental data should be seen from an information theory point of view.Maximising entropy by conducting the right combination of experiments will be key to optimise the use of machine learning.VI.SUPPLEMENTARY MATERIALFabrication of PM6:DTY6 devicesMaterials: PM6 (95K) was purchased from Solarmer.DTY6 was provided by Prof. Lei Ying's group at South China University of Technology (SCUT), China.SnO 2 nanoparticles (Product N-31) were received from Avataman.The o-Xylene solvent was purchassed from Sigma-Aldrich.All the materials were used as received without further purification Device fabrication was based on the inverted devices with a configuration of ITO/SnO 2 /PM6:DT-Y6/MoO 3 /Ag.First, the ITO substrates were cleaned in sequence in water, acetone, and Isopropanol, then dried with compress air.SnO 2 NPs were dispersed with ultrasonic treatment for 2 min and then filtered through 0.45 µm Polyamide (PA) filter before use.A 25 nm thickness of SnO 2 film was deposited on the ITO substrates by spin-coating; sequentially, the SnO 2 films were annealed at 200 °C for 30 min in air.Afterwards, active films with various D/A ratios (w/w), including 1:0, 0.85:0.15,0.7:0.3,0.55:0.45,0.3:0.7,0.15:0.85,and 0:1, were spun on the top of glass/ITO/SnO 2 in a nitrogen-filled glove box.For the ratios of 1:0, 0.85:0.15and 0.7:0.3, the total concentration was 9 mg/ml in o-Xylene.For the ratios of 0.55:0.45,0.3:0.7,0.15:0.85,and 0:1, the total concentration was 18 mg/ml in o-Xylene.The thicknesses of all films were controlled by varying the spin speed.All films were annealed at 100°C/10 min in nitrogen atmosphere.Finally, all devices were completed by depositing 10 nm 3 and 100 nm Ag electrode through a mask with an opening area of 0.104 mm 3 under 1 × 10 −6 mbar.Optical measurement of PM6:DTY6For the optical constants, both refractive index n and extinction coefficient k are determined by spectroscopic ellipsometry (ME-L ellipsometer, Wuhan Eoptics Technology Co.).The samples were prepared on Si wafers under the same conditions used for device fabrication without additional post-processing.Spectroscopic ellipsometry measures Ψ (related to the polarized light amplitude) and ∆ (related to the polarized light phase) values, which are associated with the complex Fresnel reflection coefficients r s (for s-wave) and r p (for p-wave):ρ = tan Ψ exp i∆ = r p r sAfter obtaining Ψ and ∆, we used the Cauchy model to fit Ψ and ∆ to determine the thicknesses of thin film samples on Si wafers, and further obtained optical constants of the materials through the fitting of Gaussian model and Tauc-Lorentz model 42 .
Modulated and continuous illumination was provided by an Omicron A350 diode laser with a center wavelength of 515 nm.A Zurich Instruments MFLI lock-in amplifier with MF-IA, MF-MD, and MF-5FM options was used to measure sample current and voltage as well as providing voltage to modulate the laser.The illumination intensity was varied using neutral density filters mounted in a Thorlabs motorized filter wheel FW102C combined with a continuously variable neutral density filter wheel.For IMPS and IMVS measurements, the amplitude of modulated illumination was chosen to be 10% of the bias illumination intensity to ensure small-signal excitation.Laser calibration was performed using a Newport 818-BB-21 biased silicon photodetector.Transient Photovoltage Transient photovoltage measurements are collected on complete devices to characterise the charge carrier lifetime at different charge carrier densities in the device.To achieve that the device is kept under open circuit conditions with an LED bias light (ring of 6 cold and 6 warm white light LEDs) generating a background carrier density in the device.The LED light intensity is calibrated by using the J sc value obtained using the solar simulator.An additional laser pulse (532 nm, 5ns) Continuum Minilite Nd:YAG is used to provide a small voltage perturbation.Subsequently the voltage decays down to the steady state open circuit voltage.This voltage transient is recorded using the 1 MΩ input oscilloscope (Tektronix TDS3032B) and fitted with a single exponential.Finally the small perturbation lifetime is multiplied by the experimentally determined recombination order to yield the full charge carrier lifetime43 .

Feature and target normalisation
For both feature and target normalisation we employed re scaling (min-max-normalisation) as followsx norm = (x − x min ) (x max − x min )The minimum and maximum values are always inferred from the whole training data set before splitting it into test and training set.For features, each light intensity is normalised on its own.The re-scaling projects all values in the simulated data set into the interval [0.0, 1.0].The minimum and maximum of the data set are stored and passed on to ensure the experimental data is re-scaled in a consistent way and the predicted values can be scaled back into their original value space.Training of the modelFor monitoring the training, we use the mean-squared-error as a loss functionM SE = 1 n n i=0 (y i − ŷi )2 by the model.Mobility and recombination are in defined by other more microscopic parameters such as free charge carrier density n f ree , traped charge carrier density n trap , Urbach energies E U and capture cross-sections σ or mathematically put: µ = f (n f ree , n trap , E U , σ n,e , σ p,h , σ n,h , σ p,e ) τ = f (n f ree , n trap , E U , σ n,e , σ p,h , σ n,h , σ p,e )

FIG. 10 :
FIG. 10: Confusion plots of the trained model predictions on the test-set for a) Shunt resistance, b) mobility at j sc c) Urbach energy for electrons and d) SRH capture cross-section for free holes recombining with trapped electrons.

TABLE I :
The range of simulation parameters used by the drift diffusion model: Resistance and mobility were varied for intensity dependent data set.For the prediction on the database, all above parameters were considered.

TABLE II :
Hyperparameters used for the ANN model for light intensity dependent prediction on JV-curves.