Data‐Driven Cz–Si Scale‐Up under Conditions of Partial Similarity

In Cz–Si growth, the shape of the solid–liquid interface and the v/G ratio significantly impact crystal quality. This study utilizes a data‐driven approach, employing multilayer perceptron (MLP) neural networks and Bayesian optimization, to investigate the scale‐up process of Cz–Si under conditions of partial similarity. The focus is on exploring the influence of various process and furnace geometry parameters, as well as radiation shield material properties, on the critical measures of crystal quality. Axisymmetric CFD modeling produces 340 sets of 18D raw data, from which 14‐dimensionless derived data tuples are generated for the design and training of the MLP. The best MLP obtained demonstrates the ability to accurately assess the complex nonlinear dependencies among dimensionless numbers derived from CFD data and, on the output side, interface deflection and v/G. These relationships, crucial for scale‐up, are successfully generalized across a wide range of parameters.


Introduction
Scale-up is the process of transitioning from small-scale operations to commercial facilities, with the goal of maintaining physical similarity between the two systems and facilitating the transfer of small-scale results to the large scale without constraints.This is accomplished by employing the similitude principle, which involves keeping the dimensionless numbers that characterize transport phenomena constant during the scalingup process. [1]hese numbers (e.g., Reynolds Re, Grashof Gr, Schmidt Sc, Prandtl Pr, etc.) or their combinations (e.g., 1/Re⋅Pr, 1/Re⋅Sc, DOI: 10.1002/crat.202300342Gr/Re 2 , etc.) stem from the dimensionless form of the governing conservation equations for the transport phenomena occurring during crystal growth. [2]o establish similitude between crystal growth processes on two different scales, the following criteria must be satisfied: 1) geometric similarity: the model should have the same shape as the real application, typically scaled accordingly; 2) kinematic similarity: the flow patterns of the fluid should closely resemble each other; 3) dynamic similarity: the ratios of all the forces acting on corresponding fluid particles and boundary surfaces in the two systems must remain constant and 4) identical physical processes must be present at both scales.
In complex processes like Czochralski (Cz) crystal growth, adhering to similarity rules can be challenging due to several factors: 1) contradictory conditions: some conditions may conflict with each other, making it difficult to establish strict similarity; 2) material properties: real materials might not possess the same properties as the model materials used at a smaller scale (e.g., comparing salts to silicon); 3) phenomena differentiation: distinguishing between different phenomena, such as melt flow generated by crystal rotation, crucible rotation, or buoyancy, can be complex; 4) geometry variability: the geometry may only exhibit partial similarity; factors like the relative positions, material properties, shape, and aspect ratio of heaters, heating shields, and crucibles may vary, as depicted in Figure 1.
The following methodology, proposed in refs.[3-5], is aimed at investigating cases of partial geometrical similarity and uncertain dynamic similarity [3] in chemical engineering applications: 1) the processes are broken down into individual parts, which are then examined separately, 2) certain similarity criteria are deliberately abandoned and the subsequent effects on the overall process are checked, or 3) characteristic variables are defined differently.Unfortunately, the proposed methodology is not applicable to Cz growth, primarily due to the excessively strong correlation among the process parameters.
The aim of this study was to investigate the scale-up process under conditions of partial similarity and to examine the implications of such an approach the crystal quality focusing on Cz-Si growth as a case study.Cz-Si growth was selected due to its significance in modern electronics and high-performance photovoltaics, where increasing ingot size with substantial scale ratios while maintaining the crystal quality poses an ongoing challenge. [6]While there is no strict limit to the size of Cz-Si ingots, larger ingots require more energy and time to grow, are more prone to defects due to enhanced thermo-mechanical stresses and their handling is complex.
The conventional approach to scale-up in crystal growth involves a combination of computational fluid dynamics (CFD) simulations and model experiments conducted on a small scale, along with experiments that closely resemble real industrial conditions on a pre-industrial scale.Unfortunately, this approach is labor-intensive, costly, and time-consuming, and it lacks the ability to yield generalized findings.Consequently, it took 40 years to increase the silicon crystal diameter from 1 to 12 in. [7,8]n this study, the complex nonlinear relationship between crystal growth process parameters, furnace geometry, the s/l interface shape (Figure 2b), and the Voronkov criterion v/G [9,10] for defect-free silicon crystal was derived using artificial neural networks (ANN) and dimensionless numbers.For v/G values greater than 1.34 × 10 −3 cm 2 K −1 min −1 , the silicon crystal is vacancy-rich, while for v/G less than 1.34 The influence of the mentioned individual parameters on Cz-Si crystal quality has been the focus of extensive experimental and numerical research in the past, e.g.. [7,8, Typically, these studies have centered on assessing the effects of the parameter under investigation within the context of specific crystal and/or crucible sizes.
Figure 2. Key geometric and process parameters in Cz-Si growth related to: a) furnace hot zone (R cry -crystal radius, R cru -crucible radius, R SH -radial distance of the side heater to the crucible, H cry -crystal height, H SH -axial displacement of the side heater from the melt free surface, H shield -axial displacement of the radiation shield from the melt free surface, H m -melt height in the axis of symmetry, H BH -axial displacement of the bottom heater from the melt free surface, v pull -pulling velocity,  cry -crystal rotational rate,  cru -crucible rotational rate), b) interface deflection Δz.
For instance, Tsukada et al. [35][36][37] conducted a study to investigate how the radiation shield influenced stress within the crystal at various growth stages and pulling velocities, while maintaining a fixed crystal diameter of 0.406 m.Similarly, Kishida et al. [28] explored the impact of crucible rotational rates on the melt flow pattern while keeping both the crystal and crucible diameters fixed at 5 and 16 in., respectively.Furthermore, Friedrich et al. [26] examined the influence of pulling velocity on crystal quality in largescale processes, using crystal diameters of 210 mm and crucibles of 24 and 26 in.Derby and Brown [31] studied the effect of melt volume and crucible positioning relative to the heater on the crystal's radius and the shape of the melt/solid interface on a small scale.These simulations were conducted with a fixed crucible diameter of 14 cm and a pulling velocity of 5 cm h −1 .
In most recent large-scale Cz-Si growth studies, the impact of geometry and process parameters was explored in the presence of magnetic fields, [29,30] which lie outside the scope of this paper.
Despite a substantial body of research dedicated to Cz-Si growth, a comprehensive evaluation of the simultaneous influence of various factors affecting the process and equipment scaling-up is still lacking.
In our study, we opted for a data-driven approach using artificial neural networks (ANN) due to their demonstrated predictive accuracy and generalization capabilities, as shown in ref. [38].The numerical training datasets were generated through axisymmetric CFD simulations for Si ingot sizes ranging from 1 to 12 in.

Numerical Model and Methodology
The general methodology for a data-driven approach to scaling under partial similarity conditions comprises the following steps: 1) data generation utilizing CFD modeling; 2) extraction of data sets from CFD results, which encompass geometry data (D cru , D cry , H cry , H m , H SH , H BH , H shield , R SH (see Figure 2a), material data for the radiation shield at silicon melting temperature ( shield -heat conductivity of the radiation shield,  shieldemissivity of the radiation shield), process data (q BH -power of the bottom heater, q SH -power of the side heater,  cry ,  cru , v pull ), interface deflection along the crystal central axis (Δz) and the v/G criterion for defect-free crystal; 3) preparations of dimensionless numbers for fluid flow, heat transfer, furnace geometry, material properties, relative interface deflections Δz/R cry and relative Voronkov criteria (v/G)/1.34× 10 −3 cm 2 K min −1 ; 4) derivation of ANN to map relationships between the data; 5) comparison of various ANNs and selection of the best one using the Bayesian optimization method [39] ; 6) Implementation of the best selected ANN to create relationships between the dimensionless numbers for process and furnace design data in comparison to crystal quality measures (interface deflection and v/G) in a wide range.
This methodology outlines the systematic approach employed to analyze the relationships between various factors in the context of crystal growth and scaling.

CFD Model and Methodology
The training data for machine learning was generated through numerical axisymmetric quasi-steady-state CFD simulations of Cz-Si growth in a furnace equipped with a single side and bottom resistance graphite heater.The furnace also featured a radiation shield made of one of five different materials: graphite, quartz, SiC, TiC and ceramic, along with a quartz crucible.A schematic illustration of this set-up for Cz-Si growth in an Ar atmosphere is depicted in Figure 1.In total, we simulated 340 unique growth recipes across various furnace geometries.The following parameters were varied: crystal and crucible diameters, axial and radial position of the side heater, axial position of the bottom heater, power of heaters, pulling velocity, crystal and crucible rotation rates, material of the radiation shield, shield-to-melt distance, Si weight and fraction of Si crystalized.The parameter ranges are presented in Figure 3 in the form of parallel coordinates.
The transport phenomena occurring during Cz-Si growth are governed by the equations of continuity, Navier Stokes with the Boussinesq approximation, and energy balance.A detailed description of these equations is provided elsewhere. [2]he dimensional forms of the continuity, momentum, and heat transport equations in the melt, along with the corresponding thermal boundary conditions at the solid/liquid (s/l) interface, are provided in Equations 1-5.
The boundary condition for the momentum equation is defined as follows: the velocity component perpendicular to the s/l interface is zero, the velocity component parallel to the melt-solid interfaces equals that of the solid (crystal, i.e., the crucible wall), and the velocity component perpendicular to the melt free surface is also zero.
The dimensionless form of the continuity, momentum, and heat transport equations is derived in Equations 6-8 using the definitions of the dimensionless variables given in Equations 9-15 and the characteristic dimensionless numbers outlined in Equations 16-19.The notation is provided in Figure 2, apart from q SH (the power of the side heater) and q BH (the power of the bottom heater).
The results from the CFD simulations were presented as the central interface deflection and the ratio v/G along the symmetry axis.Interface deflection was measured relative to the threephase junction and varied from concave (Δz < 0) to convex (Δz > 0) shapes, as illustrated in Figure 2b.The findings from the CFD simulation were discussed with respect to dimensionless numbers, i.e., Reynolds number (Re), Grashof number (Gr) and Stefan number (Ste).
For modeling Cz-Si growth, the commercial code CGSim was used, and quasi steady-state axisymmetric simulations were conducted stepwise: 1) calculation of transport phenomena with a fixed s/l interface while adjusting the power of heaters; 2) calculation of transport phenomena with a calculation of the interface shape.

Machine Learning Methodology
A multilayer perceptron (MLP) type of feedforward ANN was chosen to approximate complex, nonlinear functions with 12 input and 2 output variables.The MLP comprises fully connected layers of neurons with a nonlinear activation function (sigmoid function in this study), divided into input, output, and hidden layers. [40]Prior to MLP training, 340 raw training data tuples originating from CFD-simulations in the form of 18D vectors (x1 … x16, y1,y2) were preprocessed, and the input variables were transformed from 16D to 12D and normalized between 0 and 1.As the data for the Re, Gr, and Ste numbers spanned many orders of magnitude, they were logarithmically transformed before normalization.Normalization ensures that all inputs have similar scales.The reduction in the number of inputs resulted from replacing direct crystal growth process variables with dimensionless numbers such as Re, Gr etc.
To efficiently and automatically determine the optimal MLP architecture, including the number of hidden layers and neurons in each layer, Bayesian optimization (BO) was applied.BO is a global optimization approach based on Bayesian statistics, particularly suitable for optimization of non-smooth functions. [39]sing the BO of a 10-fold cross validation MSE loss, the best architecture for the hidden layers was identified.For this study, we limited the maximum number of BO iterations to 500 and the maximum number of hidden layers to 14.
The fully connected MLP architectures searched by the BO included two kinds of combinations of hidden layers, each with a minimum of 2 neurons.The first kind featured a reduction in the number of neurons from the input layer to some latent layer, followed by an increase from that latent layer to the output layer.The second kind exhibited a non-decreasing or non-increasing trend from the input to the output layer.These configurations were constrained to ensure that the total number of unknown parameters, i.e., weights and biases of the network, did not exceed a predefined maximum.This maximum was determined by the amount of training data used in a 10-fold cross-validation, with 10% of the data allocated for early stopping, constituting 81% of the total number of inputs/outputs datasets.Early stopping is a technique used in training ANNs to prevent overfitting, particularly when dealing with small datasets and complex architectures. [41]verfitting occurs when an MLP performs outstandingly well on the training data but performs poorly on unseen or validation data.Early stopping helps in mitigating the overfitting problem by monitoring the network's performance on validation data during training and stopping it before it starts to overfit on them.

CFD Results and Training Database
The CFD simulations yielded 340 raw datasets, each represented as 18D vectors, comprising 16 CFD input parameters and 2 output parameters.Figure 4 illustrates all these raw data and their ranges using parallel coordinates.Each line in the diagram corresponds to a single data tuple (x1 … x16, y1, y2).This comprehensive raw database served as the foundation for machine learning training and analysis.
Visual inspection of the results reveals a significant influence of the pulling velocity, as confirmed by experimental investigations and our previous findings for Cz-Ge. [42]Logically, a higher pulling velocity corresponds to a higher v/G value.Conversely, as the crystal length increases (measured as the percentage crystallized), the v/G value decreases.To deepen our understanding of the relationships among the studied input and output variables, we selected several CFD results and created 2D plots that showcase all outputs in relation to one of the furnace geometric parameters (Figure 4), material properties of the radiation shield (Figure 5), and process parameters (Figure 6).These plots were generated while maintaining the remaining input parameters at a constant value.
The selected results highlighted that the geometric parameter H BH , the material properties of the radiation shield  and , pulling velocity v pull , and melt height H m are the most influential factors within the studied data range, contributing to an increase in v/G.Conversely, an increase in the geometry parameters H BH , H SH , and H m leads to a decrease in the value of the interface deflection ∆z.The sign of ∆z is defined in Figure 2b.
These results are consistent with the findings in the literature, e.g., they are in line with, [11,12] which explored the relationship between rotational rates, interface deflections, and temperature gradients in the crystal, as well as with, [17,24,32] which investigated  how radiation shield material affects interface deflection.Furthermore, the results align with the work of, [26] which explored the impact of pulling velocity on interface deflection.
The relationships between the raw data described above were established by selecting specific data pairs within a limited range.
To offer a more comprehensive approach applicable to a broader range of data and encompassing all variables, machine learning was utilized.
For the number of hidden layers ranging from 1 to 14, the corresponding counts of available MLP architectures, each characterized by a specific number of hidden layers and the number of neurons in each, as determined by the methodology outlined in Section 2.2-were as follows: 11, 58, 164, 325, 506, 670, 791, 868, 892, 871, 822, 754, 674, and 594, respectively.
These exactly 8000 combinations either featured a decreasingthen-increasing pattern in the number of neurons from the input layer to an intermediate layer and then to the output layer, or exhibited a non-decreasing or non-increasing trend from the input to the output layer.
To accelerate the search for the optimal MLP architecture among those 8000 available options, BO was employed.The five best MLP architectures and their corresponding MSE obtained after 500 iterations are given in Table 1.The lowest mean squared error (MSE) of 0.00525 was achieved for the MLP with 2 hidden layers, each containing 4 neurons (referred to as [4,4]).
The best-performing MLP was utilized to predict outputs for given inputs, and the results are visualized in Figure 7 as a parity plot, comparing datasets predicted by the top-performing MLP with those simulated using CFD.The precise prediction of two output variables based on twelve input variables, as depicted in Figure 7, allowed us to explore the scale-up of Cz-Si growth under conditions of partial similarity across a wide data range.

Scale up of Cz-Si Growth under Conditions of Partial Similarity
The motivation behind deriving the MLP neural network was to utilize it as a surrogate function for the fast and accurate prediction of the simultaneous influence of 12 parameters affecting the process and equipment scaling-up in Cz-Si growth under conditions of partial similarity.The selected results are presented in Figures 8-10.
The impact of growth process parameters, such as pulling velocity, crystal rotational rate, and heating power, which are inherent in Gr/Re cry 2 and Ste numbers, on the interface deflection and v/G is depicted in Figure 8.In this study, instead of using singular dimensionless numbers Gr and Re, we focused on the combined dimensionless Richardson number (Ri = Gr/Re 2 ).In Czochralski growth, both natural and forced convection coexist, and Ri signifies their ratio.The literature suggests that natural convection is negligible for Gr/Re 2 < 0.1, forced convection is negligible for Gr/Re 2 > 10, and neither is negligible when 0.1 < Gr/Re 2 < 10. [43] Our results highlight that for large Gr/Re 2 values (dominating natural convection), an increase in the Ste number (e.g., equivalent to a decrease in pulling velocity) leads to a decrease in Δz, indicating a downwards shift in the interface to negative Δz values.
In this scenario, less latent heat is generated at the solid/liquid interface, benefiting the interface shape.
For medium and small Gr/Re 2 values (dominating forced or mixed convection), an increase in Ste leads to an inflection point in Δz.This is a consequence of competing convections that may introduce instabilities in the interface shape.
Regarding v/G, for medium and large Ste values, both Gr/Re 2 and Ste have a very weak influence.Conversely, for small Ste values (equivalent to high pulling and growth velocities), a decrease in the Ste number leads to a strong increase in v/G, while a The influence of radiation shield material properties on interface deflection and v/G is illustrated in Figure 9.For both Δz and v/G, the emissivity of the radiation shield  shield is more impactful than its heat conductivity  shield .An increase in both  shield and  shield leads to a nearly linear increase in Δz and v/G across the entire studied range.These findings align with the well-known fact that radiation, associated with emissivity, becomes the predominant heat transport mechanism at temperatures above 600 °C.
The impact of specific furnace geometry parameters, such as the relative axial position of the side heater (H SH /H m ) and the radiation shield-to-melt distance (H shield /H cry ) on interface deflection and v/G is depicted in Figure 10.
For low H shield /H cry values (where the radiation shield is closer to the melt-free surface), an increase in H SH /H m values results in a substantial increase in Δz (i.e., the interface is shifted upwards) and a significant decrease in v/G.In contrast, for high H shield /H cry values, an increase in the H SH /H m number results in a moderate increase in Δz and a decrease in v/G.The combination of moderate H shield /H cry and small H SH /H m values leads to an inflection point in both Δz and v/G.5][36][37] Following the presented methodology based on the derived MLP network, it is easily possible to generate new data to investigate the influence of any combination of the 12 studied process and furnace design parameters on interface deflection and v/G.

Conclusion
MLP feedforward artificial neural networks, in conjunction with BO, were selected as data-driven methods to explore the scaleup process of Cz-Si under conditions of partial similarity.Using BO for the 10-fold cross-validation MSE loss, MLP was trained, and the best architecture was selected based on predefined constraints.
Numerical training datasets were generated through 340 axisymmetric CFD simulations for Si ingot sizes ranging from 1 to 12 inches.The MLP architecture with 2 hidden layers and 4 neurons in each accurately captured the complex nonlinear relationship among 12 input parameters, encompassing process and furnace geometry parameters, as well as radiation shield material properties, along with 2 output parameters: Δz and v/G.
Analyzing the relationships among these variables revealed a strong dependence on the parameter range, highlighting the limitations of the traditional approach in assessing parameter  effects within the context of a specific crystal and/or crucible size.Generalization, a crucial aspect of the employed data-driven approach, enabled the study of diverse scale-up scenarios across a wide range of parameters without similarity constrains.The results of this study offer a deeper understanding of the scale-up process, aiding in future developments and process optimization efforts.

Figure 1 .
Figure 1.Partial similarity of Cz-Si furnaces at different scales.

Figure 3 .
Figure 3.All 340 raw CFD results derived for various crystal growth process parameters and furnace designs, are displayed in a parallel coordinate system.The line color matches the value of v/G.The notation is provided in Figure2, apart from q SH (the power of the side heater) and q BH (the power of the bottom heater).

Figure 7 .
Figure 7. Parity plots for 340 normalized datasets obtained using the bestperforming MLP, compared to data obtained through CFD simulations for outputs: o 1 (normalized Δz/R cry ) and o 2 (normalized ratio of v/G to critical value v/G c ).

Table 1 .
Mean square error (MSE) of five best MLPs with various number of hidden layers and neurons resulted from 10-fold cross validation after 500 iterations of BO.