Development of the VGF Crystal Growth Recipe: Intelligent Solutions of Ill‐Posed Inverse Problems using Images and Numerical Data

Development of the Vertical Growth Freeze crystal growth process is a typical example of solving the ill‐posed inverse problem, which violates one or more of Hadamard's well‐posedness criteria of solution existence, uniqueness, and stability. In this study, different data‐driven approaches are used to solve inverse problems: Reduced Order Modelling method of Proper Orthogonal Decomposition with Inverse Distance weighting (ROM POD InvD), an approximation method of Kriging and Artificial Neural Networks (ANN) employing images, combination of images and numerical data and solely numerical data, respectively. The ≈200 training data are generated by Computational Fluid Dynamics (CFD) simulations of the forward problem. Numerical input data are related to the temperatures and coordinates in 10 characteristic monitoring points in the melt and crystal, while the image input data are related to the interface shape and position. Using the random mean squared error as a criterion, the Kriging method based on images and numerical data and the ANN method based on numerical data are able to capture the system behavior more accurately, in contrast to the ROM POD InvD method, which is based solely on images.


Introduction
The Vertical Growth Freeze (VGF) crystal growth process involves progressive upward freezing of a melt by shifting the axial DOI: 10.1002/crat.202300125temperature gradient in a furnace by changing the heating power over time.
The optimal growth recipe ensures a planar crystallization front and high growth rates.The numerical development of the growth process is a typical example of solving the ill-posed inverse problem and is therefore a challenging task.Ill-posed inverse (PDE) problems have been studied for a long time from an analytical and numerical point of view. [1,2]They include all problems that violate one or more of Hadamard's wellposedness criteria: solution existence, uniqueness, and stability (low sensitivity to input data changes). [3]The common optimization approach for process development implemented in commercial crystal growth software is based on the "inverse modelling" strategy, [4][5][6] which inherits Tikhonov regularization. [7]owever, this approach is limited to small number of specific independent parameters (power of heaters) and optimization parameters (temperature in monitoring points).Similar approaches proved to be unstable and costly. [7]he recent tremendous success of various data-driven applications in solid-state materials science and crystal growth has stimulated the present search for better solutions to inverse problem based on machine learning (ML). [8,9]The selection of the ML method depends on many factors and there is no ML method that fulfills all requirements at the same time.The choice depends mainly on the required accuracy and/or interpretability of the output, the number of outliers and missing data, the total number of data, the type of data (images, numeric data etc.), the ML goal (data classification, regression, clustering etc.), training time, data non-linearity and time dependence, etc.For regression task with high required accuracy, both for images and numeric data in the quasi-steady state, the following methods are promising: 1) Reduced Order Modelling method of Proper Orthogonal Decomposition (ROM POD), [10] 2) interpolation method of Kriging, [11] and 3) Artificial Neural Networks (ANN) of multilayer perceptron (MLP) type. [12]The first two methods were identified after a preliminary analysis comparing different prediction solvers for reduced models, using the coefficient of determination as the criterion for the ROM methods and Euclidean distance as the criterion for the pure interpolation methods.
The ANN was chosen because of its superior ability to pointwise approximate smooth nonlinear functions to any degree of accuracy.
The ROM POD is a technique for reducing the model complexity by reducing a large number of interdependent variables to a much smaller number of uncorrelated variables (analogue to Principal Component Analysis) while retaining as much as possible of the variation in the original variables.The interpolation method of Kriging predicts the value of a function at a given point by computing a weighted average of the known values of the function in the vicinity of the point.The MLP is the most popular feed forward ANN type and machine learning method in general that is used to estimate unknown functions that can depend on a large number of inputs.It is characterized by a high level of prediction accuracy, but also by the large amount of training data required.The above-mentioned approaches were assessed for different types of training data.
Two data types are common in VGF growth: image and numerical data.Images of interface position and deflection can be obtained from the characterization of grown crystals.For the determination of solid/liquid (s/l) interface shapes in VGF crystals, the Lateral-Photovoltage-Scanning-Method (LPS) [13] and Infrared transmission (IRT) analysis are common post-growth characterization methods.The LPS has been originally developed for the analysis of Si, SiGe, and Ge crystals but can be adjusted for the analysis of GaAs crystals. [14]Numerical data can be obtained from experimental measurements of process data or from the CFD simulations.Each data type has its own advantages and challenges.Numerical data from CFD simulations may have inherited biases due to unavailable or inaccurate material data, oversimplification of the furnace geometry, transport phenomena, and boundary conditions in the CFD model etc., [15] but have the important advantage that they can be generated for many different growth recipes.On the other hand, experimental data may have moderate data veracity and variety due to furnace ageing, and challengeable in situ in operando measurements in the aggressive environment with high purity requirements.Nevertheless, experimental observations are still the most convincing measure of the reliability of numerical predictions.
In our study, we used ≈200 numeric data sets and ≈200 images generated by axisymmetric quasi-steady-state CFD simulations of the forward problem.The ROM POD method was used for image input data, the Kriging interpolation method for both image and numerical data, while ANN was used for numerical data only.
Our data-driven methods were employed for the reconstruction of the growth recipe, i.e., for providing the heating power and growth rate values at different stages of the growth process (different s/l interface positions) for a given temperature field in melt and crystal in VGF-GaAs growth.
The VGF-GaAs process was chosen due to the recent strong revival of interest in the III-V crystals as the key future materials for the next generation of wireless communications, i.e., for the microelectronic and optoelectronic devices in 5G and 6G technologies. [16]ros and cons of the data-driven approaches will be discussed, and results will be compared.

Data Generation and Pre-Processing
A simplified geometry of the 4 inches VGF-GaAs furnace with five graphite resistance heaters is shown in Figure 1.The pyrolytic boron nitride (pBN) crucible contains 9 kg of GaAs charge covered with liquid boron oxide.The crystals were grown in Ar atmosphere.The heating power was varied in the range of 0-4 kW, and the growth rates were in the range of 0.1-5.4mm h −1 .Due to the rotational symmetry, the furnace was described by a 2D axisymmetric model.The ≈200 CFD simulations of the forward problem were performed using Crysmas commercial code, 130 of which were already used in our previous study. [17]orward simulations provided the values of temperature field in the furnace (Figure 2a) at different stages of the growth process (different s/l interface positions and deflection) for given heating power and growth rate.The interface deflection Δz was measured at the melt symmetry axis with respect to the three-phase junction (melt/crystal/crucible) and varied between concave (Δz > 0) and slightly convex (Δz < 0).
The data for the training database for inverse problems was extracted from these CFD results in the form of numerical and image data.Numerical input data were related to the temperatures and coordinates in 10 characteristic monitoring points in the melt and crystal (Figure 2c), while the image input data (Figure 2d,e) served as a surrogate for unavailable experimental images (e.g., LPS and/or IRT) and were related to the interface shape and position.All training image data were pre-processed in the form of black-and-white areas (Figure 3) and lines (Figure 4).
For image pre-processing, i.e., for extraction of the information on images, [18] we used two methods: Grey feature indicator function and Pixel number and the commercial software Odyssee A-Eye from Hexagon.The Grey feature indicator function converts the images into grey shades returning N features that can be interpreted as intensities of the grey levels.The best results were obtained for N = 10.The Pixel number counts the number   of pixels present in a specified min-max range which represents a color on the RGB scale.In our study, we set min = 0 and max = 1 to count the black pixels.
Numerical output data for inverse tasks were related to the growth rate y 1 and power of heaters y 2 -y 5 (Figure 1).

Data-Driven Methods
As mentioned before ROM-POD method was used for prediction of growth recipes with only images as input data.It included In- For prediction of growth recipes with images and numeric input data, the interpolation by means of Gaussian process (GP) regression (a.k.a.Kriging) was used.It allows to predict the value of an underlying random function Z = Z(x) at any arbitrary location of interest x 0 , i.e., the value Z(x 0 ), from the measured observations z(x i ) of Z(x) at the n ∈ N sample points x i .Due to the underlying GP, near-sample points get more weight in the prediction.Moreover, the GP provides a measure of uncertainty of the prediction in every point and considers the correlation among data, no matter whether that correlation is isotropic or anisotropic.In this study, the simulations were performed for quadratic basic function and stationarity h2.
For predictions using 10 numerical input data and 6 output data, the MLP networks were used.Before an MLP network can be trained, it is necessary to derive its architecture (i.e., the number of neurons and the number of hidden layers), the nonlinear activation function, and the learning method.If there are no additional restrictions on the network parameters, i.e., unknown connection weights and biases between neurons, then the number of those parameters should not be greater than the number of training data in order to provide their identifiability.However, this number is not always sufficient.The higher the risk that the network loses approximation capability, the closer the number of data is to the number of parameters.
If we want to use only 80% of the available data for training, then the above rule for the number of network parameters implies the following maximal numbers of hidden neurons: where N I is the number of inputs (10 in our study), N O is the number of outputs (6 in our case), N is the total number of datasets (200 in our case), N H1 is the number of neurons in the first hidden layer, and N H2 is the number of neurons in the second hidden layer.Finally, different architectures with 1 and 2 hidden layers were selected and compared using the 10-fold cross-validation method, [19] the sigmoid activation function, and the Levenberg-Marquardt training algorithm [20] with training to validation data ratio 9:1.In the next step, the number of hidden layers and hid-den neurons in each of them that had the smallest Root Mean Squared Error (RMSE) averaged over the validation sets from the 10 cross-validated networks were selected and used to train all of the data.
The RMSE is defined in Equation 3: where y p,i and y i are predicted and true values of outputs.
The MLP training and predictions were implemented using the commercial software, Matlab.
The motivation for using ANNs in this inverse task was fast and accurate prediction of the optimal growth recipe, i.e., the power of heaters and the growth rate, by setting desired  temperatures in monitoring points in GaAs and zero interface deflection during the growth.In this study, we chose the setvalues of temperatures in GaAs from the typical values of temperature gradients in the crystal and the melt in industrial processes, which are in the range of 15 K cm −1 and 2-10 K cm −1 , respectively. [21]n addition to the RMSE values, the results were also discussed with regard to the inputs (features) importance determined using the Random Forest (RF) method.
The RF is a supervised machine learning algorithm consisting of multitude of decision trees (DT). [22]Each DT has a tree structure, starting from the topmost root node with the initial dataset, undergoing recursive binary splitting down in each node until the terminal node is reached that either is in a given maximal depth or has at most a given minimum size.The splits are based on the sum of squared errors with respect to the means of the variable in the two subsets forming the split.From all possible splits of the dataset above the branch, a split is chosen that leads to the minimal sum of squared errors.Feature Importance is the sum of the RMSE reductions across all those interior nodes in which this input variable resulted in the highest RMSE reduction in the node.

Results and Discussions
The predictions of the growth recipe for different input data types (images, numerical, or both) using three different data-driven approaches (ROM-POD InvD, Kriging, and ANN) are presented in the form of parity plots for true and predicted values.
When only images were considered as training data, either BW or L type (see Figure 2d,e), the ROM-POD InvD method was unable to capture the system behavior as shown in Figure 5.Nevertheless, BW images showed better predictions (RMSE BW = 0.53762 vs. RMSE L = 0.892).Also, the correct trend was recognizable for the power of outer top and upper side heaters y 3 and y 4 .In this study, the accuracy of the predictions cannot be substantially improved by using both image types at the same time as they contain the same information.One of the reasons for the low predictive ability of ROM-POD InvD method may lie in the non-singularity of the solution, i.e., in the general experimental observation that the same interface shape and position can be achieved with different growth recipes.In contrast to the other two approaches, ROM-POD InvD did not have the temperature inputs, i.e., it knew only two instead of ten parameters.
For verification of the predictions of the inverse tasks and also testing the hypothesis of non-singularity of the solution, we used the predictions of inverse task y 1 -y 6 as inputs for the new forward task x 1,f -x 6,f and performed CFD simulations again.The resulting temperature distributions in GaAs provided new images of interface shape that were finally compared with the training images of the inverse tasks.Two cases/datasets A and B, shown in Table 1, were used for illustration purposes.
In case A, ROM-POD InvD method predicted similar growth recipe to the one that was used for generation of BW images (Table 1).This similar growth recipe ensures a similar interface position and a similar interface shape (Figure 6a,b).Still, a small difference of 4 K in maximal temperature in the furnace was observed.On the other hand, in case B (Table 1), ROM-POD InvD predicted different growth recipe that ensured a similar interface position y 1,f (Figure 6c,d), but with pronounced difference in interface deflection y 2,f .Also, a difference of 24 K in maximal Figure 8. Relative importance of inputs in forward problem obtained by Random Forest (RF) method.Notation: x 1,f -crystal growth rate, x 2,fheating power in inner top heater, x 3,f -heating power in outer top heater, x 4,f -heating power in upper side heater, x 5,f -heating power in lower side heater, x 6,f -heating power in bottom heater, y 2,f -interface deflection and y 3,f -temperature at GaAs free surface rim.The RF hyper parameters: n estimators = 50, minimal samples split = 2, minimal samples leaf = 1, maximal depth = 20, bootstrap = true.temperature in the furnace was observed.In view of this, the low prediction accuracy when using only images as input data is not only a consequence of the non-singularity of the solution but also of the nature of this data-driven method, the number and location of the input data in the parameter space, as reported in the literature. [23]hen considering the Kriging interpolation model with both images and numerical data for solving inverse tasks, the system behavior was fully captured as shown in the parity plots for 6 outputs in Figure 7.
The prediction accuracy was very high for all outputs except for the lower values of the growth rate y 1 .The reason may lie in the physics of the problem.During the VGF growth, heat is provided by the surrounding heaters and as a latent heat on the crystallization front.The faster the growth, i.e., the higher the value of y 1 , the more latent heat is generated, and its relative contribution with respect to the heating power of other heaters y 2 -y 6 is increasing.Still, the cumulative contribution of all other heaters is higher than the contribution of the y 1 .This conclusion can be supported by the estimation of the relative importance of inputs in forward problem using Random Forest method.The results are shown in Figure 8.
Consequently, at the lower values of the crystal growth rate y 1 , the physical system is less sensitive to latent heat changes, and the predictive accuracy of data-driven methods decreases.
Finally, when considering the ANN method with numerical data for solving inverse tasks, the system behavior was again fully captured, as shown in the parity plots for all 200 data for predicted versus true output values y i , i = 1,6 in Figure 9.
The most accurate predictions with the lowest RMSE value were obtained for the architecture 10-7-7-6 shown in Figure 10.
Very accurate predictions were obtained for all outputs except y 1, where moderate accuracy was observed by lower growth rates.The explanation of this behavior may lie in the physics of the problem, as discussed in the case of the Kriging method.
The motivation for derivation of ANN was to use it for fast and accurate prediction of the optimal growth recipe, i.e., power of heaters and the growth rate by setting desired temperatures in monitoring points in GaAs and zero interface deflection during the growth.Some temperature combinations were not feasible as input values because they resulted in negative predicted growth rates.In order to avoid the negative values of the growth rates y 1 , it is necessary to detect the range of input parameters x 1 -x 6 where they may occur.For this purpose, the following approach was taken.For each interface position x 1 , one of the corresponding cases out of 200 training data was selected.From this selected training data set, a reference case was derived by "flattening" the isotherms in the GaAs, i.e., by assuming that x 2 = 0, x 3 = x 9 , x 4 = x 7 , x 5 = x 6, and x 8 = x 10 .Further on, the temperature in one axial position/monitoring point was varied while keeping the other temperatures equal to the reference case.One example of the study of the dependences among variables was shown in Figure 11.From the obtained ANN predictions, it can be seen that increasing x 5 and x 6 strongly decreases the growth rate y1, while increasing x 3 and x 9 moderately decreases y 1 .Moreover, if x 5 = x 6 >1370 K, the growth rates reach negative values, and these input values should be avoided.
On the other hand, lower values of x 3 , x 9 , x 5 , x 6 and higher values of x 4 and x 7 accelerate crystal growth and are therefore advantageous.
Strictly speaking, a direct comparison of Kriging and ANN methods for solving inverse tasks is not feasible in this study because Kriging was trained on combined images and numeric data, while ANN was trained on numeric data only.Images provided more information about the interface shape than the numerical value of the interface deflection in the symmetry axis.With images, it was possible, e.g., to distinguish cases with "w", convex, and concave shapes.Nevertheless, some conclusions can be drawn with respect to the accuracy of the predictions.
Hence concerning the accuracy, both methods gave similar results and were able to predict the missing data with a similar level of confidence using small data volume.The corresponding error values were RMSE kri = 0.04255 and RMSE ANN = 0.04769, respectively.Both methods had a large variance in the region of lower growth rates.Concerning future applications, Kriging trained on experimentally obtained images from crystal characterization and small number of numeric data from CFD simulations, offers higher reliability of numerical predictions for a low cost.When very high accuracy is required based on solely CFD numeric data, the ANNs are recommended for inverse task as they can significantly improve their predictions, especially for the growth rate, when more numerical data points (i.e., variables) are placed in the radial direction in the melt.Unfortunately, more input variables require more training data to ensure model identifiability that is expensive.

Conclusion
The ROM POD InvD method using s/l interface shape images as input data to solve inverse problems in VGF-crystal growth is fast and requires small number of training data.Unfortunately, this method using only images was not able to capture the system behavior.Still, for some predicted variables correct trends were recognized.One of the reasons for the low prediction accuracy was the non-singularity of the solution when no information about the system is available outside of the s/l interface, e.g., no temperatures at different points in the melt and crystal.If Kriging interpolation model with both images and numerical data is considered, the method gives quick and accurate predictions using small amount of data.Potential usage of the experimental images can significantly improve data reliability and consequently the predictive performance, especially for the applications where CFD models are simple and material data insecure.The Kriging approach is suitable for the prediction of the growth recipe for certain positions, shape of s/l interface and all temperatures in the melt and crystal.
Artificial neural networks for solving inverse problems with small amount of numerical data captured accurately the system behavior and enabled study of dependencies among the variables.For inverse tasks, the feasible input temperatures must be evaluated prior to ANN predictions of the growth recipe.The prediction accuracy can be significantly improved when more input data are used and when large amounts of data are available.

Figure 1 .
Figure 1.Geometry of the 4-inch VGF-GaAs hot zone with two top heaters, two side heaters, and one bottom heater.Notation corresponds to the growth rate y 1 , and heating powers y 2 -y 6 , which are outputs for the inverse task.

Figure 2 .
Figure 2. Examples of CFD result for certain growth recipe and training data for inverse task extracted from it: a) the temperature field in the furnace, b)isotherms in GaAs with 4 K intervals (the s/l interface corresponds to T m = 1513 K) with red marked part used for image inputs, c) numerical inputs for inverse task in the form of interface position x 1 , interface deflection x 2 and temperatures x 3 -x 10 , d) image input in the form of a black area for right half of a GaAs crystal and a white area for a right half of a melt, which we will call BW images, and e) image input in the form of a line representing shape and position of the right half of s/l interface, which we will call L images.

Figure 3 .
Figure 3. Examples of image inputs in the form of a black surface for a right half of a GaAs crystal and a white surface for a right half of a melt for different growth recipes.

Figure 4 .
Figure 4. Examples of image inputs in the form of lines representing the shape and position of the right half of the s/l interface for different growth recipes.verseDistance weighting (InvD) method of multivariate interpolation, where the values in unknown points are calculated from weighted average of the values at known points.The weight is calculated by the inverse of the distance to each known neighboring point.The hyperparameters had the following values: number of modes = 0, number of closest neighbors = 2, shape of interpolation curvature (power) = 2.For prediction of growth recipes with images and numeric input data, the interpolation by means of Gaussian process (GP) regression (a.k.a.Kriging) was used.It allows to predict the value of an underlying random function Z = Z(x) at any arbitrary location of interest x 0 , i.e., the value Z(x 0 ), from the measured observations z(x i ) of Z(x) at the n ∈ N sample points x i .Due to the underlying GP, near-sample points get more weight in the prediction.Moreover, the GP provides a measure of uncertainty of the prediction in every point and considers the correlation among data, no matter whether that correlation is isotropic or anisotropic.In this study, the simulations were performed for quadratic basic function and stationarity h2.For predictions using 10 numerical input data and 6 output data, the MLP networks were used.Before an MLP network can be trained, it is necessary to derive its architecture (i.e., the number of neurons and the number of hidden layers), the nonlinear activation function, and the learning method.If there are no additional restrictions on the network parameters, i.e., unknown connection weights and biases between neurons, then the number of those parameters should not be greater than the number of training data in order to provide their identifiability.However, this number is not always sufficient.The higher the risk that the network loses approximation capability, the closer the number of data is to the number of parameters.If we want to use only 80% of the available data for training, then the above rule for the number of network parameters implies the following maximal numbers of hidden neurons:

Figure 5 .
Figure 5. Parity plot for ROM-POD InvD predictions for inverse task using either BW or L images as input data.

Figure 6 .
Figure 6.Examples of CFD results for the growth recipes given in Table 1 using power of heaters and growth rates equal: a,c) the training data for inverse tasks (true value) and b,d) predicted values from inverse tasks.The case A and B corresponded to the datasets with high and low ROM-POD InvD prediction accuracy, respectively.

Figure 10 .
Figure 10.The ANN architecture that assured the most accurate prediction for inverse tasks.

Figure 11 .
Figure 11.Examination of dependences among variables using the best ANN for inverse tasks assuming overall flat isotherms in GaAs, the interface position equal x 1 = 0.391m and the value of remaining inputs invariant and corresponding to the reference case x 3 = x 9 = 1520.75K;x 4 = x 7 = 1488.17K;x 5 = x 6 = 1292.5K;x 8 = x 10 = 1514.78K.

Table 1 .
Verification of ROM-POD InvD predictions using CFD simulations for the forward task.Cases A and B are selected examples with high and low prediction accuracy by inverse tasks, respectively.