Aspects of image data preparation to extend a classification scheme for cleaning mechanisms to realistic soils

Knowledge of the cleaning mechanism is necessary to choose a suitable model for a cleaning simulation. In the present work, an existing classification scheme for cleaning mechanisms is considered. Altough this framework is quite promising, the generation of training data constitutes a bottleneck, since the labeling was done manually and very roughly in order to supply the necessary amount of samples in a reasonable time. This, in turn, causes the scheme to be inaccurate when applied to more realistic data. The aim of the present work is to improve the preparation of training data preparation by introducing a semi‐automatic labeling procedure. The labeling procedure involves a new perspective on the data and the application of a gradient filter procedure. Furthermore, fully convolutional networks (FCNs) are employed to generalize different gradient filter. The labeling procedure is significantly faster and more consistent than manual labeling. Also, a proof of concept is provided showing that the FCNs are a suitable technique for the present classification task.


INTRODUCTION
Cleaning and decontamination is one of the most important topics in the food processing industry [1].Multiple cleanings per day may be required since various products are produced on the same facilities [2].Various authors claim that cleaning raises high economical and ecological costs.To name an example: Rad and Lewis [3] analyzed the water consumption of market milk processors and found out that cleaning causes 28% of total water consumption, which is the highest among different contributions.Alvarez et al. [4] found out that cleaning of processing machines in dairy industry causes a daily downtime of 4-6 h.Although the mentioned figures were collected a decade ago, the problem is more relevant than ever before in times of energy revolution.
Although cleaning raises high costs, its dimensioning is done mostly empirically [5].Cleaning processes are not fully optimized yet, due to a lack of methods.Simulation of cleaning processes would be a feasible approach for systematic variation of operating parameters with subject to reduce ecological and economical costs [6].However, cleaning processes are very complex and often involve time-dependent material properties and turbulent multiphase flows [7].A cost efficient way of conducting cleaning simulations of film-like soils with accuracies reasonable for industrial application is the Boundary Condition Cleaning Model (BCCM) approach, first introduced by Joppa et al. [8].In a preliminary step, the flow within the component of interest is simulated by means of a computational fluid dynamics (CFD) simulation without considering the dimensions of the soil.In the actual cleaning simulation, the flow field is frozen and a scalar transport equation is solved for the transport of the soil within the flow.The knowledge about the soil and its removal is transformed into a suitable boundary condition for the scalar transport equation.In follow-up works of the same authors further cleaning models were developed with the concept of decoupling flow simulation and calculation of soil removal [6,9,10].However, the newer models no longer require solving a scalar transport equation, since soil removal is described differently.The way how this is done depends on the cleaning behavior of the soil, that is, the specific cleaning mechanism that is active in a particular situation [7].Hence, knowledge about the cleaning mechanism is mandatory for the applicability of the aforementioned models.
Various authors distinguish soils according to their cleaning mechanism.A detailed discussion of the literature can be found in Golla et al. [7].The cleaning mechanisms considered in the present paper are based on the work of Köhler et al. [11].These cleaning mechanisms are diffusive dissolution, cohesive separation, adhesive detachment, and viscous shifting (depicted in Figure 1 of Golla et al. [7]).Diffusive dissolution is the transport of soil molecules into the cleaning fluid, driven by a concentration gradient.Cohesive separation occurs, when the cohesive strength within the soil is exceeded by hydrodynamic loads and soil chunks are removed.Since cohesive separation of small soil chunks close to the soil surface is similar to diffusive dissolution, the mechanisms are impossible to distinguish from a macroscopic point of view.Therefore, within this paper, the term cohesive separation adresses both aforementioned cleaning mechanisms.Adhesive detachment is the counterpart of cohesive separation.Here, the adhesive strength between soil and substrate is overcome by hydrodynamic loads and large soil patches are removed at once.Finally, in case of viscous shifting the soil is or becomes flowable due to certain physical or chemical effects (e.g., melting).
In previous work, the present authors developed a machine learning (ML) algorithm that automates and objectifies the identification of the cleaning mechanism based on grayscale image data from cleaning experiments under usage of feed forward neural networks (NNs) [7].The NNs were trained using image data generated in cleaning experiments with model soils, exhibiting a clear cleaning mechanism.Later, they were applied to more realistic soils with spatial and temporal variation of cleaning mechanism.Excellent accuracies above > 95% were achieved in predicting the dominating cleaning mechanism throughout the whole cleaning process on the model soils.However, when predicting the cleaning mechanism resolved in time, the algorithm achieved only up to 80% accuracy.The application to more realistic soils exhibited a good qualitative agreement, however, no quantitative assessment of the performance was made since the labeling procedure was not applicable to realistic soils.For the data labeling, the cleaning experiments were investigated frame by frame and regions belonging to a certain cleaning mechanism for a certain time were tagged accordingly.No pixelwise investigation was done, since this would have been too cumbersome manually.On the other hand pixel-by-pixel and frame-by-frame labeled data would be necessary to obtain both, a classification scheme working on realistic soils and the possibility to quantitative assessment.
In the present paper, an improved labeling strategy will be developed to fulfill this requirement and a first realization for a classification algorithm processing this data will be provided.A family of algorithms designed to obtain pixel-by-pixel segmentation of image data are FCNs [12].To label the data, a procedure based on gradient filters is introduced.

CLEANING EXPERIMENTS AND DATA PREPARATION
The cleaning experiment utilized in this work are described in detail in Golla et al. [7], so that only a brief summary is given here.The experiments were conducted with dried starch (pregelatinized waxy maize starch), dried ketchup, and petroleum jelly, each soil type being representative for one cleaning mechanism.A duct with rectangular cross section was used, with one of the larger side walls transparent to observe the cleaning process and the opposed wall soiled with the substance of interest.The cleaning process was monitored using a grayscale camera and the measured quantity was the intensity .In Figure 2 of Golla et al. [7] sample images and evolutions of gray value over time are shown for each cleaning mechanism.
One major change applied in the present work is the way the data are observed during labeling.Previously, the videos were investigated frame by frame: this makes it easy to recognize cleaning but it is hard to perfectly locate pixels, where a cleaning mechanism is active.In the present work, instead of observing --planes in each frame , --planes are now the starting point.An example of how such a view is created is shown in Figure 1.The way the data are presented now makes it easier to locate the cleaning process in space and time, and for the educated viewer, it is very easy to distinguish whether a suggested labeling is suitable or not.In the present study, a total data basis of four experiments for each cleaning mechanism was employed.Three of these four experiments were used for training purpose.In each of the remaining experiments, one --plane was labeled manually to allow quantitative assessment of the accuracy.The manual labeling was done by investigating the evolution of the gray value over time for each -location and took around 30 min for each --plane.

Gradient filter procedure
A typical image processing technique is edge detection, which will be a fundamental in the method for detection of cleaning modes presented here.When applying edge detection, the image gradient is computed numerically using finite differences.This is equivalent to performing the convolution of the image with a gradient filter, such as  1 = [−1 0 1] (see Gonzales and Woods [13]).The gradient filter procedure applied here consists of three steps and is illustrated in Figure 2.
To detect cleaning, a generalized gradient filter of the form is applied, where Δ is the filter radius.A small filter radius serves well to locate steep gradients, while a large filter radius identifies flat gradients.As a first step, the temporal gradient was computed.Assuming the gray values being stored as   , , where  is the pixel index in -direction, and  the pixel index in -direction, respectively.Given a frame with frame index , the absolute value of the temporal gradient is calculated as To allow the computation of the first and last Δ values, a padding was applied by extending the evolutions of the gray value by Δ times the first and Δ times last value.In a second step, a threshold  was applied to the filter: all gradient values above the threshold were considered to be associated with cleaning.The threshold can be set either in form of an absolute or relative value.In the last step, the interval where cleaning is detected can be enlarged by applying a dilatation radius Δ d .Thus, the gradient filter procedure has three hyperparameters: the filter radius Δ, the threshold , and the dilatation radius Δ d .Applying this procedure, one filter was tuned for each experiment.The result of the filter can be easily evaluated using the new labeling view discussed in Section 2. Note that the gradient filter procedure does not detect which cleaning mechanism is present.This information must be provided manually.

Fully convolutional networks
FCNs are a subfamily of convolutional neural networks (CNNs) dedicated to pixel-by-pixel segmentation of images [12].CNNs are based on the convolution operation and during the training procedure, the filters are learned.For that reason, CNNs are well suited for the present task, since it is required to generalize a set of filters generated with the gradient filter procedure.Typically, CNNs consist of two parts: in the first part convolutional layers, pooling layers and activation functions are used to build features from the data.In the second part, the features are passed through fully connected layers, similar to a feed forward NN to receive an output for a certain task.Almost each CNN can be transformed in an FCN by replacing the second part with a convolutional layer and ensuring that the number of inputs matches the number of outputs [14].
The architecture used in the present paper only consists of convolution blocks and is illustrated in Figure 3. Pooling layers are omitted to keep the architecture simple.Each convolution block  has a set of channels   = {1  , 2  , …}.The data were passed batch-wise through the network, that means the data were partitioned in batches and the trainable parameters of the algorithm updated after every batch.After the whole training data are seen once by the algorithm, this is called an epoch.Within each convolution block , three operations were used: 1. One-dimensional convolution 2. Batch normalization

Activation function
The result of the first step was calculated using Herein,  is the frame index,  is the sample number within the present batch, and   the channel of interest.The first sum iterates over all channels belonging to the previous block, while the inner sum corresponds to the discrete crosscorrelation operator.The upper bound, 3, equals the size of the kernel and could be any odd number.The weights   ,  represent the filter kernels and are the trainable parameters of the FCN.Each block  contains 3 ⋅ | −1 | ⋅ |  | trainable parameters, so that the number of parameters is independent of the input size.The values  −2+,  represent the output of the channel  belonging to the previous block.If the sum is evaluated for the first block ( = 1), the values  correspond to the input data only consisting of a single channel.For evaluating the sum at the first and last entry, the data were extended by applying zero padding.As a second step, batch normalization was applied, where the results from the first step were normalized with mean and standard deviation, which were computed considering the entire batch.In the third step, the data were passed through a nonlinear activation function.This is necessary to filter for particularly important features.In the present study, ReLU and tanh were tested, observing that the results are similar.The results shown later on were obtained with tanh.To ensure that only one output is generated per input, the final convolution block was configured with two channels representing cleaning and no cleaning, respectively.The result was finally passed through a softmax activation function to decide which label to assign.The rest of the network was built symmetrically.A employed number of convolution blocks was 2  , with   the block depth.The number of channels within each block was computed as The procedure just described involves the hyperparameters | 1 |,   ,  defining the architecture.The FCNs were implemented in Python 3.11 using the PyTorch library.

Training
In total, four different FCNs were trained.One of the FCNs was trained with all nine experiments with the objective to identify cleaning in general (FCN-All).Each of the remaining FCNs was trained just to detect one single cleaning mechanism, each using three experiments (FCN-C, A, V).The training was conducted over a maximum of 100 epochs, while a batchsize of 10 000 was used and each experiment provides around 8000 samples.The samples were taken in a way that the length of the samples  in was always the same, and the gray values were always normalized with the first value [7].To avoid imbalances, class weights were assigned in a way that each class contributed equally to the loss, in total.Throughout the training, cross entropy loss was utilized.At the beginning of the training, the ADAM optimizer was used with an initial learn rate of 0.01.After each 30 epochs, the learn rate was reduced by a factor of 0.5.If the validation loss was not improving for 20 epochs, the optimizer was switched to SGD with a momentum of 0.9.This procedure provided better results.Every time the validation loss improved, the current model parameters were stored overwriting the recent parameters.If the validation loss did not improve for 30 epochs, the training was stopped early.

Gradient filter results
First, the gradient filter procedure was applied to the manually labeled experiments.An individual filter could be tuned for each experiment within a minute and the resulting classification mask is compared to the manual result in Figure 4. Quantitative comparison is done using the accuracy metric, which is defined as the number of correctly classified samples divided by the total number of samples classified.Since the samples for each class are not equally distributed, the inter-  section over union (IoU) metric was used additionally.The IoU metric compares the pixel mask obtained manually to the mask obtained by a procedure to be assessed.As the name indicates, it is defined as the ratio between the intersection and the union of these masks.
All metrics computed are listed in Table 1.For the gradient filter procedure, the lowest accuracy is obtained for viscous shifting and the highest accuracy for adhesive detachment.The accuracy metric is very good in all cases and Figure 4 confirms that misclassifications only arise by slight under-or overestimation of the time when the cleaning process starts or ends.For all cases, IoU metrics above 70% were achieved.To gain more insight into the quality of the classification result, an evolution of gray value over time was considered for each cleaning mechanism and the labelings were compared (Figure 4, last row).In case of cohesive separation, the labels obtained manually start earlier than the result of the filter.The example highlights that it is hard to tell when the cleaning process really starts.The advantage of the filter procedure is that it would rather introduce a systematic error, while manual labeling always involves random uncertainties.Systematic errors can be circumvented using a suitable post processing technique, for example, by applying a one-sided dilatation to the classification here.The example of viscous shifting is the most complex, since it might involve reattachment of the soil to the wall.In this case, using the filter procedure more regions are associated with cleaning than manually assigned.

Fully convolutional networks
Finally, FCNs were considered.From inspecting the masks in Figure 4, only a slight difference between the results of the gradient filter procedure and the FCNs for each cleaning mechanism (FCN-C, A, V) is visible.In case of cohesive separation, the size of the green region at the beginning of the cleaning is increased, indicating a stronger deviation from the manual labels.Table 1 shows that the metrics only decreased in case of cohesive separation.
The FCN-All that is trained to detect cleaning in general has worse performance than previous candidates.Horizontal red stripes in Figure 4, fourth row, show that the FCN-All sometimes misclassifies samples further away from the cleaning process.In case of the mechanism cohesive separation the FCN-All benefits from more variation in the data achieving better metrics.The opposite effect occurs for detecting adhesive detachment.Here, image noise is misinterpreted as viscous shifting.
The present results can now be compared to the results obtained with the online network from the previous work [7].In this reference, accuracies up to 80% were achieved when the algorithm was trained with 10-20 experiments for each cleaning mechanism and substantial time was invested to design appropriate features.Juxtaposing both results demonstrates that the metrics obtained with the present approach are promising.Especially when considering the smaller amount of training data.

CONCLUSIONS
The present paper proposed an improved labeling strategy, which supports the data curation procedure for an existing classification scheme for cleaning mechanisms.The existing algorithm was analyzed and it was concluded that the level of detail of the training labels is the bottleneck for further enhancement of the algorithm and application towards realistic soils.Hence, an improved labeling strategy was developed, which is based on three entities: first, the perspective of investigation, which makes it easier to decide whether a labeling can be considered as good or bad.Second, the gradient filter procedure, which allows to obtain a fast and consistent labeling suggestion by only tuning three hyperparameters.Third, generalizing filters using FCNs.
While the FCN can be used as labeling assistance, they are also a promising architecture for a classification algorithm, which will be investigated in the future.In the present work, the FCNs only consider the gray value evolution over time of a single pixel.To also include macroscale information, multiple gray value evolutions will be considered.Furthermore, active learning strategies can be used to improve the performance of FCNs used for labeling assistance.

A C K N O W L E D G M E N T S
This research project is supported by Industrievereinigung für Lebensmitteltechnologie und Verpackung e.V. (IVLV), the Arbeitsgemeinschaft industrieller Forschungsvereinigungen "Otto von Guericke" e.V. (AiF) and the Federal Ministry of Economic Affairs and Climate Action (IGF 21334 BR).
Open access funding enabled and organized by Projekt DEAL.

F I G U R E 1
Changed labeling view.(A) Original view on --plane.(B) Video footage as three-dimensional block.(C) New view on --plane.(D) Regions associated with cleaning in the - view.
Gradient filter procedure consisting of three steps, starting from a gray scale signal in time: 1. Application of the filter  Δ , 2. Application of the threshold , 3. Application of the dilatation radius Δ d .Result is the labeling of the gray scale evolution by a cleaning mechanism, here the gray shaded laps of time.F I G U R E 3 Data processing through the FCN.(A) Data processing within a convolution block of the FCN.Each sample refers to an evolution of gray value over time taken from a single pixel,  in is the number of frames, dashed squares indicate padding,  B is the number of samples within a batch.For the convolution block shown, the number of channels are | −1 | = 1, |  | = 4. (B) Architecture of the whole network.

F I G U R E 4
Comparison of the classification results.Green regions are only tagged manually, red regions are tagged only by the respective technique employed, and purple indicates overlapping regions, where both labelings are positive.First row: original image, second row: results obtained with gradient filter procedure, third row: results obtained by FCNs trained for each cleaning mechanism, fourth row: results obtained by FCN trained to detect cleaning in general, fifth row: sample evolutions of gray value over time with labeling obtained by gradient filter procedure.
Resulting accuracy and IoU metrics obtained by the different techniques employed.