Optimized multi‐biometric enhancement analysis

Saliha Artabaz, Laboratoire de Méthodes de Conception de Systèmes LMCS, Ecole nationale Supérieure d’Informatique ESI, Oued‐Smar, Algiers, Algeria. Email: s_artabaz@esi.dz Abstract A multi‐biometric system uses different modalities to identify individuals more accurately. The authors analyse fusion efficiency of a significant number of multi‐biometric fusion schemes. To do so, the study applies different functions that are generated using genetic programming (GP) on the 2000 multi‐biometric instances produced by the fusion of different biometric matching scores. The functions are represented using a tree of arithmetic operations and are used for fusion at score level. First, genetic programming is implemented on the XM2VTS score database. The GP optimizes the half total error rate of fused matching scores. Then, a comparative study is performed based on our experiments on matching scores of different biometric baseline systems provided by the bio‐secure database. This database provides 24 streams that we use to generate 2000 multi‐biometric combinations. These multi‐biometric instances combine matching scores of different instances, sensors and traits. To assess the quality of the fused scores and the quality of performing biometric baseline systems, we use weighted functions based on user‐specific and group‐specific normalization. Then, we propose a hybrid cat swarm optimization (CSO) based on the average‐velocity inertia‐weighted CSO and the normal mutation strategy‐based CSO to compute the weights of the selected functions for the fused biometric systems. Finally, we present the statistical significance tests to confirm that the proposed functions outperform the existing functions based on arithmetic rules, normalization fusion and evolutionary algorithms.


| INTRODUCTION
Multi-biometrics addresses several unimodal biometric systems drawbacks. Mostly, they are applied to reduce matching errors [1] since the research experiments show that combining different traits reduces both false positive and false negative errors [2][3][4][5]. With recent user-friendly applications like patient identification, healthcare monitoring [6,7], mobile and smartphone identification for sensitive information and services protection [8,9], biometrics is dealing with new challenges like variable configurations and adaptive modalities combinations [10,11] in addition to complex and interactive environment constraints that induce high-level data confidentiality risks [12]. Providing similar contextual conditions in the framework testing, which allows different combinations and fusion methods testing, is highly desirable. Thus, the biometric system can adapt a fusion method according to the acquired modalities and successful features extraction.
Multi-biometrics fusion is applied at different levels: signal level (namely image level), feature level, score level, decision level and rank level. The most used among these in the literature is the score level [13,14]. This level has many advantages: � Scores are much easier to fuse than features [3]. � The score level provides more information than the decision level [3]. � They are easier to assess from existing uni-biometric systems. � The fusion at this level provides better accuracy than fusion at other levels [15,16].
Fusion at score level considers different factors or parameters that must be optimized to ensure the success of the applied fusion strategy. On one hand, the authors in [17] propose an innovative scheme to fuse scores by combining the weighted mean and quasi-arithmetic mean without any learning process, while [18] uses consecutively a suite of derivatives on training data to estimate the vector score integration adapted to the multiclass problem that minimizes the probability of error. On the other hand, many evolutionary methods are applied to optimize these metrics with a learning process like differential evolution [19,20], particle swarm optimization (SO) [21] and quasi-convex optimization [22]. SO is suitable for different optimization applications. The use of SO is motivated by its ability to emerge intelligence using collaboration between individuals that has limited or no intelligence [23]. This is a good opportunity to solve problems of high dimensionality that are complex to resolve by brain perception. Cat swarm optimization (CSO) is one of the most widely used and is inspired by the natural cat behaviour that models exploration and exploitation using an innovative technique with respect to the trade-off between the two phases [24]. The authors in [24] describe the different applications of CSO in computer vision and biometric recognition and confirm through their survey and performance evaluation that CSO outperforms PSO, Genetic Algorithm (GA), DE and other methods applied in different fields. Their results prove that CSO is a competitive algorithm that leads to the optimal fitness value for different problems. To the best of our knowledge, this method is not yet applied in score fusion factors computation like weighting. Therefore, we propose a CSO algorithm proposed to compute the fusion weights that are combined with the user-specific quality measure on pre-computed score fusion functions from genetic programming (GP).
The fusion at score level considers scores resulting from different traits, instances, sensors and algorithms. Therefore, to identify the best system adapted to the situation, we must consider all these parameters. Indeed, the experiments conducted in [16] achieved different gain ratios for different combination methods. The experiments showed that a successful fusion depends on many factors including the availability of data for training and matchers, the accuracy of the matchers, the correlation of the scores and the chosen fusion. Consequently, assessing the efficiency of a fusing strategy requires testing the strategy on different score combinations before concluding whether the fusion method can outperform individual accuracy of the baseline systems. However, most of the research found in the literature compares fusion methods relying on their own baseline systems [3,15,16]. However, this cannot affirm whether the enhancement occurs as a result of the fusion or the used baseline systems' performances. Therefore, for a predefined fusion strategy, we can search the best biometric combinations that work better than the fused baseline systems.
Here, we propose to combine GP with the proposed CSO algorithm to generate different weighted fusion functions represented in a tree of arithmetic rules. Then, we compare our proposal to the fusion strategies of main proposals in the literature such as the use of norms [5], DE, PSO and BSA [19][20][21]25]. Our first version and application of GP is proposed in [26] whose extended experiments of the best trees are conducted on the Biosecure score database [27]. Then, we perform multiple simulations using GP to get closer to the optimized fitness value, within the defined number of generations, for the maximum number of individuals. These simulations allow us to select multiple trees that optimize the score fusion. These trees are used to benchmark and analyse fusion on an unlimited number of biometric output combinations that are constructed thanks to the scores provided by the Biosecure score database [27]. This database includes non-correlated baseline systems of different traits and a quality assessment of the processed signals. The quality assessment must consider the acquired signal and the quality of the biometric system that processes it. The first evolve for each sample, where the second can be evaluated to provide the scalar that indicates its performance and reliability. Therefore, we aim to combine baseline systems weighting based on an evolutionary method with quality assessment computed from the baseline systems quality.
The available quality measure gives a scalar indicator based on different sources of quality [28] such as the trait character, imaging and environment. We use the measures, provided by the Biosecure database, to weight the fusion functions. The proposed approach outperforms the baseline biometric subsystems. Then, a comparative study is conducted between the best-computed fusion functions that are obtained from the GP. Then, the statistical analysis of the tested multi-biometric scores fusion is described allowing us to show the ones which reduce the baseline biometric systems' errors for instance the equal error rate (EER) with or without the hybrid cat swarm optimization (HCSO) weighting.
In Section 2, we present the score level fusion applied in multi-biometrics. Section 3 describes GP used for generating the proposed functions, the HCSO algorithm applied to compute baseline score weighting and the tested databases. Then, we display and analyse the different proposed functions used to fuse multiple combinations of baseline biometric systems in Section 4. This leads us to propose a useful platform that gathers an unlimited number of multi-biometric fused scores to find the best combinations. Finally, we conclude and propose some perspectives of the work performed.

| MULTI-BIOMETRICS AND SCORE LEVEL FUSION
Multi-biometrics are gaining popularity since they address unimodal biometric system weaknesses such as non-universality, noisy signals, low accuracy, and spoofing attacks. It is wellknown that multi-biometrics is mostly applied to reduce unibiometric systems errors according to the different studies published by researchers in this field [13]. The uni-biometric systems are becoming less and less reliable to protect confidential data at a high level. Authors in [12] demonstrates the effectiveness of fusion applied on Support Vector Machine (SVM) classifiers outputs, using the Dempster-Shafer evidence theory, that can provide a reliable biometric in complex interactive environments. In addition, fusion provides effective systems which combined with template protection can preserve and enhance system performances against spoofing attacks [29].
Fusion of multiple traits faces many challenges to ensure performances' precision and avoid biased results. These challenges include the multiple data sources incompatibility, matchers scores normalization and the noise, which influence systems performances, bringing false positive or false negative authentication.
Recognition in multi-biometric fusion schemes combines traits of different sources (multiple samples of modalities or traits acquired with different sensors and processed with different algorithms) at different levels (sensor, feature, score, rank, or decision): � Sensor level: combines multiple images or signals obtained from the same sensor or different sensors. This level can be considered in signal reconstruction (multiple images of the same evidence) or fusion. � Feature level: combines multiple feature vectors. The combination consists of fusion or selection of features from multiple sources. � Score level: combines matching scores of different matchers or classifiers. � Decision level: combines decisions made by different biometric systems. � Rank level: combines ranking results of different biometric systems.

| Score level fusion
The score level fusion is the most cited and used fusion [8,9,15,30,31], which is due to its low computational complexity.
In addition, comparative studies have shown that fusion at the score level outperforms other levels [15]. The proposition of [19] discusses the use of differential evolution, which has a similar structure to GA, in parameters tuning to reduce the overlapping of genuine and impostor distribution. Also, authors in [20] use DE to find the confidence factors of the belief assignments that reduce the weighted error rate. The belief assignments representing the transformed scores are then fused using proportional conflict redistribution (PCR). The same factors are tuned using PSO in [21], and the fusion is performed using the Dempster Shafer and PCR. PCR is also used to resolve conflicting beliefs from different classifiers whose performance is optimized with the evolutionary backtracking search optimization algorithm that searches the best confidence factors that boost or remove classifiers scores [25]. Also, there is another method which optimizes the fusion process using the quasi-convex optimization [22] that requires domain-specific knowledge on feature similarity score distributions. Table 1 describes some of the works proposed in the literature.
The comparative study in [30] shows that the score level slightly surpasses both feature and decision level. However, due to the limited number of tests, the work does not provide a significance test to confirm the score level superiority. A recent work proposes a unique blend of belief assignment and decision-making methods in Dezert-Smarandache theory framework [15]. They tested fusion of different datasets where TA B L E 1 Review of fusion methods

Modalities Methods References
Multi-spectral palmprint Different t-conorm compared to sum, SVM [8] Iris, face GA features selectionSVM score fusion [9] Speech, lipreading Average [30] NIST BSSR1 multi-biometric score database (fingerprint and face), face, iris dataset Dezert-Smarandache theory DSmT [15] Palm/phalanges print Sum, product, min, max, hamacher t-norm, frank t-norm, yager's t-norm [5] Face, palmprint, signature, speech Sum rule compared to the unimodal system [31] Fingerprint, iris, left ear and right ear DE, exponent control, and Kernel mapping [19] Fingerprint and voice DE and proportional conflict redistribution [20] Face and voice PSO, Dempster Shafer and Belief functions [21] Two face databases Quasi-convex optimization [22] Iris, finger vein and fingerprint Evolutionary backtracking search optimization algorithm [25] Abbreviation: GA, Genetic Algorithm. only one contains two traits (NIST BSSR1), and the two other databases (FRGC, IRIS dataset) are used for multi-algorithm and multi-sample fusion ( Table 2). Using multiple databases in multi-biometric fusion analysis can be suitable to compare their performances. However, fusion cannot guarantee performance enhancement. The used modalities that belong to different persons are not dependent. Despite this, multiple works perform their experimental study on combined databases to demonstrate the effectiveness of their fusion proposal. The use of standard multimodal databases remains the best option according to the availability of the chosen modalities.

| Normalization and weighting
Normalization and weighting system's outputs are necessary to preserve and improve the provided system accuracy. Normalization is usually used to transform scores of different sources into a common domain. Recent researches [5,8] use normalization methods as fusion operations. They combine couples of scores in flow. Their results seem to be interesting when compared to SVM and Sum rule, but need further investigations [8]. On one hand, normalization is needed to combine heterogeneous scores outputs of uni-biometric systems. The parameters of normalization methods are computed in the training step and are tested in the validation step. On the other hand, the main problem that encounters the biometric evaluation is the disparity between scores of samples belonging to the same class: genuine or impostor. This causes misidentification of the real class and results in false acceptance and false rejection [32]. The client-specific normalization is a good solution that can suffer from the lack of samples. Therefore, authors in [32] discuss the usefulness of group-specific normalization methods to overcome this problem.
The quality measurement [33] provides a predictor of the system performances. Sample quality is defined as 'scalar quantity that is related monotonically to the performance of biometric matchers' [33].
Many recent investigations [2,[34][35][36][37] are interested in the sample quality. They use the sample quality scalars to get a well-adjusted fusion function. Another application considers the system quality to indicate to the fusion approach the relevance of each uni-biometric system. This quality evaluation allows the fusion process to provide an adapted combination according to the identified biometric signals for each authentication [38]. Here, the main issue is to preserve accuracy even if the signal quality changes from an authentication to another. This can be interpreted as considering the quality scalar as a quantification of the evidence degradation due to noisy signals. Authors in [37] use an image reconstruction module that optimizes the false acceptance rate based on the quality measurement of the processed signal. Otherwise, it is important to notice that the perception of a supervised biometric system is out-of-date in recent user-friendly applications that require more convenience [39]. Therefore, quality evaluation is crucial to establish a reliable multi-biometric system that is adapted to the variation of the processed data [34]. To summarise, the studied works related to this field employ signal or biometric system quality to ensure the consistency of the biometric decision. Other researchers consider the quality evaluation of the biometric system reliability [35].
Improving system accuracy using a fusion method on a limited number of multi-biometric schemes does not attest its efficiency. Indeed, we can reach improved accuracy using a fusion strategy thanks to the baseline system combination.
Here, the authors aim to experiment different functions to fuse different scores outputs of uni-biometric systems and verify this assumption. Therefore, the authors propose a useful platform [26] to test the predefined fusion functions and classify the different biometric system combinations according to their accuracy enhancement. Thereby, the platform gives the best combinations and fusion methods according to the required security level (high, medium, and low). In addition, it determines if the used fusion functions give equivalent accuracy for different multi-biometric combinations. That ensures that the adopted fusion gives steady performances in different situations. It has been proven that typical operations are the best ones in combining limited number of uni-biometric systems [14]. Therefore, the authors aim to propose different fusion functions based on these operations, and study and compare between them on different fused uni-biometric systems. Further, the authors analyse the impact of normalization and quality integration on performances of the studied multibiometric combinations.

| MATERIALS AND METHODS
Choosing a fusion method in score level is very crucial to enhance the provided multi-biometric system performances. The work of the authors provides performance analysis of different fusion functions (see Table 8) that combine elementary operations like sum, product, min and max along many multibiometric scores combinations generated from the score database. To accomplish that, we start by explaining the GP applied previously [26] and enhanced to be tested and compared on a new and different database. Then, we define the proposed HCSO to reduce errors of a subset of multi-biometric that we -329 analyse through the analysis performed here. After that, we describe the databases that we use in our experimental study. Finally, we discuss the proposed experiments.

| GENETIC PROGRAMMING
We use GP to get a subset of optimal functions among the constructed ones that are represented using a tree structure. This section explains the used GP simulations that conduct us to preliminary results discussed in [26]. The aim of the authors is to re-explore the proposed optimization and test a diversity of functions that have good performances. The mutation and crossover operations allow exploring search space of different trees by modifying operations on the selected nodes. As we perform tests on a score database, we can vary the number of leaves of the initial population to match the number of input scores of the tested score database. Figure 1 recapitulates the GP parameters. Figure 2 summarizes the applied GP to get the best trees or functions that we use for fusion and test on the score database. All the population trees are tested using the evaluation protocol offered by the XM2VTS database. As we perform testing on different databases, we give details of the train and test data later.
As shown in Figure 2, in each simulation, we start with the initial population, generated randomly or from the previous simulation by varying the death nodes (between 2 and 8) and the number of nodes that corresponds to a predefined interval for the number of leaves (at least 8 leaves that support the input scores). Afterwards, we evaluate the half total error rate (HTER) of each tree that combines the database scores. Then, according to the fitness of the population or the number of generations, the process continues with tree transformations until it reaches the best population. In our case, we succeed to optimize all the population to get a mixture of functions with an HTER in a reduced interval. Here we present details of the GP used to explore the search space:

| Hybrid CSO (AVICSO and NMCSO)
Now, we combine average-inertia weighted CSO (AICSO) [40] with normal mutation strategy-based CSO (NMCSO) [41]. The first introduces a balance between the global and the local search where the second reinforces the global search to avoid premature convergence to local optima. In our case, we only apply the average on the velocity using this formula: Now, the problem is defined as the search of scores weights and fusion functions that optimize the EER of the studied multi-biometric system. The modification of the used function by each cat of the CSO population occurs when we detect that the current function does not reduce the EER.
CSO algorithm is based on two phases that simulate the cat's behaviour which are seeking mode and tracing mode. To find the optimal weights, we based our optimization on the development set and test these weights on the evaluation set. The proposed HCSO takes the following steps to get the optimal solutions: Step 1 Initialize all parameters of seeking and tracing mode and evaluate the fitness of N cats represented by their decision variables that are the function fusion chosen randomly combined to our scores weights initialized randomly in the defined velocity range. The weights are modified in steps 3 and 4, while the function number is controlled to diversify solutions if we do not find solutions that reduce the function error.
Step 2 Classify the cats into seeking and tracing mode according to the mixture ratio (MR) that defines the ratio of cats in seeking mode.
Step 3 Go through the seeking mode and update the copies (added according to seeking memory pool SMP-1) of parent cats with normal mutation, instead of traditional random mutation, and SRD (seeking range of the selected dimension) parameter that defines the mutative ratio. The normal updates the cats positions that are randomly distributed according to the range of the resting cats. The used formula is as follows: The CDC (counts of dimension to change) dimensions define the ratio of random dimensions that are modified.
Then evaluate the copies and select the next position for each cat.
Step 4 Go through tracing mode by updating dimensions of the cats in movement according to w that defines an inertia value to the velocity formula defined by: The decreasing value of w enables one to maintain a global search at the beginning and gradually move to local search. This updated velocity for each dimension d is then added to the position of cat k .
Step 5 Check bounds of the solution and explore other functions for the best solution if there is no fitness improvement after a defined number of iterations. In that case, we propose to enhance the global search by exploring other bad solutions. Then, combine cats of seeking and tracing mode and repeat step 2 to step 5 until reaching the fixed number of iterations.

| Databases
First, we need to present the databases used in our experiments. We choose these databases as they are constructed with respect to the biometric database protocols. Furthermore, they are more suitable for our comparative study as they give a great number of scores computed from different biometric systems.

| XM2VTS score database
We use the XM2VTS score database [14], where the scores are computed based on the publicly available XM2VTS face and speech database. These scores give access to the provided baseline systems with their documentation for further experiments. Publishing score results of these baseline systems is important for researchers to compare them in more detail with their obtained scores. However, fusion, in this context, is less considered as it is more useful to use scores for combining on common settings instead of a simple comparison of outputs. This database provides well-defined protocols (train and test data details are described in Table 4) and a set of evaluation tools such as the DET curve, expected performance curve (EPC), and the HTER significance test, which we use in our fitness evaluation as an optimization metric.

F I G U R E 2
Genetic programming on the XM2VTS score database 332 -ARTABAZ ET AL.

| Biosecure score database
The Biosecure score database is a benchmark of qualitydependent, client-specific, and cost-sensitive fusion algorithms. It contains fingerprint, face and Iris modalities acquired with different sensors and scenarios. Baseline systems are evaluated using biometric verification systems cited in Table 5 [27]. The benchmark contains 24 streams of matching scores from two sessions divided into development set and evaluation set as represented in Table 6. The database simulates a real test environment for fusion schemes such as scores of cross-devices, quality evaluation of acquired traits and failures due to sensors or unmeasurable quality. In addition, it provides tools for analysing system performances based on the processed data.

| Multi-biometric optimization analysis
Here, we aim to analyse the impact of fusion on the performances of biometric systems. Thus, we need to generate different combinations of existing biometric systems as shown in Figure 3. Then, we apply multiple fusion strategies on the selected scores. This selection, carried out on n provided scores, gives a subset of scores that are used in the fusion process. We use the development set to compute normalisation parameters. Then, we test each multi-biometric combination using the evaluation set. The random selection provides 2000 multibiometric instances and covers different modalities and traits' instances that tests sensor cross-matching. We use the same fusion functions tested in [26] in the extended comparative study, and we compare them with typical fusion operations, normalization functions and evolutionary methods applied in score fusion.
First, we use GP results obtained previously from the XM2VTS [26]. Figure 4(a) shows the graph oscillations that illustrate the progress of GP from the first to the last simulation. It shows the percentage of minimal HTER obtained for the population at each generation. Figure 4(b) illustrates the fitness average according to the number of crossover and mutation. The GP provides a set of the best trees that represent a fusion function of elementary operations. Figure 5 shows the enhancement of the average HTER with GP simulation along 100 generations using the crossover that evolves the population to the optimal value and the roulette selection that enlarges the exploration. The mutation allows avoiding premature convergence. As a result, the HTER average is reduced to a range of [0, 5%] in the last experiments [26] that decreases under 1% using two successive simulations. After that, we start our experimental study according to the following steps: 1. First, a set of trees is selected from our previous simulation.
The selection was conducted randomly along all generations to get a set of generations according to the HTER. Then we select different trees of each one. 2. We use these trees to fuse matching scores provided by different biometric systems of the Biosecure database. 3. We use the development set to compute the min-max standard deviation applied in normalization. 4. First, we apply a user-specific weighting using the normalized database quality measures giving the sample fidelity to the claimed ID, multiplied by the standard deviation of minmax for each score (computed from the development set). We compute the weight as follows: where x t is the template quality (claimed id) and x q is the query quality (true id). This is a measure of dissimilarity between the two measurements.

F I G U R E 3
Multi-biometric score fusion evaluation ARTABAZ ET AL. -333 5. Second, we use Frobenius norm for the HCSO algorithm to rescale the distribution of client and impostor using a group-centred strategy [32] that aggregates the score matching of different samples belonging to the same group. In our context, we consider only two groups that are genuine and impostor. The normalization is the norm 2 between the two vectors and is computed from the development set and applied on the evaluation set.

| EXPERIMENTS AND DISCUSSION
In the experiments conducted, we get the optimized population illustrated in Figure 5 (the last population). As shown in this figure, the error range [0.67, 72.89]% of the initial population is reduced after the first simulation to [0.3, 0.87]%, then [0.1, 0.34]%. We get the best trees from the generated populations and use them in the next experimental study. We compare our Area Under Curve and HTER achieved in the XM2VTS score database with two evolutionary methods that use the same evaluation metrics (see Table 7).

| Experimental study
In the experimentation, we aim to test different configurations to analyse the impact of using the same functions and varying the fused uni-biometric scores. The study is essential as it demonstrates that fusion depends on the fusion function and the scores that are fused [1]. Also, we compare the proposed functions and the HCSO function to other typical operations, the fusion normalization proposed in the literature [5], and the applied evolutionary algorithms for fusion weighting [21,25]. We use the algorithm presented here: We perform the experiments on the Biosecure score database [27], a collection of matching scores computed from the Biosecure biometric databases that proposes a quality evaluation of the captured signals based on the features richness, environmental effects and signal properties. The quality is a crucial information about the studied fusion schemes to discard undesired influence of high scores provided by poor quality traits. We select a subset of fx10functions that provides low HTER error in the XM2VTS database that is representative of groups of functions with similar HTER.
The selected functions are shown in Table 8. Based on these functions, we search the optimized weights using the proposed HCSO algorithm that explores them to find the best one. The experiments are done on MATLAB 20a. The HCSO optimization is performed on two PCs: Intel i5 processor, 8 GB RAM, and Intel i7 processor, 16 GB RAM, and takes three days for 1000 multi-biometric scores using parallel execution with 6 MATLAB workers. Figure 6 illustrates the statistics (MIN, MAX, AVG and STD) of the cited functions that are tested using the evaluation set of the score database. These functions are classified according to the variance between the two sessions (see Table 6) to measure the function performance variation. In this ranking, we compute the sum of ranking considering that Session 2 is more important as recommended in the database documentation. We use the sum of these ranks to show the best functions. As a result, Function 3, 8, 5 and 2 whose averages of ranks is less than 5 are the best ones as illustrated in  -335 similarly. We consider that the two rankings are similar if the difference is less than 1. Function 3 is the best function with EER less than 33%. In Figure 7, we can observe that functions 1 and 2 give better result for the best multi-biometric fusion from two simulations and provide the lowest EER -0.0006%. Our results remain coherent because function 2 is one of the best-ranked functions. However, it is the less selected by the HCSO algorithm for 2.40% of the multi-biometric instances. In Figure 8, we present statistics of the tested fusion functions on multiple multi-biometric fusion schemes. It illustrates the percentage of enhanced multi-biometric performances under the corresponding EER on the x-axis for each sub-set of functions. As we decrease the number of functions that must provide a reduced EER, the number of multi-biometric instances increases. For example, six different functions enhance the EER of 7.85% of the tested ones under 3% while 70.40% of them have only one function with the same EER. These results seem interesting as they demonstrate the influence of the fused baseline outputs scores on the resulted EER. Nevertheless, we provide 40 multi-biometric fusion schemes that are optimised to an EER less than 0.001%. This is not verified if we increase the number of functions which explain the decreasing ratio until 0%. In addition, the simulation demonstrates a significant enhancement compared to operations: sum, product, min, max. The proposed functions provide an EER less than typical operations for 89.5% of the 2000 multi-biometric combinations considering only the quality weighting.  We compare our results to the recently used Hamacher t-norm and Frank t-norm normalization methods that we implement on the tested data of the Biosecure database. Also, Table 10 gives EER comparison with the state-of-theart results. The HCSO weighting optimization gives the lowest EER average. As shown in Table 10, only 50% of the tested multi-biometric scores are fused successfully in the best case. Our functions outperform the two best norms that give good results in [5], PSO [21], and Backtracking search optimization algorithm [25]. The proposed HCSO algorithm enhances the selected function using the score weighting and reaches more than 87.5% EER reduction compared to typical operations (see Figure 9). After 50 iterations, more than half of the multi-biometric combinations have less than 1% EER as demonstrated in Figure 11.
The performed experimental study confirms the assumption that having good results after fusion does not assert the robustness of the chosen fusion method. Here, we get a significant STD for some functions like f1, as seen in Figure 6. This means that the resulted functions depend on the selected scores of the biometric systems. The robustness of the chosen baseline systems used in fusion plays a crucial role in performance enhancement. Therefore, comparing different fusion functions based on the same data and baseline systems is required to affirm whether fusion has improved baseline system performances or not. Besides, testing different scenarios is always important because the ideal case does not exist. Generally, system failure is caused by system weaknesses to face some special scenarios, which are not considered, like sensor failure or insufficient image quality that affect decisionmaking. Hence, we must deal with these drawbacks to prevent system failure in a real environment. The chosen scores database is built upon some of these scenarios. In the experiments conducted, we test quality weights that we propose based on score quality available in the Biosecure database. We compare it to a normalized score without employing weights. The results are presented along 2000 multi-biometric combinations in Figure 10. As shown, normalization provides the lowest values. However, if we omit zero-values, mostly caused by missing values in the database, we can see that quality weighting provides a steady range and reduces significant EER fluctuation. However, max peaks are caused by missing quality scalars for some scores in the testing database. Figure 12 compares between the EER of 1000 combinations using user-specific quality weighting and the added HCSO weights. Combining the two quality measures for baseline scores and those related to the signal quality is more performing as demonstrated by these results.
In conclusion, searching for the best combination, instead of proving the effectiveness of a unique process, remains  important in future studies. Here, we optimize performances to reach the best multi-biometric systems on two simulations illustrated in Figure 7. This multi-biometric system gives nearby errors for the ten functions on session 1 of the evaluation set. The EER is in the range of [0.0002 6.88]%. However, the best functions with small variance (less than 1.5%) between the two sessions are 1, 2 and 4 in this case. With the HCSO optimization, function 1 is the most selected for 24.30% of the tested multi-biometric combinations.

| Significance test
To prove that the proposed functions outperform the typical fusion operations, we use the t-test to approve the significance of the variation between the two sets. First, we use a paired test to analyse the average difference in one direction. We get a t-value1.63 less than the critical t-value 1.64. So, the hypothesis of significant improvement is assumed (P < 0.1). The proposed fusion is significantly better than typical fusion operations with 90% confidence interval. Second, we use a paired test to analyse average difference assuming that the two variances are different. The test is almost successful (P = 0.05 < 0.1 and t value = 1.58 < 1.64). Next, we perform a significance t-test between pairs of our proposed functions and the four typical operations. Results are significant for all pairs as seen in Table 12 except for function 8 whose average is close to sum's average. Our functions perform better than all typical operations do.
The comparison between our functions and the typical fusion operator is illustrated in Figure 13. It represents the number of multi-biometric combinations, from 1000 tested combinations with the proposed functions that outperform the typical operations. Fct2 is the function that yields the best enhancement comparing to the Sum operator, while it reaches proximate percentage to Fct1 comparing to Prod and Min operations. [Fct4-Fct10] gives approximately the same results that outperform Max and Prod ones for more than 60% of multi-biometric combinations.
In summary, the percentage of success exceeds 90.4% compared to the typical operations. Figure 13 describes the total number of multi-biometric combinations for which we find at least one function that outperforms sum, prod, max and min. Also, the number of functions that succeeds in each multi-biometric system is numbered, where each series represents one of the typical functions, in Figure 13(b). The results of max and sum approach the min and prod, respectively.  Finally, we use the ANOVA test to confirm that the results of our experiments are statistically significant. The H0 hypothesis of the ANOVA test is that the tested functions do not optimize the EER, whereas the H1 hypothesis states that the tested functions optimize the EER. Table 11 shows the result of the ANOVA test; the probability of H0 hypothesis is very low (p-value = 6.59e-32). It confirms that the results of our experiments are statistically significant, and the tested functions optimize EER.

| Multi-biometric fusion
In the experiments, we reach an EER under 0.013% and we get three multi-biometric fusion schemes that optimize the EER to the range of [0.0006%, 1.30%] from baseline scores with EER in the range of [1%, 99.55%]. This means that we reach a range improvement of 98%. Our functions outperform the results obtained in [42] using Random Subspace of AdaBoost (RS-ADA) for fusion which reaches an EER of 1.98%. Table 13 describes the fused traits in these multi-biometric schemes. The traits are face, multiple instances of fingerprint, multiple captures using different sensors and iris. Furthermore, the fusion is done on scores-the result of matching between fingerprint query and template from different sensors. This confirms the usability and significance of quality evaluation for weighting scores in such cases. Figure 14 gives the evaluation of the 10 fusion functions on one of the best multi-biometric fusion, on two tested sessions. Function 2 is the best function in this case as demonstrated previously in Table 9 using our ranking method. The HCSO reduces the EER of function 1 from 1.33% to 0.9%.
The example illustrated in Figure 14 shows that fusion allows reducing errors caused by divergent scores. Indeed, function 2, 5 and 8 give already good results and optimize Area Under Curve. For instance, the fused scores represent respectively the Face (CANON), Face (CANON with Flash), Iris and two fingerprints taken with different devices as described in the database. The other scores are filtered for dummies values included in the tested database.

| CONCLUSIONS AND FUTURE WORK
Here, the authors propose a new multi-biometric fusion scheme based on optimized and weighted functions. The fusion functions are generated using the GP based on arithmetic operations. The score weighting optimization is performed using the HCSO based on average-inertia weighted CSO and Normal Mutation CSO. The authors conduct the experimentation on the Biosecure score database to compare different combinations of the provided scores and find the best solutions that carry out a fixed range of EER. The proposed hybrid CSO optimizes fusion score weights and reach at least 80% enhancement of the tested multi-biometric combinations. The significance test confirms the improvement of our weighted scores fusion compared to typical fusion operations and weights' optimization using evolutionary algorithms applied on similar databases or approaching modalities. With different baseline systems, having probably divergent performances, the system improvement cannot be ensured by combining their scores. Therefore, associating weights to scores is essential in multi-biometric fusion. Quality weighting applied to user scores and classifier scores brings additional information about the used sensor, user behaviour and classifier performances. The experimental study compares a significant number of multi-biometric combinations instead of a limited number. Therefore, this allows analysing more expressively how much fusion and quality integration in the biometric process is important to get better results and confirm the effectiveness of each function to improve multiple configurations rather than a configured one. This can be used in the different contexts to verify whether the merit of fusion methods outstrips the performances of baseline systems. In future work, it has been planned to test another variant of CSO to reduce the time cost using parallel CSO and combine it with the current version.