Signal-level Fusion for Indexing and Retrieval of Facial Biometric Data

The growing scope, scale, and number of biometric deployments around the world emphasise the need for research into technologies facilitating efficient and reliable biometric identification queries. This work presents a method of indexing biometric databases, which relies on signal-level fusion of facial images (morphing) to create a multi-stage data-structure and retrieval protocol. By successively pre-filtering the list of potential candidate identities, the proposed method makes it possible to reduce the necessary number of biometric template comparisons to complete a biometric identification transaction. The proposed method is extensively evaluated on publicly available databases using open-source and commercial off-the-shelf recognition systems. The results show that using the proposed method, the computational workload can be reduced down to around 30%, while the biometric performance of a baseline exhaustive search-based retrieval is fully maintained, both in closed-set and open-set identification scenarios.


I. INTRODUCTION
Biometric technologies have become an essential component of many personal, commercial, and governmental identity management systems around the world.Both the positive (e.g.access control) and the negative (e.g.forensics and surveillance) identification scenario can greatly benefit through the use of biometrics.Biometrics rely on highly distinctive characteristics of human beings (see figure 1 for some popular examples), which make it possible for individuals to be reliably recognised using fully automated algorithms.The global market value for biometric technologies has been steadily growing in recent years and is currently estimated to be tens of billions of dollars [1]. Examples of actual application scenarios of biometrics beyond personal devices (see e.g.[2]) include, but are not limited to border control (see e.g.[3], [4], [5], [6]), forensic investigations and law enforcement (see e.g.[7], [8], [9]), national ID systems (see e.g.[10], [11]), as well as voter registration for elections (see e.g.[12], [13]).Currently, the largest systems of this kind reach hundreds of millions or even over a billion enrolled subjects (see e.g.[14]).
The increasing scope, size, and prevalence of biometric systems' deployments necessitate the development of technologies capable of efficient and accurate processing of biometric data.The aforementioned systems often need to operate in biometric identification and duplicate enrolment check scenarios, where typically an exhaustive search (i.e.one-to-many comparison) is needed in order to identify a potential previous enrolment based on a biometric probe.In Fig. 1: Example images of some commonly used biometric characteristics (from MCYT [15], FRGC [16], and IITD [17]) this context, solutions which help to achieve shorter practical system response times through algorithmic methods, rather than through mere scaling of the hardware architecture are of interest; such algorithmic methods, referred to as biometric workload reduction and/or biometric indexing, can help to improve user experience and to reduce the monetary costs.
The practical relevance of such methods is evidenced by a strong interest from the governmental side, manifesting itself through numerous benchmarks and competitions [18], [19], [20].Accordingly, significant research efforts have been devoted to development of methods for computationally efficient biometric identification systems.Beyond mere software and hardware-based implementation optimisations, the idea of computational workload reduction in biometric identification has received a lot of attention from researchers.Such methods typically aim to decrease the number of template comparisons needed in a biometric identification transaction by taking advantage of certain intrinsic properties of biometric data coupled with advanced search structures and algorithms.For more details on this subject, the reader is referred to subsection II-A and a survey of Drozdowski et al. [21]).In contrast to methods reported in the literature, which typically rely on access to feature vectors extracted by facial recognition algorithms, this article considers the more challenging case of a black-box biometric recognition system (i.e.any commercial off-the-shelf product with unknown internal representation of feature vectors in proprietary format); specifically the method proposed in this work relies merely on raw facial biometric samples and the capability to compute comparison scores between them.Furthermore, the method proposed in this work requires neither additional training nor classifiers for the purpose of computational workload reduction.

A. Contribution and Organisation
This work proposes a concept wherein signal-level information fusion is utilised as a basis for a multi-stage indexing and retrieval scheme for facial biometric data.The proposed method relies on facial image morphing (see subsection II-B and a survey of Scherhag et al. [22]) and filtering of candidate short-lists based on biometric comparison scores.A comprehensive experimental evaluation shows that by using the proposed concepts, the computational workload associated with a biometric identification (both closed-set and open-set) transaction can be reduced down to around 30%, without negatively affecting the biometric performance in terms of recognition accuracy.
By virtue of relying on signal-level fusion, the proposed concepts could be integrated even into black-box face recognition systems, i.e. without accessing the underlying algorithms and feature representations.This contribution is in stark contrast to numerous other methods published in this area, as they very often rely on unrestricted access to biometric templates (i.e.extracted feature sets) to facilitate indexing and efficient retrieval.
The work presented in this article significantly extends the proof-of-concept publication of this method in [23].More specifically, the conceptual and technical contributions beyond the original conference paper include: • New methods of intelligent pairing of parent images to be morphed, which significantly improve the originally achieved results.• A theoretical and empirical analysis of an extension of the originally proposed two-stage retrieval scheme to incorporate multiple stages.• A larger evaluation dataset.
• Numerous additional open-source (OSS) and commercial off-the-shelf (COTS) morphing tools and face recognition systems used in the evaluation.
The remainder of this article is organised as follows: section II provides the relevant background information and surveys the related works.The proposed system is described in section III.The experimental setup is outlined in section IV, while the experimental results are presented in section V. Concluding remarks and a summary are given in section VII.The scope of this article combines two areas within biometrics, namely computational workload reduction and signallevel fusion through morphing.The pertinent background information and key related works for those two fields of research are surveyed in subsections II-A and II-B, respectively.

A. Computational Workload Reduction
To facilitate a growing number of data subjects (i.e. the increase of the size of an enrolment database), optimisations or additional investment are needed to maintain fast biometric identification system response times.While expanding the hardware (e.g. by distributing the computations to many servers) can solve the problem, such solutions are naturally associated with monetary costs for the equipment purchase, installation, maintenance, etc.In this context, the goal of computational workload reduction methods is to decrease the necessary amount of computations for a given task of a biometric system, thereby mitigating some of the physical infrastructure costs.Performing the biometric template comparisons dominates the overall computational costs of a biometric identification transaction; therefore, most of the workload reduction approaches address this step of the system pipeline [21].Two main classes of such approaches are feature transformation, with the aim to reduce the computational cost of individual template comparisons (see e.g.[25]), and preselection, with the aim of search space reduction, i.e. the number of necessary template comparisons (see e.g.[26]).In the context of this work, the latter type of methods is of interest.
A simple and common method for database pre-filtering is the utilisation of geographic and/or demographic metadata to narrow down the potential search space (see e.g.Gehrmann et al. [27]); soft biometrics (see e.g.Dantcheva et al. [28]) can also be used in an analogous manner.The works of [29], [30], [31] rely on a concept for a two-stage retrieval.In the first step, a compact representation (e.g.binarised or with reduced dimensionality) of biometric data is used to prefilter an enrolment database, whereas in the second step the actual (with high discriminative power) biometric templates are compared within the pre-selected shortlist.Similar methods based on coarse-to-fine search, nearest-neighbour search, and clustering based on the feature sets extracted from biometric samples have also been published.In addition to such singlemodal methods, a number of methods which rely on biometric information fusion (see e.g.[32], [33] for surveys on this topic) has been proposed.In [34] and [35], multi-instance binning and indexing methods were proposed for fingerprint and iris data, respectively.More generic multi-biometric indexing methods were proposed e.g. in [36], [37], [38].Those methods relied on information fusion on the feature-level in order to build some kind of an intelligent search structure (e.g. a tree).In [39], a multi-instance fingerprint indexing method which relies on rank-level fusion was proposed.A multi-biometric cascade where successively smaller candidate short-lists are created at score-level was proposed in [40].A decision-based Fig. 3: Overview of the proposed system cascade operating on the principle of sequential fusion of fingerprint and iris recognition systems was presented in [41].
Notably absent in the aforementioned works (and indeed in a comprehensive survey of the field [21]) are approaches which rely on signal-level fusion.In the context of face recognition, signal-level fusion might be achieved with morphing (see subsection II-B).While most of the contemporary research concentrates on the vulnerability caused by morphs (i.e.average facial images derived from two or more parent images) and detection thereof, in this article they are considered from an entirely different angle -their properties are exploited to reduce the computational costs of the biometric identification transactions.A short proof-of-concept for this idea was presented by Drozdowski et al. [23], who coupled signal-level fusion (morphing) with a concept for a two-stage pre-filtering and retrieval system.This article is based on and significantly extends said work both conceptually and experimentally.

B. Morphing
Image morphing is a long-standing field of research with numerous practical applications, most notably in e.g.medical imaging and special effects in the film industry [42].In the context of biometrics, image morphing methods enable creation of biometric samples which contain biometric information from two or more distinct data subjects.Such artificially created samples bear resemblance to the two (or more) original parent samples both in the feature and image domain.In other words, the unique link between individuals and their biometric reference data can be broken, as the subjects whose biometric samples are contained in the morphed image can both be matched (accepted) during subsequent biometric recognition transactions with the morphed reference image.Ferrara et al. [43] showed that administrative processes for issuance of biometric travel documents have vulnerabilities (in certain countries), which enable a submission of a morphed image within a passport application process.After the (otherwise fully genuine) passport is produced, it makes it possible for multiple individuals to cross borders with biometric checks.This attack vector, dubbed the "magic passport", has been shown to be feasible both against automated systems and human experts alike [44].Since then, it has also been shown that other biometric characteristics might be susceptible to attacks which rely on variations of morphing, see e.g.[45], [46].
The facial image morphing process is relatively simple; viable and realistic morphs can be generated even by nonexperts with a variety of inexpensive or free software tools [22].A well-known case with practical relevance is that of a German activist group, "Peng!", who successfully applied for and received a passport with a morphed image of a group member and a high-ranking EU-level official [47].A typical morphing process consists of following steps: 1) facial landmark detection and triangulation in multiple input images, 2) landmark averaging to a single set of landmarks, 3) warping and alpha blending the information into a single output image, and 4) post-processing, such as image compression or artefact removal.Recently, morphing techniques based on generative adversarial networks (GANs) have been proposed, see e.g.Damer et al. [48].Figure 2 shows an example of facial image morphing.
In recent years, facial image morphing has been a hot topic in the biometric research community.Significant efforts were put into development of algorithms which can automatically and reliably detect morphed images.Survey articles by Makrushin et al. [49] and Scherhag et al. [22] provide a detailed overview of facial morph creation and detection methods.Those are, however, out of scope for this work; instead, the intention of this work is to take advantage of morphing with the goal of improving a biometric system.Preliminary works on this subject included e.g.Korshunov et al. [50], who used morphing for privacy-preservation, and Drozdowski et al. [23], who conducted a proof-of-concept study on accelerating biometric identification transactions using morphing.

III. PROPOSED SYSTEM
In this section, the proposed system is described.Subsection III-A provides a conceptual overview of the proposed system.Subsections III-B and III-C describe the algorithms for retrieval in an identification transaction and selection of subjects to form morph pairs, respectively.

A. Overview
The proposed system relies on signal-level fusion (facial image morphing) to create a multi-stage retrieval structure for biometric identification transactions.The requisite changes w.r.t. a simple, exhaustive search-based biometric identification system mainly pertain to two subsystems of a generic biometric system as specified in ISO/IEC 19795-1 [51]: Data storage subsystem contains the biometric samples representing N enrolled subjects, from which an additional index is created through the application of signal-level fusion, whereby the resulting index-samples contain biometric information from multiple subjects (recall subsection II-B).This signal-level fusion can be performed for two or multiple parent samples, so that each sample contributes equally to the resulting morph.The number of parent samples contributing to a morph is referred to as "morph capacity" and denoted as n; thus, N n morphs are created.In subsection III-C, the methods for pairing the samples to be morphed are described and analysed.Comparison subsystem the biometric probe is first compared against the N n morphed samples exhaustively.A shortlist of the most likely k candidates is pre-selected based on those comparison scores.On a secondary processing level, template comparisons between the biometric probe and the normal reference templates are conducted within the candidate short-list.Figure 3 shows a conceptual overview of the proposed system.It is worth noting that the proposed system could be effortlessly combined with certain other methods of computational workload reduction, such as binning or pre-selection based on demographic or geographic attributes.More precisely, the combination with said methods could significantly reduce the computational effort required in the first step of the retrieval algorithm of the proposed system (see next subsection).

B. Retrieval
The proposed system relies on the fact that the morphed images retain enough discriminative power for the probe to exhibit better comparison scores against its correct (mated) morph than against the other (non-mated) morphs.This in turn makes it possible to robustly select a candidate short-list to be passed onto the next stage of the pipeline.The computational workload of an identification transaction (measured in terms of the number of necessary template comparisons and denoted W two−stage ), can be expressed as follows: A decision space for the computational workload of the proposed system can be visualised by controlling the n and k parameters.This is shown in figure 4. The y-axis expresses a fraction of the baseline system's computational workload (as proposed in [52]); the baseline workload is equal to 1.0.The x-axis expresses the size of the pre-selected candidate short-list as a fraction of the enrolment database size (i.e.k N ).The baseline (exhaustive search) does not depend on the aforementioned parameters and is therefore plotted as a horizontal line for reference.It can be seen that provided a sufficiently small short-list (i.e.k N < 0.1), the proposed system might significantly reduce the computational workload of a biometric identification transaction.In addition to a two-stage retrieval, the concept can be extended to facilitate multiple stages.In such a scheme, the candidate short-list is successively filtered in a cascading manner (conceptually similar to the multi-stage and multibiometric cascade of Drozdowski et al. [40]), with the number of subjects contributing to the morphs being reduced at each level.Instead of a single threshold for the short-list size, each level would require its own threshold.Such a multi-stage system depends on following variables: N , the number of subjects represented in the enrolment database; n 1 selected from the set {2 x | x ∈ N + } and denoting the number of subjects contributing to the morphs on the first level, thereby also determining the total number of levels of the cascade, i.e. l = log 2 n 1 + 1; and the short-list size thresholds for each cascade level, i.e. {k 1 . . .k l }.The computational workload in a multi-stage retrieval scenario, denoted W multi-stage , can be obtained using the following formula: A decision space for the computational workload of the proposed multi-stage system for a cascade with 3 levels of morphs (i.e. starting with morphs each consisting of 8 data subjects) and a final level with the actual reference images can be drawn as shown in figure 5.The x, y, and z-axes denote the short-list size (i.e. the k l parameter in relation to N ) on each level of the cascade, whereas the colour space denotes the overall workload in relation to the exhaustive search-based baseline, whose workload equals 1.0.For clarity, only the configurations resulting in computational workload equal to or lower than the baseline are depicted.A cascade with 2 levels of morphs (i.e. starting with morphs each consisting of 4 data subjects) is also theoretically feasible, whereas a cascade with more than 3 levels of morphs does not appear theoretically feasible.This limitation is due to the prohibitively lowered discriminative power of the system as too much information is lost by morphing so many (i.e.16 or more) subjects with each other.In order to reduce the computational workload of an identification transaction, the proposed system must require fewer template comparisons than a typical, exhaustive searchbased one.In other words, the relation W proposed < N must be satisfied.From the theoretical calculations demonstrated in figures 4 and 5 for the proposed two-stage and multistage systems, respectively, it can be seen that there do exist configurations (based on k and n parameters) which require significantly less computational workload than the baseline.In other words, provided that the discriminative power (i.e.biometric performance) of the system can be maintained, the proposed system could operate more computationally efficiently than the baseline.This trade-off between computational workload and biometric performance is evaluated empirically later on in this article.

C. Selection of Morph Pairs
Deciding which parent samples to morph with each other is expected to have a non-trivial impact on the efficacy of the proposed system.This assumption is based on previous works by Damer et al. [53] and Röttcher et al. [54], who showed that morphing attacks can be improved by intelligently (i.e. based on quantitative criteria) selecting subjects to morph with each other.
Such a subject-pair selection belongs to a well-known and old class of combinatorial optimisation problems.It could be, for example, formulated as a stable roommates/marriage problem.However, in the practical experiments, this formulation ran into a number of issues regarding so-called "odd pairs" and solvability of the problem for a large number of data subjects (see [55], [56]).To circumvent this issue, instead of seeking a stable matching, one could try finding a matching based on a global (i.e. for the entire enrolment database) optimisation of a cost function, thereby allowing some poorly matched pairs (i.e. with a high cost).This formulation corresponds to the assignment problem and was successfully applied in the practical experiments.
Formally, given S as the set of data subjects in the enrolment database and a weight function C : S ×S → R + , the goal is to find a bijective mapping of this set to itself, f : S → S, so that the cost function s∈S C s,f (s) is minimised and ∀s ∈ S, f (s) = s (i.e. the subjects cannot be mapped to themselves).In this work, three selection methods were considered: Random the pairs of samples to be morphed are assigned by chance, i.e. without any kind of selection.Soft-biometric the pairs of samples to be morphed are assigned based on similarity in terms of soft-biometric attributes.In this work, the subject's sex, age, and skin colour were used.Similarity-score the pairs of samples to be morphed are selected based on similarity in terms of comparison scores computed with a facial recognition system.For the latter two, the weight function is obtained by calculating the dissimilarity (soft-biometric or face recognitionbased) scores.Thus, for an enrolment database of N subjects, a square matrix is created as shown in equation 3, where S x denotes the x'th data subject, while c x,y denotes the dissimilarity score between the x'th and y'th data subject in the enrolment database (i.e. the cost of pairing the two subjects with each other).Due to the constraint that the subjects cannot be paired with themselves, the diagonal is set to ∞ (represented by the floating-point format size limit in the actual software implementation).
Using the above matrix-based formulation, this problem can be solved in polynomial time with the so-called Hun-garian algorithm [57].The expectation of such an intelligent assignment is an increased discriminative power of the first step(s) (i.e. the pre-selection) of the proposed system, whereby its overall results (biometric performance and computational workload) would be improved.For configurations where more than two data subjects contribute to the morphs in the cascade, i.e. n > 2, the pairing follows an iterative procedure.First, the pairings are computed for individual reference images as described above and two-subject morphs are created accordingly.Thereupon, the procedure is repeated for the resulting morphs (i.e. the cost matrix is re-computed and the morphing is done) in order to create morphs with four or eight contributing subjects.

IV. EXPERIMENTAL SETUP
The following subsections describe the experimental setup used for the evaluation of the proposed system.The used datasets are presented in subsection IV-A, while the used face recognition systems and image morphing tools are presented in subsections IV-B and IV-C, respectively.Finally, subsection IV-D specifies the used evaluation metrics.All possible combinations of the face recognition and morphing tools, as well as settings of the proposed system were evaluated.For the random pairing algorithm, a cross-validation over 10 folds was conducted.Table I summarises the counts of the individual parameters; accordingly, the total number of tested configurations was 144.

A. Datasets
The experiments were conducted using a combined database of over 1,000 subjects stemming from the FERET [58] and FRGCv2 [16] databases.A subset of images was selected based on conformance with ICAO requirements for passport images [59].To facilitate the soft-biometric pairing method described in subsection III-C, the datasets were annotated for the following demographic attributes (where available, existing labels given by dataset providers were used directly): sex, age, skin colour.Table II gives an overview of the used datasets, while figure 6 shows example images from them.It should be noted that since the morphing process requires highly constrained images, using in-the-wild datasets is not applicable in the context of the conducted research.[60], VGGFace2 [61], and ArcFace [62] with pre-trained models made available by the authors were utilised.Additionally, a very recent version of a strong commercial off-the-shelf (COTS) system from a well-established provider was utilised.For the used open-source systems, access to the underlying algorithms and feature representations is available, whereas the COTS system operates in a black-box manner and only allows the comparison score between two biometric samples to be computed.

C. Morphing Tools
Four tools were used to create morphed images for the experiments: the open-source OpenCV [63] and FaceMorpher [64] implemented with Python and dlib [65], as well as the proprietary FaceFusion [66] and UBO-Morpher [67].The resulting images exhibit differences due to pre-and postprocessing being applied (e.g.filters, cropping) and variations in the underlying morphing algorithms.Example images of morphs produced by the used tools are shown in figure 7.For the pair selection method described in subsection III-C, a SciPy [68] implementation of the Hungarian algorithm was used.

D. Evaluation Metrics
The proposed method and an exhaustive-search based baseline method were evaluated on two key aspects, using ISO/IEC standard methods and metrics [51]

V. RESULTS
The following subsections present the results of the conducted experimental evaluations.First, in subsections V-A and V-B, experiments are conducted w.r.t. the choice of suitable face recognition and image morphing methods.The effects of subject pairing methods are evaluated in subsection V-C.Finally, the results achieved by the proposed pre-selection and overall multi-step system are presented in subsection V-D.

A. Selecting Face Recognition Systems
The systems described in subsection IV-B were first evaluated in a baseline scenario, where closed and open-set identification was conducted on an enrolment database of 1024 data subjects using exhaustive search method (hence W Baseline = 100%).Those results are reported in table III.It can be seen that the COTS system and the open-source ArcFace system achieve high separability of the score distributions and perform well in both operation modes, whereas the biometric performance is strongly degraded for the other two open-source systems.This is especially the case for the openset mode, which is much more challenging than closed-set.For the experimental results in the subsequent subsections, the COTS system is used along with the best open-source software system (ArcFace), which is henceforth referred to as "OSS".

B. Selecting Morphing Tools
A prerequisite for the proposed method to work is that all subjects contributing to a morphed image can be successfully matched against it.In other words, the amount of biometric information contained in the morph should be approximately equal for each contributing subject.To check if a morphing tool fulfils this prerequisite, following experiment is conducted: a morph is created from two parent images (references) from two randomly selected subjects.Then, other images (probes) from those two subjects are compared to the created morph.The process is repeated for all data subjects and two score distributions are plotted for the morphing tool described in subsection IV-C and face recognition systems chosen in subsection V-A.
Figure 8 shows the aforementioned score distributions.It can be seen that for the two tools depicted on the right side of the figure (FaceFusion and UBO-Morpher), the distributions for the two contributing subjects strongly diverge, especially for the COTS system.That is, the second subject tends to receive much worse comparison scores against the morphed image they contributed to; consequently, false negative errors would be more likely to occur for this subject in the context of the proposed multi-step system.The reason for this score divergence presumably lies in the specifics of the algorithm with which the images were morphed.More specifically, in order to avoid artefacts around the head and in the hair region, some tools only conduct morphing within the face region, whereas the surrounding area is cut in from only one (first) of the subjects (c.f.figure 7).Consequently, such morphs actually contain more biometric information from one of the subjects, which is then reflected in the differences in the comparison scores.
The distributions depicted on the left side of the figure (FaceMorpher and OpenCV) overlap perfectly for the COTS system; for the OSS system, the overlap is very tight as well, albeit slightly favouring the OpenCV method.To quantitatively confirm those observations, comparisons using the Wasserstein distance are performed.This distance metric is obtained by computing ∞ −∞ |S 1 − S 2 |, where S 1 and S 2 denote the cumulative distribution functions for the data displayed in each of the plots in figure 8.The results of this evaluation are shown in table IV.The lowest distances between the distributions are observed for the OpenCV morphing tool for both OSS and COTS recognition systems.Based on those quantitative and qualitative results, the OpenCV tool was used for the experiments in the subsequent subsections.

C. Impact of Pairing Algorithm and Morph Capacity
The choice of the image pairing method and the number of subjects contributing to a morph has a non-trivial impact on the system.As demonstrated in figure 9, by intelligently selecting the pairs, the score distribution for mated-morph comparisons moves towards the distribution of mated scores.The intuitive soft-biometric method performs slightly better than random selection of pairs, whereas significant improvements are observed for the method based on similarity-scores.The aforementioned figure also shows the obvious effects of increasing the capacity of the morphs -as more subjects are morphed into one image, it tends towards an average face, thereby decreasing the discriminative power.For the tested face recognition systems, it appears that morph capacities beyond 8 would unlikely be feasible for use in the proposed multi-step retrieval.To further evaluate the impact of the two aforementioned factors, closed-set identification experiments were carried out.The resulting CMC curves are shown in figure 10.It can be observed that the curves quickly converge onto 100% HR for morph capacities of 2 and 4 subjects; in other words, this means that only very small short-list size would be required to avoid pre-selection errors.For the morph capacity of 8 subjects, the curves reach the 100% point much later, i.e. larger short-lists would be required.This does not necessarily disqualify configurations with such a morph capacity, as the overall computational workload depends not only on the short-list size of the pre-selected short-list, as described in subsection III-B.Furthermore, such morphs could potentially be used in the multi-step system or for applications where some pre-selection errors might be tolerable.This aspect of the trade-off between biometric performance and computational workload is addressed in more detail in the next subsection.From the figure 9, the impact of the pairing method is very clear -vast improvements can be observed when moving from the random pairing to pairing based on soft-biometrics and even more so when the pairing is based on similarity-scores.Both COTS and OSS face recognition systems perform well, with the COTS system being the better of the two, which is to be expected intuitively and based on the baseline results from subsection V-A.

D. Overall Results
Table V summarises the results of the pre-selection in a two-step retrieval system for several commonly reported levels of hit rate.Since computational workload is reduced relative to the baseline by using the proposed method (c.f.table III), it follows that 0% < W Proposed < 100%.For HR > 99%, lowest computational workload is achieved with pairing based on similarity-scores and a morph capacity of 4 contributing subjects for the COTS face recognition system.For the OSS face recognition system, its lower discriminative power has to be mitigated by selecting a lower morph capacity of 2. At the stringent 99.5% or even 100% HR level, both the OSS and COTS systems approach their respective lower limits (c.f.figure 4) of computational workload for the selected morph capacities, i.e. approximately 1 2 and 1 4 of the baseline workload.
Table VI summarises the overall results of the best configuration of the proposed system in open-set and closedset identification scenarios.For both COTS and OSS face recognition, the biometric performance of the baseline method (c.f.table III) is maintained, while the proposed system allows for the computational workload to be reduced by approximately 70% and 50% for COTS and OSS face recognition systems, respectively.Furthermore, for the COTS system, a slight improvement (lower workload) from two-step to multistep retrieval method can be observed.

VI. DISCUSSION
In subsection VI-A, the proposed system is compared w.r.t.other existing methods of computational workload reduction.Subsection VI-B discusses the scalability of the proposed system.

A. Comparison with Existing Systems
While methods with an even better computational workload reduction than the proposed method have appeared in the academic literature (see [21]), they require access to more information or are less flexible than the proposed method.More specifically, such approaches often utilise the feature vectors of the underlying facial recognition system or additional (potentially error-prone) classifiers; furthermore, other prerequisites such as extensive training or simple template comparators may limit their flexibility.Due to this reason, a direct benchmark against such methods would be cumbersome due to different aspects of the system being focused on.More concretely, the proposed method focuses on a balance between computational workload reduction and high flexibility irrespective of the used face recognition system, whereas the related works tend to focus on maximising the computational workload reduction by being strongly interwoven with the underlying biometric recognition system.Consequently, the vast majority of the approaches proposed in the literature cannot be combined with COTS recognition systems, due to their operation as black-box systems.

B. Scalability
The concepts underlying the proposed indexing scheme can be theoretically seamlessly scaled with the growing size of the enrolment database and can be trivially parallelised or distributed.Furthermore, the proposed retrieval algorithm features a flexible design which facilitates dynamic adjustment of decision thresholds and setting pre-selection subset sizes relative to the enrolment database size.One important consideration w.r.t. the scalability would be the increased computational costs of the pairing algorithm (subsection III-C); those could, however, be mitigated by computation distribution or additional bucketing of the enrolment database.On the other hand, a larger size of the enrolment database would likely increase the probability of finding suitable pairings for all the subjects (especially for the outliers).This in turn would be expected to improve the quality of the morphing process and the retrieval algorithm.Thus, the proposed methods can be expected to scale both in terms of biometric recognition performance and computational efficiency.
As mentioned in subsection IV-A, the proposed method requires somewhat constrained data of certain quality.While directly testing the scalability of the proposed method with a larger dataset would certainly be of interest, currently no large-scale, publicly available datasets of facial images with sufficient image quality exist.Images in well-known largescale datasets, e.g.LFW [70] or MegaFace [71], do not generally possess a sufficient image quality for the proposed method.While this can certainly be considered a limitation of the proposed method, it is worth noting that many existing biometric systems, e.g.border control, operate with images of high-quality, thereby making the application of the proposed method theoretically feasible (see section VII for more details).

VII. CONCLUSION
In this article, facial image morphing, which constitutes a serious attack vector against biometric systems has been reconceptualised for use in a beneficial manner.Specifically, a method of indexing biometric data with signal-level fusion has been presented.The proposed method relies on intelligent pairing and morphing of facial parent images to facilitate a multi-step retrieval for biometric identification transactions.In a comprehensive experimental evaluation with open-source and commercial systems, the proposed method has been shown to achieve a biometric performance nearly identical to that of an exhaustive search-based baseline, while simultaneously substantially reducing the computational workload of biometric identification transactions (down to ∼30% at 100% HR).
In contrast to related works, the proposed method could be effortlessly integrated even into black-box biometric recognition systems, as it merely requires access to the raw biometric samples (facial images) and the comparison scores.One limitation of the proposed method is the requirement of goodquality reference images.This notwithstanding, numerous operational systems could benefit of the proposed method, as they store or process images compliant with very strict quality standards [59], e.g.biometric samples used in passports or visa applications.Furthermore, the enrolment of new subjects into an existing system would require a periodical re-computation of the index.
Although out of scope for this article, the developed concepts may also be applied and benchmarked in conjunction with a feature-level fusion in the context of white-box systems, which would constitute an interesting item of future research.

Fig. 4 :
Fig. 4: Computational workload per identification transaction of the proposed two-stage (1 stage with morphed images and 1 stage with reference images) system in relation to an exhaustive search-based baseline system.The parameter k is implicitly included in the calculation of x-axis values.

Fig. 5 :
Fig. 5: Computational workload per identification transaction of the proposed multi-stage system (3 stages with morphed images and 1 stage with reference images) in relation to an exhaustive search-based baseline system.The parameters k l are implicitly included in the calculation axis values corresponding to their level (l) in the multi-stage system.
in a morphed image (subsection III-B) 3 Image pairing method for morphing (subsection III-C) 3 Face recognition system (subsection IV-B) 4 Image morphing tool (subsection IV-C) 4

Fig. 6 :
Fig. 6: Example images from the used datasets

Fig. 7 :
Fig. 7: Example morphs produced by the used tools (images in subfigures (a) and (b) taken from FRGCv2)

Fig. 8 :
Fig.8: KDEs of comparison score distributions between a morph and subjects contributing to it.Note the divergences between the distributions for some of the morphing tools.

TABLE II :
Overview of the used datasets' subsets

TABLE III :
Baseline results

TABLE IV :
Wasserstein distances between the distributions depicted in figure8

TABLE V :
Summary of the pre-selection results (best result for each HR level for COTS and OSS face recognition marked in bold typeface)

TABLE VI :
Summary of the proposed system results