Unimodal-Bio-GAN: Keyless biometric salting scheme based on generative adversarial network

Cancellable biometrics enabled us to develop robust authentication systems by replacing the storage of the original biometric template with another secured version. A technique called biometric salting uses a parameter (key) and an invertible function to transform the human biometrics features into a secured format that can be protected and stored securely in a biometric database system. The salting key plays a main role in the success of this transformation, which makes it robust or vulnerable to many security attacks. One of the main challenges that faces biometrics' researchers currently is how to design and protect such a salting key considering two basic measures: security and recognition accuracy. In this article, we propose unimodal-Bio-GAN, a reliable keyless biometric salting technique based on standard generative adversarial network (GAN). In unimodal-Bio-GAN, a random permuted version of the human biometric data is implicitly considered as a salting key and required only during the enrolment stage, which increases the system reliability to overcome different security attacks. The experimental results of unimodal-Bio-GAN using the CASIA Iris-V3-Internal database outperform the previous methods and its security efficiency is analysed using different attack types.


| INTRODUCTION
Since, we are living at the age of technology revolution, security problems have become more popular and challenging. Biometric techniques rely on extracting unique features of human traits (physiological or behavioral) such as fingerprint, voice, iris, handwritten signature, face, etc. to replace the traditional (token/password) authentication systems and overcome their security challenges [1]. While human biometrics cannot be lost, shared or forgot like traditional techniques, it can be stolen by storing it explicitly in a system database. Therefore, a transformed version of the biometric data is required to protect it since these biometric features can not be changed or revoked [2]. In cancellable biometric, there are two techniques to protect the biometrics' features and transform it into a secured format: invertible (salting) and non-invertible schemes [3]. Non-invertible schemes rely on a non-invertible function for biometric traits transformation, achieve high authentication security and suffer from some degradation in the recognition accuracy [4]. On the other hand, salting schemes utilize an invertible function and another parameter called (salting key) for transforming the biometric template [2,4]. While salting techniques achieve high recognition accuracy, their dependency on slating key introduces some security challenges [4].
To overcome the security challenges introduced by using a salting key for biometric data transformation, techniques such as biohashaing [5,6] and a bloom filter [7][8][9][10] is hiding its values in a physical secured token. Biohashing utilizes a random key to make a random projection of the biometric traits. the bloom filter salting scheme generates many transformations of the same biometric traits by varying the values of a user-specific random key called filter. While biohashing achieves high recognition accuracy and the bloom filter accelerates the transformed biometrics comparison, they both suffer from security attacks if the utilized token has been stolen [11][12][13].
In order to solve the problems introduced by the stolen token, a bioencoding technique is developed to transform the biometric data using the non-secured (public) salting key stored in the authentication system database [3,14,15]. Bioencoding utilized the stored public key for mapping a fixed size iris template into a transformed one called 'bio-code' using Boolean function. Although, reconstructing the genuine iris template from bio-code is computationally hard by a random guessing, a non-genuine iris template could be constructed using the stored public key and the transformed bio-code, which acts like a fake legal code, this attack is well known as 'pre-image attack' [16,17].
Recently, another category of salting schemes is developed to overcome the threat of pre-image attack by storing the salting key values as a hidden key. Mayada et al [18,19] proposed a salting scheme to avoid pre-image attack by hiding the salting key in a bidirectional association memory model (BAM). Although, the proposed scheme overcomes pre-image and correlation attacks, the recognition accuracy of the authentication system mainly depends on the utilized salting key rather than the biometric data.
In the light of previous limitations, we propose a biometric salting scheme which mainly depends on the biometric data rather than an external salting key. The salting key is extracted from the input biometric data during the enrollment phase. The generated salting key is later used to produce the cancelable template for each enrolled user. The authentication phase is performed using only a test sample of the biometric data without using any external key, which is preferred for avoiding the security vulnerabilities of using external keys. The proposed scheme is considered as the first cancellable salting scheme depending only on the biometric traits for authorizing users. There is no need to store the salting key in a token or store it publicly or even make it hidden in the authentication system database. The proposed scheme uses generative adversarial network (GAN) [20] for protecting the biometric data. GAN is considered as a transformation function for the biometric traits using a random permutation version from the biometric traits itself. The permuted version of biometric traits acts as the salting key which is needed only during the enrollment phase.
For each enrolled person, a random permutation version from his biometric traits is fed into GAN discriminator networks' inputs and samples from his biometric traits are fed into GAN generator networks' inputs. According to the reversibility of the GAN model [21,22], the output produced by generator network after training the GAN model is binarized and Xor-ed with the mean of the input biometric traits samples in order to generate the genuine cancellable template for the input person. Finally, the generated cancellable binary template and the generator network weight values are stored in the system database and can be used later in the authentication phase. In the authentication phase, a test biometric sample for an input person is fed into GAN generator networks. Then, the output is binarized and Xor-ed with the input sample to generate a test cancellable template. Finally, the test cancellable template is compared to the genuine cancellable template of the claimed person to verify its identity.
This article is organized as follows: section 2 has some basic information about GAN, the proposed scheme is introduced in section 3 while its security analysis is presented in section 4, section 5 evaluates the recognition accuracy of the proposed scheme using simulated experiments based on unimodal iris biometric traits, and finally section 6 concludes the main results.

| GENERATIVE ADVERSARIAL NETWORK
Generative adversarial network (GAN) is one of the advanced machine learning techniques that is used as a generative model based on a neural mathematical framework [20]. GAN utilizes two competitive networks (generator and discriminator) in order to create a perfect generative result [21]. GAN is used widely to synthesize a real image based on randomly generated noisy inputs. The real image samples are fed into the discriminator network, while the noisy random data is fed into the generator network. The output of the generator network is fed into the discriminator network to determine whether it is a real image or not, the decision output of the discriminator network is used to update the generator network's weights. After certain number of training epochs for both networks, the synthesized real image is generated. Standard GAN is trained by updating generator and discriminator alternatively using the traditional back propagation algorithm.
As shown in Figure 1, the generator network takes a randomly distributed noise samples z ∼ p(z) as inputs and produces a generative output G(Z). The distribution of the generated data G(Z) is supposed to be close to the real data (discriminator inputs). The discriminator network works to distinguish the real data x and the generative output G(Z) [23]. The objective function of the standard GAN model training process is formalized in Equation (1) [22,23].
where G represents generator mapping function, D represents discriminator transferring function, x denotes the real data, and z denotes random noise data.
Recently, the GAN model has been used for synthesizing a realistic biometric images based on random sample such as iris [23], face [24], and fingerprint [25]. Moreover, GAN model has been used for security purposes, in [26] GAN is used for protecting the stored biometric template by generating a permuted-indexing cancellable biometric.

| THE PROPOSED METHODOLOGY
The proposed methodology is inspired by the standard GAN model. The GAN model is suggested to develop a keyless salting cancellable biometric scheme. Unlike traditional salting techniques, which requires external salting key for producing a cancellable template, the proposed salting scheme generate the cancellable template using only the biometric traits samples.
The proposed scheme utilizes the standard GAN model for producing transformed version for the biometric data. Since the standard GAN architecture requires two types of input data (real and random) samples [20], the real data in the proposed Bio-GAN model is represented by a permuted version of the biometric data while the random data is represented by the set of training biometric samples. As shown by Figure 2, the generator network creates candidates transformed templates for the input biometric samples while the discriminator network evaluates them. The objective of the discriminator network is to distinguish between the input biometric samples and the permuted biometric data while the objective of the generator network is to create a transformed template close to the permuted biometric data. Based on the decision of the discriminator network, the weights of the competitors' networks are updated.
The proposed architecture consists of enrolment and authentication phases. The role of the enrollment phase is building and training the GAN model which ends with generating the cancelable template while the role of the authentication phase is using the trained generator network to create a transformed template for test data and verify it. Figure 3 illustrates the stages of the enrolment phase for the proposed salting scheme. The enrolment phase applies two main stages for each data input for a person. The goal of the first stage is utilizing GAN model for producing a transformed version of the biometric traits using the generator network. As shown by Figure 3, given m biometric traits samples for each enrolled person, a biometric template of size 1 � n is generated for each input sample using the feature extraction process. During the training phase of GAN model, the generated templates are used as inputs for the generator network. The output of the generator network (a template of size 1 � n) is used as input for the discriminator network.
Using the same feature extraction process, another biometric template is generated using any selected sample of the input biometric traits samples. In order to produce a salting key for the enrolled person, this generated template is randomly permuted and used as another input for the discriminator network. For each training epoch, the multi-layer discriminator network updates its weighs using the traditional backpropagation learning algorithm [20], the output decision of discriminator network is used to update generator network weighs using the same training algorithm. After certain number of training epochs for generator and discriminator networks, the output produced by the generator network is used to create the cancellable template for the enrolled person. Eventually, the final generator network weights are stored in the system database.
Although the transformed template produced by the generator network could be used as the final cancellable template, the original biometric traits could be easily revealed if the transformed template is publicly stored in the system database along with the generator network final weights' values. This security attack occurs due to the reversibility property of GAN model [21,22]. Accordingly, the second stage is introduced to secure the stored cancellable template by producing a noninvertible cancellable template to overcome the system F I G U R E 2 Bio-GAN scheme architecture. GAN, generative adversarial network TAREK ET AL.
-3 database attacks along with improving the system performance. A simple XOR binding operation is suggested as an extra transformation level. The irreversible XOR operation is utilized to bind a binarized version of the transformed template produced from GAN and another binarized template computed using the mean of all generator's biometric training templates. The binary output of an XOR function is stored in the system database as the final cancellable template. The enrolment process of the proposed scheme is illustrated by Algorithm 1.

Algorithm 1 Unimodal-Bio-GAN Enrollment
12: Store W g and T ref in authentication system database.

F I G U R E 3 Enrollment phase for the proposed Bio-GAN scheme. GAN, generative adversarial network
During the authentication phase, the proposed scheme requires only a test biometric traits sample for authorizing the input person identity. As shown in Figure 4, first, a test template is generated through the feature extraction process. This test template represents the input vector for the trained generator network. Then, the input vector is fed forward the generator network using the stored weights. The computed output of the generator network is binarized and Xor-ed with the input test template to create the test cancellable template. Finally, authentication decision is made by comparing the test cancellable template with the stored one of the claimed identity.

| SECURITY ANALYSIS
This section analyses the proposed keyless salting scheme from the security point of view. Any cancellable biometric scheme must satisfy the following properties: diversity, recoverability, and non-invertability. Moreover, it must overcome the preimage and correlation attacks [1,2]. The following subsections analysis each property in details.

| Recoverability and diversity
Since biometrics' traits are stored in a biometric system database, a cancellable biometric scheme is said to be recoverable if a template of the same stored biometric traits can be recovered from the compromised system database. This can be achieved by creating a new version of the biometric template using a new value for salting key [1]. Since the proposed salting scheme depends on a random permutation of the biometric template for creating the salting key, there are many versions of the salting key that could be generated using the same biometric traits. Therefore, the proposed scheme satisfies the recoverability property with number of possible permutations equals to n! for a biometric template of size n.
Similarly, for a biometric template of size n, the proposed scheme can generate n! different cancellable templates for the same biometric traits using different n! salting key values. In other words, each person has the ability to be enrolled in multiple authentication systems using different cancellable biometric template for each system. Therefore, the proposed scheme satisfies the diversity property across different biometric authentication systems for the same biometric traits.

| Non-invertability
The salting scheme is said to be non-inveritable if the original biometric traits can not be recovered using the stored parameters in the authentication system database. The stored parameters of the proposed scheme are the generator's weight values and the cancellable template. Using neural networks for securing the biometric data has been mathematically analysed in previous works [19] to study its non-invertability property. The analysis concludes that, having only the network's weights without knowing it's input and output, it is computationally hard to recover any of them. Since the output of the generator network of the proposed scheme is binarized and Xor-ed with another template before storing in the system database, the attacker cannot recover the exact output of the generator network, especially when a non-linear multilayer computation F I G U R E 4 Authentication phase for the proposed Bio-generative adversarial network scheme TAREK ET AL.
-5 is used for the training process [18]. Furthermore, the irreversibility property of XOR function to find its inputs having only the output value (the cancellable template) makes the recoverability of the generator network's inputs using the stored cancellable template, computationally hard for any system attacker. In summary, since the exact output of network generator's weights is not explicitly stored in the system database and the stored Xor-ed cancellable template cannot be decoded to reveal its inputs, the attacker cannot utilize the stored generator's weights and the cancellable template to recover the biometric data and hence the proposed scheme satisfies the non-invertability property.

| Pre-image and correlation attacks
The pre-image attack means creating a non-genuine biometric feature (fake biometric) based on the previously stored parameters in a biometric system database to use it as a real one for any authentication system [17,18]. To have un-authorized access to any authentication system, the attacker tries to create a test cancellable biometric template similar enough to the previously stored real one. As mentioned previously, the stored parameters are useless to any system attacker and hence the construction of a pre-image biometric template will be failed. Accordingly, the pre-image attack is computationally hard as a random guessing (brute force attack). The attacker requires a maximum number of trials equal to 2 n to generate a disclosed system's biometric template (where n is the length of biometric binary-valued template). Therefore, the larger biometric template size is, the more difficult to attack the biometric system.
Another type of attack is called the correlation attack, which aims to recover the original biometric template by correlating several cancellable templates originated from the same biometric traits [28]. To overcome this attack type, the cancelable template for the same person enrolled in various authentication systems should be different and un-linkable.
Across various biometric systems, the cancellable templates for the same biometric data vary according to different values of the salting keys. The proposed salting scheme utilizes different permutations extracted from the same biometric traits to act like salting keys for various biometric systems. Therefore, each biometric system generates a cancellable template derived from a unique salting key and the weights of the trained network generator. In summary, by updating the GAN model architecture or creating different permuted salting keys for the same enrolled user across multiple authentication system, it is nearly impossible to recover the original biometric template by correlating the stored cancellable templates across multiple compromised biometric systems.

| EXPERIMENTAL RESULTS
To measure the performance of our proposed system in terms of recognition accuracy and reliability, the Iris biometric traits from the CASIA-V3-Internal iris dataset [29] is used for experiments. The dataset contains images of the left and right eyes extracted from 249 subjects. A binary iris code is extracted for each eye image using the Libron Mask code [27]. First, iris segmentation and localization are performed using circular Hough Transform and Linear Hough Transform. Then, the iris parts are normalized and segmented using the Daugman's rubber sheet model. Finally, the normalized iris parts are encoded using 1D Log-Gabor filters and then quantized to produce a bit wise iris template. The produced iris code is reshaped into one vector of 9600 bits to be used as inputs to the generator and discriminator networks. The GAN network model is constructed for each class. Both networks have the same neuron numbers in their input layers. The generator network input and output layer have the same number of neurons while the discriminator's network contains a single neuron in the output layer.
A set of training samples of the binary iris code are randomly selected as input for the generator network and any randomly selected sample is chosen as the input pattern to the discriminator's network. Also, weights have been initialized randomly for both networks. Table 1 illustrates the used experimental parameters.
During the enrolment phase, the GAN model is trained to construct a cancellable template for each class. In order to minimize the GAN discrimination ability to create a cancellable template similar to the input one of biometric traits, which is not preferable from security point of view, the number of training epochs and learning rate has been minimized. Once the training phase is completed, the resultant output of the generator network is binarized using a threshold value equals to 0.5, after which it is Xor-ed with a reference iris template to create the final cancellable iris template. The reference iris template for each class is computed using the previously mentioned fusion approach (i.e. Equations (2) and (3)).
To evaluate the performance of the proposed scheme, Libron Mask [27] is applied for all database iris images to generate a binary iris code for each image. The iris code is used as inputs to generator networks, and the resultant output is binarized and Xor-ed with the input iris code to generate the final testing cancellable iris template. The intra and inter score distribution for the generated testing cancellable templates are computed for all possible database comparisons using Hamming distance measure. Intra and inter score distribution curves are illustrated by Figure 5. As visualized by Figure 5, the minimized intersection region between intra and inter distributions indicates a promising recognition accuracy for the proposed scheme.
Additionally, Figure 6 indicates the promising recognition accuracy in terms of false acceptance rate (FAR) and false rejection rate (FRR) curves. As shown, the intersection point between FAR and FRR curves indicates the hamming distance threshold used for the authentication stage. The detected threshold value equals to 65.5 which provides a promising (small) equal error rate (EER) value equals 2.17%.
Moreover, the receiver operation characteristic (ROC) curve for the proposed Bio-GAN scheme is evaluated and visualized by Figure 7. As shown by the figure, the recognition accuracy of the proposed scheme equals 97.82%.
To analyse our proposed system further, we compared our recognition accuracy to the one produced by the original unprotected iris biometric system. Figure 8 visualizes the ROC curves for both systems where there is some degradation in the recognition accuracy compared to the original unprotected system [The EER (%) values for the original unprotected system and the proposed scheme are 1.78, 2.17, respectively]. Despite the little recognition accuracy degradation for the proposed scheme compared to the original unprotected system, the proposed scheme overcomes many security challenges such as achieving diversity, recoverability, and non-invertability properties in addition to conferring resistance to the pre-image and correlation attacks as discussed in detail in the previous section. Considering the trade-off between the recognition performance and security, the proposed scheme produces acceptable recognition accuracy compared to the original unprotected system. Moreover, we compared our proposed scheme (Unimodal-Bio-GAN) to the recent well-known salting biometric schemes. We selected three token-based schemes such as Biohashing [5], Bloom filter [7] and Randomize bit sampling [30], and the transformation process of these schemes depend on an external key. The key is randomly generated for each user and it is stored externally in a token owned by the user. Additionally, we selected three tokenless-based schemes such as Bioencoding [3], Hetro_Convolved [19], and Hetro_Xor-ed [18], the transformation process of these schemes depends on an internal key stored in the authentication system's database. Bioencoding [3] uses a single key for all users and it is stored publicly in the system database. However, Hetro_Convolved [19] and Hetro_Xor-ed [18] generate a random key for each user and it is encrypted using BAM. The parameters used for each applied scheme are set based on the reported common setting in their works, the parameters values are chosen in order to achieve a good balance between the recognition accuracy and the security requirements. The word length used in the bioencoding scheme [3] is set to 3. For the biohashing scheme [5], the bit length is set to 300 bit. The word size and block size used in Bloom filter [7] are set to 10 and 2 5 bits, respectively. For randomize bit sampling [30], the size of the hash function is set to 10, the block size is set to 160, and the security threshold and the number of hash function for each block are set to 0.5 and 400, respectively. In Hetro_Xor-ed scheme [18], the proposed XOR binding process restricts the key size to have the same length of the input iris template (i.e. 9600), while in Hetro_Convolved scheme [19] the key size is set to 128 which is the same length of the output layer of BAM. For fair comparison, the iris features used by all competitive salting schemes is extracted using Libron Mask code [27]. Table 2 summarizes the comparison in terms of: the recognition accuracy (measured by EER value), the category of scheme (whether it depends on token or not), the performance dependency, and the reliability against the pre-image and correlation attacks. We can conclude that the proposed scheme provides a keyless biometric salting scheme having a promising recognition accuracy. Also, it is robust against pre-image and correlation attacks. Moreover, the recognition accuracy of the proposed scheme depends only on the biometric data rather than any external values such as salting keys. Finally, Figure 9 shows the recognition accuracy in terms of ROC curves for the competitive schemes. It is noticed that, the best recognition accuracy is achieved by the original unprotected iris biometric system. However, the produced iris templates are publicly stored in the authentication system database without further protection. On the other hand, the applied salting schemes achieve less recognition accuracy with the advantage of securing the iris' templates. There is a variation for the performance of the applied salting schemes. As shown by the figure, the recognition accuracy of the proposed Bio-GAN outperforms the Biohashing [5], Bloom Filter [7], Bioencoding [3], and Hetro_Convolved [19] schemes. While the best recognition accuracy of the applied salting schemes is achieved by randomized bit sampling [30] followed by Hetro_Xor-ed [18]. However, the authentication process of the schemes proposed in the Randomized bit sampling [30] and Hetro_Xor-ed [18] depends on an external token and a stored encrypted key, respectively, which make them vulnerable to the security threats of key-based schemes. We concluded that the proposed Bio-GAN achieves good recognition accuracy with the advantage of being independent of using any external salting keys.

| CONCLUSION
This article proposed a keyless biometric salting scheme that depends only on the biometric data to avoid the security challenges imposed by using external parameters such as salting keys. The GAN model is utilized to produce a cancellable template using only the input biometric trait samples. In order to add extra levels of security in addition to improving performance, the transformed template generated by GAN is Xor-ed with a reference template generated from the input biometric traits to create the final stored cancellable template, which satisfies the invertability security property. For each enrolled class, the irreversible cancellable template and GAN generators' weight values are stored in the system database. The proposed scheme overcomes many security challenges for biometric salting schemes such as non-invertability, recoverability, and diversity. Moreover the biometric data plays a key role in the proposed system recognition accuracy, which makes the proposed system independent of using any external keys that are vulnerable to some security attacks. We analysed the robustness of our proposed scheme against many types of attacks such as pre-image and correlation ones using the CASIA-Iris V3-Interval data sets. The simulated experiments indicate a good promising recognition accuracy with acceptable degradation compared to the original unprotected iris data.