A fully rotation invariant multi ‐ camera finger vein recognition system

Finger vein recognition systems utilize the venous pattern within the fingers to recognize subjects. It has been shown that the alignment of the acquired samples has a major impact on the recognition accuracy of such systems. Although a lot of work has been done in this field, there is still no approach that solves all kind of finger misplacements. In particular, longitudinal finger rotation still causes major problems. As the capturing devices evolve towards contactless acquisition, solutions to alignment problems become more important. As an alternative to rotation detection and correction, the problem can also be addressed by acquiring the vein pattern from different perspectives. This article presents a novel multi ‐ camera finger vein recognition system that captures the vein pattern from multiple perspectives during enrolment and recognition. Contrary to existing multi ‐ camera solutions that use the same capturing device for enrolment and recognition, the capturing devices for the proposed system differ in the configuration of the acquired perspectives. The cameras of the devices are positioned so that the recognition rates around the finger are high and that the number of cameras needed is kept to a minimum. The experimental results confirm the rotation invariance of the proposed approach.


| INTRODUCTION
Vascular biometric systems [1] have established themselves as a serious alternative to systems using traditional biometric traits such as fingerprint, face or iris. Especially, systems utilizing the structure of the blood vessels in the palm or fingers, commonly denoted as hand and finger vein biometrics, offer several advantages over traditional modalities. As the vein pattern is located inside the human body and it is only visible in nearinfrared (NIR) light, vein images can hardly be acquired without the knowledge of the human subject and no latent variants of it exist [2]. As NIR videos exhibit the blood flow in the vessels, it is possible to apply liveness detection techniques to prevent presentation attacks [3,4].
The performance of finger vein recognition systems mainly depends on the quality and alignment of the acquired sample data. The quality of the vein images is influenced by the physical design and the configuration of the capturing device, whereas the alignment suffers from misplacements of the finger during the acquisition. The most typical finger misplacements are vertical or horizontal shifts, tilt, bending and longitudinal rotations. The problem of misaligned acquisition is not exclusive to finger vein recognition. Also other modalities suffer from it and apply different correction methods. In face recognition, the acquired images are registered towards the frontal view (face frontalization [5,6]). For fingerprint recognition pose-correction is particularly important when using contactless fingerprints [7,8]. In iris recognition, posecorrection is done implicitly by applying Daugmans rubber sheet model [9]. As there is a trend towards contactless acquisition in finger vein recognition systems [10][11][12], problems due to finger misplacements will get more important.
The negative effect of various types finger misplacements on the recognition rates and how its impact can be reduced or eliminated has been addressed in several publications. Lee et al. [13] utilized minutiae points of the vessel network of the finger for alignment. Huang et al. [14] reduced the influence of longitudinal finger rotation by normalizing the vein pattern assuming an elliptic finger shape. Kumar and Zhou [15] aligned the finger based on their boundary to correct in-planar translations and rotations. The feature-point based recognition system proposed by Matsuda et al. [16] introduces a finger-shape model together with a non-rigid registration method. Yang et al. [17] introduce a system with an anatomy structure analysis based vein extraction algorithm and matching strategy. In [18], the authors proposed a recognition system which can handle different finger misplacements utilizing PCA-SIFT [19] together with bidirectional deformable spatial pyramid matching [20]. In [21], finger misplacements are detected by analysing the shape of the finger. The deformations are corrected using linear and non-linear transformations. Prommegger et al. [22] improved the resistance against longitudinal rotation by introducing additional comparisons to pre-rotated versions of the enrolment samples. In addition to these software-based solutions, there are also hardware-based ones which guides the subject to place the finger into the correct position in the first place (e.g. [23]). This way, finger misplacements are avoided during acquisition rather than correcting them afterwards in the processing pipeline. Another approach is to acquire the vascular pattern from multiple perspectives. For example Bunda [24] and Sonna Momo et al. [25] propose multi-camera systems that acquire vein images from three different perspectives. A system proposed by Kang et al. [26] applies finger vein recognition in the 3D space.
In [27], the authors presented two rotation invariant finger vein recognition systems. Contrary to traditional single-camera systems, both systems acquire the vein structure from multiple perspectives all around the finger for enrolment, while for recognition still only a single sample is captured. The first approach, Multi-Perspective Enrolment (MPE), compares the probe sample to all corresponding enrolment images. The final biometric comparison score is determined using a maximum rule score level fusion. The second approach, Perspective Cumulative Finger Vein Templates (PCT), uses the enrolment samples to generate a single template holding the vein information all around the finger. For recognition, the probe sample is compared to the generated template. The experiments confirmed the rotation invariance of both methods, although the recognition rates for MPE are better than those for PCT. In [28], an adopted version of MPE named Perspective Multiplication for Multi-Perspective Enrolment (PM-MPE) has been introduced. It effectively reduces the number of perspectives needed to be acquired during enrolment by introducing pseudo perspectives while the recognition rates are kept high. If enough perspectives are acquired during enrolment, negative effects of longitudinal finger rotation on the recognition performance can be inhibited for all three methods.
To counteract longitudinal finger rotation, a novel fully rotation invariant finger vein recognition system, Combined Multi-Perspective Enrolment and Recognition (MPER), is presented. Its rotational invariance is achieved by acquiring the vein pattern from several perspectives for both, enrolment and recognition. The final biometric candidate score is calculated using a maximum rule score level fusion (MaxSLF) of the individual comparison scores of each enrolment and recognition perspective. The idea of acquiring multiple perspectives for enrolment and recognition is not new. While existing solutions, for example [24][25][26], use the same capturing device for enrolment and recognition, for MPER the two devices are different in terms of the acquired perspectives. The two capturing devices are designed in such a way that the rotational distance between the closest enrolment and probe sample as well as the number of perspectives involved is kept to a minimum. The experiments analyse the recognition performance of MPER with respect to rotation invariance and its applicability for real-world applications. The performance achieved with its camera configuration is compared to the performance achieved by the state-of-the-art single-camera systems and by camera configuration of existing multi-camera finger vein recognition systems. The experiments are carried out using the PLUSVein finger rotation data set (PLUSVein-FR, [29]).
The reminder of this paper is organized as follows: Longitudinal finger rotation and the problems it causes for finger vein recognition systems are described in more detail in Section 2. Section 3 hold all details on MPER. The experimental set-up together with its results are described in Section 4. Section 5 discusses the design of the required capturing devices. Section 6 concludes the paper along with an outlook on future work.

| LONGITUDINAL FINGER ROTATION
Virtually all currently available commercially finger vein scanners acquire the vein images from a single finger using a single camera. Such capturing devices are prone to different misplacements of the finger, including in-planar shifts and rotation, bending, tilt and longitudinal finger rotation, during image acquisition. Misplaced fingers result in images that are misaligned or subject of certain deformations. There exist several countermeasures during acquisition (e.g. by adding support structures to the device for finger positioning) or processing (during pre-processing, feature extraction or comparison) to avoid or compensate certain misplacements, but especially longitudinal finger rotation is hard to prevent.
The acquisition of vein images corresponds to a projection of the blood vessels structure in the finger (3D space) onto a 2D plane. Rotating the finger along its longitudinal axis results in a change of the acquired vein pattern. Figure 1 tries to visualize this effect. The top row shows schematic cross sections of the same finger rotated from À 30°to þ30°in steps of 10°, the bottom row holds the corresponding vein patterns. It can be clearly seen that the vein pattern changes with the rotation of the finger. These changes follow a non-linear transformation and depend on the relative positioning of the veins to each other. In the worst case some veins can even disappear (merge). The vein pattern acquired at À 30°, 0°and þ30°differ quite a lot. Obviously, this is a major problem for finger vein recognition systems. An analysis of publicly available finger vein data sets [31] showed that longitudinal finger rotation is a real problem. Depending on the acquisition setup (used capturing device and protocol), the examined data sets contain rotational distances of up to 77°between two samples of the same finger. In [30], it has been shown, that the performance of widely used recognition schemes suffer from such deformations. The negative effect for single camera systems can be reduced if appropriate countermeasures are taken [22]. There are also approaches, that is [24][25][26][27][28], that try to further reduce the negative influence of longitudinal finger rotations by acquiring multiple perspectives of the vein pattern.

Combined Multi-Perspective Enrolment and Recognition
(MPER) is a novel fully rotation invariant finger vein recognition method. It achieves its invariance against longitudinal finger rotation by acquiring multiple perspectives during enrolment and recognition. For every recognition attempt, the acquired probe samples are compared to the corresponding enrolment samples utilizing simple vein pattern based finger vein recognition approaches, for example as presented by Miura et al. in [32] combined with Circular Pattern Normalization (CPN) [22]. The final biometric candidate score is calculated using a MaxSLF of the individual comparisons. A downside of multi-perspective finger vein recognition is that the cost and complexity of the capturing devices increases with the number of perspectives involved. Multiple perspectives can be acquired by either using more than one camera (e.g. [24][25][26]), or by building the sensor in a rotating manner (camera and illumination module rotate around the finger, e.g. [29]). Since moving parts are susceptible to malfunction, it is assumed in the further course of the article that each perspective is acquired by a separate camera. Taking this into consideration, MPER was designed in a way that the number of perspectives a capturing device needs to acquire is kept to a minimum, while the rotational range, for which it delivers good results is maximized.
Contrary to existing multi-camera solutions, where the same capturing device is used for enrolment and recognition, for MPER the finger vein scanner used for recognition differs from the one used for enrolment. The n enrolment cameras are linearly spaced around the whole finger (360°). Distributing them evenly around the finger ensures that the vein pattern is captured from all sides. The rotational distance between two adjacent recognition perspectives is α ¼ 360°n . For the recognition device m cameras are arranged symmetrically with respect to the desired acquisition perspective. The distance between adjacent perspectives φ depends on the distance between the enrolment cameras α and the number of acquired perspectives m. It is calculated as φ ¼ α m . For devices with an odd number of cameras, the middle camera is positioned exactly in the acquisition direction. For those with an even number, the two cameras closest to the desired acquisition perspective are rotated by φ 2 . By this arrangement, the distance between the closest enrolment and recognition perspective δ is always δ ≤ δ max ¼ φ 2 . If δ max is kept smaller than the rotation the finger vein recognition system utilized by MPER can compensate, then MPER is invariant to longitudinal rotation. Figure 2 depicts the principle of the positioning of the cameras for MPER for two different scenarios: Both scenarios acquire the same number of enrolment perspectives (n ¼ 4, the cameras are visualized as filled blue dots) but a different number of recognition perspectives. The left side captures m ¼ 2 recognition perspectives (visualized as red circles) and the right side m ¼ 3, respectively. The enrolment cameras are distributed evenly around the whole finger. The rotational distance between two enrolment cameras is α 4 ¼ 360°4 ¼ 90°. As described, the finger vein scanner used for recognition differs from the enrolment one. The distance between two adjacent cameras φ is calculated as φ ¼ α m . For m ¼ 2 this results in a distance of φ 4,2 ¼ 45°, for m ¼ 3 the distance is φ 4,3 ¼ 30°, respectively. The maximum distance between the closest enrolment and recognition camera is δ max ¼ φ 2 , that is δ max(4,2) ¼ 22.5°and δ max(4,3) ¼ 15°. As all acquired probe perspectives are compared to all enrolment ones, the number of comparisons needed for one recognition attempt is N c ¼ n ⋅  Table 1 holds the details for all settings evaluated in the experiments of Section 4.
It is known that the shifts executed during the comparison of binary vein templates have a noticeable influence on the performance of vein pattern based recognition systems (cf. [28,31]). The horizontal shifts in the comparison step of MPER only needs to compensate the maximum distance between the closest enrolment and recognition perspective δ max . The experiments showed that the shift can be calculated as where h is the height of the ROI image after applying CPN.

| EXPERIMENTS
The experiments of this article carry out a recognition performance analysis of the novel MPER approach with respect to its rotation in various acquisition camera configurations. Furthermore, it compares these results to those of state-of-theart single-camera solutions and that of camera configurations of existing multi-camera recognition systems. In the last part of the experiments, a runtime analysis for MPER is carried out.

| Data set
Most available finger vein data sets were acquired using singlecamera capturing devices. However, for the analysis of multiperspective finger vein recognition systems the vein pattern must be available from several, well defined perspectives. Currently, there exist only a few devices, for example [24][25][26]29], that are capable of doing so and not all of the data sets captured with these devices are available to the scientific community.
In the experiments, the performance of MPER all around the finger is analysed for different sensor configurations. Acquiring the data for each individual configuration separately using dedicated capturing devices is expensive in two ways: (1) the corresponding sensors have to be built, and above this, (2) the data must be acquired for a sufficient number of subjects and perspectives. The effort increases with each examined MPER configuration. For the planned evaluations (seven different configurations, cf. Table 1), this is unfeasible.
A viable alternative to building dedicated sensors is to acquire vein images all around the finger, and simulate different capturing devices by selecting images from the corresponding perspectives (only those a dedicated capturing device would acquire). This way all possible sensor configurations can be evaluated even though the data was acquired only once. The PLUSVein Finger Rotation Data Set (PLUSVein-FR) [29] was acquired with this idea in mind. It provides finger vein images all around the finger (360°) with a resolution of 1°.
The PLUSVein-FR contains finger images captured from 63 different subjects, four fingers per subject, which sums up to a total of 252 unique fingers. Each finger is acquired five times. This results in 1.260 images per perspective and 454.860 vein images for the whole data set (the data set contains 361 perspectives as 0°and 360°have been acquired separately). The experiments are carried out on a subset of the PLUSVein-FR containing perspectives in steps of 5°, resulting in 73 different perspectives (0°and 360°are acquired separately). For more details on the data set, the interested reader is referred to the authors previous publications [29,30].

| Recognition tool-chain
All experiments have been executed using an automated toolchain. For MPER it consists of the following components: 1. Finger region detection and finger alignment are based on [33]. 2. Similar to [14], the region of interest (ROI) is normalized to a fixed width. 3. In order to enhance the contrast between background and vein pixels Circular Gabor Filter (CGF) [34] and simple CLAHE (local histogram equalization) [35] are applied on the vein image during pre-processing. 4. The binary feature images are generated using the wellestablished vein pattern based Maximum Curvature (MC) method [32]. 5. The comparison score between two feature images is evaluated using a correlation-based method. For this purpose, the probe samples are compared to shifted and rotated versions of the enrolment images [36]. 6. The final biometric candidate score is calculated using a maximum rule score level fusion of the individual comparison scores of the previous step.
The single-camera performance results, except those for DFVR [18] and the CNN-based approach, have been taken from [22], those for MPE from [27] and those for PM-MPE from [28], respectively. The experiments of the remaining multi-camera systems [24][25][26] use the same tool-chain as MPER.
TA B L E 1 Details of different MPER settings: number of enrolment (n) and recognition perspectives (m), distances between enrolment (α) and recognition perspectives (φ), maximum distance between the closest enrolment and recognition camera (δ max ) and the number of comparisons for one recognition attempt (N c )

Enrolment
Recognition As there exists no publicly available implementation of DFVR, this method has been implemented for the experiments. For the implementation the code of Deformable Spatial Pyramid Matching for Fast Dense Correspondences proposed by Kim et al. [20] was extended by the vein based key-point selection, PCA-Sift [19] and bidirectional matching as described in [18]. The used CNN approach (Triplet-SqNet) was taken from [37]. It employs the SqueezeNet architecture [38] using the triplet loss function together with hard triplet online selection as described in [39]. All

| Evaluation protocol
The evaluation follows the FVC2004 test protocol [36]. For this protocol the evaluation of all possible genuine comparisons is required, while for the impostor scores only the comparisons between the first sample of a finger against the first sample of all other fingers are executed. MPER requires the acquisition of multiple perspectives for both, enrolment and recognition. Therefore two separate subsets are needed. The first subset, which is used for enrolment, contains the first two samples, the second one, used for recognition, contains the remaining three samples. This ensures, that every sample is used either for recognition or enrolment (but never for both). The split results in 63 ⋅ 4 ⋅ 3 ⋅ 2 ¼ 1.512 genuine and (63 ⋅ 4) ⋅ (63 ⋅ 4 À 1) ¼ 63.252 impostor comparisons, which sums up to a total number of 64.764 comparisons. To assess the recognition performance of the examined recognition systems, the equal error rate (EER), the receiver operating characteristics (ROC) curve and the area under the ROC curve (AUC) are used.

| Performance evaluation for MPER
In this part of the experiments, the performance of the proposed method, MPER, with respect to its rotational invariance all around the finger (360°) is evaluated. In order to assess the performance of the proposed system, the recognition rates of every single (independent) perspective, the intra-perspective performance (IPP), serve as a reference. For the IPP, the 73 perspectives used in the experiments are evaluated independent from each other. This means that in principle every perspective represents its own single-camera system (only a single sample is acquired for enrolment and recognition using a single camera where ideally the finger is placed in the same manner). As a result of this, the results of the IPP, with respect to longitudinal finger rotation, are subject to the same limitations as presented in [30]. The IPP is calculated once without applying any rotation correction or compensation method and once applying CPN [22]. As MPER claims to be invariant against longitudinal finger rotation, recognition rates in the range of (or better than) IPP without rotation compensation can be accounted as good. Be aware that the IPP results are completely independent from each other, and therefore, although the results are presented together, no rotational independence can be concluded from them.
As described in Section 3, for MPER multiple probe samples are compared to multiple enrolment samples. The final score is calculated by fusing the scores of the different comparisons applying a maximum rule score level fusion. For these experiments seven different scenarios, which differ in the number of acquired enrolment and recognition perspectives, are evaluated. For enrolment, four different camera settings are used: six cameras (α ¼ 60°), four cameras (α ¼ 90°), three cameras (α ¼ 120°) and two cameras (α ¼ 180°). For recognition, the sensors are equipped with either two or three cameras. The actual rotational distance between the recognition cameras depend on the number of cameras m and the distance of the enrolment cameras α and is calculated by The details for the used sensor configurations can be found in Table 1.
The proposed method utilizes Maximum Curvature [32] features. This choice is based on the authors previous work on analysing the influence of longitudinal finger rotation on single-camera recognition systems [22] and (PM-)MPE [27,28,41]. The results in [22] showed that simple vein pattern based systems (e.g. [14,32,42]) in combination with rotation detection or compensation schemes (i.e. [14,21,22]) outperform more sophisticated recognition systems (i.e. [16,17,43]) with respect to their robustness to longitudinal finger rotation. The work on (PM-)MPE, especially [41], also indicated that vein pattern based methods should be preferred over other methods. In the course of this article, the evaluations of [22] were extended by two further recognition schemes ( [18,37]) and CPN (cf. Section 4.5).
The trend of the resulting EERs are depicted in Figure 3. In addition, to MPER, also the intra perspective performance results for applying 'no correction' and CPN, respectively, are visualized. The performance of both intra perspective methods show akin trends, just at different EER levels: The best performance results are obtained in the palmar (0°, 360°) region followed by the dorsal (180°) region. The perspectives in between show inferior results, achieving the worst results around 90°and 270°. CPN outperforms 'no correction' over the whole range in average by a factor of two.
The top plot visualizes the results for the settings which use two cameras for recognition. The EERs for MPER-60°/2 and MPER-90°/2 are just above those of the intra perspective CPN results. The plot for both scenarios shows a noticeable drop in the recognition performance at those perspectives where the distance between the closest enrolment and recognition camera reaches its maximum δ max . But they still achieve a better performance for almost all perspectives than IPP without rotation correction. For MPER-120°/2, the performance drops are more prominent. The reason for the high performance degradation is that MPER-120°/2 with PROMMEGGER AND UHL -279 δ max ¼ 30°brings the recognition system used (MC together with CPN) to its limits.
The bottom plot of Figure 3 shows the result for the threecamera recognition settings (please note the different scaling of the y-axis). All four evaluated scenarios again show the performance drop for those perspectives where the closest enrolment and recognition perspective are the farthest away from each other. However, the performance degradation is only striking for MPER-180°/3, where δ max is again 30°, and thus can no longer be compensated.
The EER describes the behaviour of the system only for the single threshold where the false acceptance rate equals the false rejection rate. To describe the overall performance of the system, the ROC curve, where the system is evaluated for varying thresholds, can be used. The top row of Figure 4 shows the ROC curves for two selected MPER settings (MPER-90°/ 2 and MPER-120°/3). The evaluation of 73 perspectives for each MPER configuration results in 73 different ROC curves. For a good readability of the plot, only a few selected perspectives are highlighted with different colours, whereas the ROC curves for all other perspectives are shown using the same grey hue. The highlighted perspectives are 0°(palmar view), 45°, 90°, 135°and 180°(dorsal view). The ROC curves confirm the result of the EER values and affirm a good performance for all 73 perspectives. In order to better assess the results of the individual perspectives, a detailed view of the most interesting region (upper left corner, FAR in 0; 0:1 ½ �, TPR in 0:9; 1 ½ �) is shown. In this view, differences between the individual curves can be seen. These differences are determined by two factors: (1) the distance δ between the closest enrolment and recognition perspective, and (2) the performance at the examined perspective itself (cf. IPP performance). Table 2 holds the camera positions for enrolment and recognition, the evaluated perspective, the corresponding rotational distance δ and the EERs when applying MPER and IPP without any rotation correction, respectively.
For MPER-90°/2 δ is 22.5°for all five perspectives, which is the maximum δ can be for this setting. Therefore, the performance in relation to each another roughly corresponds to that of the IPP EER results: The best performance is achieved at the palmar view, the worst at 90°. The others are between these two curves. For MPER-120°/3 the situation is slightly different. There, δ differs for the highlighted curves. For 0°a nd 45°δ is quite small (0°and 5°). For 180°δ reaches its maximum of δ max ¼ 20°. The experiments at 0°and 45°also give the best results. 90°(δ ¼ 10°) and 135°(δ ¼ 15°) perform worst.
In order to be able to better compare the different camera settings, this article uses the area under ROC curve (AUC) to aggregate the ROC curve into a single value. The AUC is equivalent to the probability that a randomly selected genuine comparison attempt is ranked higher than a randomly chosen imposter one. High AUC values (close to 1) are an indicator for well performing systems (a perfect system achieves an AUC of 1). The AUC plots in the bottom row of Figure 4 show the AUC values for all camera settings and perspectives. The trend of the curves confirms the results shown for the EER. Again, the worst results are achieved for those perspectives in which the distance between the closest enrolment and recognition perspectives, δ, approaches δ max .
From the presented results it can be deduced, that rotational invariance is only given if the maximum distance between the closed enrolment and recognition camera is less than the angle the used recognition system (in these experiments: MC, CPN and Miura matcher) can compensate. Prommegger et al. showed in [22], that for the used recognition system, this angle should be δ max < �30°for simple vein pattern based systems. The aim of MPER is to find a camera setting, that maximizes the recognition rates across the entire range all around the finger while minimizing the number of used cameras. For the evaluations the intra-perspective recognition performance without rotation detection is defined as an indicator for good performance. As a result of this, all configurations that achieve recognition rates below the IPP without correction are rated as good. From the evaluated settings, MPER-90°/2 and MPER-120°/3 fulfil this aim best. Since   there are usually fewer enrolment than recognition stations, the number of perspectives used for recognition should not exceed the number of enrolment cameras (m ≤ n). Furthermore, it must be taken into account whether the proposed capturing devices can be built easily. Here, it is especially important to consider whether the illumination modules can be mounted for the proposed camera perspectives. More details on the required capturing devices can be found in Section 5.

| Comparison to single-camera recognition systems
In this part of the experiments the proposed system is compared to existing single-camera systems. Most of the performance results are taken from [22]. In addition to the methods evaluated in [22], two further recognition schemes, Deformable Finger Vein Recognition (DVFR) [18] and a CNN based system (Triplet-SqNet) [37], and Circular Pattern Normalization (CPN) [22] are evaluated. The added experiments follow the same protocol as described in [22]. Results taken from [22] are marked with an asterisk (*). The experiments analyse the recognition performance of single-camera finger vein recognition schemes with respect to longitudinal finger rotation. The experiments intend to show, that with such traditional one-camera systems it is only possible to compensate longitudinal finger rotation to a limited extent. The evaluations include not only different recognition schemes, but also different state-of-the-art rotation correction and compensation methods. The recognition schemes under investigation are three simple vein pattern based recognition schemes, Maximum Curvature (MC) [32], Principal Curvature (PC) [42] and the Wide Line Detector (WLD) [14], where the biometric comparison score is calculated based on the correlation of the extracted binary feature images (as proposed in [44]), Deformation-Tolerant Feature-Point Matching (DTFPM) [16], Finger Vein Recognition with Anatomy Structure Analysis (ASAVE) [17], an approach based on classical SIFT [43] and a more recent scheme, Deformable Finger Vein Recognition (DFVR) [18], that uses SIFT features as well. The latter four of these methods claim to be rotation tolerant to a certain extent. The last method, Triplet-SqNet [37], employs the SqueezeNet architecture [38] using the triplet loss function together with hard triplet online selection [39].
The evaluated rotation compensation methods are Known Angle Approach [22], Elliptic Pattern Normalization (EPN) [14], Circular Pattern Normalization (CPN) [22], Geometric Shape Analysis Based Finger Rotation Deformation Detection and Correction (GADC) [21] and the Fixed Angle Approach proposed in [22]. The Known Angle Approach corrects the vein images based on the actual angle of rotation. Since the used PLUSVein-FR provides the actual angle of rotation, this method can be applied. EPN corresponds to a rolling of the finger. It assumes an elliptic finger shape and that the acquired veins are close to the finger surface. This way, non-linear deformations of the vein structure are reduced. CPN is very similar to EPN. The only difference is, that it assumes a circular finger shape instead of an elliptical one. GADC analyses the shape of the finger and based on this analysis, a decision is made for example whether a finger is rotated to the right or left and if so corrected accordingly. For the Fixed Angle Approach, the enrolment samples are rotated using a pre-defined angle in both directions. The probe sample is compared to all three enrolment images (the actual acquired one and the two rotated versions of it). The final biometric candidate score is determined using MaxSLF. By pre-rotation the input image, the rotational distance between the probe sample and the (rotated) enrolment samples should be reduced.
First, the behaviour of the single-perspective recognition systems with respect to longitudinal finger rotation is evaluated. The rotational range under investigation is �45°from the palmar view. Figure 5 depicts the results. All simple vein pattern based methods, MC, PC and WLD, follow akin trends. At the palmar view, they achieve EERs below 1%. Up to a rotation of �15°, the performance keeps quite stable. The performance drops sharply for larger rotation angles, hitting EERs close to 45% at �45°. For the more sophisticated approaches, DFVR, DTFPM, SIFT and ASAVE, the performance at the palmar view is worse, but the performance degrades slower when the finger is rotated. The same holds true for the CNN based Triplet-SqNet. The best performance is achieved for DFVR.
Applying different rotation correction approaches can improve the recognition rates for the different recognition systems. As Prommegger et al. [22] showed, simple vein pattern based methods benefit most from rotation correction. Therefore, only one such system, that is using MC features, is evaluated. The results are presented in Figure 6. The line labelled No Correction corresponds to the MC line in Figure 5. The results show that the GADC approach (at least on the F I G U R E 5 Trend of the EER for different single-camera recognition schemes across the rotation angles from À 45°to 45°(0°corresponds to the palmar view) PLUSVein-FR) does not work. EPN and CPN improve results almost to the same extent. However, CPN is a bit better at larger rotation angles. The results of the Fixed Angle method are better than for CPN and almost match those of Known Angle. The overall best results are achieved if the Fixed Angle approach is applied together with EPN. For more details on results for single-perspective recognition systems, the interested reader is referred to the authors previous publication [22].
For the comparison of the proposed systems to classical single-perspective ones, the best recognition system, DFVR, and the best rotation compensation method, the Fixed Angle approach together with EPN, have been selected. The results are presented in Figure 7. As the single-perspective systems are tolerant to longitudinal rotation only to a certain extent, only the rotational range of �45°is evaluated. All methods start with an EER <1% at the palmar view. As the rotation increases, the performance of the single-camera system drops. With DFVR an acceptable performance is achieved almost to �20°. After that, the recognition rate drops rapidly. For the Fixed Angle Method with EPN (using MC features), this crucial point is reached at approximately �25°. The multicamera systems, and here in particular MPER-120°/3, achieve stable results across the entire range.

| Comparison to multi-camera recognition systems
The comparison to the camera configurations of other multicamera recognition systems is split into two parts. In the first part MPER is compared to systems that enrol multiple perspectives only during enrolment, while in the second part it is compared to recognition systems, that utilize multiple perspectives for both, enrolment and recognition. This comparison should demonstrate the importance of the sensor configuration, that is how the positioning of the cameras, and thus also the selection of the perspectives acquired, influences behaviour of a recognition system. As the aim is to compare the different camera systems and not the whole recognition systems, the same processing chain (feature extraction and comparison) is used for all systems. This ensures that the results are not undermined by use different software.
In the first part the two systems MPE [27] and PM-MPE [28] are evaluated. The main difference between MPE and MPER is the sensor configuration. While MPER acquires multiple perspectives during enrolment and recognition, MPE captures multiple perspectives only during enrolment, and just a single view for recognition. For recognition, both methods compare the acquired probe sample(s) to the enrolled ones. The final biometric comparison score is calculated using a maximum rule score level fusion. PM-MPE is an adopted version of MPE. It still acquires multiple samples during enrolment and just a single one for recognition. The difference is that PM-MPE generates additional perspectives, so called pseudo-perspectives, by rotating the acquired enrolment samples in both directions by a defined rotation angle. During recognition, the probe sample is not only compared to the actual acquired enrolment images, but also to its rotated versions. This way the rotational distance between the closest enrolment sample (including the generated pseudo-perspectives) and the probe sample should be reduced. The results for this comparison are taken from the original publications for MPE [27] and PM-MPE [28]. The settings MPE 45°and PM-MPE 60°fulfil best this papers definition of good performance (EERs better than IPP without rotation correction with a minimum number of involved perspectives). Therefore, the comparison is done with respect to these two settings. Table 3 lists the details of the configurations. All four methods require less than 10 cameras, the two MPER settings even only six. PM-MPE requires the most computing capacity because it has to calculate the pseudo perspectives during With respect to the recognition performance, the most important factor is δ max , the maximum distance between the closest enrolment and recognition perspective. δ max must be smaller than the maximum distance which the used recognition scheme tolerates. In case of MC and CPN this is δ max < 30°. This condition is fulfilled for all four settings. The value of δ max for PM-MPE 60°must be read differently. There, the pseudo perspectives generated for the enrolment sample are also taken into account. Therefore, the maximum distance to the closest actually acquired perspective is 3 * δ ¼ 30°. Figure 8 visualizes the performance results. The EERs of all four methods are in the same range. MPE 45°shows, with exception of a few spikes around 90°, 270°and 330°of PM-MPE, the worst recognition rates. The EERs for MPER-90°/2 are slightly better than for MPE and PM-MPE. The overall best results are achieved for MPER-120°/3. For this setting, the EERs are below 1% for the range between �60°around the palmar view and less than 3% over the entire range (360°). All four methods can be considered as invariant against longitudinal finger rotation.
The main disadvantage of (PM-)MPE is the complexity of their enrolment devices. For MPE 45°a device operating eight cameras is needed, for PM-MPE 60°it still needs six cameras. As a plus, the recognition devices are traditional single camera capturing devices. This might be beneficial if the number of recognition stations is much higher than the number of enrolment ones. The sensors used for the two MPER settings use four or less cameras.
Along to MPER, there are also other finger vein recognition systems that utilize multiple perspectives during enrolment and recognition. This part of the experiments tries to compare such existing multi-camera systems, that is [24][25][26], to MPER. Again, the main focus of this comparison is to show the influence of the configuration (arrangement of the cameras) of the acquisition devices. Therefore, the evaluation is reduced to a comparison of the camera systems itself. The experiments apply the same methodology as used for MPER: All relevant probe and enrolment samples are compared to each other combined with MaxSLF the get the final comparison score. Using the original algorithms/software as proposed in the original papers would even undermine the results of the intended analysis of the capturing devices. Anyway, for none of the three systems a reference implementation is provided. Therefore a comparison to the original systems would be a difficult task.
The systems taken into consideration are three 3-camera systems. The first one has been proposed by the University of Twente [24]. Its cameras are positioned towards the palmar view (0°) and under an angle of 22.5°in both directions from the palmar view. As the PLUSVein-FR data set provides only perspectives in steps of 1°, the cameras used in the experiments are placed at À 22°, 0°and 23°. The second capturing device [25] is from Global ID SA, a commercial company. In principle the design of the sensor is identical to that of Twente. Only the distance between the cameras (45°) is larger than for the device from Twente (22.5°). The last capturing device was proposed by the South China University of Technology (SCUT) [26]. Its three cameras are positioned equally distributed around the finger which results in a rotational distance between two adjacent cameras of α ¼ 120°. Table 4 states the details of the three camera systems. For more information on the capturing devices, the interested reader is referred to the original publications. TA B L E 3 Details on the selected settings: Number of acquired enrolment samples (n), distance between enrolment perspectives (α), number of generated pseudo perspectives during enrolment (N p ), the number of recognition perspectives (m), the rotation angle that is used to generate the pseudo perspectives for PM-MPE (φ), the distance between the recognition cameras for MPER (φ), the maximum distance between the recognition and the closest enrolment perspective (δ max ), the number of comparisons for one recognition attempt (N c ) and the total number of cameras needed for one enrolment and one recognition device  [24] (Twente), [25] (GlobalID) and [26] Figure 9 shows the results for the three camera systems. The green line with filled diamonds depicts the results for the Twente sensor. Around the intended acquisition perspective (palmar view, 0°or 360°), this sensor configuration shows good recognition results with EER's below 0.5%. In the experiments, this performance keeps quite stable up to �55°. Starting with this rotation angle, the performance starts to degrade. Around �70°the recognition rate drops rapidly, arriving at EERs between 45% and 50% for rotations above �85°. This is the expected behaviour. For a rotation of 70°, the distance of the closest enrolment and recognition camera is 25°( closest enrolment camera at 23°, recognition camera at 48°) which is, according to [22], close to the maximum rotation MC together with CPN can compensate.

Method
The red line with the empty squared markers shows the results for the GlobalID sensor. Its behaviour is akin to the one of Twente. The difference is, that the range, in which it delivers good recognition results, is larger: For GlobalID, the performance degradation starts around 90°and drops rapidly beginning at 115°. The reason for this is the larger distance between the cameras. The chosen 45°is the largest possible distance the utilized recognition scheme (MC in combination with CPN) can handle.
The cameras of the SCUT sensor are positioned at the palmar view (intended acquisition perspective), 120°and 240°. Its results are visualized as ochre line with asterisks markers. Up to a longitudinal rotation of �25°from the palmar view, the system shows good recognition rates. Starting at this angle the performance starts to degrade quickly, reaching EER's >40% between 40°and 80°. When the finger is rotated further towards 120°, the performance improves to the same level as achieved at 0°. That is the expected behaviour because the cameras are evenly distributed around the finger (0°, 120°and 240°), and thus for a finger rotation of 120°the same views as for 0°are acquired. The same behaviour can be observed at 240°.
It is obvious that none of the three camera systems achieve rotation invariance when applying the same methodology as used for MPER. The capturing devices proposed by the University of Twente [24], GlobalID [25] and SCUT [26], only show a good performance in the range that is covered by enrolment and recognition cameras while MPER-90°/2 achieves EERs below 4% and MPER-120°/3 even below 3% all around the finger. Again, please note that for the experiments only the capturing devices itself are evaluated using the same methodology as for MPER, and not the whole finger vein recognition systems as proposed in [24][25][26]. As a result of this, the comparison to the three capturing devices (Twente, GlobalID and SCUT) is not quite fair (the Twente and GlobalID sensors were not built with rotation invariance recognition all around the finger (360°) in mind and the SCUT sensor was developed for a different CNN based recognition system). But still, the results clearly show that the positioning of the cameras in the enrolment and recognition capturing devices is an essential factor for achieving rotation invariance in finger vein recognition. Especially the comparison to MPER-120°/3 is interesting: The sensors of Twente, Glob-alID, SCUT and MPER-120°/3 use three cameras for both, enrolment and recognition, but only MPER-120°/3 (using a really simple recognition scheme: MC features, CPN and MaxSLF) achieves rotational invariance. The invariance for MPER-120°/3 is achieved by placing the cameras in a way that the maximum distance between the closest enrolment and recognition perspectives stays below the distance the recognition system can handle.

| Runtime analysis
As MPER introduces additional processing steps compared to simple single-camera systems, the runtime costs are relevant in a practical application. In this analysis the focus is set on the recognition step as this is more important to end users than enrolment (enrolment is executed only once whereas recognition is executed n times). The runtime of the best settings with respect to the recognition performance (MPER-90°/, MPER-120°/3, MPE 45°and PM-MPE 60°) are compared to single perspective systems with no correction, applying the Fixed Angle approach and the combination of EPN and the Fixed Angle method. As shown in [41], the systems like (PM-)MPE and MPER are applicable to all simple vein pattern based methods. Therefore, the runtime evaluation has been done for three such recognition schemes: MC, PC and WLD. Note that the implementations of the recognition algorithms used in these experiments are not optimized for runtime performance. Hence, the determined durations are only indicators for the additional costs imposed due to the evaluated approaches. As there are no reference implementations for [24][25][26] available, a runtime analysis for these systems is not possible.
The relevant processing steps to evaluate are pattern normalization (PN), pre-processing (PP), feature extraction (FE), comparison (CMP) and MaxSLF. While PN and MaxSLF are independent, the other three (PP, FE and CMP) are F I G U R E 9 Comparison of performance results (EER) of other multicamera recognition systems to MPER PROMMEGGER AND UHL -285 dependent on the used recognition scheme. For the estimation of the runtime of every processing step, the average time needed for 1.260 repetitions (number of images of each perspective from PLUSVein-FR) has been calculated. For the scheme independent processing steps it is worth to mention, that CPN is more than 12 times faster than EPN (7 ms instead of 87 ms). The reason for this is, that the arc length of ellipses cannot be calculated directly. The MaxSLF is very fast, regardless of the number of scores involved (always < 0.001 ms). Table 5 lists the average processing times for the method dependent steps in the recognition tool-chain.
The number of times a processing step is executed varies between the different approaches. for example the fixed angle approach does not need any pattern normalization, whereas MPER-120°/3 needs to execute it three times (it acquires three samples for recognition). Table 6 lists how often each step needs to be executed for the different approaches. The runtimes determined for the different approaches are given in Table 7. The first line holds the results for a single perspective system without any rotation detection or compensation. One recognition attempt for PC and WLD needs only around 50 ms whereas for MC one try takes 315 ms. These results also serve as a reference for calculating the relative increase of the runtimes (RI) of the other methods. RI is calculated as where t ref is the execution time of the reference method (single-perspective system without rotation correction) and t cur the time of the evaluated system, respectively. The runtime for the fixed angle approach increases only minimally by two comparisons and the three scores MaxSLF. Combining the fixed angle approach with EPN adds another 78 ms, which more than doubles the execution time for PC and WLD. MPE 45°is slightly slower than No Correction and the Fixed Angle approach, but considerable faster than Fixed Angle combined with EPN. This is due to the additional comparisons and CPN (CPN is faster than EPN). PM-MPE 60°is, due to the additional comparisons to the pseudo perspectives, slower than MPE 45°, but still faster than Fixed Angle with EPN. As a result of the additionally captured probe samples, the two MPER setups, MPER-90°/2 and MPER-120°/3, are noticeably slower than the other approaches. However, MPER-90°/2 is still faster than Fixed Angle combined with EPN for PC and WLD.
All three multi-camera systems (MPE, PM-MPE and MPER) have the potential to improve their runtimes by means of parallelization. With the exception of the MaxSLF, all steps can be carried out in parallel. With (PM-)MPE, this affects only the comparison step, with MPER also the processing steps of the input images (CPN, PP, FE). Table 6 shows how often the work steps will be carried out simultaneously and therefore indicate the possible time savings. If the execution times are calculated without taking overhead costs into consideration, this would result for the same runtimes for all three approaches. The resulting runtime would be comparable to that of a single perspective system without rotation correction or the fixed angle approach. This clearly shows that all three approaches have the potential to be used in real-world applications.

| DESIGN PROPOSAL FOR MPER CAPTURING DEVICES
Multi-perspective finger vein recognition systems require the acquisition of the vein pattern from different views of the finger. Currently, there exist only a few devices, for example [24][25][26]29], that are capable of doing so. The data used for the evaluation of MPER was not acquired using a dedicated capturing device, but has been simulated by taking data from the PLUSVein-FR data set. However, in order to apply MPER in practise, appropriate capturing devices are required. In this section possible sensor designs for MPER are discussed. MPER-90°/2 needs a four-camera device for enrolment and a two camera one for recognition, respectively. For MPER-120°/3 two three-camera devices are required. Figure 10 shows the possible sensor designs for MPER-90°/2. For the enrolment device (left), each of the four acquired perspectives has its own camera (C1-C4). They are placed equidistant with a rotational distance of 90°between each other. There is only one illumination module (L1 and L2) for every two cameras. It is placed on the opposite side of the relevant cameras. L1 illuminates the finger for C3 and C4 and L2 for C1 and C2, respectively. The angle of incidence from the illumination modules to the relevant cameras is α 2 . The recognition device on the right side consists of one illumination module and two cameras. The two cameras (C1 and C2) are placed φ ¼ 45°from each other. The illumination module is positioned in a way, that the angle of incidence to both cameras is φ 2 ¼ 22:5°. Possible designs of the sensors needed for MPER-120°/3 are visualized in Figure 11. For the enrolment device (left), the three cameras are positioned equally distanced with α ¼ 120°a t 0°, 120°and 240°. Every camera has its own illumination module which is placed on the opposite side of the finger. The recognition device on the right side consists of one illumination module and three cameras. One camera (C2) is placed opposite of the illumination module. The two other cameras (C1 and C3) are rotated by φ ¼ 40°from C2 to the right and the left. Similar devices have already been built: The enrolment device corresponds to that of SCUT [26] and the recognition device to the one of GlobalID [25], respectively.

| CONCLUSION
We presented the novel multi-camera finger vein recognition system MPER. The system acquires multiple perspectives for enrolment and recognition. The capturing devices used are designed in such a way that the rotational distance between the closest enrolment and recognition sample as well as the number of perspectives involved is kept to a minimum. As a result of this the two capturing devices differ. The experiments showed that rotation invariance can be achieved by using as little as three cameras in both devices. The processing steps are very simple. In the course of biometric recognition, the binarized vein pattern of all enrolment and probe samples are compared with each other. The final biometric candidate score is determined by applying a maximum rule score level fusion. The simplicity of the processing chain is also a strength of the proposed method. In contrast to other more sophisticated multi-camera recognition systems, for example [26], implementations for all processing steps (pre-processing, feature extraction and biometric comparison) are available (besides others also the one presented in [40]). The runtime analysis showed that MPER achieves comparable results with existing solutions and has the potential to be used in real-world applications. The capturing devices for MPER were not actually built but have been simulated using the PLUSVein-FR dataset. This allowed us to evaluate a lot of different sensor configurations. Interestingly it turned out that for some of the configurations, sensors built for other recognition schemes could be used. For example, for MPER-120°/3 the sensor built by SCUT [26] can be used for enrolment and the one proposed by GlobalID [25] for recognition.
A drawback of multi-camera recognition systems compared to traditional single-camera systems is the increased cost and complexity of the capturing devices. Depending on the application, one has to decide which of the factors is more important. Systems that operate in a controlled environment, in which all users are cooperative and habituated, single-camera systems will be sufficient. The more freedom the user is given during acquisition (contactless, on-the-move), the more finger misplacements, including longitudinal finger rotation, will occur. In such environments, the added cost and complexity of using multi-camera systems will be justified.
Comparison with other multi-camera recognition systems shows how important the positioning of the cameras is. F I G U R E 1 0 Possible multi-camera set-ups for MPER-90°/2. Left: The enrolment device consists of four cameras that acquire the vein pattern all around the finger. The cameras are spaced equally distanced all around the finger. Right: The two cameras of the recognition device cover the range between two enrolment cameras F I G U R E 1 1 Possible multi-camera set-ups for MPER-120°/3. Left: The enrolment device consists of tree cameras that acquires the vein pattern all around the finger. The cameras are spaced equally distanced all around the finger. Right: The three cameras of the recognition device cover the range between two enrolment cameras PROMMEGGER AND UHL Recognition systems that acquire multiple vein images for enrolment, but just a single one for recognition [27,28] achieve similar recognition rates all around the finger than MPER. The advantage of such a system is that the cost of the sensors used for recognition (using only a single camera) is kept low. However, the devices for enrolment (requiring at least six cameras) are complex and expensive. Comparison with capturing devices of recognition systems, which also use multicamera devices for both, registration and recognition, shows the importance of the arrangement of the cameras. Like MPER-120°/3, all three investigated systems [24][25][26] use three cameras for both devices. When using the same methodology as proposed for MEPR for all sensors, only MPER-120°/3 achieves rotational invariance.
In our future work we plan to actually build the capturing devices needed for MPER-90°/2 and MPER-120°/3, and put the systems into operation. We will focus on the usability of the system, especially the time needed for a recognition attempt will be optimized for real world applications. Furthermore, we want to analyse existing multi-camera systems, using the originally proposed algorithms, with respect to their rotational invariance. Furthermore, motivated by the work of Kang et al. [26] and Xie and Kumar [45], we want to evaluate the rotation invariance of CNN-based recognition systems and, if they show good performance, whether such systems can help to simplify the required capturing devices (reduce the number of needed perspectives).