Personal identification based on acoustic characteristics of outer ear using cepstral analysis, Bayesian classifier, and artificial neural networks

Russian Ministry of Science (information security) Abstract A hypothesis is discussed concerning the use of echograms of the external auditory canal for personal identification. The authors have developed a device for measuring the acoustic parameters of the external auditory canal. Obtained echograms can be used as biometric patterns for identification and authentication of subjects. Two types of biometric parameters are considered based on spectral and cepstral analyses of echograms. The authors used two approaches for recognizing ear patterns: the first was based on Bayes' formula and the second on artificial neural networks (convolutional and fully connected). The Bayesian classifier has been found to show a lower percentage of identification errors with an equal error rate (EER) = 0.0053. The best result for neural networks was EER = 0.0266. An experiment the authors repeated with the same subjects six months after the initial data collection showed insignificant deviation in the number of wrong decisions (EER = 0.008).


| INTRODUCTION
Biometric identification methods 'bind' digital accounts to a specific person. Because of these unique properties, biometrics has become a valuable asset for cybercriminals. The malefactor can gain access to all the personal accounts associated with a user's compromised biometric pattern by taking possession of the biometric data of the victim. In this regard, trust in biometric systems is largely determined not only by the number of error decisions (accuracy of person recognition) but also by system resistance to forgery acceptance (digital or physical 'fakes' of biometric patterns) as well as the ability to hide the biometric pattern from outside observation to protect it from becoming compromised.
Currently, there are several standards to protect biometric templates from becoming compromised when storing and transmitting their digital copies via communication channels (ISO/IEC 19792: 2009, ISO/IEC 24761: 2009, ISO/IEC 24745: 2011 standards, GOST R 52633 series) and to protect biometric scanners from submission attacks (spoofing) aimed at sensor fraud (ISO/IEC 30107 series).
Open biometric patterns (e.g. fingerprint, iris, face) are 'in sight', so they are compromised in the natural environment. An intruder can steal biometric characteristics from the owner using non-contact or hidden methods (e.g. from a door handle or photographs). The proposal is to use the data on the internal structure of the outer ear obtained using echography as biometric patterns. The individual characteristics of the ear canal of subjects are hidden from direct observation and cannot be copied by photographing. The 'plane' pattern of the ear is not informative enough for making a 'fake'. external ear, one can propagate acoustic waves through the ear canal; these waves then distort as they are reflected off the walls of the canal. The reflected signal will be different from the original. The differences are due to the individual characteristics of the human auricle and ear canal. The parameters of the echo signal or its transfer function may contain information about the geometry of the auditory canal and auricle, so they can represent a vector of biometric parameters (features) characterizing the structural features of a person's external ear.
The resonance of the auricle is about 5 kHz on average, and the resonance of the ear canal is 2.5 kHz [2]. A person can hear sounds in the range from 16 Hz to 20 kHz. However, it is believed that the human ear is most sensitive to acoustic vibrations with average frequencies from 1 to 5 kHz (this includes speech frequencies [3]). As a rule, when exposed to a sound signal on the walls of auditory canal, low frequencies (up to 1000 Hz) do not cause resonance. Therefore, they are less informative in identifying the distinctive features of the ear's structure. The upper limit of probing sound signal frequency is limited by the characteristics of the sound reproduction and recording system (useful information can also be present in overtones, so it makes sense to analyze high frequencies).
In addition to the geometry of the external auditory canal, otoacoustic emissions (OAEs) affect the registered echo signal. OAEs are faint sounds (less than 20 dB). These sounds are recorded in the external auditory canal but originate in the cochlea of the inner ear as a by-product of the work of the outer hair cells to increase the oscillations of the cochlear basilar membrane. OAEs can be further categorized as spontaneous OAEs, which appear spontaneously, and evoked OAEs, which arise in response to the impact of a sound stimulus in the ear canal. It is only possible to register an OAE in a hermetically sealed ear canal using a highly sensitive microphone. The possibility of registering a caused OAE in clinical practice is limited to the 0.5-8 kHz frequency range. An OAE is a component of the recorded echo signal because it 'merges' with the reflected signal. The influence of the individual features of the OAE on the echo signal is insignificant when a loud sound stimulus is used.

| FACTORS AFFECTING BIOMETRIC SIGNAL QUALITY
Let us consider hypothetical factors that can affect parameters of the echo signal, its parameters, and the signal-to-noise ratio. These sources of distortion can be divided into the following categories: otological (due to the anatomy or pathologies of the ear), technical (due to the peculiarity of the equipment used), and those associated with conditions of use (including the 'human factor').
Otological factors include the following: • Conductive hearing loss is a hearing disorder that makes it difficult for sound waves to travel from the outer to inner ear. Disorders of the outer ear (e.g. neoplasms, abscesses) can cause the greatest difficulty in possibly using the proposed method, since the signal will be distorted because of the curved geometry of the auditory canal. If the eardrum or auditory ossicles are affected, the user may feel discomfort and pain when listening to the audio signal. These disorders rarely affect both ears at once, so the subject can use the healthy ear during the identification procedure (as will be shown below, the number of recognition errors increases by about four times).
• Sensorineural hearing loss is a hearing loss caused by damage to the structures of the inner ear, the vestibular cochlear nerve, or the central parts of the auditory analyzer. If the inner ear is damaged, there may be no OAE. That can only slightly affect the characteristics of the echogram. Complete or partial deafness is a factor that does not affect the parameters of the echo signal but can bring inconvenience to the user.
• Sulphur plugs-a pathological condition characterized by blockage of the lumen of the external auditory canal with a mixture of earwax, dust, and dead skin particles. Over the time, sulphur accumulates in the auditory canal, but the effect of this factor on the parameters of the echo signal is minimal (as will be shown further, the accuracy of user identification remains almost unchanged six months after training the system). If the user has a sulphur plug at the stage of identification that was absent at the stage of training the system (or vice versa), then the rate of false rejection errors (FRR) to the user increases, but this cannot prevent authorization completely and does not affect the false acceptance rate (FAR).
If FRR is increased to an uncomfortable level due to otological factors, retraining the biometric system will correct the situation.
Technical factors are determined by the following equipment parameters: (1) microphone sensitivity; (2) internal noise of microphone; (3) range of reproducible frequencies of the speaker and sound card; and (4) soundproof properties of the headphone housing. The degree of influence of the last factor has not been studied. The use of one model of equipment during the training and identification phases guarantees invariability of the signal.

Conditions of Use
The environment does not affect the echo signal when using headphones with a housing that has good sound insulation properties. Another reason for echo signal distortion during identification can be a loose-fitting headphone housing, active body and head movements, or food intake (e.g. jaw movements). However, these factors are eliminated by the execution of some simple requirements.

| DATA SET OF SUBJECT ECHOGRAMS
A device has been developed for recording the biometric characteristics of the ear (see Figure 1) that consists of two electret microphones (with a noise of 36 dBA, a sensitivity of 60 mV/Pa, and a frequency range of 20-20000 Hz), a soundinsulating housing (in the form of headphones), a shielded copper wire, two speakers (with a power of 0.5 W and a frequency range of 850-20000 Hz), a plug (3.5 mm), and a sound card from CREATIVE (with a sampling frequency of 44000 Hz and a bit depth of 24 bits). Biometric data were collected from 75 people (men and women in approximately equal proportions, aged 18 to 40 years and without otological pathologies). Each subject was asked to listen to a mono sound signal υ(t) of increasing and decreasing frequency (sliding modulated sine) obtained by linear frequency modulation (chirp), where t is the time in discrete form. The signal frequency varied in the range from 1 to 14 kHz, the signal duration was 10 s, and the average signal volume -80 dB. The signal was reproduced through two loudspeakers (for the right and left ear). The echo signal was simultaneously recorded by microphones mounted in the headphone housing. The sampling rate of the recorded echo signal was 44 kHz (the recording was performed in mono mode). All participants listened to a signal 15 times, each time taking off the headphones and then putting them on again (to account for the dependence of the echo signal on the mounting).
The recorded echo signal u i,k (t) can be called an echogram of the outer ear or an acoustic pattern of the subject's ear, where i is the subject's number and k is the number of the input attempt. The generated data set is presented in the form of a set of WAV files (mono, 44 kHz, 16 bit) and is in the public domain [4].

| ANALYSIS OF SUBJECT ECHOGRAMS
Because the input devices and signal υ(t) were identical for all subjects, to search for individual differences in the structure of the subjects' ears, one can directly analyze the parameters of the echo signal u i,k (t) without constructing transfer functions based on u i,k (t) and υ(t). The short-time Fourier transform (STFT) was used in the work to analyze the signals.
The signals u i,k (t) as well as their spectrograms S i,k are not very informative-the differences in the otoacoustic responses of different subjects are hardly distinguishable. The echo signals are generally variable across repetitions because they depend on the headphone mounts (see Figure 2). The timescale of the chirp signal is rigidly related to the frequency scale (see Figure 2). The spectrograms were transformed into the amplitude spectrum Ā 0 averaged over all windows (over all time intervals) (see Figure 3) to extract useful information from the acoustic signal and reduce the variance of random emissions when the signal is decomposed into Fourier series (1,2,3,4): SðτÞ ¼ fA ν 1 ;τ ; …; A νw size 2 ;τ g ð3Þ where Q is the number of time intervals of T seconds by which the echo signal u i,k (t) is divided considering the window size W size = 65536 and the step W step = 32768; ν k is the frequency of the k − th harmonics (in discrete form); A ν,τamplitude of the harmonic with frequency ν corresponding to the time interval numbered τ; A 0 νaverage amplitude of harmonics with frequency ν. Since the frequency of the chirp signal υ(t) varied in the range from 1 to 14 kHz, the analysis was limited to this frequency range. So with a window size Wsize = 65536 (T = 1.49, v1 = 0.67), harmonics with a frequency from 1000.98 to 13998.31 Hz were accounted for, and therefore, the spectrum A 0 is composed of 19400 averaged amplitudes taking into account the harmonics with numbers 1494 ≤ k ≤ 20893. The size of the window was chosen so that it was possible to analyze the signal frequencies u i,k (t) with an accuracy of at least 1 Hz. The results of the described transformation, shown in Figure 3, are quite robust. The averaged spectrum A 0 i;k is much more informative than the echo signal u i,k (t) and its F I G U R E 1 A device for recording acoustic patterns of the outer ear SULAVKO ET AL.
-3 spectrogram S i,k (τ): the spectra A 0 i;k differ for different subjects, as shown on the graphs (see Figure 4).
The cepstrograms K i,k were constructed (see Figure 5) to reveal the local features of the averaged spectrum. Usually the cepstrum is understood as the result of the inverse Fourier transform of the logarithm of the signal power spectrum [5]. However, it is proposed that cepstrograms can be obtained by applying STFT without taking the logarithm of the values of A 0 i;k by analogy with how this was done with respect to the initial signal u i,k (t) when constructing the spectrogram. So the frequency scale ν of the spectral function A 0 i,k (ν) was taken as the timescale, and the averaged spectrum was divided into frequency intervals Δν ¼ W * size =19400 in accord with the window size W * size and the step W * step . Further, each interval was expanded into a Fourier series. Amplitude spectra or spectra of cepstral coefficients C κ,ι -the aggregate of which is a cepstrogram (5)-were plotted for the Δν intervals: where ι is the number of the frequency interval and κ is the number of the cepstral coefficient corresponding to the F I G U R E 2 Echo signals (left) and their spectrograms with short-time Fourier transform parameters: rectangular window, W size = 65536, W step = 32768 (right)

F I G U R E 3
The process of obtaining the averaged amplitude spectrum of the echo signal u i,k (t) frequency κ Δν . The following optimal parameters for STFT were established during the empirical research: W * size ¼ 16 and W * step ¼ 13. A higher number of test subject identification errors was observed with W * size ≤ 16, and the number of errors did not decrease with W * size > 16. The W * step was set so that there was little overlap of windows.
The cepstrogram shows distinctive features for different subjects, which are difficult to notice on the averaged spectra (see Figure 5).
Note that the information content cepstrogram and averaged spectrum depends on the type of window function. One can obtain more information about the structural features of the ear canal of the subjects (see Figure 5) by combining different types of windows at the stage of calculating the averaged spectrum (W type ) and cepstrogram (W * type ).When the u i,k (t) and A 0 (υ κ ) functions decompose in Fourier series, different window functions can be optimal. The study used the following types of windows and their combinations: rectangular, Barletta (triangular), classical Gaussian (with a shape parameter p = 1), Laplace, Gaussian parametric (with a parameter value of p = 1.5), Blackman, and Hamming.

| FEATURE EXTRACTION
The indicators of the amplitudes of spectra and cepstrograms (hereafter referred to as spectral and cepstral features) can be used directly as features. However, the dimension of the space of spectral and cepstral features without additional processing will be large (n = 19400 and n = 11040, respectively). As empirical studies have shown, the number of spectral features can be reduced without loss in identification reliability to n = 970 if energy indicators (sums of 20 amplitudes with close frequencies) are used as features a j (6): The quantity of cepstral features was managed to reduce them by only 2 times (to n = 5520) without reducing the reliability of recognition of the subjects by only a factor of 2 (up to n = 5520), using the following energies as features (7): Each feature can be considered a random variable. The study showed that the laws of distribution of the values of the features under consideration are close to normal for most of the subjects (which was checked on the basis of the Pearson chi-square test).

| SUBJECT IDENTIFICATION BASED ON NAIVE BAYES CLASSIFIER AND EXPERIMENTAL RESULTS
The scheme of the 'naive' Bayesian classification can be reduced to the following algorithm [6,7]. The posterior probabilities of hypotheses are calculated in n steps. Each hypotheses is associated with a certain user registered in the system (n is the number of features). In this work, the next rule was followed: with a 'naïve Bayes classifier', the prior probabilities of the competing hypotheses are considered equal in the absence of any information to the contrary. At each step, the posterior probabilities are recalculated according to Equation (8), and the posterior probability calculated at the previous step is taken as the prior probability. The decision is made in favour of the hypothesis with the maximum posterior probability at the last step: where N is the number of hypotheses (identified subjects, N = 75), P h (a j ) is the posterior probability of the h-th hypothesis depending on the j − th feature, (P h (a 0 ) = 1/N), p h (a j ) is the conditional density of the probabilities of the h − th hypothesis at the j − th step of the classification. Probability densities p h (a j ) in Bayesian classification can be used instead of conditional probabilities (dimensionless values in the numerator and denominator are cancelled in the calculations, and P h (a j ) takes values from 0 to 1). Because in a simplified form the features can be conditionally described by the probability density function of the normal distribution law, p h (a j ) were calculated by Equation (9): where μ h,j is the mathematical expectation of the values of the j − th feature, the characteristic of the subject under the number h, σ h,j is the standard deviation of the values of the j − th feature and of the subject under the number h. To train a 'naive' Bayesian classifier means to calculate the parameters μ h,j and σ h,j based on 'the data of the training sample of data subjects associated with the corresponding hypotheses'.
An experiment to identify subjects was carried out. A Bayesian classifier was built that was trained on eight examples of an echogram from each subject. Other examples were used for testing. The probability of errors was calculated as the ratio of the number of incorrect decisions (when there was a feature of an erroneous hypothesis) to the total number of experiments. Since identification was carried out on a closed set (at N = 75), an incorrect decision simultaneously led to 'false rejection' and 'false acceptance'. In this regard, the percentage (probability) of errors calculated by the criterion of 'maximum posterior probability' (on a closed set) will be numerically equivalent to the equal error rate (EER) at the threshold P h (a j ) = 0.5 (only one hypothesis can overcome it).
The equivalence of the 'maximum posterior probability' and EER criteria can be illustrated by calculating errors as follows. The "false rejection" error should be recorded when the posterior probability of the hypothesis associated with the subject's stated login (number) does not exceed the threshold. The "false acceptance" error should be recorded when the posterior probability of any other hypothesis overcomes the threshold. Figure 6 shows that at the threshold P h (a j ) = 0.5, the EER = FRR = FAR equality holds (this is only true if we are talking about a 'closed' set of classes). We can also conclude that the error indicators are very difficult to balance when using a Bayesian classifier-the ratio of FRR and FAR nearly does not change over the interval (0; 1).
The results for the identification of the subjects based on the parameters of one ear are presented in Table 1 (see  Tables 1 and 2). As you can see, the averaged spectra are less informative than cepstrograms (cepstral features give a lower percentage of errors). The best results are obtained by combining one of the Barlett, Blackman, or Hamming windows with a rectangular window (see Table 2). Figure 7 shows the dynamics of change in error probabilities depending on the number of features. When the Hamming window is combined (at the stage of calculating the spectrum) with a rectangular window (at the stage of calculating cepstrograms), the lowest level of errors EER = 0.0239 was achieved with the least number of features n = 2326. Thus, the use of 2326 first cepstral features is optimal for the data set under consideration. Figure 7 shows the error rates when combining 2326 cepstral features (Hamming window + rectangular window) for signals from the right and left ears, respectively (n = 4652). With identification in the 'two-channel' mode (when two ears of the subject are probed at once), the probability of errors becomes much lower and amounts to EER = 0.0053.
A similar experiment was carried out but in the verification mode using the cross-validation method. In the verification mode, the separate Bayesian classifier was trained for each user, and two hypotheses were determined (N = 2): 'genuine' (hypotheses are associated with the subject whose login/number is declared) and 'impostor' (hypotheses are associated with the general population of all possible users). For the 'genuine' hypothesis, conditional probability densities p h (a j ) were calculated that considered the parameters μ 0,j and σ 0,j of a particular subject. For the 'impostor' hypothesis, the parameters μ 1,j and σ 1,j had been preliminarily calculated based on a data sample from other subjects (accounting for the hypothesis of normal distribution of feature values). The training of Bayesian classifiers was carried out on the data of 50 subjects ('genuine') using eight random examples of echograms from each. To calculate FRR, seven test examples of echograms of 'genuine' that were not used in training were taken. Echograms of the remaining 25 subjects ('impostors') were used in testing to calculate FAR.
The experiments were repeated three times-each time new set of 'impostors' that did not intersect with the previous one was determined. Then the probabilities of errors were calculated at the threshold P h (a j ) = 0.5 (Figure 8). These probabilities at n = 4652 were FRR = 0.1028 at FAR < 0.0001 (no "false acceptance" errors were recorded in the experiment). To determine FAR with higher precision, a much larger sample size is needed. However, the obtained result can be called very optimistic because it satisfies practical goals (for biometric  A natural experiment with the same subjects was repeated after six months from the initial data collection. The Bayesian classifier trained on a 'closed' class set six months earlier was used. Each subject entered 10 examples of ear biometric data and then was identified. According to the results of the experiment, only six errors were recorded in 750 experiments (EER = 0.008), which is insignificantly higher than the initial indicator (EER = 0.0053). We can conclude that the properties of the external auditory canal suitable for person identifying do not change noticeably over time (moreover, they are not affected by the accumulation of sulphur in the ear canal).

| SUBJECT IDENTIFICATION BASED ON CONVOLUTIONAL AND FULLY CONNECTED NEURAL NETWORKS AND EXPERIMENTAL RESULTS
Researchers usually try to increase the number of layers of neural networks and reduce the number of features when using neural networks in classification problems. The larger the dimension of the input data is, the large the number of parameters the network must efficiently analyze. Neural networks with a large number of parameters can potentially provide higher solutions accuracy. At the same time, the volume of the training sample increases. Building an optimal architecture is a compromise amongst the dimension of the input, the size of the network, and the size of the training sample.
The eight different architectures (six using convolutional layers and two without) of artificial neural networks (ANN) with different configurations (activation functions and other layer parameters) focussed on the processing of both averaged spectra and cepstrograms was formed. The use of convolutional neural networks is due to their ability to extract highly informative features from graphical objects and time series. Neural networks were implemented as a software module using the Keras.Net deep learning library (an extension of the Keras library for the VisualStudio development environment and the C# programming language).
Before the biometric data entered the ANN, the data were prepared: The averaged spectra were calculated with the next parameters: W * type -Hamming window, W size = 65,536, W step = 32,768. Then the spectra were interpolated (because the initial length of the spectra was too large) to l = 300, l = 600, or l = 1200 amplitudes, and the amplitude values were reduced to the interval [0; 1]; • The cepstrograms were calculated using a rectangular window (W * type ) with different options for window size and step (W size = 8, W step = 6 or W size = 16, W step = 13 or W size = 32, W step = 24). The values of the cepstral coefficients were reduced to the interval [0; 1].
The volumes of the training and test samples were similar to the previous experiment (eight examples for training and seven for testing from each subject). ANN was trained using the Adam optimizer. The ANN was trained for 25 epochs, and then the intermediate testing was performed. When neural network overfitting occurred, the process was stopped, and for the considered ANN model, the best result was recorded.
The three best ANN models were selected by comparing the accuracy of identification of 75 subjects (see Figure 9 and Table 3.). The identification results are shown in Figure 10.   16, 24, 30, etc. examples, the time for training the biometric system will increase to 5 min, 10 min, or more. But a training procedure should be quick and automatic. A possible solution is the development of special methods for augmentation of echogram data, which should be a separate study.
Setting up a Bayesian classifier is easy to automate, while ANN training should monitor the occurrence of overfitting. The Bayesian classifier can also be easily used in the verification mode where there are only two hypotheses -'genuine' and 'impostor'). The 'direct' training of a neural network to solve the problem of pattern verification on an 'open' set of classes is difficult because the training set for 'genuine' will be much smaller than the training set for 'impostor'. Such experiments were also carried out but were not successful. It is proposed that the apparatus of multilayer neural networks and the Bayesian classifier is integrated by creating a pretrained deep autoencoder network that will extract the most informative meta-features from large dimension cepstrograms. After processing cepstrograms using a neural network, meta-features will be served to the input of the Bayesian classifier. Such scheme could potentially work much more efficiently. In further studies, testing schemes with preliminary training is planned using such methods of feature extraction as i-vector, d-vector [8], which are F I G U R E 1 0 Receiver operating characteristic curves based on the results of the identification patterns of two ears: (a) First artificial neural network (ANN) after 600 learning epochs (cepstrograms, W size = 8, W step = 6); (b) Second ANN after 50 training epochs (averaged spectra, l = 600); (c) Third ANN after 100 training epochs (averaged spectra, l = 600)) traditionally used in speech and speaker recognition problems.

| COMPARISON OF OBTAINED RESULTS AND RESULTS OF EARLIER PUBLICATIONS
The use of acoustic (echographic) properties of the ear for personal identification is a relatively new direction. Therefore, not many results can be cited. The main ones of these are presented in Table 4. For comparison, the most representative results of subject recognition based on 2-D and 3-D visible images of the outer ear (auricula) are also shown. However, it should be borne in mind that the auricula is often openly visible, making it easy to falsify. In addition, pattern recognition methods are subject to a number of inherent problems associated with pattern quality (change in perspective, occlusion and change in lighting conditions [9]), although in recent years these problems have been addressed [10]). The acoustic approach considered has a number of fundamental advantages (concealment of the biometric data from observation and listening, absence of problems with pattern quality) over approaches based on the visible appearance of the ear. In the present work, the obtained result exceed those previously achieved.

| CONCLUSION
The ear canal can be thought of as a resonant system. If you act on it with sound waves, then the waves reflecting from the walls of the channel change the amplitude-frequency characteristics. As a result, the reflected signal recorded on the microphone can characterize the individual features of the structure of the human ear. On the basis of this principle, we have developed a device using the acoustic patterns of the ears of 75 subjects. The parameters of the averaged amplitude spectrum of the reflected signal as well as the parameters of cepstrograms were tested as distinguishing features. At the stage of calculating the average spectrum and cepstrogram, more information about the structural features of the ear canal can be obtained by considering different types of windows. Among the considered window functions (rectangular, Barletta (triangular), classical Gaussian (with the shape parameter p = 1), Laplace, parametric Gaussian (p = 1.5), Blackman, Hamming), it was found that the best results are obtained by combining of the Hamming window with a rectangular window.
Two approaches to the recognition of ear patterns have been tested: first approach based on the Bayesian hypothesis formula and second based on ANNs. The Bayesian classifier showed a lower percentage of identification errors EER = 0.0053. The best result for neural networks was EER = 0.0266. For neural network classifiers, the training sample size of eight examples per subject is rather small. The small sample size is due to the requirements of practice: the learning process of the biometric system should be fast and automatic. Setting up a Bayesian classifier is easy to automate, while ANN training should monitor the occurrence of overfitting.
To test the stability of the derived features over time, an experiment with the same subjects that repeated six months after the initial data collection. This experiment showed an insignificant deviation in the number of wrong decisions (EER = 0.008). It was concluded that the identification properties of the external auditory canal do not change significantly over time (including the fact that they are not affected by the accumulation of sulphur in the ear canal).
The probabilities of errors in the patterns verification mode based on the Bayesian classifier were also calculated: FRR = 0.1028 at FAR < 0.0001 (no "false acceptance" errors were recorded in the experiment). The obtained result can be called satisfying for practical purposes (for biometric systems it is important to have FAR close to zero with an acceptable number of wrong rejects).
It is obvious that when the parameters of the probing υ (t) signal change, the parameters of the echo u i,k (t) signal also change. If υ(t) signal is considered secret information only known to a legitimate subject (by analogy, with a password), the signals υ(t) and u i,k (t) together are an identifier and authenticator. This approach can potentially enhance the protective properties of the proposed identification method.
The obtained results can be called very encouraging. Further research is planned to create an autoencoder that is a pretrained deep network to extract the most informative metafeatures from cepstrograms of echo signals. These metafeatures will be used as input to the Bayesian classifier. We also plan to test schemes with preliminary training using such methods of feature extraction as i-vectors and d-vectors, which could potentially work much more efficiently. For experimental verification of this hypothesis, however, a larger data sample will be required. Testing of cepstrograms obtained from wavelet transforms is also planned.

ACKNOWLEDGEMENT
The reported study was funded by the Russian Ministry of Science (information security), project number 6.