CSWS—Suppress confusion subregion in indoor image fingerprint localization

Image data can provide rich content information, which has attracted a lot of attention in the field of indoor fingerprint positioning. However, indoor image information from different locations is characterized by high content repetition, and these repetitive regions can cause the problem of insufficient differentiation of adjacent image fingerprints or even fingerprint misclassification. To solve this problem, this article proposes a confusion subregion weighted suppression strategy in the image fingerprint database. First, duplicate regions (confusion subregions) are extracted from the fingerprint database using an appropriate salient region detection method. Then the similarity of these duplicate regions in Euclidean space is defined, and the degree of influence of these regions on the distinguishability of the fingerprint database is measured. Finally, the suppression of these duplicate regions is achieved in the original image fingerprint database by introducing weighted suppression coefficients. Experiments show that the algorithm proposed in this article can achieve significant results in the fingerprint localization task of real indoor scenes and effectively improve the localization accuracy.


F I G U R E 1 Repetition area diagram
In the offline phase, image fingerprint localization is performed by designing a location fingerprint, 8 collecting and constructing an image fingerprint database with location labels, and establishing an association between the location label and the image signal at that location. In the online phase, the localization problem is transformed into a matching retrieval problem between the data to be localized and the fingerprint database. 9 The location labels of the most similar images retrieved from the image fingerprint database are output as the location prediction results. [10][11][12] Existing research work focuses on how to extract more robust image features 13 to build a stable fingerprint database of image information in the offline phase; and how to improve the matching retrieval accuracy and efficiency of query images in the online phase. 14 The rise of deep learning in recent years has also seen the emergence of several image representation techniques represented by convolutional neural networks, 15 and it is worth noting that image feature representation methods such as SIFT, 16 which were designed by hand, are still widely used today due to their stable and robust performance. On the other hand, there are also research works that filter the candidate image sets by designing outliers 17 of matched point pairs between images to obtain the most similar image fingerprints and achieve localization output.
However, the report 12 states that on the classic Pittsburgh dataset, 18 these retrieval methods using outliers have a recall @1 of ≈70% but a recall @50 of ≈90%, which indicates that for 20% of the query images this result is unreasonable, and the reason for this result is mainly the neglect of similar regions between different images even after spatial validation such as geometric transformation. Similar to the situation in the report, in indoor scenes, due to the limitation of indoor space, images captured from different fingerprint points in the image fingerprint database also contain some recurring object regions, as shown in Figure 1. The appearance of these recurring subregions makes the similarity between the database image fingerprints increase, but this will result in these fingerprint points lack distinguishability. 19 If such recurring regions are not processed, it will negatively affect the steps of image candidate set generation and further screening to obtain location information. In other words, such repetitive regions can seriously confuse the location recognition judgment results.
The main contributions of this article are summarized as follows.
1. To improve the accuracy of indoor image fingerprint localization, this article deeply analyzes the causes and characteristics of confusion regions of the image fingerprint database, makes full use of the information of repeated regions in images, and proposes an image processing algorithm based on weighted suppression of confusion subregions. 2. To make the image fingerprint have higher distinguishability, we analyze the characteristics of the repetitive areas of indoor image fingerprint libraries and the problems they pose to the positioning task. We proposed a weighted suppression algorithm for confusion subregions, defined formula for calculating the confusion degree under Euclidean space for the confusion subregions in the fingerprint database, and quantified the degree of influence of these repeated regions on the overall fingerprint database; introduced weighting coefficients that are positively correlated with the confusion degree, and achieved weighted suppression of the confusion subregions of the original fingerprint database by the corresponding weighting coefficients. 3. We conducted extensive experiments on three typical indoor scenarios and a public dataset to verify the superior performance of the proposed confusion subregion weighted suppression strategy (CSWS) algorithm in suppressing the interference of confusion regions and improving the accuracy of the localization algorithm.
To better demonstrate our work, the rest of this article is organized as follows. We introduce the related work in Section 2. We perform the problem analysis and give the details of the algorithm proposed in this article in Section 3. We list the specific experimental scenarios and experimental results in Section 4 and finally, we conclude the article in Section 5.

Image fingerprint localization
Depending on the size of the dataset, image fingerprint localization methods can be divided into two major categories: when facing small and medium-scale image datasets, suitable image feature representation algorithms 20 are selected based on specific scenes, and the nearest neighbor algorithm 16 is used to achieve matching retrieval of images and fingerprint location output for localization algorithms; when facing large scale datasets, Li et al. 21 proposed a localization algorithm for outdoor scene database hierarchical structure; Lin et al. 22 proposed a visual localization system for outdoor scene based on convolutional descriptors and global optimization. In the indoor environment, Chiou et al. 23 established a large image dataset ICUBE and proposed a multi-view image fingerprint localization algorithm based on a graph neural network.
Solving the image fingerprint localization problem usually requires two main tasks: (1) characterizing and describing the original image data; (2) identifying a candidate set of database images that are similar to the query image, and further determining the most similar image and the corresponding location information. The first task is a typical problem in computer vision, that is, image representation research. Many research works have proposed image feature representation methods with high differentiation and robustness for different image application scenarios from the features of images such as color and grayscale, such as the scale-invariant feature transform matching algorithm (SIFT). 16 The SURF algorithm 24 with rotation invariance and effective response to illumination changes, as well as traditional image feature description methods such as convolution neural networks 15,25 and other deep learning algorithms in recent years.
Another task is the image matching retrieval problem, where research on the image fingerprinting task focuses on improving matching retrieval performance and increasing the correct similarity of image candidate sets. Work in this area includes methods for image database construction and indexing, with methods for building image visual word encoding libraries 26 and corresponding derived optimization algorithms, 14,27,28 and in recent years incorporating neural networks for retrieval design, 29 and end-to-end image recognition system design. 30 However, by improving the retrieval method only, the spatial structure relationship of the image fingerprint database will be ignored and cannot fully satisfy the need for high precision (low false-positive rate) image localization. Therefore, there are other work designs, such as spatial rearrangement, 31 to obtain the most similar image fingerprints for localization output by calculating the geometric transformation between the query image and the candidate image set, 17,32 and further filtering the images using outliers, thus further improving the accuracy.

Saliency region detection
To obtain these duplicate regions in the database, this article uses the saliency region detection technique 33 for region extraction. Unlike methods such as target identification, 34-36 semantic segmentation, 37,38 and so forth. Salient region detection methods enable the detection and generation of these repetitive subregions at low cost (real information annotations, etc.) and low algorithmic complexity. By using features such as color, intensity, local orientation, and frequency spectrum properties of the image, the saliency level of each region in the image is calculated, 39 and reliable and fast saliency detection can provide valuable reference information for a variety of image tasks with different needs. The existing research algorithms can be divided into two main categories: traditional detection methods and deep learning-based detection methods. In the traditional detection direction, Goferman et al. 40 proposed an environment-aware saliency region rule and detection algorithm in the spatial domain based on multi-scale salient information extraction by pixel contrast; Zhai and Shah 41 proposed a global contrast-based saliency detection method; Achanta et al. 42 proposed a detection method based on image frequency tuning; Hou and Zhang 43 proposed a visual attention mechanism detection algorithm based on spectrum, and these classical detection methods have been used in many years of practical application, and still have good detection performance and use value; 44 in the direction of deep learning-based detection, Zhao et al. 45 introduced a gating structure to balance the coding and decoding problem under multi-scale detection, and designed a more balanced saliency region detection network; Gao et al. 46 improved the computational efficiency of the saliency detection network by designing a new convolutional module and a dynamic weight decay scheme. It is worth noting that the deep learning-based detection methods require a good truth annotation of the dataset in advance, which introduces an extra workload to practical applications.

Problem analysis
Indoor scenes are affected by the limitations of spatial extent, and their image fingerprint libraries collected at different locations often contain many subregions with high content repetition, such as object features like specific markers that recur in images taken at different locations. Take the dataset IndoorB-LivingRoom (see Section 4.1 for details) as an example for observation, as shown in the red circled areas in Figure 2. These red circled areas have a high occurrence rate in the whole database. Faced with this data scenario, the image matching results for two fingerprint points that are far away from each other are shown in Figure 3. As shown by the matching results, this will lead to a large number of matching point pairs still generated between images that are actually far away from each other. This undesired matching will lead to a significant decrease in the differentiation between different fingerprint points, which will affect the similarity metric results in the subsequent matching and retrieval sessions.
In this thesis, these duplicated regions are called confusion subregions. The existence of confusion subregions reduces the fingerprint differentiation of the image fingerprint database, brings serious interference to the fingerprint localization method centered on inter-image similarity, and poses a great challenge to the whole localization task. Therefore, this article proposes an image confusion subregion suppression strategy for indoor fingerprint localization to solve the localization problem caused by such confusion subregions.

Image fingerprint localization system based on CSWS algorithm
The fingerprint localization task can be regarded as the query task of the nearest neighboring fingerprint points at the location of the subject to be located, while the image fingerprint localization system is designed in advance to correspond to F I G U R E 2 Schematic diagram of the content of IndoorB-LivingRoom (partial) F I G U R E 3 Image matching diagram F I G U R E 4 Fingerprint positioning algorithm flow chart the fingerprint reference points in the real scene environment, and the information at each reference point is represented by the image acquired at that point, which is correspondingly called "image fingerprint." The set of image fingerprints corresponding to all reference points is called the image fingerprint information base. The localization is achieved by matching the images acquired at unknown locations with the image fingerprints in the image fingerprint database, which can also be considered as pattern recognition or image matching retrieval. The image-based fingerprint localization system can be divided into two main phases: the offline fingerprint database building phase and the online query localization phase, and the overall structure is shown in Figure 4. First, the specific number of reference points and their corresponding distribution locations are designed and determined in the indoor area to be localized, and the corresponding equipment is selected to collect the raw image data at each reference point.
The scale-invariant feature transforms description algorithm (SIFT) has become a widely used image characterization algorithm in fingerprint localization systems due to its excellent scale invariance, rotation invariance, resistance to illumination, and so forth. This thesis also chooses this description algorithm to characterize the original image. This description algorithm first performs the polar detection by Gaussian difference operator and constructing Gaussian pyramid on different scale-spaces, where the scale coordinates of each layer of Gaussian pyramid are calculated as follows: where 0 is the initial scale (set to 1.6 in the original article), s is the layer index, and k is the reciprocal of the total number of layers in the group k = 2 (1∕s) . Polar value detection using Gaussian difference operator: where I(x, y) is the original image, and L(x, y, ) is defined as the convolution of a Gaussian function G(x, y, ) of a transformed scale and the original image.
After obtaining the extreme points in the discrete space, the curve is fitted to the Gaussian difference function in the scale space using Taylor expansion to obtain the offset of the extreme pointsX X = (x, y, ) T represents the offset of the point relative to the interpolation center, iterative interpolation convergence is performed for this offset, and edge response point rejection is performed using the Hessian matrix to obtain robust image feature key points; then the local features of the image are used to assign the reference direction to the key points using the image gradient method so that the descriptor has rotational. Finally, the key point feature description is achieved by chunking the image area around the key point and calculating the gradient histogram within the block. The resulting keypoint descriptor is shown in Figure 5 (from the original 16 ). The feature representation of the original image is essentially composed of multiple keypoint descriptions. The content of these keypoint descriptions includes information about the location, scale, and orientation of the point and its neighbors. This information enables the fingerprint location system to characterize the information in the collected fingerprint images.
However, as pointed out in the problem analysis, there are more duplicate regions in the original image collection acquired at this time that affect the localization effect, unfortunately, according to Equation (1), it can be seen that the information elements of the image feature representation algorithm mainly face key points and neighborhoods, and the robustness of its descriptive capability (Equation 3) is also robust against offsets within a certain neighborhood, lacking feature extraction and processing of spatial location differences between different fingerprint points. Therefore, before characterizing the fingerprint images, this thesis proposes the confusion subregion weighted interference suppression algorithm (CSWS) for optimization to remove the confusion information that affects the differentiation of the fingerprint database. Finally, these processed and described sets of image features constitute the image fingerprint database.
The main challenge of the indoor image fingerprint localization system is the confusion subregion interference problem in the image fingerprint library of indoor scenes, and we develop the algorithm design around this key challenge. First, we use salient region detection as the main means to obtain the confusion subregions in the image fingerprint library; then define the confusion degree as the measurement factor for these subregions under the Euclidean space, and design the weight coefficients according to the confusion degree to realize the weighted mask representation on the original image, to reduce the weight proportion of the corresponding features in the original image for the confusion subregions, remove the interference of the confusion subregions to the subsequent retrieval judgment, and improve the accuracy of the algorithm. Improve the accuracy of the localization algorithm. The overall algorithm flow chart is shown in Figure 6.
Confusion subregion detection First, the confusion regions in the image fingerprint library need to be acquired, and the image salient region detection method is chosen to complete the extraction process in this article. Since the salient region detection method has a flexible application strategy, it needs to be designed specifically for different downstream tasks. According to the indoor localization, scenario applied in this article and the characteristics of the image fingerprint database pointed out in the problem analysis, combined with the general definition criteria of salient regions in the paper, 42 the obfuscated subregions extracted by the salient region detection method selected in this article should have the following characteristics: the detection of basically consistent subregions for all images in the database. The remaining image regions other than the salient regions still contain useful information; the salient regions are uniformly highlighted, and the computation time of the detection method needs to be as short as possible due to the combination with subsequent algorithms. In this article, subregions that can be successfully extracted with the above characteristics are considered as confusion subregions that can be effectively used in subsequent algorithms.
In this article, a brief principle analysis of the most common and effective detection methods is presented, including three traditional saliency region detection methods [41][42][43] and a deep learning-based detection method. 46

Global contrast-based detection method (LC)
Calculate the global contrast of a pixel over the whole image, that is, the sum of the distances between that pixel and all other pixels in the image in terms of color as the saliency value of that pixel. 41 The saliency of a pixel I k in an image I is calculated as follows.

Detection method based on image frequency tuning (FT)
The image can be divided into a low-frequency part and a high-frequency part in the frequency domain. 42 The low-frequency part reflects the overall information of the image such as the outline of the object, the basic constituent regions; while the high-frequency part reflects the detailed information of the image, such as the texture of the object. In the actual calculation, a Gaussian smoothing with a window size of 5 * 5 is used to round off the highest frequencies, and a DoG filter is chosen to approximate the Laplace transform of the Gaussian function. Significance mapping of the input image I (W * H pixels).
S(x, y) = |I u − I whc (x, y), | where I u denotes the average pixel value of the image and I whc is the result of the original image after filtering and blurring.

Spectral residual model of visual attention mechanism based on image spectrum
The spectral residual model obtains the distribution of useful information in an image from an information-theoretic perspective. 43 Given an image I, the amplitude spectrum A(f ) and the phase spectrum P(f ) are calculated by first performing the Fourier transform on it, then by logarithmizing the amplitude spectrum and passing it through a n * n mean filter h n (f ), the resulting R(f ) is the residual spectrum, and finally by performing the Fourier inverse transform and passing it through a Gaussian fuzzy filter, the region of significance S is obtained

Lightweight SOD model based on deep learning
Using the neural network principle, a lightweight network is designed, consisting of a feature extractor and a cross-stage fusion module, which can process features at multiple scales simultaneously through the introduced convolution module. 46 It is worth noting that this approach requires image data in the training phase to have a large number of true values of prelabeled saliency regions.

Weighted suppression strategy
Denote the set of images of the offline fingerprint library as {L = l i , i = 1, 2, … , N}, where l i represents the original images and N is the total number of images. The confusion subregions are detected separately for N images, and the extracted subregions are denoted as {S = s i , i = 1, 2, … , N}. The confusion region detection algorithm in the previous section does not contain a priori constraints on specific regions, so the obtained confusion subregions in the set {S} will not be identical across image regions. We would like to automatically achieve enhanced suppression for those parts of these regions with a high degree of overlap to remove strong interfering factors; on the contrary, do a low degree of suppression for those parts with a low degree of overlap to maximize the retention of useful information.
To ensure the consistency of scale size in data processing, the first also uses the SIFT description algorithm to map these confusion subregions into the feature space uniformly, and get the feature vector matrix {Ds = Ds i , i = 1, 2, … , N} with the same dimensional size, respectively, N, and then define the confusion degree of each confusion subregion in the feature space as P i , which denotes the average of the similarity between this confusion subregion Ds i and the other N − 1 confusion subregions in the set of subregions. The specific calculation is as follows.
where j denotes the other subregion in the set of confusion subregions and Ds i denotes the ith different confusion subregion. The confusion degree P i is used to reflect the influence of this confusion subregion on the distinguishability of the whole image fingerprint library. A higher confusion degree means that it overlaps more with other subregions, which means that it will seriously affect the distinguishability between fingerprints; on the contrary, it means that the influence is less. Therefore, if we can suppress the confusion degree according to the position of the corresponding subregions in the original image and reduce the weight share of these confused subregions, we can reduce the bad influence brought by these regions in the image fingerprint library. In this article, we introduce the parameter i to achieve such suppression in the form of weighted fusion, and the suppressed image is noted asl i .
where i is the weighted inhibition coefficient positively correlated with the degree of confusion as P i , and the positive correlation function is calculated to satisfy the convex function property, and the function relationship selected in this article is calculated as: i = 2 arcsin(normalization((P i )) ∈ (0, 1), where the parameter i ranges from 0 to 1. For subregions with high confusion, i is close to 1, which achieves high suppression of such confused subregions in the original map through the formula (11). For subregions with low confusion, i is close to 0, which avoids unnecessary suppression processing for such regions. The suppression of the confusion subregions is achieved by this weighted suppression strategy. The obtained imageŝ l i after suppression constitute a new set of offline database images {L =l i , i = 1, 2, … , N}, thus removing the interference brought by the confusion subregions and better performing the subsequent location recognition tasks. Since these algorithms can be completed in the offline stage, they are highly portable and can be deployed for a variety of image matching-centered location algorithms, and here is the algorithm pseudo-code (Algorithm 1).

Algorithm 1. Confusion subregion weighted suppression algorithm
Offline image fingerprint library 1: for l i ∈ L; i = 1; i ≤ N do S = S i ← obfuscation subregion detection l i 2: end for 3: for s i ∈ S; i = 1; i ≤ N do 4: for s j ∈ S; j = 1; i ≠ j do end for 6: end fori=1 = 2 arcsin(normalization((P i )) 7: repeat l i = l i − * s i i++ 8: until i > N 9: returnL =l i , i = 1, 2, … , N In the online stage, the original query image to be located, first, the feature extraction and description (SIFT description algorithm) is performed using the same characterization algorithm as in the offline stage, and the feature vector representation of the query map is obtained; then this feature vector is matched and retrieved with the image fingerprint database that has been constructed in the offline stage, and the image matching retrieval method with the nearest neighbor method as the core is used to match the feature vector of the query map and the database The feature vectors of the query map and the feature vectors at each fingerprint point are feature matched and ranked according to the similarity, and the top N image fingerprints in the ranking are used as the candidate location set; this thesis chooses to use the RANSAC algorithm and local optimization to fit the single response of the image features, filter the images in the candidate location set using the consistency of the spatial distribution of the local image features, divide the feature points into inner/outer points, and then rerank the images in the candidate set according to the number of inner points, the images in the candidate set are reordered. After such filtering, the location label corresponding to the top-ranked image fingerprint in the candidate set is output as the result of querying the location of the image.

EXPERIMENTS
In this section, three different types of representative indoor scene data and a publicly available dataset 23 are selected for experiments, and the effectiveness of the proposed method in this article is evaluated for indoor location recognition tasks based on the performance evaluation metrics described in this section.

Experimental scene setting
IndoorA-BookStore: The area of experimental area used is 24.2 m 2 , the real scenario and layout is shown in Figure 7. This is a more classic indoor environment with a large number of repetitive bookshelves on both sides of the pedestrian corridor and high light intensity on the window side, which is suitable for verifying the robustness of our algorithm to deal with the complex situation of the scene light. The image capture device is a Huawei mate20pro rear Leica triple camera, and a total of 15 reference points are collected in the scene, with a 0.6 m interval between adjacent sampling points. IndoorB-LivingRoom: Considering that indoor smart device also include intelligent home services, we chose a living room scene with an experimental area of 24.6 m 2 , the real scenario and layout is shown in Figure 8. The experimental scenes include home objects with more complex textures such as potted plants and murals, which can effectively verify the robustness of the algorithm to deal with complex scene contents. The image capture device is using Huawei mate20pro rear Leica triple camera, and a total of 35 reference points are collected at 0.4 m intervals between adjacent sampling points.
IndoorC-Mall: The second-floor area of a shopping mall is selected, and the area of the experimental area used is 68.4 m 2 , and the layout is shown in Figure 9. This is a typical complex indoor scene, the left side of the scene is the straight staircase passage area, and the remaining area is the merchandise display shopping area. The scene contains rich contents and variable lighting factors, as well as the flow of people, which tests the comprehensive robustness of the algorithm. The image shooting device is Huawei mate20pro rear Leica triple camera, and a total of 25 reference points are collected, with 0.8 m intervals between adjacent sampling points.
ICUDE-Lab: We also select a public dataset of indoor scenes. 47 The real scenario and layout is shown in Figure 10. The ICUBE dataset contains a total of 2896 images from three scenes (Mall, Lab, Office) involving 214 locations in an academic building, with a training set of 1712 images and a test set of 1184 images. In this article, we choose one of the Lab scenes with an area of 2480 m 2 and a total of 120 test points, where each fingerprint point contains four images in orthogonal directions, and we choose a single image of each fingerprint point in the same direction as the original information representation of the fingerprint.

Image retrieval performance
To evaluate whether the algorithm proposed in this article can actually work effectively in an image fingerprint localization system, it is first necessary to verify the actual performance of the algorithm in the image matching retrieval process. We choose the mean average precision (mAP) 48 to evaluate the image retrieval recognition performance of the algorithm, which is the average of the image accuracy under all recall rates and can provide a single numerical measure of the overall recognition performance of the algorithm. The most common image retrieval pipeline is the image retrieval algorithm based on the nearest neighbor method, 16 denoted as Baseline. Since the CSWS algorithm proposed in this article is oriented to the processing strategy of the image fingerprint database in the offline stage, we use the image retrieval pipeline 16 in the database construction phase by adding the CSWS algorithm for processing (Baseline-CSWS). On the datasets IndoorA, IndoorB, and IndoorC, the Baseline-CSWS algorithm was compared with the base nearest neighbor method followed by Baseline, 16 the BOW algorithm, 26 and the VLAD algorithm 14 for retrieval recognition. The BOW algorithm is to transform the original image into a vector representation by image feature description first, select J central vector points in the image feature set as visual words, find the nearest neighboring visual words by calculating similarity for other image feature vectors, and assign discrete indexes to the feature vectors according to similarity and nearest neighboring visual words; the whole image database becomes a visual dictionary with visual words as the clustering center and the rest of feature vectors The whole image database becomes a visual dictionary with visual words as the center of clustering and the rest of the feature vectors are classified according to the discrete index. Unknown images are retrieved with the visual words and corresponding indexes of this visual dictionary for content retrieval, and similar image results are returned. the VLAD algorithm is an improvement on the BOW method, where discrete indexes are formed by calculating the cumulative residuals with the centroids after forming visual words. Both algorithms have good applications in the field of image retrieval. In the experimental process, the specifications of the visual dictionary codebooks of both the BOW method and the VLAD method are set to k = 64, the SIFT algorithm is chosen for the feature description method, and other parameters are kept consistent with the original algorithm design. The average accuracy is shown in Table 1.
According to the results in the table, it can be seen that our algorithm uses different detection methods FT, LC, and spectral with different final results in the confusion region extraction. The visualization of these three detection methods is shown in Figure 11. It can be seen that the image regions obtained by the spectral residual method itself are not clear and cannot effectively extract the confusion regions, which leads to the lack of obvious effect after the introduction of CSWS; while the FT and LC methods can extract obvious image regions, which can well support the subsequent calculation of confusion degree and suppression coefficient. Overall, on these three small and medium-sized datasets, our proposed CSWS algorithm combined with Baseline can significantly improve the retrieval performance compared with the original method; and the average accuracy is better than that of BOW and VLAD algorithms. Specifically, the highest mAP in IndoorB-LivingRoom, an indoor home scenario, is 92%, which we analyze should be the most distributed database confusion subregion in this scenario and can have significant performance improvement after suppression by CSWS algorithm. It is also noted that in several other algorithms, the VLAD algorithm outperforms the BOW algorithm on three different datasets, which should be due to the lack of representation capability of the codebook itself when the BOW algorithm constructs the codebook directly on small and medium-sized datasets.
The CSWS algorithm proposed in this article is an image fingerprint library-oriented processing strategy with good portability and can be flexibly combined with other retrieval matching algorithms for application. Therefore, we also selected the first 30 location points (sampled points spaced 1 m apart) for image retrieval on the public dataset ICUDE-LAB, followed by the protocol in the classical VLAD algorithm, 14 and combined the algorithm of this article with the VLAD method in the original fingerprint library stage for combined, the results of the search performance comparison are as follows.
As can be seen from Table 2, the mAP of this article's method combined with the VLAD algorithm is improved by 13%, which indicates that our algorithm achieves effective suppression of the confusion region in the fingerprint database and has good generalizability.

Positioning accuracy assessment
To verify the algorithmic effectiveness of the CSWS algorithm in the localization system, this article chooses to combine it with the Baseline localization method with the nearest neighbor method as the core (Baseline-CSWS). In the offline stage, the original image fingerprint library L is processed using the CSWS algorithm proposed in this article to obtain the weighted suppressed image fingerprint datasetL, and the image representation of the dataset is obtained using SIFT descriptors to obtain Des ob ; the query image Q to be located in the online stage is also computed and image represented in the same way to obtain corresponding feature vectors. The query image vectors and the vectors in the database are ranked according to the similarity between them, and the top N-ranked images are used as the candidate image set. Then the candidate image set is filtered using the consistency of spatial distribution of local image features, assuming that the 3D structure visible in the query image and each candidate image can be approximated by a small number of planes (1)(2)(3)(4)(5), the RANSAC algorithm and local optimization are chosen in this article to fit the single-response, 49 and then the candidate images are reranked according to the number of interior points. The location labels corresponding to the top1 sorted images after filtering are used as the query map location for the resulting output Q location , thus completing the localization task. Also on the three datasets, this article compares the VLAD algorithm and Baseline algorithm, which performed second best in the image retrieval performance results in the previous subsection; in addition, the EfficentNet network structure 50 for feature extraction, softmax classification head to achieve fingerprint localization is selected for comparison.
EfficientNet network is a model scaling strategy designed by the Google team in 2019, by introducing compound coefficients, which have faster motion and higher experimental accuracy compared with other network structures, and has an excellent performance in several fields such as image classification and recognition. The algorithm is selected for feature learning of images, using ImageNet [77] to complete the pretraining of the model, in the training phase of the scene data in this article, the first 16 MBConv layers a Conv layer, retaining the effective network parameters of pretraining; let go of the last Conv layer and the global average pooling layer, used to learn the actual scene data The size of the generated feature map is 512 dimensions, and the output of the network is the learned image feature map, which is connected with the FC classification layer after normalization, and the number of categories of this layer is the number of reference points corresponding to the experimental scenes.
The cumulative positioning error results for the three scenarios are shown in Figures 11-13, where the X-axis depicts the localization error in meters and the Y -axis represents the cumulative error distribution function (CDF). In the localization results for all three scenes, Baseline-CSWS achieves the best localization results due to the effective treatment of the confusion subregion. In the resulting plot of the living room scene (Figure 12), the Baseline-CSWS method is clearly outstanding in the interval of 0-2 m, because the most serious confusion area interference phenomenon exists in the IndoorB-LivingRoom dataset, and the CSWS algorithm in this article effectively suppresses this interference problem.
In the bookstore scene results (Figure 13), the overall accuracy of various algorithms slightly decreases, and the analysis suggests that the more homogeneous scene content of the IndoorA-BookStore dataset (similar layout of bookshelves) will make the image fingerprint differentiation not obvious enough, but the method of introducing CSWS still has good performance in this case. The mall environment has the worst overall localization effect because there is a large amount of dynamic movement of people, which will bring greater interference to the fingerprint localization system. In Figure 14, Baseline-CSWS is slightly higher than the EfficentNet method in the 0-2 m interval because the neural network structure can capture the useful information in the image more effectively compared with the traditional method, but the CSWS method in this article still effectively improves the localization accuracy of the baseline method.
In addition, we calculated the average positioning error and positioning standard deviation of the positioning system in different scenes to show the accuracy and stability of the positioning. The results are shown in Table 3 and Figure 15.
It can be seen that the localization accuracy of introducing the CSWS algorithm improves 39%, 43%, and 28% over the Baseline system in the three scenes of bookstore, living room, and shopping mall, respectively, and 20%, 23%, and 10% over the algorithm with the next best effect. The results show that the introduction of the CSWS algorithm can indeed significantly improve the performance of the localization algorithm.  We also conducted experimental comparisons on the publicly available dataset ICUDE-Lab, choosing EfficentNet for feature extraction and achieving localization through the classification task; the CSWS algorithm proposed in this article was applied to the data loading phase of the EfficentNet network as an optimization comparison (EfficentNet-CSWS). To be fair, our experimental setup and test local size are kept as much as possible consistent with the ICUDE-Lab in the paper 47 and retain the MVGs method mentioned in Reference 47 and a handcrafted-vision-feature-based (Hvision) method 48 for the experimental data, and the comparison results are shown in Figure 16.

Method
It can be seen that the CSWS algorithm proposed in this article, after being combined with the EfficentNet method, significantly and consistently improves the localization performance of the algorithm and outperforms a handcraftedvision-feature-based (Hvision) method. The localization accuracy of the algorithm in this article is equal to that of the MVGs method in paper, 47 which is mainly because only unidirectional images of each fingerprint point are used in conducting the experiments in this article, and the information richness of each fingerprint point is smaller in terms of the data itself, rather than the MVGs method in which each fingerprint point is represented by a sequence of quadratic images. The suppression of confusion regions by the CSWS algorithm improves the overall localization accuracy of the algorithm to approximate the accuracy of MVGs with multi-directional rich information, which illustrates the effectiveness of the suppression algorithm proposed in this article. Also, its performance improvement on EfficentNet implementation of the fingerprint localization algorithm illustrates the high portability of the CSWS algorithm.

CONCLUSIONS
In this article, we propose a CSWS algorithm for image fingerprint localization, and effectively demonstrate that this algorithm can solve the duplicate region interference problem in image fingerprint database. The confusion subregions in the fingerprint database are extracted by appropriate detection methods, and the degree of influence of these confusion subregions on the whole fingerprint database is measured under the feature space combined with the Euclidean metric, and the weight coefficients positively correlated with the degree of influence are designed to achieve the suppression treatment of the confusion subregions. Experiments show that our proposed algorithm can well overcome the confusion region interference, improve the retrieval performance and localization accuracy, and have good portability at the same time. The experimental validation of the algorithm's retrieval recognition performance and localization effect is conducted on three real scenarios with different environments (IndoorA-Bookstore, IndoorB-LivingRoom, IndoorC-Mall) and a public dataset (ICUDE-Lab), respectively. The results show that the proposed algorithm can significantly improve the retrieval performance of the algorithm, improve the localization accuracy, and has good portability. By introducing the CSWS algorithm, the localization accuracy of the Baseline method is improved by 39%, 43%, and 28% in three real scenarios, respectively; in the public dataset ICUDE-Lab, the introduction of the proposed algorithm combined with the VLAD algorithm can improve the average retrieval accuracy by 13%, and the EfficientNet network structure in the data The introduction of CSWS algorithm in the loading stage can effectively improve the localization accuracy of the algorithm and can achieve the same localization effect as that of MVGs algorithm in Reference 47 with less data volume (one-way image fingerprint).
In future research, we plan to design more effective confusion subregion extraction methods and further optimize the design of the suppression algorithm from the perspective of feature space based on the ideas in this article.

ACKNOWLEDGMENTS
This work was supported by the National Natural Science Foundation of China (No. 61871054).

DATA AVAILABILITY STATEMENT
Research data are not shared.