A Real‐Time and Fast LiDAR–IMU–GNSS SLAM System with Point Cloud Semantic Graph Descriptor Loop‐Closure Detection

Herein, a real‐time, fast, tightly coupled simultaneous localization and mapping (SLAM) system is proposed. The deep neural network is used to segment the point cloud semantically to construct the point cloud semantic map descriptor, and the global navigation satellite system real‐time kinematic position is used to detect the loop closure. Finally, a factor‐graph model is used for global optimization. The working principle of the SLAM system is introduced in detail, including the construction of the factor graph, the generation of the point cloud semantic graph descriptor, the generation of the ring graph, the loop‐closure detection process, and the global optimization. Indoor and outdoor experiments are conducted to validate the performance of overall trajectory estimation and loop‐closure detection. Compared to traditional methods, the proposed method offers improved accuracy and real‐time performance in trajectory estimation. It effectively addresses issues such as limited feature constraints, large computational consumption, and subpar real‐time performance, allowing for quick and accurate loop‐closure detection. Moreover, the method demonstrates a time consumption reduction of two‐thirds compared to traditional approaches while exhibiting superior overall loop‐closure detection performance.


Introduction
Autonomous vehicles rely on highly efficient and robust positioning algorithms to ensure precise navigation with minimal failure rates.In recent years, the continuous advancements in autonomous driving technology have significantly improved the accuracy of light detection and ranging (LiDAR) systems with the emergence of consumer-level 3D LiDAR devices.The development of autonomous vehicles has made significant breakthroughs.3][4][5] In LiDAR SLAM, the SLAM method based on the multisensor fusion of LiDAR is currently a hot research topic in the SLAM field.The comprehensive performance of the algorithm is improved by fusing various sensor data with LiDAR data.][9][10] The integration of 3D LiDAR and IMU data can be classified into two main categories based on the fusion method: loosecoupling and tight-coupling SLAM fusion framework.The loose-coupling SLAM fusion framework involves utilizing the 3D LiDAR as two separate modules for motion estimation and then combining the pose estimation results.Typically, the IMU measurement data is employed to assist the 3D LiDAR's odometry or to fuse LiDAR and IMU data through filtering.[15] In contrast, the tight-coupling scheme incorporates new-state variables to the system-state vector according to the characteristics of each sensor.It constructs error terms using the measurement information of each sensor, aiming to minimize the total error and optimize the state vector.Prominent examples of this approach include LiDAR-inertial 3D plane SLAM (LIPS), [16] LiDAR inertial odometry (LIO)-Mapping, [17] and other related works. [18]han et al. proposed a tightly coupled LiDAR inertial odometry framework, LIO-SAM. [19]The information of multisource sensors was fused by factor-graph optimization.Using the idea of a keyframe, only the pose of the keyframe was optimized, which reduced the calculation amount of factor-graph optimization and ensured the real-time performance of the algorithm.Due to the addition of the global positioning system (GPS) factor, the positioning accuracy and mapping quality of the algorithm are improved.
Xu et al. [20] proposed a SLAM method based on a tightly coupled iterative extended Kalman filter to fuse LiDAR and IMU data, FAST-LIO.This method proposes a new Kalman gain formula, which significantly reduces the amount of calculation and effectively improves the comprehensive performance of the algorithm.
Xu et al. [21] further introduced a method called FAST-LIO2, aiming to enhance the performance of FAST-LIO.The FAST-LIO2 is also based on the tightly coupled iterative extended Kalman filter.In this method, the map is maintained by using the incremental k-d tree data structure.This structure enables incremental updates and dynamic balance of the map, enabling the algorithm to directly register the point cloud data with the map.By employing this approach, the algorithm achieves improved efficiency and accuracy in map registration.
In general, it is widely acknowledged that tight coupling yields better results in sensor fusion as it enables effective utilization of the complementary properties between sensors.This approach allows each sensor to mutually correct and optimize the state variables, leading to improved performance.Recent studies have demonstrated the superiority of tight coupling in achieving enhanced outcomes in sensor fusion.
Of course, some scholars have proposed to use novel sensors for LiDAR-SLAM.Guan et al. [22] proposed a loosely coupled multisensor fusion SLAM method based on visible light positioning (VLP) and LiDAR.This method uses LiDAR, odometry, and a rolling shutter camera to realize a multisensor positioning system, which can provide accurate and robust robot positioning and navigation.Its positioning accuracy can reach a centimeter level.
Based on LiDAR and VLP or visible light communication, Guan et al. [23,24] implemented a lightweight indoor mobile robot positioning system on the robot operating system (ROS).These methods use distributed framework design and ROS's loosecoupling characteristics to ensure the algorithm's lightweight and accuracy.
Huang et al. [25] proposed a multisensor fusion framework, VLP landmark, wheel odometer and red-green-blue-depth (RGB-D) camera (VWR)-SLAM, which uses a tightly coupled method to fuse VLP, wheel odometer, and RGB-D map camera to achieve SLAM.It can achieve accurate and robust robot positioning and navigation in LED crowded or interrupted scenarios.Its positioning accuracy can reach the centimeter level while building a 3D environment map.
Loop-closure detection is a key part of SLAM that involves identifying where the robot arrived.With the system's continuous operation, the front-end odometer will inevitably produce cumulative drift error.Hence, reliable loop-closure detection is a crucial technology to correct the cumulative drift error in the SLAM system.
A good loop-closure detection algorithm requires efficient and accurate search loop closures.Loop-closure detection based on LiDAR is currently divided into two types based on machine learning and point cloud descriptors.
Among the methods based on machine learning, the most classic is Segmatch, [26] which divides the point cloud, and then finds the point cloud through the nearest neighbor search, and finally matches the point cloud through a random forest classifier. [27]th PointNetVLAD [28] and LPD-Net [29] are based on PointNet [30] for point cloud feature extraction and loop-closure matching.In contrast, OverlapNet, [31] OverlapTransformer, [32] and LCDNet [33] are all loop-closure detection methods using deep neural networks, which search for loop closures by different characteristics of LiDAR data.It is worth noting that these approaches often necessitate a large number of training data for their network models, making the training process challenging.
The approach based on point cloud descriptors involves the design of feature descriptors that effectively capture the internal structural information of the point cloud.Loop-closure detection is then conducted based on the Euclidean distance between these descriptors.At present, the mainstream point cloud descriptors are multiview 2D projection (M2DP), [34] signature of histogram of orientation (SHOT), [35] Delight, [36] viewpoint feature histogram (VFH), [37] clustered viewpoint feature histogram (CVFH), [38] ensemble of shape function (ESF), [39] Iris, [40] Scan Context, [41] and Intensity Scan Context. [42]hese methods use local key points or other global features to extract point cloud descriptors from point clouds point by point and then compare the point cloud descriptors of the query scan data frame with the point cloud descriptors of the historical scan data frame to identify the previously visited places.These methods emphasize local details and ignore the high-level constraints of features because humans do not define the place they have been to through geometric points or other point-level feature quantities but understand the whole scene from a macro perspective by identifying the object's relative position in 3D space.Inspired by the way humans identify locations, we use semantic objects in the scene to perform loop-closure detection and propose a loop-closure detection method based on point cloud semantic map descriptors.
The main contributions of this paper are summarized as follows: 1) a point cloud semantic graph descriptor is proposed to quickly encode LiDAR data frames into point cloud descriptors.
2) A loop-closure detection method based on a semantic graph descriptor is proposed, facilitating efficient loop-closure detection in SLAM.3) A tightly coupled LiDAR inertial SLAM system is constructed, where the loop-closure constraint factor obtained from the loop-closure detection method based on the semantic graph descriptor is incorporated into the global factor graph. 4) The proposed SLAM is comprehensively evaluated through a range of experiments, including both indoor and outdoor experiments.

Frames and Notions
The world frame in this paper includes the earth-centered earthfixed (ECEF) frame and the world frame (W ).The sensor frame includes LiDAR, IMU, and global navigation satellite system (GNSS) receiver frames.
In Figure 1, the origin of the ECEF frame {E} is at the center of the earth.The X-axis of the ECEF frame points to the intersection of the original meridian and the equator, and the Z-axis points to the Arctic.The world frame {W} is located on the ground, and the Z-axis points to the normal direction of the local tangent plane inward.For the GNSS receiver frame {g}, this paper only focuses on its origin coordinates in the {W} frame, and the axial direction of {g} is arbitrary.

R B
A represents the rotation of a vector in the frame {A} to the frame {B}, p B A is the coordinate of the origin of the frame {A} in the frame {B}, v B A is the speed of the origin of the frame {A} in the frame {B}, ⊗ represents the multiplication between two quaternions.

Loop Closure Detection Algorithm Flow
Figure 2 is the algorithm flow chart of loop-closure detection in this paper.First, it is judged whether the current data frame is a keyframe.If it is judged as a keyframe, it is constructed as a point cloud semantic graph descriptor, and the absolute position information of the keyframe is obtained by GNSS-RTK.The descriptor and position information corresponding to the keyframe is stored in the point cloud semantic graph descriptor container in the form of key-value pairs.In the subsequent loop-closure detection, all descriptors in the container are used to construct the k-d tree, and the standard k-nearest neighbor search algorithm is used to quickly search the loop-closure candidate frame.
When performing loop-closure detection for the current frame, the point cloud semantic map descriptor is first constructed by the RangeNetþþ [43] algorithm.At the same time, according to the GNSS-RTK absolute position information corresponding to the current frame, the historical keyframes closest to the current frame descriptor position are searched in the descriptor container.If the distance is less than the set distance threshold, the two data frames may constitute a pair of loopclosure frames.We record these historical keyframes that meet the distance conditions as loop-closure candidate frames.
Then, the number of nodes (the number of target objects) is matched for the loop-closure candidate frames that satisfy the distance condition to further narrow the range of loop-closure candidate frames.Then, the loop-closure candidate frames are verified by double loop-closure verification: descriptor similarity verification and geometric verification.
If the aforementioned loop-closure matching fails and does not match the loop-closure frame, the descriptor generated by the current frame and its corresponding GNSS-RTK position is stored in the point cloud semantic graph descriptor container.
If it is determined that the two data frames are loop closures, the loop-closure constraint factors generated by them are added to the global factor graph for optimization, the vehicle's pose is updated, and the keyframe point cloud is inserted into the global point cloud map and updated.

Point Cloud Semantic Segmentation
When constructing the point cloud descriptor, to avoid the randomness of the target object, the point cloud descriptor is not representative.Therefore, the common stable object is selected as the target object of semantic segmentation in the outdoor scene.It is assumed that the selected target object is as follows: a vehicle, trunk, and pole.
We use RangeNetþþ to semantically segment point cloud data and extract objects in the environment to construct point cloud semantic graph descriptors.RangeNetþþ uses deep learning to do semantic segmentation of 3D point cloud data.The algorithm can obtain accurate point cloud segmentation results and can be run separately on a single embedded graphics processing unit (GPU).

Construct Point Cloud Semantic Graph Descriptor
The relative distance between two objects is invariant, and they are rotation and translation invariant.Therefore, the relative distance topology information is added to the descriptor to ensure that the descriptor has translation and rotation invariance.
For each point cloud data candidate frame, after using RangeNetþþ to semantically segment the point cloud to the target object, an undirected graph G ¼< V, E > is used to construct the point cloud descriptor.Node V is the position of the target object in the undirected graph G (using the centroid of the target object to represent its position in space), E is the edge between nodes V, and edge e ij ¼ < V i , V j > is the Euclidean distance between nodes V i and V j .
In the undirected graph G, the target objects are combined in pairs, and three vertices and six edges can be obtained.There are three vertices: vehicles, trunks, and poles.There are six sides: vehicle-vehicle, trunk-trunk, pole-pole, vehicle-trunk, trunkpole, and pole-vehicle.
Figure 3a is the point cloud semantic graph descriptor corresponding to a point cloud data frame.The blue, orange, and green nodes in the figure represent the vehicle, the trunk, and the pole, respectively.The edge between the nodes is the Euclidean distance between the two nodes.

Distance Matching
When performing loop-closure detection, first, according to the GNSS-RTK position information corresponding to the current frame, the historical keyframes closest to the current frame point cloud descriptor position are searched in the descriptor container.If the position distance is less than the set distance threshold (the distance threshold in this paper is set to 0.25 m), the two data frames may constitute a pair of loop-closure frames.We record these historical keyframes that meet the distance conditions as loop-closure candidate frames.

Node Matching
After obtaining the loop-closure candidate frame of the current frame by distance matching, it is also necessary to preliminarily match the number of three types of target object nodes in the point cloud semantic graph descriptor corresponding to the loop-closure candidate frame.Assuming that the current number of object nodes is V, the number of object nodes of the loop-closure candidate frame to be matched is V total .For the loopclosure candidate frame whose sum of the three types of target objects is within the predetermined threshold range, we list it as a candidate object.

Construction of Ring Graph
The ring graph is used to calculate the degree of similarity matching between two point cloud descriptors.Figure 3b is the ring graph corresponding to the point cloud descriptor in Figure 3a.The number of rings in the ring graph is the same as the number of types of E in the undirected graph G.The ring graph in this paper is six rings, which correspond to the six edge types in E, respectively.We only introduce one of the six edge types in E, E trunkÀpole , and the process of constructing a ring in the corresponding ring graph.The other five edge types have the same ring-generation method.
Suppose there are m E trunkÀpole edges in the point cloud semantic map descriptor, and these m E trunkÀpole edges form an edge set S trunkÀpole .The Euclidean distance range of the edges in S trunkÀpole is In the aforementioned formula, s min is the minimum value of S trunkÀpole , and s max is the maximum Euclidean distance in S trunkÀpole .
Each edge of S trunkÀpole is sorted in ascending order according to Euclidean distance, and the ordered edge set S trunkÀpole 0 is The ring corresponding to S trunkÀpole 0 is divided into ms max sectors, and the angle corresponding to each sector is 360 ∘ ms max .Then, the number of sectors N i corresponding to each edge in the ordered edge set S trunkÀpole 0 is The number of sectors corresponding to each edge in S trunkÀpole 0 is arranged in the ring in turn, and one sector is used to separate the sectors corresponding to each edge.Each edge is arranged counterclockwise with the horizontal right line as the starting line.
In this way, the edge of S trunkÀpole 0 type in the point cloud descriptor is represented by a ring in the ring graph.We denote the ring corresponding to S trunkÀpole 0 as C trunkÀpole , then the fan set of the ring C trunkÀpole in the ring graph is Then the remaining five types of edges in the point cloud descriptor are represented by a ring in the ring graph to form a complete ring graph C graph C graph ¼ C trunkÀtrunk , : : : The ring graph corresponding to such a point cloud semantic graph descriptor is C graph .We can use the ring graph of the loopclosure candidate frame and the current frame for similarity matching, which is used to quickly filter the loop candidate frame to obtain the loop-closure frame.

Data Frame Similarity Matching: the First Verification of Loop Closure
For all loop-closure candidate frames that satisfy distance matching and node matching, we convert them into point cloud semantic graph descriptors and generate a ring graph at the same time.
We use the loop-closure candidate frame and the current frame to calculate the similarity score of the two data frames to judge whether the loop-closure candidate frame and the current frame constitute the loop-closure frame.
In the ring graph matching, the ring graph of the loop-closure candidate frame and the current frame is first aligned according to the center of the circle and the horizontal right line, and then by matching the similarity of each ring in the ring graph with different Euclidean distances, the ring similarity matching calculation formula is In the aforementioned formula, C According to the threshold α to determine whether there may be a loop closure between the current frame and the loop-closure candidate frame If the similarity score is less than the set threshold α, it means there is no match to the loop-closure frame.The descriptor generated by the current frame and its corresponding GNSS-RTK position is stored in the point cloud semantic map descriptor container.On the contrary, it is considered that there may be a loop closure between the current frame and the loop-closure candidate frame.After completing the first verification of the loop closure, the second verification of the loop closure, geometry verification, is required.

Geometric Verification: the Second Verification of Loop Closure
Geometric verification is to determine that there may be a loop closure between the current frame and a loop-closure candidate frame.After satisfying the first verification of the loop closure, further geometric verification is needed.The focus of this step is to find geometric consistency correspondence for the current frame.
We use Equation (10) to evaluate the loss of loop-closure detection of the loop-closure candidate frame and the current frame In the aforementioned formula, p l is the current frame point cloud set; p h is the loop candidate frame point cloud set; n is the number of current data frame point clouds; T is the transformation matrix; and p il and p ih represent the ith 3D point of the current frame and the loop-closure candidate frame, respectively.
According to the set threshold β to determine whether there is a loop closure If the loss value of a loop candidate frame is greater than the threshold β, it means that the loop frame is not matched, and the descriptor generated by the current frame and its corresponding GNSS-RTK position is stored in the point cloud semantic graph descriptor container.On the contrary, it is judged that there is a loop closure in these two places.When the loop is confirmed to exist, we take the transformation matrix T as the initial pose of six degrees of freedom.

Construction of Loop Closure Constraint Factor
When the loop-closure frame is determined, the relative transformation is calculated using the following formula According to the construction method of the LiDAR relative pose factor in the following text, the loop-closure constraint ϕ loop x ð Þ factor is constructed and added to the global factor graph.

Determination of Parameters in Loop-Closure Detection
In the aforementioned loop-closure matching process, the α value in Equation ( 8) is obtained by experiment.The method of determining the value of α is introduced later.
First, the range of α is preliminarily determined.Through experimental tests, when α is less than 0.8, the algorithm will get too many loop-closure candidate frames, which will increase the amount of calculation of the loop-closure detection module and affect the real-time performance of the SLAM system.When α is greater than 0.9, there will be no loop-closure candidate frame, which will lead to loop-closure matching failure.Therefore, the range of α initially set in this paper is 0.8-0.9.
When the α value is actually determined, the initially determined range is evenly divided at an interval of 0.1.For each value, ten current frames are taken in each test route, and the similarity score is calculated with the historical keyframes in the point cloud descriptor container.The test of each route is repeated ten times, and the α value with a high average accuracy of loop-closure matching is finally selected.Figure 4 shows the test results indoors and outdoors to determine the α value.

System Framework
This paper constructs a SLAM system based on point cloud semantic graph descriptor loop-closure constraint.The overall SLAM system structure is shown in Figure 5.The system is mainly divided into four parts: input, front end, back end, and output.
Figure 5 is the framework of the SLAM system in this paper.The data of the input system are from 3D LiDAR, IMU, and GNSS-RTK sensors.The front end adopts a laser inertial odometer model, which integrates the observation data of LiDAR and IMU in a tightly coupled way.The observation data of IMU is used to correct the motion distortion of the point cloud.After the point cloud is removed, the amount of data is reduced by regional filtering and downsampling, and then the edge and surface features are extracted according to the curvature of the features in the environment for feature matching.The process of feature extraction and feature matching refers to LOAM, [12] or LEGO-LOAM [18] algorithm and the local subgraph is constructed.In the loop-closure detection part, the point cloud semantic segmentation algorithm RangeNetþþ is used to segment the objects that appear more frequently in the scene, and the loop-closure detection module based on the point cloud semantic graph descriptor is constructed.At the same time, the loop-closure constraint factor is established and added to the global factor graph.The construction of the global factor graph is similar to LIO-SAM [19] .Finally, the LiDAR relative pose factor, IMU pre-integration factor, loop-closure constraint factor, and marginalization factor are jointly optimized in the sliding window to perform vehicle pose estimation and IMU bias estimation, and the keyframes of the subgraph are used to construct the global map.
Figure 6 is the LiDAR inertial odometer factor-graph model of the front end of the SLAM system in this paper.With the The range of α is initially determined to be 0.8-0.9, and the initially determined α range is divided equally at an interval of 0.1.For each α value, ten current frames are taken from each test route, and the similarity score is calculated with the historical keyframes in the point cloud descriptor container.The test of each route is repeated ten times, and finally, the α value with high average accuracy of loop-closure matching is selected.continuous operation of the SLAM system, the scale of state variables in the front-end odometer continues to increase.If all state variables are optimized, the amount of calculation will be greatly increased, thereby reducing the real-time performance of the SLAM system.Therefore, to reduce the amount of calculation during optimization and ensure the real-time performance of the system, sliding window optimization is used to limit the amount of calculation.Only all state variables in the sliding window participate in the optimization.
In the global-factor-graph-optimization model of Figure 7, because, after the introduction of the sliding window, the state of the data frame outside the sliding window will no longer optimize its parameters.However, the previous data still has a constrained relationship with the data between the sliding window.If these data are directly discarded, the constraint information will be lost.
The state variables to be estimated in the whole SLAM system are the position, velocity, attitude, acceleration bias, angular  velocity bias of IMU in the optimization window relative to the global frame, and the external parameters between LiDAR and IMU The state and external parameters of the optimization window at the time i are expressed as In the aforementioned formula, x W B i represents all the states of the ith frame IMU, p W T B i is the position of the IMU in the world frame, v W T B i is the velocity of the IMU in the world frame, q s T B i is the attitude of the IMU in the world frame, b T a i is the accelerometer bias of the ith frame, and b T g i is the gyroscope bias of the ith frame.T L B is the transformation of IMU frame to LiDAR frame, p L T B is the displacement of IMU to LiDAR, and q L T B is the attitude change of IMU to LiDAR.
According to the factor-graph-optimization model objective function construction method, the cost function of the tightcoupling model of the SLAM system proposed in this paper can be obtained In the aforementioned formula, ϕ prior x ð Þ, ϕ Lidar x ð Þ, and ϕ IMU x ð Þ are the residual two norms of each factor, which are In the aforementioned formula, ϕ prior x ð Þ is a priori factor cost term, which is a constraint between the marginalized states before the sliding window, and ϕ Lidar x ð Þ is the cost term of the LiDAR pose factor, which is the constraint between the poses of adjacent LiDAR keyframes in the sliding window.ϕ IMU x ð Þ is the IMU pre-integration factor, which is the constraint between the IMU motion state and the bias between adjacent radar frames in the sliding window.ϕ loop x ð Þ is a loop-closure constraint factor.

LiDAR Relative Pose Factor Construction
The relative pose factor of the LiDAR can be obtained by the constraint between the pose of the new keyframe and the pose of the keyframe in the subgraph.The relative pose factor involved in the optimization is to optimize the pose of adjacent radar keyframes in the sliding window.The optimization target items between adjacent poses are In the aforementioned formula, x is the optimization variable, x i is the motion state of the keyframe i, x j is the motion state corresponding to the keyframe j, and P ∈ ℝ 6Â6 is the relative pose transformation covariance matrix.
In the process of sensor fusion, each state variable needs to be converted to the IMU frame, and the expression of the relative pose observation is converted to the expression in the IMU frame as follows In the aforementioned formula, T L B is the external parameter transformation of the vehicle body and LiDAR frame, and T W B k ∈ SEð3Þ;k ¼ i, j is the pose to be estimated in the LiDAR inertial odometer frame corresponding to the keyframe k.
The residual between the observation and the expectation of the pose transformation between the LiDAR keyframe j and i can be written as

IMU Pre-Integration Factor Construction
The biggest contribution of pre-integration is that it can quickly process a large number of IMU observation data, and pre-integrating IMU observations between keyframes of LiDAR can obtain an independent relative motion constraint to participate in the state estimation process.
The IMU sensor data acquisition frequency used in the SLAM system in this paper is 200 Hz, the acquisition frequency of the LiDAR is 10 Hz, and the frame rate of the IMU data is much higher than the LiDAR frame rate.To combine the two data, it is necessary to synchronize the data collected by the IMU and the LiDAR.This paper will pre-integrate the IMU data corresponding the two LiDAR keyframes.
The IMU pre-integration residual represents the constraint between the motion state of the LiDAR keyframe k and k þ 1 and the IMU bias estimation.The residual between the adjacent keyframes k and k þ 1 in the sliding window can be written [44] as

Marginalized A Priori Factor Construction
Tight-coupling needs to deal with IMU errors and LiDAR errors.We use the optimization method based on the sliding window to optimize the parameters of data frames outside the sliding window.However, it still has constraints on the data inside the sliding window.If the data frames outside the sliding window are directly lost, the constraint information will be lost.Therefore, we need to use it as a priori information and add it to the global nonlinear optimization problem as a part of the error, which is the concept of marginalization.The introduction of the marginalization factor is conducive to the long-term real-time operation of the system.The core of sliding window optimization is that when a new factor is added to the factor graph, the old factor is removed from the optimization window by using the marginalization technique. [45]The cost function of the marginalized prior factor is

Construction of Loop-Closure Constraint Factor
The construction of the loop-closure constraint factor refers to Section 3.

Experiments and Discussion
We conduct experiments in indoor and outdoor scenarios to evaluate the performance of the SLAM system (Video S1, Supporting Information).In the indoor scene, as shown in Figure 8c, the SLAM system of this paper is mounted on the Tiago robot, and the Tiago robot is used as the real trajectory.In the outdoor scene, as shown in Figure 8a,b, the BeiDou (BD)/GPS/global navigation satellite system (GLONASS) three-mode differential positioning system is used as the real trajectory, which is carried on the vehicle, and the positioning accuracy can reach 1-2 cm level.All experiments were carried out on the ADVANTECH MIC-7700 industrial computer.The system is Ubuntu 16.04, the LiDAR is Velodyne VLP-16, and the IMU is MTI-30 of Xsens.This paper uses absolute trajectory error (ATE) to evaluate the positioning accuracy of the SLAM system.ATE represents the error of the corresponding point between the pose estimation value and the real pose value, which is usually represented by the root-mean-square error (RMSE) of each trajectory.
Assuming that the number of trajectory data frames collected by the LiDAR is n, the RMSE calculation formula of the absolute trajectory error of the n-frame data is where T represents the pose transformation between two trajectories, P i is the true value of the trajectory, and pi is the estimated trajectory of the algorithm.
The average of RMSE is calculated as where m is the number of tests.Finally, the results are output in the form of charts, which is very suitable for evaluating the performance of the SLAM system.

Indoor Experiments
The indoor experiment was carried out in two experimental scenarios, and a total of three experiments were carried out.

Experiment Scene 1
Experimental Scene 1 is in the corridor of the teaching building of the Baoshan Campus of Shanghai University.The experimental route is shown in Figure 9a.The internal structure is shown in Figure 9b.This area is located in the middle of the teaching building, and the surrounding trees and grass are dense.The whole environment is more occluded by GNSS signals, which makes the GNSS-RTK signal weak.The characteristics of the corridor are obvious, with rich line features, plane features, and cylindrical features.The curvature of these features varies greatly, which will be beneficial to the front-end feature extraction and feature matching of the SLAM system in this paper.
The experiment is mainly to verify the positioning accuracy and mapping effect of the SLAM system in this paper without loop closure, and the GNSS-RTK signal is weak.

Experiment Scene 2
Experimental Scene 2 is in the underground garage of Shanghai University Baoshan Campus.Figure 9c is the internal scene map of the underground garage.The experimental area is rectangular, with a total area of about 400 m 2 .The interior has rich features such as edges, surfaces, and columns.There is no GNSS signal, and the environment is dark.In this scene experiment, it is mainly to verify the positioning accuracy and mapping effect  of the SLAM system in this paper when there is a loop closure but no GNSS-RTK information.
The relatively stable cars, square columns, and landmarks in the environment are selected as the target objects of the point cloud semantic segmentation module in loop-closure detection.
We will conduct comparative experiments with LOAM and LEGO-LOAM algorithms because our algorithm and these two algorithms are similar to the research points of SLAM, and in the case of using the same sensor, the performance of these two algorithms is the best in the current LiDAR SLAM algorithm.Compared with them, the performance of this algorithm can be well evaluated.
Experiment 1 was carried out in the corridor of the teaching building.Experiment 2 was carried out in the underground garage without a middle loop closure, and Experiment 3 was carried out in the underground garage with middle loop closures.The total trajectory lengths of the three experiments are 235, 223, and 345 m, respectively.The number of loop closures on the route of Experiment 2 is less than that of Experiment 3.
All three experiments were conducted ten times.To ensure the reliability and representativeness of the data, we chose to conduct experiments at different time periods.The experimental time of experimental serial numbers 1-5 was in the morning, and the experimental time of experimental serial numbers 6-10 was in the afternoon.The outdoor scene experiment is also the same.
Real-time performance is a key indicator to measure the performance of the algorithm.Table 1 shows  Figure 11 is the point cloud image of our algorithm in three scene experiments.In the corridor experiment of Figure 11a, the satellite map and the point cloud image are registered.Figure 12 compares the indoor experimental trajectories of the three algorithms.Figure 13 compares the effects of the three algorithms in indoor scene mapping.
In Experiment 1, the experimental scene is located in the corridor of the teaching building.The corridor environment lacks loop closure, and the GNSS-RTK signal is weak.Therefore, the loop-closure modules of the LEGO-LOAM algorithm and   the SLAM algorithm in this paper are not involved in the whole system optimization.By comparing the degree of agreement between the point cloud image and the satellite map, it is found that the degree of agreement of our algorithm is the highest, and the positioning error of the algorithm in this paper is also smaller than the other two algorithms.
In Experiment 2 and Experiment 3, the experimental scene is in the underground garage, which has a loop closure but no GNSS-RTK signal.Since there is no loop-closure detection module in the LOAM algorithm, the positioning error will be affected by the cumulative error of the system.The loop-closure detection module of the LEGO-LOAM algorithm uses a simple trajectorybased pose method, and the positioning effect is worse than the loop-closure detection method proposed in this paper.And, the mapping effect of our algorithm is better than LOAM and LEGO-LOAM.
Three experiments in indoor scenes prove that the SLAM algorithm in this paper has good performance and can run in realtime under the condition of weak GNSS-RTK or no GNSS-RTK signal.At the same time, ensure that the positioning accuracy and mapping effect is better than LOAM algorithm and LEGO-LOAM algorithm.

Outdoor Experiments
The experimental site of the outdoor scene in this paper is the East Campus of the Baoshan Campus of Shanghai University, as shown in Figure 9d.There are three test routes in the outdoor scene, including the loop-closure route, the short-distance straight route, and the continuous turning route.The total length of the three test routes is 461, 450, and 396 m, respectively.The outdoor scene has high-rise buildings, trees, grass, and ditches, and there are many pedestrians and vehicles in the school.It is a typically dynamic and complex outdoor scene.At the same time, the high-rise building will cause GNSS-RTK signal instability, so the scene is very challenging.The relatively stable trunks, roadside cars, and flowerbeds in the environment are selected as the target objects of the loop-closure detection module.
The average error of the three test route trajectories is shown in Figure 14.In these three test routes, the average positioning error of the proposed algorithm is 0.160 m, the average positioning error of the LOAM algorithm is 0.428 m, and the average positioning error of the LEGO-LOAM algorithm is 0.255 m.The SLAM algorithm proposed in this paper improves the positioning accuracy of LOAM and LEGO-LOAM algorithms by 62.4% and 36.9%,respectively, in three outdoor scene experiments.
It can be seen from Table 2 that the average time consumption of each module of our algorithm is lower than that of LOAM and     LEGO-LOAM algorithms, and the real-time performance is better than the other two algorithms.
Figure 15 is the point cloud map and satellite map registration map of our algorithm in three scenes of outdoor scenes.Figure 16 is the comparison map of the positioning trajectory of three algorithms.Figure 17 is the comparison map of the mapping effect of three algorithms in outdoor scene.
In test route 1, the vehicle carries the LOAM algorithm and does not return to the starting point after running around the library.The main reason is that the LOAM algorithm does not have a loopclosure detection module.The accuracy of the LEGO-LOAM algorithm is higher than that of the LOAM algorithm, and it can return to the vicinity of the starting point.The SLAM system proposed in this paper uses the tight-coupling model; integrates the LiDAR odometer factor, IMU pre-integration factor, and the loop-closure constraint factor based on the semantic map descriptor; and runs a circle to return to the starting point perfectly.The trajectory of our algorithm is more suitable for the real trajectory of autonomous vehicles, and the positioning accuracy is higher than the other two algorithms.The point cloud map and the satellite map are also completely consistent, and the mapping effect is better than the other two algorithms.
In test route 2, the route is a straight-line driving route.There are objects such as trees, buildings, and rivers in the scene, and there are often pedestrians and vehicles on the road.The environment has high-performance requirements for the SLAM algorithm.The feature extraction and feature matching of the front end of the algorithm can deal with complex environmental features well.It can be seen from the experimental results that the positioning accuracy of the proposed algorithm is higher than the other two algorithms, the point cloud map and the satellite map are also completely consistent, and the mapping effect is better than the other two algorithms.
In test route 3, the route is a continuous curved route, and there will be three corners in the route.Sudden changes in environmental features at the corners, such as pedestrians and moving cars, will have a greater impact on the feature matching of the overall SLAM system.The motion pose estimated by LOAM and LEGO-LOAM systems has a large deviation compared with the algorithm in this paper, and both of them have drift phenomena.The main reason is that the SLAM system in this paper adopts a tightly coupled method, adding the LiDAR odometer factor, IMU pre-integration factor, and loop-closure constraint factor to jointly optimize the pose, which greatly eliminates the cumulative error and avoids the drift phenomenon.This also brings more accurate positioning results.The positioning results and mapping effects are better than the other two algorithms.
Three experiments in outdoor scenes show that the SLAM algorithm in this paper has good performance, can run in real time in outdoor complex dynamic scenes, and ensure that the positioning accuracy and mapping effect is better than the LOAM algorithm and LEGO-LOAM algorithm.

Conclusions
In this article, we construct a tightly coupled laser inertial SLAM system and propose a loop-closure detection algorithm based on a point cloud semantic graph descriptor.The algorithm extracts stable objects in the environment through semantic segmentation to construct graph descriptors.At the same time, the information of GNSS-RTK is added to the point cloud descriptor, which is used for fast loop-closure detection, and the loop-closure constraint factor is generated and added to the global-factor-graph optimization.We conducted experiments in two indoor scenes and three outdoor scenes and compared them with the LOAM algorithm and the LEGO-LOAM algorithm.The experimental results show that the proposed algorithm performs well in indoor and outdoor scenes with and without loop closures and can overcome a variety of complex dynamic environments, the real-time performance, positioning accuracy, and mapping effect that are better than the LOAM algorithm and LEGO-LOAM algorithm.
In the future, the adaptive transformation of the parameters in the loop-closure detection algorithm will be studied to better improve the effect of loop-closure matching and the generalization performance of loop-closure detection.Furthermore, the mapping effect and positioning accuracy of the SLAM system should also be improved.

Figure 1 .
Figure 1.Illustration of world frames and sensor frames.The origin of the earth-centered earth-fixed (ECEF) frame {E} is at the center of the earth.The X-axis points to the intersection of the primary meridian and the equator, and the Z-axis points to the Arctic.The global frame {W} is located on the ground, and the Z axis is upward.

Figure 2 .
Figure 2. Loop-closure detection algorithm flow chart.It is divided into three steps: i) data preprocessing; ii) constructed point cloud semantic graph descriptor for keyframes; iii) use of point cloud descriptors for loop-closure detection, stored historical keyframes in containers, and point cloud descriptor is searched in the container.

Figure 3 .
Figure 3. Point cloud semantic graph descriptor and its corresponding ring graph.a) The corresponding point cloud semantic graph descriptor after the semantic segmentation of the current point cloud data frame.b) The ring graph is corresponding to the point cloud semantic graph descriptor.
graph corresponding to the current frame, C graph j represents the ring graph corresponding to a loop-closure candidate frame, and m is the number of rings in the ring graph (the type of edges in the point cloud semantic graph descriptor).

Figure 4 .
Figure 4.The determination of the value of α in indoor and outdoor tests.a) Determination of α value in indoor test.b) Determination of α value in outdoor test.The range of α is initially determined to be 0.8-0.9, and the initially determined α range is divided equally at an interval of 0.1.For each α value, ten current frames are taken from each test route, and the similarity score is calculated with the historical keyframes in the point cloud descriptor container.The test of each route is repeated ten times, and finally, the α value with high average accuracy of loop-closure matching is selected.

Figure 5 .
Figure 5. Simultaneous localization and mapping (SLAM) system framework.It is divided into four parts: input, front end, back end, and output.The data of the input system are from 3D light detection and ranging (LiDAR), inertial measurement unit (IMU), and global navigation satellite system real-time kinematic (GNSS-RTK).The front end is a LiDAR inertial odometer model.The back end optimizes the factors in the factor graph in a sliding window.Finally, the global map and vehicle location are output.

Figure 6 .
Figure 6.Factor diagram of LiDAR inertial odometer.The LiDAR relative pose factor and IMU factor are integrated.

Figure 7 .
Figure 7. Global-factor-graph-optimization model.The LiDAR relative pose factor, IMU pre-integration factor, loop-closure constraint factor, and marginalization factor are integrated and jointly optimized in the sliding window.

Figure 9 .
Figure 9. Indoor and outdoor experiment road map and indoor scene internal structure.a) Indoor Experiment 1 route-teaching building corridor experiment, indoor experiments 2 and 3 were carried out in the underground garage.b) Teaching building corridor internal structure diagram.c) Underground garage structure diagram.d) Three test routes for outdoor scenes.

Figure 8 .
Figure 8. Experimental hardware platform.a) The SLAM system is mounted on an electric vehicle.b) SLAM system includes the following: ADVANTECH MIC-7700 IPC, Velodyne VLP-16, Xsens MTI-30, and GNSS-RTK modules.c) SLAM system is mounted on Tiago.
the average time consumption of the three modules of feature extraction, loop-closure detection, and optimization in Experiment 3. It can be seen that the average time consumption of the feature extraction module of our algorithm is 2.963 ms, which is far less than 100 ms, which can meet the real-time operation of 10 Hz LiDAR.The loop-closure detection module is the most time consuming, with an average time of 70.257 ms.The loop-closure detection set in our algorithm runs once in 1.5 s, so it fully meets the real-time requirements.It takes about a third of the time of LEGO-LOAM.The optimization module takes less time, and the average time consumption is 1.651 ms, which can meet the real-time requirements.The average time consumption of each module is lower than that of LOAM and LEGO-LOAM algorithms, and the realtime performance is better than the other two algorithms.The positioning errors of the three experiments are shown in Figure 10.In the three experiments, the average positioning error of the proposed algorithm is 0.180 m, the average positioning error of the LOAM algorithm is 0.523 m, and the average positioning error of the LEGO-LOAM algorithm is 0.283 m.Compared with LOAM and LEGO-LOAM, the positioning accuracy of the proposed algorithm is improved by 65.6% and 36.4%,respectively.

Figure 11 .
Figure 11.The point cloud image of the algorithm in three scene experiments.a) Experiment 1: teaching building corridor point cloud.b) Experiment 2: underground garage point cloud without middle loop closure.c) Experiment 3: underground garage point cloud with middle loop closure.

Figure 10 .
Figure 10.Comparison of average positioning error of the indoor test.The average positioning errors of the three algorithms in three test scenarios.

Figure 12 .
Figure 12.Comparison of indoor experimental trajectories of three algorithms.a) Experiment 1: teaching building corridor trajectory comparison diagram: i) global trajectory diagram; ii) local trajectory amplification map.b) Experiment 2: trajectory comparison diagram of the underground garage without intermediate loop closure: i) global trajectory diagram; ii) local trajectory amplification map.c) Experiment 2: trajectory comparison diagram of the underground garage with intermediate loop closure: i) global trajectory diagram; ii) local trajectory amplification map.

Figure 13 .
Figure 13.Comparison of three algorithms in indoor scene mapping effect.a) Experiment 1: comparison of the effect of teaching building corridor mapping.b) Experiment 2: comparison of the effect of underground garage construction without intermediate loop closure.c) Experiment 2: comparison of the effect of underground garage construction with intermediate loop closure.

Figure 14 .
Figure 14.Comparison of average positioning error of outdoor test scene.The average positioning error of the three algorithms under three test routes.

Figure 15 .
Figure 15.The point cloud and satellite image registration map of the algorithm in the outdoor scene experiment: a) loop-closure route, b) straight route, and c) continuous turning route.

Figure 16 .
Figure 16.Comparison of outdoor experimental trajectories of three algorithms.a) Test route 1: loop-closure route trajectory comparison diagram: i) global trajectory diagram and ii) local trajectory amplification map.b) Test route 2: straight-line driving route trajectory comparison diagram: i) global trajectory diagram and ii) local trajectory amplification map.c) Test route 3: continuous turning route trajectory comparison diagram: i) global trajectory diagram and ii) local trajectory amplification map.

Figure 17 .
Figure 17.Comparison of three algorithms in outdoor scene mapping effect.a) Test route 1: loop-closure route mapping effect comparison.b) Test route 2: straight driving route mapping effect comparison.c) Test route 3: continuous turning route mapping effect comparison.

Table 1 .
Indoor experiment 3 algorithm module average time consuming.

Table 2 .
Outdoor experiment 1 algorithm module average time consumption.