An innovative method for human height estimation combining video images and 3D laser scanning

Digitalization has increased the number of video surveillance systems that sometimes capture crime images. Traditional methods of human height estimation use projective geometry. However, sometimes they cannot be used because the video camera surveillance system is not available or has been moved and there are no reference lines on the frame. Scientific studies have developed a new method for human height estimation using 3D laser scanning. This model necessarily requires a series of approximations, which increase the final measurement error. To overcome this problem, in the present study, images of a subject are projected directly on the 3D model, estimating the height of the subject. This article describes the methodological approach adopted through the analysis of a real case study in a controlled environment executed by Carabinieri Forensic Investigation Department (Italy). The aim is to obtain a human anthropometric measure derived from frames extracted from the videos associated with the digital survey of the framed area obtained with 3D laser scanning and point cloud analysis. The result is the height estimation of five subjects filmed by a camera obtained through the combination of 2D images extracted by a DVR/surveillance systems with 3D laser scanning. Results show that most estimated measurements are less than the real measurement of the subject; it also depends on the posture of the subject while walking. Furthermore, results shows the differences between the real height and the estimated height with a statistical approach.


| INTRODUC TI ON
Body measurement, called anthropometry, is analyzed in different studies: bio-engineering, medicine [1], textile industry, and even in forensics [2].General anthropometry includes the complete process of data collection, documentation, synthesis, and analysis [3].Body measurement is a powerful tool for forensic comparison between an unknown person and a suspect [4].Body sizes of a subject can be measured with different methods, for example, using traditional tools such as grippers and tapes [5] or automatically using 3D scanners [6].The 3D scanner offers more accurate and comparable results [7], and it is faster than manual measurement [8].During the forensic analysis, the body of the person is absent from the analyzed scene, and the only evidence that can be analyzed is video of a crime scene.Process digitalization has increased the digital tools in the company and in the city.Today we talk about "Smart Cities" in which there are a set of interconnected digital tools, including video surveillance systems [9].Today, shops, cars, companies, and houses have installed a digital video camera surveillance system, so very often the video surveillance systems capture a frame of a crime scene.If in the video, there is an unknown person, the investigators can estimate the subject's height from the video of the crime scene.
The aim of the paper is to estimate the height of a subject captured by a video without having the subject be measured physically.This is a very powerful forensic investigation tool that allows to get information about a possible suspect at the crime scene.Today, scientific publications present two main methods for measuring the height of an anonymous subject taken from video surveillance systems: the projective geometry and 3D scanning.The expert has to choose the best model based on the conditions in which he is working.Two of the most commonly used methods in the forensic field are described below.
Projective geometry [10], where it is possible from the type of framing and the presence of elements of comparison to establish the plans of stock of the projective lines, there are some software tools capable of reconstructing the genesis of perspective of the image to establish the real measurements of some elements present in the scene.This method does not necessarily require a review of the crime scene, the things that are needed are the ability to draw the main lines of "perspective flight" and identify the object of known height to establish the correct scale factor (e.g., Amped's FIVE).The error is determined empirically and depends mainly on the quality of the image and the geometries present on the reference scene.
Reverse photogrammetry [11,12]: the image is re-projected on the scan converted to a 3D model through an algorithm of reverse photogrammetry that allows, through homologous points, the estimation of the origin of photos and lens correction.To do this, at least eight points that can be pointed both in the image and in the 3D model of the scene are needed.
Photogrammetry allows detecting the metric qualities of an object through a series of graphic operations carried out on one or more frames of the same object.Detection is based on the geometric principles of projective geometry and perspective images.Finally, considering the head point (PT) and the feet point (PP) of the subject in the overlying frames, it is possible estimate the height of the subject.This work presents a forensic application of the second model, which uses 3D laser scanning to estimate human height starting from a 2D video which captured an unknown person.This model is useful when the camera is not available or it has been accidentally moved and in addition there aren't reference lines on the frame (to manage the perspective flights) [13].
The 3D laser is a powerful tool in forensic science that is essentially used for different activities.The laser scanner can document the conditions of the scene (preservation of scenes) that may change over time: fire scenes, construction sites, scenes of accidents that will have to be cleaned immediately after the investigation due to public safety, such as construction or road crashes, industrial accidents, etc.
Witnesses' views can be replicated and reorganized in real time with the 3D representation of the scene available [14,15].A 3D response of the scene of the accident is created, including measurements, vehicle position, damage, distances, and road surface information [16].This tool helps to obtain accurate structural analysis.Documentation of structural damage (exterior, interior, foundation, building envelope) and construction defects in all areas of the building, including hard-toreach areas.It is very helpful for the presentations in the courtroom.
The creation of 3D paths allows juries/audiences to immerse themselves in an environment; witnesses' views can be replicated, and incidents can be rearranged in real time.It is used by civil protection and fire brigades for the analysis of post-earthquake or incendiary event.
Based on the work of Johnson and Liscio [17], an experiment was carried out to evaluate the accuracy and reproducibility of the height measurement of five known subjects using laser scanning of the scene and applying reverse photogrammetry.The five subjects were filmed with cameras installed in a room while walking around in order to verify the effect on the measurement.Classical projective geometry methods can only be used when the camera was still available and has not been moved or when there are reference lines on the frame to manage the perspective flights.Forensic investigations often develop months after the crime has been committed.Therefore, it could happen that the cameras have been moved.This means that there may not be reference lines on the frame.This new technique based on the acquisition of the scene using a 3D laser scanner allows to measure the height of a subject with the relative error even in cases in which the video surveillance systems are no longer present and/or have been modified and without a reference line on the frame.This model is innovative compared to those proposed in the literature, because in this case the 2D frame was overlay on the 3D model developed starting from point cloud meshing and it is possible to measure directly the height of the subject.
Furthermore, the literature proposes old measurement methodologies in which the subject is represented with a large cylinder [18], this leads a measurement problems, because in this case the subject was approximated with a cylinder figure.In this case, however, it was developed a new measurement method.The measure was performed directly on the subject, whose photo is spread over the 3D environment.This allows to decrease the error measurement probability.The rest of the paper is organized as follows.Section 2 describes material and methods; Section 3 presents the main results of the study; Section 4 discusses the results.Finally, Section 5 summarizes the main conclusions of the research.

| Design of the methodological approach
The methodological approach was developed based on different steps, to obtain a hierarchical and integrated system.In the first stages, the images of the crime scene are evaluated by analyzing their quality, after which the virtual environment is built in three dimensions.Finally, the images from the cameras are combined with the virtual built environment and height estimate is developed.This methodological approach presented makes it possible to integrate the technical needs of the analysis with the managerial and qualitative assumptions of the analysis.The methodological approach of the case study is divided into seven steps as shown in Figure 1.

| Experimental procedure
A detailed description of the methodological steps characterizing the experimental procedure is reported below.

| Step #1: Video acquisition
Five subjects have walked into a room under a video camera surveillance system, wearing job shoes.Participants were taken for a walk along a single path inside a room and filmed by an AXIS P3245-LV Fixed Network Camera (HDTV 1080p video quality, Lightfinder 2.0 optimized IR, two-way audio I/O connectivity, signed firmware, and secure boot).The first step of the analysis is to download the video and bring it back to the video management software (e.g., Amped's FIVE).It is important to choose the correct tools to acquire the frame because camera features can introduce many defects as described in Table 1.

| Step #2: Images extrapolation
The second step involves extrapolating the images using a video management software.Data extraction can be a problem as these DVR/surveillance systems typically save the acquired data in a proprietary format that can only be decoded with the player provided by the manufacturer.These players are typically very low-quality software, which is often the source of problems in introducing additional quality issues and other accurate data access issues.Surveillance system footage is often encoded in proprietary formats.This means that the video file is encapsulated in a digital black box.
The video content is visible on the players provided directly by the surveillance system manager.To convert the surveillance/DVR captured video into a "tractable" format, that is, analyzable and processable, it is necessary to know as much as possible about the software supplied by the system manufacturer, not excluding the use of reverse engineering techniques to try to understand the system structure.In this case study, images were extracted using Amped Five, a forensic software.According to Criminisi et al. [10], useful images are those in which the resolution allows one to identify the head and feet of the subject.In addition, the subject must be in a relative upright and erect position, and the feet parallel to each other.Many still images were obtained and extracted of each subject as they walked the prescribed route during testing.Figure 2 shows some examples.Figure 4 shows the point cloud of the analyzed room obtained with 3D scanning.
The modeling software used to interpret the registered 3D point cloud data was 3DReshaper [22] developed by PhD Michele Curuni From this generated mesh, reference points were identified on the frames of interest (at least eight) and in the point cloud (Figure 6).
This allows for the proper and accurate orientation and position of 2D video data relative to the 3D point cloud data.
Once the reference points in the 2D video are matched to the 3D point cloud from the same perspective as the 2D video data, the 2D video frame can be projected onto the 3D mesh (Figure 7).This powerful method overlays the 2D video frame onto the 3D point cloud so ideally it is possible to measure all • three position parameters (x, y, z); • three orientation angles (ω, φ, κ); • one direction.
The position of this camera point was called Estimated Camera Point -ECP.It is necessary to measure the height of many objects in the frame for which the value is known to verify that the overlay of the image has occurred correctly.Furthermore, it is essential to identify the error between the real position of the camera and the estimated position of the camera.There is no standard of acceptability for error values.The control is not numerical; it is a visual control of the operator.The process is validated by estimating the height of a metric rod of known height positioned in the area close to the subject to be measured.Literature shows that repeated analyzes have shown that there are measurement problems when there is a parallax error greater than 3 cm.In this case study, the error is less than 1 cm.If the model verification is OK, it is possible to start with the height estimation of the subject; otherwise the whole process is null and another frame must be validated.

| Step #6: 3D measurement
After having projected the frame of interest on the mesh and defined the camera position in the 3D modeling environment, the two other entities useful for the development and calculation of the dimension of interest are defined, namely: (1) PP, coinciding with the support surface of the subject; and (2) PT, generally coinciding with the top of the head of the subject filmed.The choice of these points is the most important human bias of the method.
Thus, it is necessary to use an image with acceptable quality and to train the operator [24].Finally, based on the principles of Euclidean geometry (criteria of similarity of similar triangles), it is possible to define the height of the subject taken in the frame of interest (Figure 8).
Therefore, based on what is reported in the scheme and given the first criterion of similarity of the triangles, it is possible to affirm that in Equation ( 1).
(  Indicating different PTs and different foot points the software calculates the various combinations of height and returns the average value, the standard deviation, the maximum and the minimum value (example is shown Figure 9).This process was repeated for all five subjects and for the different frames of each subject that were extracted from the videos.Frames have been taken when subjects are close to the camera and are captured front and back and other frames have been taken when subjects are away from the camera and are captured from the front and back.

| Step#7: Manual measurement
Finally, the height of each subject was manually measured using an anthropometer and a metal tape measure with an error of 0.5 cm.Each subject was measured for 13 times in front of, and for 13 times back.In this way, it was possible to compare the estimation of the heights obtained through the model under analysis with the real heights of the subjects.Table 2 shows the manual measures.The manual measurement was done after finishing the estimation.This way the tests are blind and independent.

| RE SULTS
Table 3 shows the results for 3D height measurements for each subject obtained with the same process described above and reported in Figure 9.Only the frames with sufficient quality were extracted.
The number of frames was variable for each subject and was between 12 and 18.Each frame refers to a specific position of the subject (captured in front or behind camera) and a precise distance from the camera.
Table 4 shows the comparison between the averages of 3D measurements for each subject with the average manual measurements.

TA B L E 4
Comparison between 3D and manual measurements (μm cm).   2 and 3. Before commenting on the results, it is necessary to clarify that in the initial frames the subject is far from the camera in the front position, in the final frames it is far from the camera in the back position, while in the medium frames it is near the camera.The distance between the camera and the subject varies.In the first frames, the maximum distance is about 17 m, and the subject is captured from the front.The distance decreases until it reaches a distance of 2 m from the camera.After which, the subject is captured from behind and moves away from the camera up to a distance of 15 m.The x-axis is the number of each analyzed frame, while the y-axis is the height value.The black dotted line is the height value measured manually, while continuous black Figure 11 shows the results for estimating the height of the subject 2. Fourteen frames were captured from the video.The results show that the measurement is more accurate with the subject's proximity to the camera and when the subject moves away from the camera from behind.
Figure 12 shows the results for estimating the height of the subject 3. Fifteen frames were captured from the video.The results show that the measurement is more accurate due to the subject's proximity to the camera.In this case, the average of the estimates is very low.This is because the subject lowers his head while walking.
Figure 13 shows the results for estimating the height of the subject 4. Eighteen frames were captured from the video.The results show that the measurement is more accurate due to the subject's proximity to the camera and when the subject moves away from the camera from behind.always related to the proximity of the subject with the camera, but in this case the average of the values is higher.This is because the subject holds the head very high while walking.

| DISCUSS ION
Results show that most estimated measurements are less than the real measurements of the subject.This was predictable, as it is known in the literature [10] that a subject in the phase of walking can be up to 6 cm shorter than their actual height measured in a fully erect position [25].Obviously, it also depends on the posture of the subject while walking.As for subject 3, the values obtained are much lower than the real height because during the walk the subject lowered his head.While for the subject 5, some values are higher than the average because during the walk, they raised their heads considerably.During forensic investigations, suspects are almost never in the best position.In this way, the study approaches the real conditions of investigation with the head in a different position.
Table 4 shows that the difference in height is less than 2 cm for subjects 1, 4, and 5, while subjects 2 and 3 are higher than 2 cm.This is due not only to their posture but also to the different prob- number of points, equal to about 1/10 of the total of the cloud and a less mesh finish.The point cloud acquisition environment affects the type of analysis.For example, if the floor of the environment is homogeneous (e.g., the floor of an office) it will be possible to acquire fewer points.If, on the other hand, the environment is not very homogeneous, it will be necessary to acquire many more points to correctly reconstruct the whole environment.Furthermore, the higher the number of points, the heavier the file and the slower the processing.In this case, the expert carrying out the analysis will choose a qualitatively optimal number of points without making the analysis too heavy.As far as the photographic error is concerned, on the other hand, it is noted: the larger the area shot, the greater the portion of the area contained within a pixel (less detail), and vice versa.These considerations are significant, especially in the texture mapping phase, because it is difficult to identify with extreme precision the reference points present on a detailed 3D point cloud rendering of a scene and the equivalent on the mesh of the 2D video data due to the much lower quality and resolution of the 2D video data.In general, it is observed that the measurements of the subject taken from the front are lower than the exact value because it is more difficult to identify the correct PT and PP.

| CON CLUS IONS
Technological evolution has increased the number of cameras that often capture crime scenes.For forensic investigators, the height of a subject may be a key feature in investigating a crime.In this paper, we presented an innovative model for estimating human height by combining 2D video data to a 3D laser scan point cloud.
This method would be used when the classical projective geometry method cannot be used because the camera is not available or has been moved and there are not reference lines in the frame.
The research presents a real case study to explain the potential of this alternative model of height estimation.The paper, after referring to the scientific literature and describing the various methodological steps, presented the results obtained and also identified the limits of the methodology that emerged during the analysis.In particular, the estimated height was lower than the real height.Accuracy decreased as the distance from the camera increased and Reverse photogrammetry allows you to estimate the position of the camera starting from the 2D images and the acquired point cloud.For reverse photogrammetry, PhotoModeler (one of the most popular software) can use three methods: (1) Using Control Points: it can use control points to solve the camera when processing if you have known 3D positions of several points in the frames, or it has several known dimensions and is able to calculate the 3D positions of several points in the photo relative to each other; (2) Using shapes: If the frame shows distinct shapes in perspective, it can solve the camera and calculate the camera positions.The shape in the frame does not need to match any specific dimension, it just needs to conform to the shape's parameters (e.g., a box needs right angles at each corner); (3) Using Constraints: If the frame has strong three point perspective with horizontal (left-right and front-back) and vertical features that you can mark, it can constrain marked features.Once the camera is solved, it can then mark surfaces to connect the constrained items.

2. 5 |F I G U R E 3
Step #3: 3D Laser scanning 3D Laser Scanning is the process used to capture millions of points of "point cloud" data and convert them into a virtual environment in highly accurate and realistic 3D models for use in many applications.The laser scanners, therefore, work by emitting an infrared laser beam, which is reflected by the surface it encounters in its path.Within this cone of concentrated light, several pulses are transmitted, which are used to estimate the distance.For this case study, a Leica model P40[19] laser scanner was used for 3D scanning.It is an electronic instrument capable of emitting an electromagnetic pulse (laser) and receiving the reflected signal, thus calculating the distance between the instrument and the point detected and also calculating the spatial coordinates of the point belonging to the object impacted with respect to the origin of the scan[20].Two different laser scanning positions were selected (Figure3) and the scan parameters were: (1) Resolution Setting: 3,1 mm@10 m; (2) Sensitivity: Normal; and (3) EDM Mode: Speed.During scanning, the points of the scene and the points related to the camera are acquired.To identify the real point of the camera, the central point of the circumference of the camera is considered.F I G U R E 2 Examples of extracted images of each subject.Laser scanning acquisition positions.

2. 6 |
Step #4: Data analysisThe acquired laser scanner point clouds were imported and preprocessed within the Leica Geosystem CYCLONE software[21].The processing of the raw point cloud data consisted mainly of registering the different 3D laser scans to generate a single point cloud, and this registered point cloud could be used in various modeling software.

[ 23 ]F I G U R E 5
modeling development of point clouds, meshes, parametric, and BIM models.This program allows for the generation of a mesh (Figure 5).The literature reports that point clouds are often not F I G U R E 4 Point cloud of the analyzed room.Mesh of analyzed room.optimalfor graphic representation, but they are the starting point for the creation of surfaces such as polygonal meshes used both for computer graphics and in CAD modeling.Texture mapping is a computer graphics technique capable of projecting one or more images on the surface of a 3D model using the notions of inverse photogrammetry.

F I G U R E 6 F I G U R E 7 F I G U R E 8 F I G U R E 9
Reference points.Frame overlay on the 3D model.the elements present in the frame, including people present in other frames.2.6.1 | Step#5: Model verificationOnce the image has been positioned on the point cloud, the estimated position of the camera that took the scene is automatically determined.The virtual 3D camera placement is a critical step in the camera match overlay process.There is one and only one position in the 3D point cloud that will exactly match the perspective of the 2D video camera.The placement of this camera position (plane of the sensor) in the 3D environment is critical.The position of the virtual camera is calculated automatically by the software after overlying the real images with the virtual images.The identification of the position of the virtual camera takes place by manual collimation of pairs of homologous points between the image and the three-dimensional coordinates of the point cloud.By identifying the projection points, it is possible to estimate the position of the camera considering different parameters (reverse photogrammetry), which will give the position and orientation of the cameras that are identified: Euclidian geometry principles of similarity.Example of height estimation.
) H: hi = (a + b): b from which it follows that hi = H * b a + b TA B L E 2 Manual measurement results (μm cm).

Figures 10 -
Figures 10-14 provide a graphical representation of the results reported in Tables2 and 3. Before commenting on the results, it is lines represent the manual measurement range.The red dot is the height estimate, while the red segments represent the standard deviation of the height estimate.The dashed vertical lines indicate the subject's distance from the camera.In the left area, the subject approaches the camera frontally, in the central area he approaches the camera frontally and moves away from the back and on the right side he moves away from the back.The figure also indicates the meters of distance between the subject and the camera.Figure 10 shows the results for estimating the height of the subject 1. Twelve frames were captured from the video.The results show that the measurement is more accurate due to the subject's proximity to the camera.

Figure 14
Figure14shows the results for estimating the height of the subject 5. Sixteen Frames were captured from the video.In this case we have a different result than the other subjects.The higher values are

F I G U R E 1 2
photographic error committed during the texture mapping phase.The selection of shared known points in 2D and 3D environments can lead to this type of error.This is the error associated with the selection of match points in the 2D and 3D with the same virtual camera, and this error in selecting match points in the 2D/3D environment leads to this variability (error).Pointing error can be developed by the minimum distance between the scanned points that depends on the resolution of the scan thus set at the design stage of the scan itself, by the number of points imported later and processed in a 3DR environment, by the level of finish of the mesh being processed, and therefore, and finally, by the size of each single interpolated triangle.Resolution error can be determined by the size of the area captured, the size of each individual pixel that makes up the frame, and the degree of definition of the image.As for the pointing error, in the present case, being processed in a 3DR environment, for a convenience factor related to the analyzed environment and hardware capabilities (video card), it was chosen to import a defined

F I G U R E 1 3
subjects were captured facing (front) and facing away (behind) from the camera.To study these limits, analyze their causes and try to overlay them.Future research intend to develop a new case study using a forensic mannequin (with a fixed height) to see how height estimates vary based on location and distance from the camera.Results of subject 4.