Facial and mandibular landmark tracking with habitual head posture estimation using linear and fiducial markers

Abstract This study compared the accuracy of facial landmark measurements using deep learning‐based fiducial marker (FM) and arbitrary width reference (AWR) approaches. It quantitatively analysed mandibular hard and soft tissue lateral excursions and head tilting from consumer camera footage of 37 participants. A custom deep learning system recognised facial landmarks for measuring head tilt and mandibular lateral excursions. Circular fiducial markers (FM) and inter‐zygion measurements (AWR) were validated against physical measurements using electrognathography and electronic rulers. Results showed notable differences in lower and mid‐face estimations for both FM and AWR compared to physical measurements. The study also demonstrated the comparability of both approaches in assessing lateral movement, though fiducial markers exhibited variability in mid‐face and lower face parameter assessments. Regardless of the technique applied, hard tissue movement was typically seen to be 30% less than soft tissue among the participants. Additionally, a significant number of participants consistently displayed a 5 to 10° head tilt.

ropathies and paralyses [7].Historically, soft tissue movement has been tracked either arbitrarily by measuring anatomical distances between two fixed landmarks or through fiducial markers of a known diameter affixed onto the participant [8].The current study is the first to document and compare both methods in evaluating mandibular movements on the lateral plane within a South Australian population.
The current research aimed to achieve two main objectives: firstly, to assess the precision of facial feature measurements using fiducial markers versus arbitrary facial width measurements, and secondly, to provide a quantitative evaluation of the lateral soft tissue movement of the mandible relative to the hard tissue movement, and to detect head tilts using consumergrade camera videos.To achieve both objectives, the study made modifications to an in-house open access software application that was designed to detect distances between facial landmarks by converting pixels to millimetres through object detection models such as OpenCV and Dlib [9,10].It was hypothesised that no significant discrepancies would be observed between the two estimation techniques when purposed to measure facial parameters during jaw movement on the lateral plane.

Study design
The study was approved by the University of Adelaide Human Research and Ethics Committee (H-2022-185).

Development of the tracking software
The developed software comprises of Dlib face detector model for the frontal face detection and FAN's face alignment model for the side face detection and is an adaptation of the Dental Loop FLT model [10].Dlib, a C++-based library, was utilised along with deep neural networks ResNet designed for facial landmark recognition.The facial detection technique employed by Dlib makes use of histogram-oriented methods (HOG).This approach, with certain adaptations such as single face focused detection, was integrated to identify facial landmarks and outlines.To adapt ResNet for the specific task of facial landmark recognition, transfer learning techniques were utilised on the dataset [11].These tasks were accomplished through a pretrained model referred to as 'shape_predictor_68_face_landmarks.dat'.The shape_predictor_68_face_landmarks.dat file contains the weights and architecture of a trained deep learning model (typically a shape predictor based on regression trees) that has been trained to predict 68 specific landmarks on a human face.The study also utilised the 2D 'Face Alignment' algorithm extracted from the face alignment network (FAN) model [12].It is a convolutional neural network-based approach for predicting the positions of facial landmarks, allowing for alignment and analysis of facial features in 2D images.This choice was made to address the limitations of Dlib in reliably detecting faces that are obscured or positioned at angles.To enable the detection of facial features from within images or frames, an algorithm was developed using Dlib's 'get_frontal_face_detector()' method which detects a frontal face and returns the landmarks of the detected face.Additionally, an extra memory functionality was implemented.This feature enhanced reliability and maintained focus on a singular face within the frame consistently during a recording session for continuous tracking.Within a continuous loop, every frame of the video was extracted and transformed into a grayscale representation.The grayscale image was then subjected to loaded landmark and face detection models, and display values for corresponding landmarks.A novel reference point to highlight the soft tissue over nasion (sN) was also implemented.The iterative procedure was set to terminate following completion of processing of all frames.The workflow has been reported below in Figure 1.

Development of the circular fiducial marker approach
The circular fiducial marker approach involved the use of the Hough circle transform (HoughCircles) function from OpenCV to identify a standardised circular marker placed strategically on the participants' foreheads or necks, depending on the orientation of the face being detected.The Hough circle transform is a technique used for detecting circular shapes within an image, even when they may be partially obscured or vary in size or position [9].The marker's diameter was then computed based on its pixel-based representation.A predefined region of interest (ROI), determined by the locations of eyebrow landmarks, encapsulates the marker for detection.The marker's contrast against the skin tone ensured reliable identification, irrespective of colour variation.
The conversion from pixel-based measurements to millimetres was achieved through a customisable reference marker diameter, which was standardised to 15 mm commercially available markers in this study.This was determined following trial and error to determine the dimension The conversion rate was computed as follows: Conversion Rate = Reference Circle Diameter (mm) Detected Circle Diameter (Pixel) In this equation, the reference circle diameter represents the known physical diameter of the circular fiducial marker in millimetres, while the detected circle diameter corresponds to the measured diameter of the marker in pixels within the image.

Development of arbitrary width reference approach
In the arbitrary width reference approach, the inter-zygion reference was considered and labelled: 'sZy left-sZy right' which denoted the facial width between the soft tissue over the zygomatic prominence.The pixel-based measurements along said lines corresponded to the detected face width measurements obtained from the multi-step process outlined earlier.Figure 2 demonstrates live examples of the two approaches.The pixelbased measurements were then converted to millimetres using the formula: Distance in Pixel Reference in Pixel × User provided reference in millimetre

Landmark tracing
The landmarks registered for the purpose of current evaluation were: (1) distance from soft tissue over nasion to subnasale, (2) distance from subnasale to soft tissue over pogonion, (3) left and right inner Canthus lines, (4) distance from articularis to gonion to soft tissue over gnathion on the right and left sides,  (5) distance between the soft tissue over right and left zygion.The order of landmarks and corresponding facial features have been adapted from the open-source software Dental Loop FLT [10].

Quantifying mandibular lateral excursions and head tilt
To track lateral excursion, a reference line from subnasale to soft tissue over gnathion was established.To account for limitations in computational power, changes in parameters were automatically recorded at 60 frame intervals from 30 fps 1080p video recordings.
The primary goal of this tracking algorithm was to quantify the angular extent of the jaw's lateral movement.To achieve this, the algorithm calculated the angle between vectors that represented the initial and final directions of movement (Figure 3).This angle, measured in radians, served as an indicator of the magnitude of the jaw's lateral excursion displacement.Moreover, the algorithm also accounted for vertical and horizontal displacements concerning the specific chin landmark at the time of lateral excursion to exclude movement anomalies (Figure 4).

Head tilt
To detect head tilt, a dedicated function, "calculate head tilt," was defined.The function began by extracting Landmarks 27 and 29 from the Dlib model and computing the horizontal distance between them.To provide a reference point for this measurement, the function also computed the distance between the top nose landmark and the middle nose landmark (Figure 5).Thresholds were adjusted, and the directional angle of tilting was then calculated using the 'math.atan'function and differentiated by positive and negative values to indicate the direction of tilt.This determination was made based on the sign   of the horizontal distance: a positive sign denoted a tilt to one direction, while a negative sign indicated a tilt in the opposite direction (Figure 6).

Data collection
Based on a large effect size of F = 0.55 (G × Power, JPD paper), α = 0.05 and power of 0.80, it was determined that data from 37 participants would be required to make the comparisons.Thirty-seven adult participants aged between 18 and 65 with some or all their natural dentition and without temporo-mandibular dysfunction were recruited.The participants undertook a single session of video recording using a consumer camera (Brio-4K, Logitech, USA) at 1080p at 60 fps using a 13-megapixel lens with no ambient lighting regulation.Video outputs were at 2500 Kbps native bitrate and encoded using H.264 NVENC and exported in Matroska Video (.mkv) format.In the clinic where data was collected, participants were uniformly positioned 45 cm away from the lens of a camera fixed on a tripod mount.This setup was carefully chosen for optimal focus in the prevailing lighting conditions.All data that could be used to identify subjects was anonymised, and recordings were deidentified in compliance with the guidelines provided by the university's human research ethics committee.Measurements for inter-zygion facial width, lower face height (from subnasale to the soft tissue over gnathion), and midface height (from the soft tissue above nasion to the soft tissue over subnasale) were taken with a measuring ruler by a single operator, establishing the baseline for reference tracking.
The hard tissue measurements for lateral excursion were captured using an electrognathograph (JT-3D; BioResearch, USA) to establish comparative differences between hard and soft tissue movement.Before data collection, a second operator conducted ten independent readings to confirm the reproducibility and repeatability of measurements.This process resulted in an Intraclass correlation coefficient (ICC) of 0.86 and an average variation of 1.5 mm.Intra-operator reliability was also assessed  with five independent readings across two consecutive days, yielding an ICC of 0.89.

Data analysis
The data derived from the circular fiducial marker approach and the arbitrary face width reference approach were compared with the physically measured values estimated by the operator on the participants at the time of video recording.The discrepancies between the methods' measurements and the ground truth values, taking into consideration a margin of error, were quantitatively analysed as relative and absolute errors.Test for normality was carried out by the Kolmogorov-Smirnov test and a 1-way ANOVA was performed at multiple levels to compare across the different methods in tracking facial parameters and mandibular lateral excursions.The following formula were used in the analyses of data:

RESULTS
Data was collected from a sample of 37 participants, spanning three distinct demographic groups: East Asian (n = 10), Caucasian (n = 20), and South Asian (n = 7).Of these, 12 were male and 25 females.Fourteen participants demonstrated a habitual head tilt between 0 • and 10 • to the right, six demonstrated a head tilt of same degrees to the left, three tilted 10-15 • to the right, while the remaining 14 maintained a straight posture.
The methodology for data collection and synthesis is illustrated in Figure 7.Both the relative and absolute errors of the two tracking methods across the participant pool are detailed in Table 1.
The relative error trends across 37 participants for lower face (Figure 8A) and mid face estimation (Figure 8B) were then visualised.Higher errors were observed in a few instances and have been discussed in the following section.
When compared to physical measurements, there were notable differences across measurement techniques.Particularly, using FM resulted in a higher standard deviation in estimating lower face height.(Table 2).
The mean electrognathograph values were 8.85 ± 2.41 on the left side while it was 8.97 ± 2.10 on the right side and were approximately 30% less than soft tissue estimations performed by both AWR and FM techniques (Table 3).

DISCUSSION
The initial hypothesis anticipated minimal disparities between the two estimation techniques when applied to facial parameters or during lateral movement monitoring.However, when comparing the estimated values to the physical arbitrary measurements, notable discrepancies arose, prompting the rejection of our initial hypothesis.Standard deviations and means for lateral excursions on both sides showed resemblance between the two measurement techniques.This suggests reasonably consistent results within groups with variability potentially arising from external conditions and population-specific lateral excursion patterns [13].
It was observed that East Asians exhibited a slightly higher mean head tilt angle (9.133 • ± • 1.99 • ) compared to Caucasians (7.375 • ± • 2.05 • ).The findings become more generalised and relevant as head tilting was observed in 62.2% of the current dataset and across ethnicities.Habitual head posture is an important determinant of jaw function and in some cases can predispose to long term masticatory muscle complex dysfunction.While superior to the alternative method of using traditional goniometers to measure habitual head postures [1], an objective analysis that accounts for head tilting similar to current methods can be of limited value in individuals with undiagnosed neck pathologies such as cervical and spinal injuries that demonstrate compensatory tilting mechanisms [14].

Limitations of the two tracking systems
Each tracking method demonstrated inherent limitations, necessitating a thorough understanding of the shortcomings for accurate result interpretation.The fiducial marker approach exhibited higher variability in estimating mid-face height, attributed to its dependence on the Hough circle detector.The Hough circle detector is a specific image processing technique used to detect circles within a given image.In some cases, discrepancies arose when the contrast between the circle's background and perimeter did not meet expected criteria, causing fluctuating output values while in other instances, hair over the forehead and makeup residues contributed to occluded-angle images and noise [15].Notably, using physical arbitrary measurements from a single operator as a standard may have introduced some observer-generated biases and parallax errors.
Environmental factors, such as lighting and electromagnetic interference, can impact tracking systems variably, influencing readings.Although existing datasets had participants in suitable ambient lighting conditions, fiducial markers, in some instances, captured excessive skin surface reflections, causing momentary disruptions in data generation as the system temporarily lost track.While the arbitrary width reference approach is less susceptible to skin surface reflections, it is more vulnerable to sudden head movements.

Limitations in the image acquisition process
There were several complexities associated with determining the duration a participant took to complete a cycle of lateral excursions.As this process was subjective, variations were noted across different ethnic groups as which may have significant correlation with head tilting.This variability was further compounded by sporadic, unanticipated head movements and occasional jittery motions that the participants performed during the data collection process.Consequently, these occasional irregularities necessitated manual frame-by-frame scrutiny to ensure the precision and fidelity of data retrieval from the recorded sequences thereby preventing the workflow from becoming a fully unsupervised process.
The current model overzealously considered every frame and interval while continuously updating the generated report to the highest value, occasionally collecting irrelevant data due to unintended movements during facial expressions like smirking.The current dataset's native sampling rate of 60 fps was deemed sufficient for capturing lateral movement values, although anatomical variations in the temporomandibular joint complex could have contributed to higher overall estimation [16,17].To address some of these limitations, the software correlated movement with habitual head posture orientation through threshold adjustments.A similar technique was used by Hussein et al. for iris pattern recognition in ocular torsional changes due to head tilt [18].

Future recommendations
Attempting to create prediction models for jaw movement trends without accurately identifying habitual head postures may result in overfitting and inaccuracies.A potential solution is to gather repeated data from the same individual over several weeks, identifying trends in habitual head posture changes and fitting a model through clustering recurring patterns.This approach could be a focus for future research.Notably, deep learning-based time-series experiments at the Royal Ade-laide Hospital in South Australia are already exploring trends influencing patient discharge [19].
In future software implementations, the utilisation of models like Anchorface, capable of capturing occluded faces, can enable the analysis of lateral excursions in the sagittal plane and anterior protrusion in the coronal plane from video recordings at obscured angles.This holds clinical significance in evaluating temporomandibular joint conditions and various other maxillofacial conditions [1,20].The software's processing time is contingent upon the machine's capabilities.If the machine fulfils the minimum hardware requirements, specifically possessing CUDA-enabled graphics, the processing time will synchronise with the duration of the sample video.Notably, recent advancements include the introduction of dedicated AI cores within processors [21].This development suggests that processing times are likely to experience a substantial decrease in the coming years, making it a subject for future exploration.

CONCLUSION
The research successfully applied landmark tracking using consumer-grade camera footage, employing arbitrary width reference and fiducial marking methodologies.Both techniques yielded similar outcomes in tracking lateral excursions, with unique limitations.However, assessing midface and lower face attributes, the fiducial marker method exhibited greater variability.Regardless of the technique applied, hard tissue movement was typically seen to be 30% less than soft tissue among the participants.Additionally, a significant number of participants consistently displayed a 5-10 • head tilt.
Open access publishing facilitated by The University of Adelaide, as part of the Wiley -The University of Adelaide agreement via the Council of Australian University Librarians.

FIGURE 1
FIGURE 1 Workflow of development.

FIGURE 2
FIGURE 2Visual representation of the Fiducial marker approach versus the arbitrary width reference approach.

FIGURE 4
FIGURE 4Recording of vertical and horizontal displacements at the time of lateral excursion to exclude movement anomalies.

FIGURE 5
FIGURE 5The custom landmarks and determination of direction of tilting.

FIGURE 6
FIGURE 6Clinical demonstration of the detection of degree and direction in head tilting.

FIGURE 7
FIGURE 7 Summary flowchart of the process workflow.

FIGURE 8 (
FIGURE 8 (A) Relative errors in the two estimation methods for lower face tracking.(B) Relative errors in the two estimation methods for mid face tracking.

TABLE 1
Tracking method absolute and relative error.

TABLE 2
Evaluation of the three soft tissue landmarks using different tracking techniques.

TABLE 3
Measurements of maximum lateral excursion and their corresponding electrognathograph measurements on the left and right side.