Data‐driven nonrigid object feature analysis: Subspace application of incidence structure

In object recognition, feature extraction algorithms are designed to capture the discriminate statistics of objects. Due to pose, deformation and background clutter, the recognition of objects becomes nontrivial, particularly nonrigid samples. Through incidence and geometric structure, this article reports on the data‐driven identification of critical features located on object exemplar profiles. The investigation is demonstrated using the features of a cat's head and the application of the Hough transform to extract planar geometric features. A data‐driven recognition routine is described that accumulates prior knowledge for evaluating the error contribution of critical features impacting recognition confidence. In measuring affine facial features, feature parallelism is tracked to determine rotations and elevations of a cat's head. A preliminary recognition error of 8.2% and 17.8% is determined for a front training profile. An analysis proceeds to determine contributions to this error due the identified critical features.


INTRODUCTION
In computer vision problems, data-driven object recognition is having a growing impact on the discovery of geometric, structural and semantic relationships between shapes. 1,2 The data-driven approach between geometry and structure is heavily related to the activity, hence functionality of the shape. Prevailing methods in object recognition study objects under isolation by transfer of information from exemplar to target through correlation. Limitations identified in the review of data-driven 3D shape analysis in Reference 3 highlight these issues: generalization across different datasets; complexity and scalability of the datasets incurring computational training time penalty; size of available 3D datasets where 2D datasets are used to infer 3D data; generation of training data for geometric shape processing tasks; uncovering useful patterns within models to uncover useful information of the problem being solved. These issues, regardless of the geometry being either a plane or solid, are synonymous with object recognition and supervised learning. The proposed method in this work is aimed at systems with limited hardware capability and therefore many of the more sophisticated algorithms are not applicable here. Through a data-driven object recognition routine the method establishes recognition of an object through linear characterization of its critical geometric features. The object under test is a cat, this is a deformable nonrigid body since its body parts can move independently of one another presenting challenging and unpredictable viewpoints and poses. To define the purpose of this study, we detail the contribution of this work in conjunction with the current state-of-the-art.
A subset of machine learning called deep learning 4 has improved the state-of-the-art in computational models addressing visual tasks. 5 Whereby, deep learning allows multiple processing layers to learn higher levels of data abstraction. Beyond the study of objects under isolation, feature data that is contained in the pixel information of images are widely processed by convolutional neural networks (CNN). 6,7 Current development of the generic CCN model 6 are the faster region-based CNN 2,8,9 and you only look once (YOLO) network. 10 The main challenges, through machine learning, deep or not, that computational model development attempt to address, are accuracy, speed, cost, and complexity. In the review of deep learning, it is stated that "human and animal learning is largely unsupervised: we discover structure in the world by observing it and not by being told the name of every object,". 4 A second thought given in Reference 11 argues that through logical and probabilistic inference, an existing gap between machine learning and machine reasoning can be defined, this being "the ability to manipulate prior knowledge in order to answer a new question." Whilst the creation of Image-Net 1,10 has increased the ability to present, prelabel, and manipulate millions of training images into such data hungry models as 7,9,10 it remains that the biological visual system does not require the shear equal amount of training data to handle the recognition of a complex object across multiple viewpoints. Even though, today's computing power alleviates the once demanding resource to pre/postprocess and train these scales of image data. In Reference 12, a review is conducted on the progression of CNN to the present day, whereby, two future directions of deep learning object detection are presented. First, unsupervised object discovery and second, video object detection: tracking temporal information across frames to understand object functionality. The contributions of this work relate prior knowledge obtained from traditional computer vision methods to identify, through inference, feature hierarchies from unlabeled inputs and known templates. A common visual feature of known rigid and known nonrigid object recognition are the spatial relationships between the object's parts and pose. Moreover, affine geometric object feature characterizations tracked through the viewpoint of an object drive the disambiguation of object patterns in this work.
In handling image content to determine feature descriptors an invariant model can be deployed to initially mitigate geometric distortions [12][13][14] ; typically described by rigid body movement. In References 15,16, and 17 enhancements to CNNs to incorporate rigid body invariance have being investigated. However, out-of-plane rotations which depict 3D objects require even more training data to characterize discriminant feature data across viewpoints. In this work, we show how affine 2D geometric feature data from nonrigid objects are localized and measured from real images to recognize this object. These are tracked through a novel iterative model based on a data-driven recognition routine utilizing the Hough transform. 18 The model builds on the application of active shape/contour models [19][20][21] and models of articulation 22 using key points captured in a 2D frame to establish logical and probabilistic reasoning to recognize and track known nonrigid deformable objects through two-axis of rotation. Furthermore, the approach presents a solution that converges both a bottom up and top down approach to object detection and reasoning. Whereby it is object parts that are determined first to make up the whole. Moreover, recognition confidence via template matching is scored incrementally. As opposed to detecting the whole and inspecting for parts. For state-of-the-art hand-crafted feature detector/extractors such as SIFT/SURF, 23 the range in which out-of-plane rotations can be accommodated is limited 24 : 25 • to −35 • . Whilst the analysis in this study is performed using a limited range of real cat head images and centers on the error contributions of each data-driven feature towards the overall recognition confidence of the cat's head. A study of occlusion combined with critical feature identification (via a data-driven procedure) establishes shape functionality that is representative to the holistic form of the object. 25,26 Most importantly, and through a novel combination of applicable techniques, the model observes, infers, and retrieves geometric features from real images based on a priori of weakly supervised data. For applications and specimens (rigid and nonrigid) with appropriate and well-defined degrees of freedom, this study demonstrates that such inferences through logical and probabilistic reasoning can be algebraically implemented to enhance computational models of recognition. One problem that the authors in Reference 27 discuss is the predication of regions of prominent objects in images. The authors simplify the problem to a single class of object where there is at least one in the captured scene. This is commonly known as foreground detection but without manually drawing bounding boxes to allow isolated study of the object. Automatic image region prediction would enhance the speed and accuracy of object recognition systems because redundant image space would be disregarded, and feature extractors would be concentrated to important areas within a picture. In Reference 28 evidence accumulation inference is also used to determine regions of interest to utilize shape-based recognition. Whilst data-driven shape analysis is also applied in segmentation for applications such as moving object detection, 29 the target application in this report is classification.
In this article we report on the importance of features for the classification of the complex object class cat. The collection of reliable object geometry data to infer structural relationships between the geometry and precedence of critical features are investigated. The class object cat is chosen due to view invariance and object familiarity. The recognition routine is based on the principal of recognition-by-parts 30 and the application of the Hough transform. 18 Such that the proposed method is aimed at systems with limited hardware capability, the model is simple and retrieves geometric features from real images based on a planar shape-driven routine. The model searches image space based on well-defined limits built into the training templates. These limits are based on the facial features (ear, eyes, and nose) of the object class cat, their average geometric lengths based on anatomical data and the distances describing the spatial relationship of these features. The model then compares the retrieved feature data from the image space region to remove redundant templates and build up a semisupervised model of the image content. As more features are observed by known inferences from the previous feature, a cumulative recognition confidence increases or decreases. The advantages of the proposed approach are that whilst recognition confidence builds up general class discernibility between a cat and a dog for example. The preemptive knowledge for higher abstractions into the object model increases the intraclass discriminability without potentially affecting interclass separability. Considering the current capabilities of deep learning models in 3D object recognition and moving object recognition and additional requirements on training inputs. We think that integration of optimized hand-crafted feature detectors to isolate (object dependent) critical features is an area where hybrid learning and hand-crafted models may offer the most appropriate solutions. 26 A second advantage that the proposed method offers is a semisupervised search criterion for object parts that through further investigation can be adapted into a preprocessing layer of a deep learning model to reduce redundancies of image search regions. Moreover, the application of this approach can be applied across multiclass facial and complex object recognition problems. It is intended that readers consider this report as a preliminary analysis. The article is organized as follows, first, the topic of data-driven recognition is discussed further to determine an appropriate recognition routine. The experiment and analysis of the determined object features are then detailed with respect to training data profiles, feature identification, affine feature tracking, and impacts on recognition confidence. The report ends with a discussion and summary of the importance of features in nonrigid object recognition.

DATA-DRIVEN FEATURE ANALYSIS
In this section, a linear method in exploratory statistics is discussed in order to determine an appropriate recognition routine. First, for an object (cat head) that is based on a simplification of its parts, mechanisms to isolate, investigate, and analyze features, are presented. Their purpose, to generate congruent data pertaining to the geometric makeup of the object and its parts. Finally, an outline of the processing path and the signal processing issues are considered.
For an image displaying a planer object containing multiple planer object parts there exists an ℜ 2 vector space of the image and a subspace ℜ n pertaining to the image components n. In this definition and application, a planer object can either be represented as one holistic component or as many components which follow the atomistic relationship of the whole. This mathematical tool enables the representation of Biederman's recognition by parts 30 assuming there exists the possibility to characterize object parts with Euclidean plane geometry. A challenge in the linear generalizations of problems at this level of statistical investigation, is to identify the interactions between linear components of the whole and the parts that are operated on by different scaler amounts. Thus, mirroring the inherent nonlinearity a real nonrigid object would exhibit at any instance of time.
Explorative data analysis techniques can be used to analyze vector-spaces and subspaces. The discussion that follows looks at the purpose of these techniques to uncover underlying structural observations of linearly posed problems and how they evolve to help encompass and identify complex relationships. The primary technique discussed is a special case of projection pursuit analysis, 31 known as principal component analysis (PCA). 32 To begin this discussion the necessary statistics are defined: standard deviation (SD), variance and covariance. SD is a measure of how spread out datasets are, this means that the SD, , is the rms distance from the mean, , of the dataset. Closely related is variance, 2 , an equivalent statistical measure from the normal distribution relating the spread of data from its mean value. In 2D both the SD and variance are 1D operators. To analyze the spread of data for n multiple dimensions, a separate calculation for each dimension is required. To provide information about the dispersion relationship between datasets the covariance between two variables is used. In matrix algebra, the covariance matrix, eigenvectors, and eigenvalues along with the second-order statistic covariance, encompass the essence of PCA. The purpose of PCA is to transform a large dataset, containing potentially correlated variables, into a smaller quantity of variables called principal components. In doing so, it allows analysts to identify patterns and trends, or not, in large datasets by using a projection of the larger dataset onto its principal components (smaller dataset). Of which, retaining the potentially correlated behavior of the variables that exhibit maximum variance. Perhaps, the most commonly known application of PCA is in face recognition. [33][34][35] In References 36 and 37, the authors discuss up to date detailed reviews on current facial recognition capabilities. For the general application of statistical analysis, the reality of applicability to interesting problems is in the interpretation of the results. PCA is unsupervised in the sense that the model does not care and does not need any prior information about the input to perform the analysis. Therefore, it is nonparametric and in deciding whether to apply this technique, the primary assumption made in the algorithm's set-up must be considered. The primary assumption is that the dataset needs to be decorrelated from second-order statistics. 38 This is because each characteristic vector or new vector space must be orthogonal to present clear distinction (distortion mitigation) of the principal components. In literature, a simple example of this is demonstrated by applying PCA to non-Gaussian data. 39 Hence forth, the true goal is error minimization of the reduced dataset.
There are identified processing tasks that benefit from the use of prior knowledge, such as object localization within images and bounding box positioning for regions of interest. In the scheme of the proposed method of recognition, a data-driven methodology is applied to a recognition systems level processes: feature detection, extraction, and recognition. The aim of utilizing prior knowledge in this manner is that likelihoods of object instances are discriminated along the processing path as opposed to solely the end stage of recognition. Examples of prior knowledge for the object class cat may include head position and pose; existence of features such as a second eye or ear; the distances to search for additional features in the input image; the appropriate profile to correlate detected features with.

Recognition approach
For an animal such as a cat, a discriminative body part is the head, which contains the features: ear, eye, nose, mouth. The proposed method of recognition is based on a picture that contains the object only, whereby, which facial features are available, and which are important, and the spatial relationship of those facial features, is the critical question to address. For example, the cat head in Figure 1 illustrates these key features and a crucial triangulated observation of the feature space.
The key measurement of the features is the separation distances and the specific geometries of the feature. For example, the approximate circular eye radius or ear edge length and width, and incident angle at the tip of the ear. Within images the feature distances are projective, when a cat's head turns these projections are preserved in 2D. Since scale invariance becomes a critical issue, preserving projective measurements becomes important when selecting the appropriate template profile to base the recognition decisions on. Hence, feature ratios which are dimensionless and therefore invariant to scale displacement, are used to normalize the feature data.
To clarify the purpose of the recognition routine, a detectable and identifiable process consists of geometric anatomical measurements and object part feature ratios. Second, due to the constraints in the number of cat pictures available and the computing time, the investigations in this report should be treated as a result of a demonstration of key processing steps. The approach to recognition and how prior knowledge is generated and implemented is as follows. Observation

SIGNAL PROCESSING PATH AND ISSUES
To recognize a cat using linear geometry, the first step is to detect the edges within the image using standard edge detection methods; such as first and second-order differentiators. 40 The edge image is input into the Hough transform (Appendix A), whereby, an appropriate threshold is obtained by selecting a range of the maximum peaks in the parametric Hough accumulator: rho and theta. Thresholding of the accumulator is important due to weakly detected edges in the image. Once the Hough transform is applied, and candidate feature locations are identified, obtained initial geometric measurement are in units of pixels. To compare training data measurements across changes in scale, pixel distances are required to be converted back to length (eg, mm or cm) and then normalized with respect to the feature in question. This process is complex because the feature in question will have its SD removed from the average value of the feature. Therefore, the formation of a feature vector is more appropriately determined using the ratio between critical features. Take the eye as the feature in question; the initial process will signal that the feature in question has a high probability to be an eye (based on the shape and size). The final determination must rely on the geometric relationships between the feature and the other features located subsequently. The application and the operating range required by the environment the system operates in will also dictate a range as to what scale and viewpoint it expects the object to present itself in. The overall approach is as follows; a visual device may have an operating range of 5 m with a viewing angle of ±50 • . Whereby for a defined interval of scale, rotation, and image resolution, a search range of the radius in the Hough transform can be applied. Once a candidate location has been found, closer inspection using a second Hough transform may present a smaller circular object (pupil perhaps). Hence information is accumulated that discriminates the candidate as an eye. For additional verification, eigenspace analysis 33 of the candidate eye image region(s) to a constrained database of eyes, ears, and noses can be applied. The only viable outcome is a probability that the feature is the feature of interest. However, cat eyes are elliptical and although the major axis does not change, the minor axis does. This is dependent on the closure of the cat's eye lid. Therefore, the starting point in the recognition process is an important consideration.
Since the eye is a soft object that can change shape under a transformation, the apparent stable object feature of the cat's face is the ear. Unlike dogs, cats' ears are almost always perpendicular to the boundary of the head. Therefore, the starting point sequences in the recognition process to consider are: (1) Ear, second ear, eye, second eye, nose; (2) Eye, second eye, ear, second ear, nose; and (3) Nose, eye, second eye, ear, second ear. Within these sequences there is the possibility that a second feature (eye or ear distance) may not be detectable or even exist. This is because the system cannot detect it, or it is occluded from the input. Therefore, missing features play a role in determining pose as well as decreasing the recognition methods probability of success. A metric is established from the number of features located and their geometric relationships, indicating a level of confidence that the object is a cat.

EXPERIMENTS AND ANALYSIS
In this section a front profile of a cat's head is determined based on average distances between appropriate geometric features measuring length of object parts and distance between parts. The measured features values of the front profiles training data are presented in Appendix B; obtained from the ImageNet database. 1 The generated front profile is analyzed to establish the critical features to be observed in recognition of the object. Leading on from the front profile, the analysis is developed to determine the critical features to track through rotated profiles of the cat head. The availability of suitable templates to establish rotated features from real pictures of a cat head at defined rotations is a nontrivial task. Based on the availability of this training data (see Appendix B), a single exemplar 3D model of a cat head 41 processed in AutoCAD is analyzed. The central arguments of this article report and discuss the contributions of the determined critical features to the recognition error. The method of recognition is adopted from application of the Hough transform to quantify the linear geometry of the features.

Identifying critical features
Taken from a subcategory of the ImageNet database, 1 there are 25 examples of a cat's front profile, Ω F(i) , in Figure 2A.
These are used to obtain an average profile of the cat; these profiles are depicted in Figure 2B. Whereby, the approximated feature measurements (explicitly defined in Appendix B) using Microsoft Word line measurement are: (1) eye major axis, (2) eye minor axis, (3) head width, (4) head height, (5) nose width, (6) nose height, (7) ear length, (8) ear width, (9) eye separation, (10) ear separation, (11) eye to nose separation, (12) eye to ear separation. Any nonmeasurable feature is given a zero-measurement value. It is important to note that the features chosen are preliminary at this stage. For each Ω F(i) of the training sets feature vector in Figure 2B, information about their feature's measurements reveal those with the largest distance values. Whereby, the head width and length, ear separation, and eye separation are the features with the largest linear distance. In these training images the ear length and width (feature 7 and feature 8) seem to exhibit the greatest variability of the obtained approximated feature measurements. The average cat face of Ω F in Figure 3A illustrates feature location, feature relationship distances (cm), and geometric approximations of the features; eyes are elliptical, head is elliptical, nose and ears are triangular. The average cats face appears regular because the object part shape is simplified by a regular geometric shape. The average cat face Ω F and SD (Ω F ) in Figure 3B are important because the metric potentially indicates which are the most stable features: those features with a low variance and large mean. Comparing the statistics of Figure 3B to the coefficient of variation cv = (Ω F )∕Ω F in Table 1, the results show that the feature with the lowest dispersion is the ear separation distance, and the highest dispersion being the minor axis of the ellipse eye.
The head measurements can be considered redundant measurements because the boundary of the head, even simplified to a circular shape, is difficult to define as a boundary because fur is characteristically a high frequency component. The high dispersion of the eye closing seems to resemble the variation across the training set in Figure 2A. Where some eyes are partially open and some closed. The nose has a cv that is relatively equal to the remaining features (21.53% to 29.53%). By a visual inspection of the cat's nose, it is a small lesser defined object relative to an eye and an ear. Hence, the sequences of recognition stated place the least importance to nose recognition in the recognition process.
The recognition process is applied to a cat's front profile using two of the sample images in the training set, Ω T . As each feature is detected and identified, its similarity to the average cats face feature range Ω F(i) ± (Ω F(i) ) is scored through   its relative error in the range: Based on the measured feature distances in Table 2, the relative error obtained using Equation (1) demonstrates the recognition success of the cat by comparing the relative error between the average cat feature data and the measured cat feature data. A second point to consider, is the consistence of test data measurements used to obtain an average profile, whereby the same points in each test example is approximated. Whereas, the measured distances of the location coordinates are in error to the process of recognition; these are limited to a pixel level accuracy. The absolute error between the measured values of Ω T(1) and the observed values used in generating the average profile Ω T(0) , quantitatively highlights the degree of systematic error in manually extracting the feature data in the training set for the sample used.
Since the eye's radius is approximated to be a circle, the minor axis of the ellipse approximation to the eye in the test data is used to deduce a relative measurement of this feature. The head feature measurements are omitted due to the saliency of the feature. The feature vector of the measured values compared to the average profile is depicted in Figure 4A  ear and nose and eye radius. However, the features with the lowest degree of error and lowest measurement error are the distances between object parts based on approximated start and end point locations. To ascertain confidence in the method of recognition for the front profile, the process is repeated for a second test sample Ω T(2) ( Figure 5A). Whereby the observed and recorded measurements are documented in Figure 5B, C and Table 3. The average error, error(Ω T(2) ) % , of the measured feature vector values against the average profile is 8.2%. Attempting to measure the length of an edge via a thresholding process in the signal preprocessing stage may not always provide a reliable account of the feature. This is because the distances of the object parts are relatively small, therefore any fluctuations above the measurement value will distort the value. Whereas, larger separation distances seem to present more stable features that are less susceptible to the measurement error of an edge. This is investigated in the next section to select accurate profiles to compare feature distances against. The ratios of such lengths for resolving a cats pose/view is the primary mechanism to account for nonlinear feature interactions undergoing elevated and out-of-plane rotations. F I G U R E 6 Test rotated cat head profile 41 and identified cat head features used to track rotated profiles Ω F ( , ): out-of-plane rotation , rotation range −50 • to 0 • , Δ = 10 • , and elevation angle , rotation range 50 • to −50 • , Δ = 10 •

Rotated profile features
To begin the analysis of a rotated profile, the training object in Figure 6 displays what is considered as the average cat head. This is a clear assumption that these results and analysis are based on. The cat head presents the appropriate features to track through a 2D rotation map Ω F(i) ( , ), these are: ear separation, eye separation, nose to ear separation, nose to ear separation half distance, and eye to ear cross separation. Of these features, there are (more than) two length tractable features through and rotation planes. This process is a factor of feature distortion identified in lesser measurable distances in the analysis of the front profile feature space. The cat head undergoing rotations in will always be symmetrical because the change in view is purely horizontal; except under occluded views. Under elevation angles this is not the case as the change in view occurs vertically, whereby the top of the cat head is not symmetric to the bottom view.
For Ω F ( −50 • : 0 • , 0 • ), the features used to track rotation displacement are ear separation, eye separation, and cross eye separation (left eye to right ear separation or vice-versa). These are chosen because of their length and availability to measure up to ±50 • . Whereas for Ω F ( −50 • : 50 • , 0 • ), the suitable features to track are the nose to ear separation and the nose to center ear separation distance. The features that are used for tracking combinations of 3D rotational displacement in a 2D images are highlighted in Figure 6: ear separation, eye separation, nose ear center separation, nose ear separation, and eye ear cross separation (from left to right: denoted as L2R in Figure 6). The primary purpose of this step is to identify the most appropriate profile to use, to identify the class object cat. This is seen as an iterative process until variability in the profile selection process reduces to a stable and ideally singular response.
Consistent and reliable reference points to measure distances is an issue identified in the cat's front profile. Propagating such variability into the rotated profile must be eliminated to enable isolation of correct template profiles in the recognition process. To remedy the systematic error in the location of the measurement points in the rotation profiles, the 3D cat head model is processed through AUTOCAD to fix the measurement point locations onto the object's triangular mesh.

Spatial relationship analysis of features
In this section, the features illustrated in Figure 6B (feature 1: ear separation, feature 2: eye separation, feature 3: nose ear center separation, feature 4: nose ear separation, feature 5: eye ear cross separation) are tracked through combinations of Ω F ( , ). For each of the feature ratio data plots in Figures 7-11, the quotient's denominator equal to the numerator is omitted because the quotient value is 1 for all rotations. Feature 2 becomes unmeasurable and as such its value is fixed to zero. The obtained measured values are explicitly presented in Appendix B. For features that are perpendicular to each other (ear separation/nose ear center separation, and eye separation/nose ear center separation) there are no crossover points for the inversion of the ratio dependent on Ω F ( , ). All ratios involving features 3,4, and 5 have a crossover inversion point, the ratio at this point is not symmetric. For features 4 and 5 there are two crossover inversion points, furthermore these points are not symmetric and change as increases. The disproportionality of the ratio is accounted by the vertical transitions of the features in the elevation angle of the cat's face. Features 1 and 2 do not change by large amounts vertically, therefore these ratios can be considered uniform across horizontal transitions. For the purpose of identifying an unknown feature sets view, to select a recognition profile, the similarity error between the training dataset must be minimized. To use these features, an ambiguity can exist for two identified points along one trajectory of , therefore cross checking between the inversion points is necessary. In the next section, the tracking of a cat head is demonstrated using the observations in the training set.

Object pose tracking and profile identification
The profile Ω F ( , ) is a 2D rotational map for each feature separation distance ratio. Since each feature is normalized by each feature, there are 25 × × feature ratio values to compare with an input feature ratio vector. Of size 5 × 5, the comparison is calculated using the Euclidean distance between the feature point values. To demonstrate this tracking method the cat head is rotated by first, a horizontal transition and second a combination of Ω F ( , ) in between the discretized training set.

Error calculation method
To determine the location of the cat's head, the Euclidean distances between the input feature ratios obtained during the feature measurement process and the known values obtained through the training example (see Figure 6A) are calculated. The process begins by determining the ratio between the feature measurements. As illustrated in Figure 12 each ratio is an entry into the feature space Ω F ( , , R n ), where R n is the normalized ratio of the features for n being the normalizing feature in the ratio. For the purpose of the following examples it is assumed that all features have been obtained. Each entry of the feature space Ω F ( , ) along R n is used to compare to the measured values in the recognition process. The measured values parameters in this demonstration are denoted by Ω F ( , , R n ). Whereby, the equality of ratio proportionality, known as the ratio of ratios, are the discriminant affine features to track similarity of object pose.

Object pose tracking tests
There are two input signals in the tracking test: Tables 4 and 5. One for a transition taken from the training set Ω F ( , ) for Ω F (10 • , 0 • ), and one for a transition in-between the discretized training samples Ω F (35 • , −25 • ). In this first test for Ω F (10 • , 0 • ), the input feature ratio values Ω T (10 • , 0 • , R 1 : 5 ) are exactly equal to the values in the training dataset. Hence the red square in Table 6 indicates the location of rotation in the cat's head that has the least error (exactly zero) compared to the input signal feature ratios. This demonstrates, for the ideal scenario, the computational method to track the cat head for known rotational combinations in the training set.

F I G U R E 12
In the next test, the cats head view is in-between the discretized training samples, the rotational combination is Ω F (35 • , −25 • ). In this example the location of minimum error will not be equal to zero, this part of the investigation provides an account of the error's contribution for each feature ratio combination of Ω T (35 • , −25 • , R 1 : 5 ). The ratio of errors for Ω T (35 • , −25 • , R 1 : 5 ) are presented in Appendix C and summarized in Table 7, whereby the location of Ω F ( , ) for each feature ratio combination R n are rounded to the nearest interval, Δ, in the training set.
The average ratio error Ω T (35 • , −25 • , R 1 : 5 ) is 0.0606. This figure of merit, using all features, places a value as to the amount of systematic error due to approximating a profile at exactly half the interval rotational distance of Ω F ( , ). For rotational angles in-between, for example a quarter or three-quarter distance from the training sample the errors approach zero. In the position of maximum error, each R n is decomposed into its individual feature ratios that contribute to the total error per R n . First, the feature labels are restated: feature 1: ear separation, feature 2: eye separation, feature 3: nose ear center separation, feature 4: nose ear separation, feature 5: eye ear cross separation. In Table 7 the error contribution of each feature and their percentage of the ratio error are presented for = 25 • . Observing the cat head at Ω F (35 • , −25 • ), the feature combinations in ascending order contributing least error to each ratio error of R n (ignoring feature parallelism) are: feature 4/feature 1, feature 5/feature 3, feature 4/ feature5, feature4/ feature 2 and feature 3/feature 4; To compare the percentage of ratio errors, Table 8 presents the same data but for = 25 • . The difference between the two elevation angles is that the observation of the cat's head is from the bottom, whereby features present themselves differently in the view. This is exhibited in the disproportionate symmetry in Figures 7-11.
Comparing the two elevation angles, the results of the analysis show that the average ratio error is increased for a positive elevation, and that the precedence of the error contribution from each feature ratio is dependent on the observation of the cat's head. One source of error for the increased average ratio error is the availability of the cat's facial features as compared to looking toward the top of a cat's head. A second source is that feature ratio symmetry is disproportionate at a zero-elevation angle. This symmetry also carries a small dependency on the out-of-plane head rotation (see Figures 7-11).  Hence, the average ratio errors for in-between angles may follow the disproportionate symmetry. Observing a cat's head from underneath at = 25 • , the results demonstrate a greater reliability on features perpendicular to each other. This is in opposition to diagonal distances carrying both a vertical and horizontal component of transition. Such as ear and eye separation and nose to ear half separation distance. In comparison, observing a cat's head from above at = − 25 • , the most reliable feature ratios are those with either a horizontal or vertical component and a feature carrying both transitional components. Such as, nose to ear separation and ear separation distance or two angled features containing both transitional contents.

DISCUSSION
The aim of this report has been to present a method of object recognition to acquire prior knowledge suitable for identifying nonrigid objects. The proposed method is aimed at systems with limited hardware capability and therefore many of the more sophisticated algorithms are not applicable. We compare the findings of the proposed method to the complexities of deformation analysis and nonrigid objects, and to the general draw backs and challenges of computer vision and deep learning.
Two key processes have been demonstrated: data-driven recognition and object view tracking. This is achieved through a novel algorithm using the well-established Hough transform across a line and plane. For the object class cat, its salient features are analyzed to isolate those features appropriate for the recognition method. The linear analysis is based on the measurement and tracking of separation distances between features and ratios of them. The most discriminative and stable features are those accountable over lengthier distances, such as ear separation and nose to ear separation. A second type of feature that can be relied upon are those oriented at an angle relative to the elevation or rotation of the head, so that a change in the head position will alter the length of the orthogonal component of the feature. Which feature is suitable will depend on the actual movement of the head, it is therefore essential a minimum of two features, with their directions ∼90 • with each other, should be used in the identification process. To track a 3D object through a 2D basis, the displacements between features occurs both horizontally and vertically and unequally. The complexity of feature tracking is exemplified by interactions between the feature ratio combinations of rotational displacements exhibiting nonlinear object movements. For recognition of a cat with a frontal view the recognition error, obtained using a training sample, is as small as 8.2%. The consistency of the measurement location contributed to this error, and as such introduces a bias into the error measurement. The second training example error is 17.8%. The contributions of these errors are measured using the difference between the recorded and recovered feature distances. This revealed the susceptibility of shorter feature distances to measurement point location distortion. An additional influence of this error is the small variations to the front profile approximation in the training images. For a rotation of the cat's head Ω F (35 • , ±25 • ), the average error peaks at the half distance of the rotational intervals by 0.104 and 0.0606, respectively. Considering the current capabilities of deep learning models in 3D object recognition and moving object recognition and additional requirements on training inputs, the proposed model utilizes inference and reasoning of unknown object parts to establish localization, detection, recognition, and a recognition confidence. The model is operational across viewpoints (two axis of rotation) to track deformable and nonrigid features and the model has been analyzed in terms of the input object and feature importance. In Reference 42, the authors investigate active skeletons as a binary shape descriptor for nonrigid objects and reduced training sets. They base their study on a shape-based representation of animals. For a training set size of 15, they tested 313 binary representations with ∼80% recognition success for changes in nonrigid object pose. The key object features were not analyzed and neither is a recognition confidence level discussed. The authors in References 25 and 42 also present a model to encompass the viewpoint of 3D model data but it does not attempt to address how shape-driven recognition of viewpoints has the potential to compliment machine learning with machine reasoning. We think that integration of optimized hand-crafted feature detectors to isolate (object dependent) critical features is an area where hybrid learning, and traditional computer vision models may offer the most appropriate solutions.

SUMMARY
The data-driven recognition procedure reports on a basis for decisions of the current feature (position, length, and orientation) that are obtained from the Hough transform parameters, to identify what to detect and measure next. The starting point looks for a stable feature that does not vary nonrigidly relative to other features. The analysis of the cat's facial features identifies the ear as the most stable object part and features with longer measurable distances belonging to object parts. By acquiring evidence along the processing path, the recognition method is nonparametric. The approach is data-driven and acknowledges that confidence in the probability of recognition is based on the cumulated evidence. Two of the key elements in the recognition procedure have been demonstrated. First, recognition of the cat's head from a front observation and second, tracking of the cat's head through two-axis of rotation: out-of-plane rotations and elevated views. The purpose of these investigations is to demonstrate and employ a method to select appropriate profiles and compare acquired feature distance measurements. Limitations of the investigation are as follows: the 25 examples of a cat's head in the front observation training set is not a definite size; the method is tested using two examples from the training set; acquisition of a real cats head at defined rotational intervals is a complicated task, hence the rotated profiles are based on a computer generated 3D models of a cats head; variance is required to be built into the rotated training profiles, one solution to this, other than acquiring many more 3D models, is to jitter the template feature geometries. Even so, the analysis of the recognition method revealed an 8.2% and 17.8% error recognition rate of the cat from a front observation. An estimate of the error impact at half angle intervals of the selected training profile is given and is shown to vary for positive and negative elevation angles, 0.104 and 0.0606, respectively. The source of this variance is seen to be dependent on the asymmetry of the cat's head across elevated views. These patterns, depicted in Section 3.2.1, synthesize a process of recognition for a (cat) head under motion. The recognition model determines nonrigid deformable objects independent of viewpoint through two-axis of rotation. Based on substantiate evidence to recognize and track key nonrigid object features, initial reports for the proposed method are promising. The significance of supervised learning in deep learning is evident and the use of unsupervised algorithms to alleviate the amount of labelled training data is clear. 6,7 However, methods to automatically learn good features in an integrated fashion into deep network layers is of current debate and research. Furthermore, presently there remain applications where traditional computer vision models are still necessary 43 to build upon recent progresses in deep learning object detection, such as 3D object recognition, moving object detection, and scene understanding. Based on the current capabilities of deep learning object recognition, we suggest, through further experimentation, that the data-driven inference routine can be represented as a series of processing layers as part of a standard deep learning model. This can reduce the search space of the input image, remove nonadditive noise and concentrate the higher levels of learned abstraction to new or known object classes and recognize complex object viewpoints through fewer training examples than what would be typically encountered in a purely supervised model. Further research is needed to validate the training sets, particularly the rotated profiles. Future research could explore the identification and tracking of these feature patterns through subspace decompositions.

APPENDIX A -HOUGH TRANSFORM
In this report the Hough Transform has been applied to extract geometric features and quantify the structural shape of object parts. A line to plane parameterization is as follows. Consider two colinear points with coordinates p(x, y).