3D surface texture analysis of high‐resolution normal fields for facial skin condition assessment

Abstract Background This paper investigates the use of a light stage to capture high‐resolution, 3D facial surface textures and proposes novel methods to use the data for skin condition assessment. Materials and Methods We introduce new methods for analysing 3D surface texture using high‐resolution normal fields and apply these to the detection and assessment of skin conditions in human faces, specifically wrinkles, pores and acne. The use of high‐resolution normal maps as input to our texture measures enables us to investigate the 3D nature of texture, while retaining aspects of some well‐known 2D texture measures. The main contributions are as follows: the introduction of three novel methods for extracting texture descriptors from high‐resolution surface orientation fields; a comparative study of 2D and 3D skin texture analysis techniques; and an extensive data set of high‐resolution 3D facial scans presenting various skin conditions, with human ratings as “ground truth.” Results Our results demonstrate an improvement on state‐of‐the‐art methods for the analysis of pores and comparable results to the state of the art for wrinkles and acne using a considerably more compact model. Conclusions The use of high‐resolution normal maps, captured by a light stage, and the methods described, represent an important new set of tools in the analysis of skin texture.

reflectance properties. The collected data are photo-realistically rendered and presented to the general public for annotations indicating the presence of the studied skin conditions. These constitute the ground truth upon which the proposed methods are applied in order to learn models for detecting and assessing facial skin conditions. We compare our three methods on this new data set, including BTF Texton results as a gold-standard method, and classical 2D-texture measures (with 3D enhancements) as a baseline method.

| 2D texture analysis
Texture characterisation is key to a number of visual computing-related applications such as object recognition, content-based image retrieval and computer graphics. A number of efficient and powerful 2D texture analysis methods have been proposed in the literature.
These methods can be divided into three categories: • Statistical methods which assume that the texture is fully determined by the spatial distribution of pixel values in the image.

Examples of statistical methods include the use of the Grey
Level Co-occurrence Matrix, 1 the Autocorrelation function, the Symmetric Auto-correlation function (SAC) and its extensions (SRAC and SCOV) 2 and the well-known Local Binary Patterns (LBPs). [3][4][5] • Structural methods that consider texture as a structured layout of texture primitives also called texture elements. Such methods divide into geometrical and topological approaches. In geometrical approaches, coarse geometrical properties such as perimeter and compactness are used to characterise texture primitives. 6 Topological approaches use various filtering methods to extract primitives such as lines, edges and blobs. The texture descriptor is then made of different properties of these extracted primitives, namely number, orientation and density. 7,8 • Model-based methods in which the texture is represented with either a probabilistic model or a projective decomposition along a set of basis functions. These representations require the determination of a certain number of parameters or coefficients to characterise the texture. The Markov model-based methods constitute an important subset of these methods. Hidden Markov Models (HMMs) have been extensively used to characterise texture. 9,10 Cohen et al used a Gaussian Markov Random Field (GMRF) to model rotated and scaled texture. 11 Methods using sub-band decomposition techniques include the wavelet transform, 12,13 the steerable pyramid 14 and the Gabor Bank of filters. 13,15,16 The approach chosen generally depends upon the aspect of texture one wishes to capture. All 2D methods make the implicit assumption that apparent texture is independent of illumination and viewpoint.
While this assumption can be approximated when studying smooth surfaces, the apparent texture of surfaces involving rough relief is more obviously illumination-and viewpoint-dependent.

| 3D surface texture analysis
The appearance of a natural surface is not only determined by intrinsic reflectance properties (colour or albedo), but is also considerably affected by the interaction between geometrical structure, light and viewpoint. Various methods have been proposed to capture aspects of this variability. In the rest of this paper, we will refer to these types of texture methods, responsive to illumination/view changes, as 3D Surface Texture. These can be categorised into three families: 3D Texton-based methods, Bidirectional Texture-based methods and Geometrical methods.
• 3D Texton-based methods: The notion of a 3D Texton was introduced by Leung and Malik 17 and has been widely used and extended to represent natural surfaces' visual appearance. The main idea is to simultaneously encode the two attributes that most affect how a surface is visually perceived; these are the surface normals and reflectance properties. To characterise a given surface's texture, the approach exploits filter responses on several images of the same surface taken in different imaging conditions (illumination and viewpoint). In addition, these filter responses are quantised into a reduced set of texture prototypes. This results in a dictionary of tiny texture patch representations called 3D textons that cover all possible local surface configurations.
• Bidirectional Texture-based methods: In contrast to the 3D textonbased methods, the Bidirectional Texture Function (BTF) operates at a higher level of abstraction representing surface properties that affect the apparent texture. This makes them useful for analysing as well as for synthesising natural texture (when used for analysis, they are generally combined with a texton-based quantisation layer). The notion of a BTF was first introduced by Dana et al 18 and has been called the most advanced and accurate representation of natural surfaces visual properties to date. 19 The BTF models a surface's texture as a function of illumination and viewpoint. It is a seven-dimensional function and represents texture as a function of the spectral band, the planar position, the view and light directions: where r x and r y are the horizontal and vertical positions, respectively, is the spectral band, i and i are the elevation and azimuthal angles of the light direction, respectively, and v and v the elevation and azimuthal angles of the viewing direction, respectively. BTF measurement generally involves a complex capture set-up in which automated devices coordinate changes in either the lighting conditions or the camera viewpoint or, in some systems, both. 18,[20][21][22] Although BTF is extensively used in Computer Graphics, generally for photo-realistic texture synthesis and rendering purposes, it is also used to create and evaluate texture features that are robust to imaging conditions. Dana et al analysed skin texture using a BTF made of more than 3500 images to discriminate between skin disorders such as acne and psoriasis. 23 Suen and Healey introduced the notion of dimensionality surface as a measure of appearance variability due to the effects of viewpoint and illumination changes on fine surface geometry. 24 From the CURet Bidirectional Texture database, 18 they applied a set of multi-band correlation functions R ij (m,n) on each image of each material sample (i and j being spectral bands and m;n an image region).
Caputo et al introduced the KTH-TIPS2 material database (11 materials each with four different imaging conditions) and used it to test the robustness of various state-of-the-art texture descriptors to pose and illumination change. 25 They experimented with including various numbers of pose and illumination conditions in their training set, and testing with samples from unseen pose/illumination conditions. One of their findings was that the more sample groups they add to the training set the better the classification method performs. More recent studies include the work of Liu et al in which they propose learning discriminative models for determining optimal texture filters for given illumination conditions. 26 The authors collected a BTF database using a dome of controllable LEDs and a fixed camera. The acquired database consists of 90 material samples captured under 6 spectral bands and 25 lighting directions.
Geometric methods: The methods presented in the two preceding sections are image-based as the intrinsic geometry of the material's surface is not known. The considerable number of image samples needed by these methods in order to capture the threedimensional properties of the studied surfaces makes their use demanding in storage capacity. Some recent works have looked at characterising 3D texture directly from measured fine geometry, providing a more compact representation of the intrinsic three-dimensional properties. Smith et al propose computing a co-occurrence matrix from the orientation of measured surface normals. 27 Their method involves quantising the normals' orientation into a discrete space. For each normal, the slant and tilt angles are discretised in three equal intervals. This result in 9 levels upon which the co-occurrence matrix is constructed. Sandbach et al extracted Local Binary Pattern features from two different 2D representations of 3D geometrical data to classify 3D facial action units. 28 The two representations are a simple depth map and the Azimuthal Projection Distance Image. This latter representation encodes the 3D surface orientation in a 2D greyscale image, by projecting each surface normal onto the tangent plane and taking the L norm of the projected point as a grey level.

| 3D skin micro-structure imaging
There are a family of techniques which concentrate not on general 3D surface texture, but on the specific problem of human skin micro-structure, motivated by medical (dermatological) applications and the increasing demand for photo-realistic solutions from the game and film industry. Cula et al used a bidirectional imaging system to capture the micro-structure of skin regions affected by diverse dermatological disorders (psoriasis, acne, contact dermatitis etc) 23 and released these 3500 images as the Rutgers Skin Texture Database. They used two different mechanical set-ups that allowed them to capture skin regions in various viewpoints and light directions. Hong and Lee 29 used a mobile phone and a mirror system to capture and analyse acne in 3D. Zhou et al 30 captured 3D data of skin surfaces using a photometric stereo device and analysed them using differential geometry features and a linear classifier to classify malignant melanomas and benign lesions.
Ma et al use a light stage to capture three-dimensional facial skin structure down to the level of the pores. 31 They combined this with a polarised light technique to separate the diffuse and specular surface properties. The resulting data are in the form of normal maps.
They have shown that specular normal maps capture most of the surface detail while the diffuse maps are more subject to subsurface scattering. These polarisation and wavelength-dependent measurements constitute very useful data for understanding how the human skin interacts with light as well modelling its micro-structure.
Many improvements and applications have been added to the capture system since. Graham et al proposed a measurement-based synthesis of facial microgeometry. 32 The authors measure the microstructure of skin patches using a twelve-light hemisphere able to emit cross-polarised light. The acquired skin micro-structure images are processed to extract displacement maps. Another skin reflectance measurement using a light stage is conducted by Weyrich et al. 33 They augment their data with an extra skin subsurface scan using a fibre optic spectrometer which is a device allowing measurements of subsurface properties such as haemoglobin or glucose concen- PRIMO (http://www.gfm3d.com/) is a commercial solution for 3D skin measurements used in some automated skin disruption detection studies such as Choi et al. 35 It is a hand-held optical-based system using structured light and a high-resolution sensor allowing measurements of skin micro-topography and roughness with a field of view of 45 × 30 × 30mm. The Anterra 3D (http://mirav ex.com/ antera-3d/) is another hand-held commercial system for 3D skin imaging and measurement. Messaraa et al 36 compared skin health measurements such as roughness and wrinkle length/depth from Anterra 3D with a 2D imaging and image analysis (using DermaTOP and image analysis on parallel-polarised images). The results showed good correlation between the 3D and 2D measurements, and the ability to detect changes due to application of a cosmetic product.

| Literature review summary
In the previous sections, the state of the art in 2D/3D texture analysis and human skin micro-structure imaging techniques were introduced. It is clear that advances have been made in face imaging technology as it is now possible to capture the skin's three-dimensional micro-structure down to the level of pores. However, it seems that these newly available possibilities for data capture are not fully exploited on the analysis side, as most of the studies presented above use either two-dimensional image-based texture features or rather coarse three-dimensional surface properties. One of the few studies that exploited the skin three-dimensional micro-structure used a BTF representation 18 which takes into account changes in illumination and viewpoint, but is still an image-based representation as the underlying surface geometry is not known.

| 3D ME A SURE S FOR S K IN TE X TURE CHAR AC TERISATION
In this paper, we introduce here three novel 3D surface texture analysis methods: the rotation fields pyramid; Local Orientation Patterns; and Multi-scale Azimuthal Projection distance. These take full advantage of the recent advances made in photometric stereo imaging techniques. In contrast to image-based methods, these operate directly on the skin geometrical fine structure captured in the form of surface normal fields captured using a light stage. We compare our novel 3D methods with both classic 2D texture descriptors and simplistic 3D extensions of these.

| Extensions of existing 2D descriptors to 3D
Before introducing our three proposed 3D texture descriptors, we describe here how a number of standard 2D feature extraction methods can be extended to 3D analysis, in order to provide a set of comparable baseline methods. We experiment with two widely used 2D texture descriptors, namely the Gabor filter bank 16,37 and rotation invariant LBPs. 2 Although the normal map estimated by the light stage can be represented in a 3-channel image, with the RGB channels being used to store the normal's x, y and z components, operating on them with filters etc does not correctly account for the non-linear manifold on which the normals lie. Instead of calculating the texture measures introduced above directly on the normal maps, we propose deriving these from either the slant-tilt space or the tangent space.

| Slant-tilt space
The normal's slant and tilt are extracted at each position ( Figure 1). This results in a map which contains two values corresponding to the normal's elevation and azimuth at each position. We keep the tangent values so the slant-tilt map is normalised in −1,1 . Considering n = n x ,n y ,n z denoting a normal, the slant and tilt tangent values are obtained with:

| Tangent space
In this approach, the normals are considered as elements of a Riemannian manifold and these are unfolded about the local means using a logarithmic mapping ( Figure 1). This results in a tangent map whose elements are 2-dimensional coordinates and are obtained with: where = 2 − . 0 and 0 are the spherical coordinates of the local normal mean . At each neighbourhood, the local normal mean is the one that minimises the mean of the geodesic distances to all the other normals in the same neighbourhood.

| 3D surface texture characterisation
We adopt a multi-scale scheme where at each level, the texture filter (either Gabor filters 16,37 or rotation invariant LBPs 2 ) is applied on either the slant-tilt map or the tangent map. This results in two responses, one for each channel. The responses are normalised to the interval 0,1 . Assuming R c,l denotes the response on the channel c at the level l , the normalisation is performed with: The histograms of the two normalised responses are computed and concatenated to form the texture descriptor at level l . The same process is repeated at the subsequent level with a down-sampled version of the current normal map. As previously mentioned, a convolution should not be done directly on the normals (because they do not occupy a linear space), so the down-sampling is done in the tangent plane with a Gaussian low pass, followed by projecting the result back into the original 3-dimensional space using the manifold exponential chart.

| Feature extraction and classification
For each sample, we build a 3-level multi-scale feature pyramid. The

| Proposed Method I: Rotation fields pyramid
The first proposed new approach is based on multi-resolution rotation fields. Rotation Fields are a very good means of capturing high frequency information from surface orientation. Nehab et al employed these to correct the three-dimensional position of 3D mesh vertices with accurate high frequency data from normal maps captured with photometric stereo. 38 Frequency separation has been extensively used in the literature to represent two-dimensional texture. 39,40 This generally involves a pyramidal multi-resolution representation, which allows the capture of texture information at different scales. At each level of the pyramid, the low frequency information is separated from the high frequency; the former is related to global shape, and the latter can be a good representation of local texture. We propose a multi-resolution analysis scheme, where at each level of the pyramid the low frequency information in the normal map is separated from the high frequency in the form of rotation fields.

| Rotation fields
Let N denote a normal map and N i,j , the normal vector at the pixel p i,j . A smoothed version N s of N is found by computing at each pixel either a weighted geodesic or Euclidean mean over a neighbourhood with a radius r. A post-normalisation of the resulting normal is required in the case of the Euclidean mean. The weights w i,j are determined by a Gaussian with a same radius r as the neighbourhood. The geodesic mean is defined as: With d N i,j ,N ′ the geodesic distance between N i,j and N ′ .
Pennec 41 show that this can be recursively approximated by: Introducing the Gaussian weights w i,j , gives: where Exp t and log t are the exponential and logarithm map about the geodesic mean t . The rotation field R is obtained by computing the rotation to apply to the original normals to match the smoothed ones at each pixel. An axis-angle representation ⃗ e, can be adopted to characterise each rotation with four parameters   where the same process is repeated.

| Rotation fields pyramid
In the two-dimensional case, most of the studies that use a pyramidal representation extract the high frequency information in several sub-bands. The main motivation for this is to capture different spatial configurations and orientations of the texture.
For example, Heeger and Bergen 39 employed steerable filters to capture anisotropic texture with the presence of elongated or oriented structures. However, in contrast to individual pixels in a 2D image, each surface orientation in the normal map encodes information about the surface gradient within its immediate neighbourhood. So, at each level of the pyramid, we use three sub-bands that correspond to the three components of the rotation vector, respectively. Figure 3 shows a 3-level rotation field pyramid of a wrinkly normal map patch.

| Riemannian distance on the rotations group SO 3
After having represented the three-dimensional surface texture as an n-level pyramid of rotation fields, a metric is needed in the rotation space in order to analyse their spatial distribution. This problem has been well studied by Pennec 42 Rotations can be represented not only by axis-angle, but also by 3 × 3 orthogonal matrices which form the Rotation Group SO 3 and constitute a smooth manifold. 42 This means that the set of rotation matrices is differentiable and support a Riemannian metric allowing to compute distances between rotations. If  1 and  2 are two rotation matrices and R 1 and R 2 , respectively, the corresponding axis-angle representations (the conversion can be easily done with the Rodriguez formula), the Riemannian distance between  1 and  2 is given by 42 : Although the composition of rotations can be calculated by the dot product of the two matrices 42 showed that it is more advantageous to use unit quaternions as an intermediate step because the result is easier to differentiate. The idea is to convert the axis-angle representation of the rotations to unit quaternions, multiply these and convert back into axis-angle representation. Let R be an axis-angle rotation (axis denoted by R ⃗ e and angle by R and its corresponding unit quaternion Q represented by its scalar s and vectorial v parts, the conversions are given by: And for two unit quaternions Q 1 s 1 ,v 1 and Q 2 s 2 ,v 2 , the noncommutative multiplication is given by: Equations 10, 11 and 12 give: Replacing s and v from equation 11 in equation 13 yields: (10) d: and R:

| Proposed new method II: Local orientation patterns
The second approach we propose for analysing 3D surface texture The value of the Texture Unit associated to p 0 is determined from the 8 surrounding patterns by: The patterns (f i ) 1≤i≤N can be defined with any discrete two-dimensional function that has only n possible values in ℤ+.
A Texture Unit is associated with each pixel contained in the image, and the Texture Spectrum is defined as the distribution of Texture Units over the whole image. This is represented by a histogram counting the frequency of each possible Texture Unit value over the image.
The main task here is to find good pattern functions that can represent the normals' orientation distribution over a Texture Unit. We propose two pattern functions for representing the normals' orientation distribution. The first function computes the dot product of two normals and compares the result with a threshold. The second function compares the azimuthal and polar angles of the normals directly.

| 1st pattern function
The first pattern function we propose evaluates the dot product between the central normal and one of the surrounding normals, and compares the result to a threshold. Formally, it is given by (with a threshold ): With this pattern function, the number of bins needed for the histogram is given by 2 N as in Local Binary Patterns. As the normals are normalised in −1,1 , the dot product depends only on the angle between the two normals. However, the problem here is to find a good threshold. It is clear that a good threshold depends on the local orientation distributions in the normal map; a good threshold for a dense and/or more or less uniform normal map may not be suitable for a sparser normal map. The threshold choice also depends on the application; for the same normal map, we may use different thresholds depending on whether we want to capture high or low frequency variations (although this would need to be combined with an adequate radius setting).  Figure 4 shows the Local Orientation Pattern Images of three skin patches using the first pattern function with a radius of 1, 2 and 4.

| 2nd pattern function
In the second proposed pattern function, the azimuthal and polar

| Feature extraction and classification
A glance at the LOP (Local Orientation Patterns) images in Figure 4 and Figure 5 gives a first idea of the behaviour difference between the two proposed pattern functions. The second pattern function tends to produce LOP images with higher frequency. This is probably due to the level of detail generated by using four patterns instead of just two. The important point here is that when using the second pattern function for capturing low frequency properties of a surface, a certain amount of noise, depending on how fine the surface structure is, can be detected. In our applications, we think that it is more appropriate to use this second function for high frequency skin properties such as pores and some lines and wrinkles, while the first function is more appropriate for capturing lower frequency conditions such as acne.

| Proposed Method III: Multi-scale Azimuthal projection distance
The third novel method we propose is an extension of the Azimuthal Projection Distance Image (APDI) introduced by Sandbach et al as a 3D surface descriptor for facial Action Unit detection. 28 In their (18) work, the authors used the APDI for coarse scale and extracted facial macro-structure. However, while these facial macro-structures are adequate for discriminating Action Units, they do not hold enough surface fine-scale detail to accurately characterise the skin conditions we are interested in (wrinkles, large pore and acne). We thus extend the APDI with three main additions: • We work with local surface normal means instead of a fixed surface mean as reference for the azimuthal projection.
• We have modified the APDI formula to take into account the surface normal azimuthal orientation, which is not considered in the original formulation.
• We have introduced a multi-resolution analysis scheme in order to capture different scales of skin deformations.
In the original formulation, 28  Each pixel value of the APDI is given by the L 2 norm of x i,j ,y i,j :

| Modified APDI
As stated above, in the original formulation, the authors set a constant surface normal mean 0,0,1 over the whole face, thus projecting about a constant vector across the face. A direct consequence of (21)  This is illustrated in Figure 7, where the mean surface normal is assumed to be aligned to the z-axis. It is easy to see that the distance r from the centre of projection, which corresponds to the original formula, stays constant for all normals with the same polar angle even though the azimuthal angle varies. This is overcome by changing Equation 23 to: This corresponds to the arc c in the projection plane going from the x-axis to the projected point and varies with as well as . Figure 8 shows the difference between using the L 2 norm (distance from the centre of projection) or the arc from the X-axis. In the first case (L 2 norm), the APDI appears less contrasted in comparison with the second case (arc) which presents more disparity and hence will be more discriminative as shown in the classification results in

| Multi-resolution scheme
We employ a multi-scale APDI scheme for analysing the 3D skin By the definition of the exponential mapping, the result will always be a unit vector. Our scaling algorithm is based on Equation 25.
As we are only interested in down-sampling, we present an overview of the down-sampling algorithm below.

| Algorithm 1: Normal map downsampling algorithm
The full implementation includes border checking and index checking which has been omitted here for brevity. We have To characterise the 3D skin texture, we build a multi-resolution pyramid of APDIs by down-sampling the normal map to different levels. At each level, the APDI is re-computed from the corresponding down-sampled normal map. The high levels contain higher frequency details adequate for texture analysis. The lower levels lose high frequency detail, but the low frequency changes related to the overall shape are highlighted. Figure 9 shows examples of image output of the modified multi-resolution APDI for 3 skin patches with presence of wrinkles, large pores and acne, respectively. It is interesting to notice how, at different scales, the level of high frequency information that is captured changes. For example, considering the patch with acne, one can see that on the first level, only the fine skin structure is captured. It is clear that stopping the texture extraction at that level would capture only partial information about the skin disruption and would certainly miss the big skin spots. These are captured better by the subsequent levels as shown Figure 9.

| Feature extraction and classification
To extract features from a given normal map patch, the multi-resolution APDI pyramid is built. Then, a grey level histogram is computed at each level of the pyramid and concatenated together. This produces a relatively big feature vector depending on the number of levels and the histogram resolution (eg number of bins). For example, an 128bin histogram with a 4-level pyramid will produce a feature vector of length 512. This can be reduced using feature selection techniques.

| Data set
The algorithms described in this paper are intended to work with data acquired in a light stage. A light stage is a 3D surface acquisition device first proposed by Debevec et al 31 which is to date the most advanced set-up for capturing surfaces' fine structure.
Existing 3D face data sets that use photometric stereo include the Photoface database 44 and the 3D Relightable Facial Expression (ICT-3DRFE) database. 45 While the first was captured with lowcost cameras, the latter is captured using a light stage. Despite providing highly detailed 3D data, the ICT-3DRFE database is not suitable for this work as the age range and skin types covered by the data set is limited.
To cover a wider age range and skin type, we have collected a new data set using our own light stage. The capture and processing of the acquired data are detailed here. 46 Briefly, the data set comprises facial captures from 50 subjects ranging in age and skin con-

| Region segmentation
Each face was segmented into 14 regions using a 3D template (set of landmarks) manually adjusted on the face ( Figure 10). As all processing (analysis or synthesis) is done on the measured normal maps, this segmentation is projected on the 2D texture space of each of the 3 photometric poses using the corresponding camera parameters.

| Data annotation
For data annotation, an experiment was conducted in which human participants were presented with skin patches from different regions of the face and asked to rate them on a scale of 1 to 5 according to the presence and visibility of wrinkles, acne and pores. We considered three regions of interest: cheek, forehead and eye corner, as these are the regions in which the skin conditions we are interested in occur most. All faces were segmented using the generic template shown in Figure 10. A photo-realistic animation was rendered for each patch showing it at different angles with a fixed point light. The photo-realism of this animation was critical to the rating process as the apparent texture of the skin is strongly affected by the lighting and viewing conditions. Figure 12 shows two skin patches rendered with two different viewing angles and the difference in apparent texture is clearly evident.
Our rating platform was set as a web application ( Figure 13). The pre-rendered animation of each skin patch was played to the participant at least once before any rating could be entered. The participant has the option to re-run the animation as many times as they wish and to change the viewing angle manually using a slider control.
To reduce potential ordering bias, the sequence allocated to each participant is randomised.
We assume that most of the skin conditions we are interested in are more or less symmetrical across the face (ie if a subject presents acne or large pores on the left cheek, it is likely that the same condition will be found on the right cheek). Thus, for each subject, instead of presenting both the left and right cheeks or eye corners to the raters, one side is picked randomly.
To reduce the rating time and minimise the risk of having participants withdraw before finishing a session, the patches were categorised in blocks according to their location on the face. Thus, we had three blocks (cheek, eyelid and forehead) of 50 patches each.
A participant chose a block to start with, with the option of rating a second or third block upon completion.
Judgement of facial skin texture is a rather subjective task.
The way people perceive and quantify the skin conditions that we are interested in will certainly be affected by many factors related to their own personal experience. Therefore, for our data set to be reliable, it was necessary to get it rated by many individuals. This also allowed analysis of the correlations between how different people perceive these skin conditions. A total of 25 participants rated the data set, with almost all of them having rated at least two blocks.

| Inter-rater agreement
As the data were rated by 25 participants, each sample has a set of ratings given by different individuals. Therefore, we can measure the data set's consistency by investigating agreement between ratings The low correlation measures on the raw data suggest some disagreement between raters. This can be due to differences in judgement, raters not understanding the instructions, or raters not providing genuine ratings. To achieve higher inter-rater agreement, we experimented with excluding those participants who correlate the least with the rest. Participants are excluded successively by ascending order of correlation to the rest, starting with the one with the weakest correlation value. However, excluding too many participants would result in decrease of confidence even though the apparent correlation obviously increases. Hence, the exclusion policy we used was as follow: we keep the maximum number of raters that achieves a correlation greater than or equal to 0.5.

| RE SULTS SUMMARY
We summarise here the classification results yielded with the 3D texture descriptors proposed in this paper. We also compare these against the performances of a BTF texton-based method which, to date, is one of the most advanced ways used to represent illumina- In this work, we use the Weka implementation of the multi-layer perceptron for training and classification, and we use a 10-fold crossvalidation approach. This choice has been motivated by preliminary investigations with other classifiers including Random Forests and Support Vector Machine that both yielded poorer results. The number of network layers is set to Weka's default which is the mean of the number of classes and the number of attributes. The output of the classifier is a discrete rating of the presence or absence of the considered skin condition and, as defined in the ground truth, is a discrete number between "1" (meaning very low) and "5" (meaning very high). The results presented in Table 2 show the performances of each descriptor in terms of the F-measure, which represents the harmonic mean of the precision and recall. Further analysis of Table 2 shows a clear improvement of the mod-

| CON CLUS IONS
In this paper, we have explored three new methods of characterising the 3D nature of surface texture and have applied these to facial skin texture analysis. In contrast to image-based methods, which use BTF data, the surface texture descriptors proposed in this paper operate directly on the captured surface microgeometry in the form of dense surface normals. The performances of these are evaluated on classifying common skin conditions (wrinkles, large pores, acne) and compared against state-of-the-art methods represented by a BTF Texton-based approach. We have also com- Convolutional Neural Network requires a much more extensive data set than our limited set of facial region captures, hence, the relevance of hand crafting our convolutional layer and passing the results on to a multi-layer perceptron. However, extending our data set and trying to learn a set of meaningful convolutional nodes for 3D surface texture analysis and synthesis remains a very good candidate for future work.

This work was sponsored by Unilever Research and by an
Aberystwyth University PhD Scholarship. This work was completed while A. Seck was a PhD student at Aberystwyth University and is not affiliated with his current employer (Arm ltd).