Plant leaf classiﬁcation and retrieval using multi-scale shape descriptor

Plant leaf classiﬁcation is a signiﬁcant and challenging research problem in computer vision area. In this study, an original multi-scale shape descriptor is presented to perform leaf classiﬁcation and retrieval. Firstly, a novel iterative rule is proposed as scales generation method, which is parameter free. Secondly, leaf contour points are represented by angle information which is calculated using their neighbour points under each scale. The angle information representation is invariant to image rotation, translation and scaling. More importantly, it can describe leaf in a hierarchical way by capturing leaf features from global to local variations. Then Fast Fourier Transform operation is applied to make the representation more compact and independent from starting point. Subsequently, for leaf retrieval the dissimilarity of each pair of leaf under each scale is computed using city block metric. And support vector machine is used as classiﬁer for leaf classiﬁcation. Finally, experiments and comparisons with multiple state-of-the-art approaches are performed. The classiﬁcation accuracy was 96.85% and 93.56% respectively on Swedish and Flavia leaf datasets. The mean average precision score was 66.42%, 76.69% and 44.14% respectively on Flavia, Swedish and MEW2012 leaf datasets. The results demonstrate that the proposed method has excellent performance.


INTRODUCTION
Plant species classification is an important topic in agricultural information area and attracts much attention from computer vision researchers recently [1]. The aim of plant species classification is to assign a test sample plant to a species based on the morphological characters, including root, stem, leaf, flower and fruit [2]. Plant leaves have valuable discriminatory information for plant classification and are easy to be captured by digital imaging equipment. Hence leaf becomes the most commonly used morphological character when automatic plant classification and retrieval tasks are implemented using computer vision techniques [3]. However, automatic leaf classification is a challenging task [4,5], because leaves from one same plant species may have different size, colour, venation and/or shape characters [6]. Besides, leaves from different species may have very similar morphological characters. General leaf classification procedures involve several crucial components, including pre-processing This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. © 2021 The Authors. The Journal of Engineering published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology of leaf image, feature description, feature matching, and classifier designing. So extracting efficient leaf features and designing effective classifiers become the key issues for automatic leaf classification task.
In recent years, many new leaf shape, colour and venation features are designed and extracted in a number of works for plant leaf classification and image retrieval [7,8]. Some works place emphasis only on colour, texture or vein features extraction of leaves, while leaf shape are utilized in more studies for plant classification and image retrieval. Since the colour and venation of leaves may change with time and different environmental conditions, the shape of leaves are often used for plant species classification due to its rich morphological variations. Both existing and several new developed methods are used to make full use of shape characters' discriminative ability, including region-based and contour-based methods [9,10]. As one of the typical region-based descriptors, a combination of new and well-known geometric features are used for leaf classification task in [11]. And contour-based methods are more popular, because contour is the most obvious feature for common people to identify plant species. Typical representatives of such approaches include Inner-distance shape context (IDSC) [12], the multi-scale R-angle (MSRA) [13] and multi-scale-arch-height leaf description (MARCH) [10]. Recently, Yang proposed the multiscale triangle descriptor for plant leaf recognition [14].
There are also some works where generic image descriptors such as SIFT [15], GIST [16] and HOG [17] are utilized for leaf description and classification. And fused description methods based on multi-feature are studied for leaf descriptions [18]. A very recent survey on of feature extraction techniques for plant leaf recognition is shown in [19]. Recently, several works use deep learning approaches for both leaf description and classification. In addition, learning based approaches, such as K-NN and support vector machine (SVM) are also studied for classifying plant leaves. For detailed survey of recent plant classification progress using learning based approaches, please refer to the reference [1,20].
In this study, a multi-scale shape descriptor is proposed to perform plant leaf image classification and retrieval tasks. This study is motivated by works in references [10,13,21], where the multi-scale scale parameters are determined manually. Theoretically, relations between angle descriptors under different scales is useful information for multi-scale fusion. Concerning multi-scale shape description of plant leaves, the used angle features under fine scales can describe the local information which changes rapidly, and under coarse scales can capture the global information which has a small variation. So more fine scales are needed and their intervals should be small. And less coarse scales are appropriate to avoid redundant description. The proposed descriptor uses a novel scale generation rule to derive multi-scale angle descriptor. The scale generation rule is free of parameters compared with other similar method such as MARCH. Leaf image classification and retrieval performances are evaluated with standard evaluation metrics on four challenging and freely available leaf datasets including Flavia leaf, the Swedish leaf, the MEW2012 leaf and the ImageCLEF 2012 dataset. And the comparisons with multiple state-of-theart approaches including deep learning methods are performed.
The rest of this paper is structured as follows. Section 2 introduces related multi-scale approaches and discusses the key issue of these approaches. In Section 3, the detailed construction of the proposed method are presented. And Section 4 introduces the experiments and comparisons with multiple remarkable approaches. The last section gives conclusions of this paper.

RELATED WORKS
During the past decades, substantial effort has been made in plant leaf classification and retrieval using leaf shape descriptors.
In reference [22], a combination of leaf texture features and shape features, including the well-known Fourier descriptor, Hu moment invariants, GLCM and local binary pattern (LBP) method, Gabor feature and Grey level co-occurrence matrix are considered as leaf features for classification. Fourier descriptors were also used together with a set of leaf features including morphological features, shape-defining feature in reference [23] for leaf classification. In reference [12], a variation of well-known IDSC has also been used to calculate global and local information of leaf shape independently. Then a pattern counting strategy was advocated for leaf matching. The rich multi-scale convexity concavity (MCC) is proposed to represent convexities and concavities of contour at multi-scale levels in reference [24].
Recently, some new methods using multi-scale framework are proposed for plant leaf classification. Reference [25] presents optimized multi-scale bending energy to find parameters that best fit the descriptor for leaf classification. In reference [26], the multi-scale distance matrix (MDM) and its variations are proposed to capture multi-scale geometric prosperities of leaves, and their experiments show that MDM achieves better classification performances than IDSC. The multi-scale arch height (MARCH) was designed to describe leaf shape hierarchically using arch height features, and leaf retrieval application was developed for mobile application in reference [10]. The MARCH extracted hierarchical arch height features for each leaf contour at K-scales, and achieved better classification and retrieval performances on four public available datasets compared with several remarkable methods. However, how to set the scale parameter value of MARCH method in a reasonable way is a worthy topic. In reference [13], the multi-scale R-angle (MSRA) was proposed with an optimized multi-scale generation method to describe leaf contour curvature using angle features derived from contour points. And this method outperformed numerous notable description methods, such as IDSC, MARCH, triangle-area representation (TAR) [27] and MDM [26]. However, the optimized multi-scale generation method uses multi-scale circles with sampled points as centre point and R-based radius to set the intersections of the shape contour, where R is the largest scale. So R-angle do not make the best of angle to capture leaf detail features, because radius is global description. In reference [28], a unified method is proposed to describe leaf multi-scale geometric information, where a threestep technique is used for multi-scale generation. And experiments show excellent results on several datasets. Angle features are also used to derive angular pattern and its binary version, and both of them in multi-scale framework are proposed in reference [21]. However, in binary angular pattern both line segment and concave strands will be coded with 0, and line segment cannot be distinguished from concave strands. Hence, this method suffers from missing important discrimination information.
The aforementioned analysis and existing research results show that angle feature derived from contour points has good properties and is easy to extend to multi-scale description [13,21,28,29]. More specifically, angle feature is inherently rotation, translation, and scaling (RTS) invariant, and can capture hierarchical features from local contour variations to global information well in multi-scale framework. Hence angle feature is a promising feature and has been used in several shape retrieval and leaf classification tasks. However, it is a key issue to determine the scale parameters of angle features. Because there is lack of theoretical support and no guarantee for better classification performances.
In this study, we propose a novel multiple scales generation rule to derive hierarchical angle descriptor to classify and retrieve leaf images. In comparison with previous related works, our work makes further contributions. Firstly, a novel multiple scales generation rule is proposed to capture hierarchical feature of leaf shape. The proposed scales generation rule utilizes quartering and bisection of leaf contour to form scale parameters. Secondly, fast Fourier transform is utilized to make the feature more compact for effective leaf classification and retrieval. The feature dimension of the proposed method is 42, and is much less than that of TAR, MCC or IDSC. Substantial experiments are tested on several freely-available leaf datasets using standard evaluation metrics to show effectiveness of the presented work.

THE MULTI-SCALE SHAPE DESCRIPTOR
The multi-scale shape descriptor derived from leaf contour is introduced in detail as follows. Each leaf image is converted firstly to be binary image. Leaf contours are extracted using algorithm in [29] and sampled with equal space into N points. Then each contour is represented as a set of N ordered (clockwise) points, that is

Proposed scale-generation rule
To derive angle feature for each contour point, the coordinates of paired neighbour points are required. The selection strategy for neighbour points has an important effect on the describing abilities of angle features. In this paper, paired neighbour points are selected using the novel multiple scales generation rule, with details as below.
Initially, for each point p i on leaf contour C, firstly C is quartered with p i as one of quartered points. So the quartered points in the clockwise direction are p i , p i-L/4 , p i+ L/2 , p i+ L/4 . Notice that p i-L/4 = p i-L/4+L , and p i+L/2 = p i+L/2-liter . Then for point p i take the nearest two points as the pair of neighbour points under the first scale. And arc length between the furthest point and p i is taken as a maximum benchmark.
Subsequently, based on the previous step, two shortest segments along leaf contour with p i as endpoint can be obtained, and locate the midpoints of the two segments respectively. And then the two midpoints are taken as a new pair of neighbour points under the next scale.
Lastly, iterate the last step until the following termination condition is satisfied and no new scale is generated: the ratio of the maximum benchmark to the arc length between the nearest neighbour point and p i is greater than one hundredfold.
From the above procedures, it is known that the generation rule can select multiple pairs of neighbours naturally. And the number of pairs of neighbour points is also definite which is equal to the number of scales and has no connection with FIGURE 1 Pairs of neighbour points for pi on sampled leaf contour sampled points number. As known from the calculation, there are 6 scales to select 6 pairs of neighbour points for one point. An example of demonstrating the pairs of neighbour points for p i on a sampled leaf using the above multiple scales generation rule is given in Figure 1.
The leaf in Figure 1 is from the Swedish leaf dataset with 256 sampled points, and each pair of neighbour points are marked with one of six markers, including square, right-pointing triangle, pentagram, plus sign, hexagram and asterisk respectively.

Multi-scale shape description
Then each pair of neighbour points can be combined with p i and two vectors are formed with p i as start point. Consequently, one included angle is obtained using these two vectors. Take one pair of neighbour points p l and p r for example, the angle θ i at p i can be obtained by the following formula: where ⃖⃖⃗ p i p l and ⃖⃖⃗ p i p r are vectors formed by p l , p r and p i . Since for each point on leaf contour, there are six scales of paired neighbour points, and hence there are six scales for the angle. So with a specific starting point p 1 for contour C, the extracted angle features of one leaf can be expressed as = { ik |i = 1, … , N ; k = 1, … , 6}. The angle features under the first generated scale can describe leaf global information well, and angle features under subsequent scales can capture local contour variations. Hence, angle features under different values of k can represent leaf from coarse to fine, which have powerful ability to plant species identification.
It is very easy to show that the extracted angle features θ are invariant to basic transformations like translation, scaling and rotation which facilitate matching between leaves [28]. However, if the starting point of C changes, the angle features θ will vary subsequently. This problem should be addressed in leaf matching stage.

Leaf matching
Several metrics can be used for measuring leaf similarities, such as commonly used city block, Euclidean distance and dynamic programming. Dynamic programming algorithm is not unsuitable for large leaf image dataset and not used in this paper because it's complex and time-consuming. City block metric and Euclidean distance can be not applied directly on , because the starting point problem influences leaf matching accuracy. Hence fast Fourier transform (FFT) is applied after features are extracted. To make the final descriptor independent from the starting point and more compact, only the magnitudes of the transform are employed for leaf matching. The FFT for angle features under one scale is given bellow where F kt are the FFT coefficients with t = 1,… N and k = 1,…,6. Moreover, magnitudes of F kt at lower frequencies are reserved to make the final descriptor more compact, that is |F kt | with t = 1, 2 … n and n ≪ N . For leaf classification task, SVM is trained using the |F kt | and used to classify testing images. For leaf retrieval task, the similarity between each pair of leaves can be determined using the simple city block metric.

Leaf classifiers
Many classifiers have been proposed to classify images, such as SVM, k-NN, and artificial neural network (ANN). Generally speaking, SVM works better than KNN classifier. Compared with ANN, SVM model needs less training data, and is easier to be trained. So SVM was chosen as classifier in the experiments. To train and test the SVM model for each leaf dataset, firstly the training subset was randomly selected and the remaining leaves were taken as testing subset. The number of leaves in training subset or testing subset depend on the dataset itself (please see Section 4 for details). Then the multi-scale shape descriptor was extracted for each leaf. Subsequently, the features and labels of leaves in training subset were used to train a SVM model. In the testing stage, only features of leaves in testing subset were fed into the SVM model to predict one label for one leaf in testing subset. The predicted labels were evaluated using the ground truth labels.

EXPERIMENT RESULTS AND COMPARISONS
In this section, the performances of the proposed method for both leaf classification and image retrieval are tested on freely available plant leaf datasets. They are Flavia leaf dataset, Swedish leaf dataset, MEW2012 leaf dataset and ImageCLEF 2012 dataset. Three metrics are used in the experiments and comparisons. A desk computer with an Intel i5-8400 2.8 GHz CPU and 8 GB RAM is used to carry out the experiments in MATLAB software. The final dimension of descriptor is 42 with n = 7 for each scale. For leaf classification task, a simple and easy-to-use tool, LIBSVM is used to train SVM with recommended parameter c = 64.88 and g = 45.64 for classification [30].

Evaluation methods
The performance of the proposed method are evaluated using standard evaluation metrics in leaf classification and image retrieval experiments, and compared with other state-of-art methods conveniently. For leaf classification task, each dataset divided into training subset and testing subset, and classification rate (%) is used. For leaf retrieval task, the precision-recall metric [13,31], mean average precision (MAP) [13,32] and S-score [13] are used. Classification rate is the ratio of the true classified leaves to all tested leaves. Precision is equal to ratio of correctly returned leaves to all returned leaves. Recall is the ratio of correct number to the total amount of relevant leaves in the whole dataset. In MAP metric, average precision (AP) is calculated as: where n 1 is returned leaves, P i is precision at leaf with cut-off ith rank, and cor i is set to 1 if leaf with cut-off i-th rank is correct, and 0 otherwise. The last metric, S-score is very suitable for nonbalanced large leaf datasets and defined as follows Here U denotes unique user amount in testing subset, P u denotes individual plants number observed by the u-th user, N u,p denotes leaf number from p-th plant observed by the uth user, and S u,p,n takes reciprocal of the smallest rank at which result is correct for the n-th leaf image from the p-th plant observed by the u-th user.

Performance on Flavia leaf dataset
Flavia leaf dataset is widely taken as benchmark in many classification works because it is very challenging and easily obtainable   [1,20]. There are 32 different species with 1907 leaf images in total in the dataset. For each species the leaf sample ranges from 50 to 77, Figure 2 shows leaf samples in the Flavia leaf dataset. Leaf classification performance of our method is first evaluated on Flavia leaf dataset. The adopted evaluation metric here is classification rate, which is identical to that in references [10,13,31].
All the 32 different species are used and each species are randomly split into two subsets: training subset with 40 samples and testing subset with 10 samples. There are 1280 leaves used for training and 320 leaves for testing each time. The multi-scale shape descriptors of leaves in training subset are used for training a SVM model that is subsequently used to classify samples in testing subset. These processes were repeated ten times, and the final classification rate is given computed by averaging. To make a comprehensive comparison, the classification rates of seven state-of-the-art approaches including MARCH, MSRA, SIFT [15], HOG [17], GIST [1] and Deep CNN [33] are listed in Table 1. It is worth noting that both MARCH and MSRA are first-class multi-scale leaf descriptor [10,13]. In reference [10] MARCH is based on absolute and sign value of arch height function, and is fused with three existing global features in leaf matching procedure for better classification and retrieval performance. Similarly, MSRA descriptor is also combined with two existing global contour features to improve the matching results [13]. Hence to make a fair comparison, the global features and the sign information are not used in the matching stage for all the three methods in the experiments. It can be easily seen from Table 1 that our method achieves better classification performance than these notable descriptors. Specifically, the proposed method achieves 93.56% classification rate, which is 7.93% higher than MARCH and 7.0% higher than MSRA. Also it is 5.64% higher than the Deep CNN, which is one of the most important deep learning methods, and 4.5% higher than closely related unified method. These results suggest that the proposed method is very efficient. Leaf image retrieval performances are also evaluated on Flavia leaf dataset. And each of the 1907 leaf images is used as a query leaf to compute the final MAP score and S-score. In addition to MARCH and MSRA, scores of other six state-ofthe-art approaches including Riemannian elastic metric, Unified method [28], TAR [27], MDM-RM [26], IDSC [29] and Shape Contexts are listed in Table 2 for the comparison, where "-" denotes that the data was not recorded in the references.
The proposed method achieves the best retrieval performances compared with these notable methods. Table 2 shows that MAP score of our method is the highest among these stateof-the-art approaches. For example, MAP score of the proposed method is 0.12% higher than the second Shape Contexts, 1.33% higher than the Unified method and 2.92% higher than the MSRA. And it also can be seen from the same table that the proposed method achieves higher S-score than the Unified method, the MSRA and MARCH. The major reason is that MSRA descriptor fails to capture local variations very well because it is affected by the centroid parameter. As to the Unified method, leaf contour is trisected firstly while in the proposed method leaf contour is quartered. So more fine scales are employed in the proposed method and can describe the local information of contour. On the whole, the performance of proposed method is very promising because it can describe the hierarchy information of leaf ranging from global to local details by using the novel scale generation rule.

Performance on Swedish leaf dataset
The Swedish leaf dataset is a widely used benchmark for leaf shape retrieval task, and it comes from a leaf classification project [34]. There are 1125 isolated leaves in total, and 15 different categories with 75 samples per category. The dataset is also very challenging for leaf classification and retrieval due to high inter-class similarity and large intra-class difference. Samples from each category are shown in Figure 3.  Leaf classification rate of our method is firstly measured on this dataset. The same training-testing proportion in reference [10] is adopted. All the 15 different categories are used and each species are randomly split into two subsets: training subset with 25 samples and testing subset with 50 samples. And an SVM model is trained and subsequently used for classifying samples in testing subset. The classification experiment was also repeated 10 times to obtain the final averaged classification rate. Classification rate and feature dimension of our method are listed in Table 3, and compared with multiple top-ranking approaches, including the well-known IDSC, MCC, MDM, the shortest path texture context (SPTC) [29], TAR, the multifeature fusion method, MSRA and MARCH. Table 3 demonstrates that the proposed method obtains the highest classification rate on Swedish leaf dataset with 96.85%. It is 0.88% higher than the second TAR, 2.72% higher than the well-known IDSC, 3.65% higher than MARCH and 4.98% higher than MSRA in the aspect of classification performance. These show that the proposed method is better than both similar methods and classical methods.
The feature dimensions of these methods are also listed in Table 3 for comparison. Because feature dimension is a very important indicator of saving storage and computation time. Table 3 shows that the proposed method has the advantages of reducing the usage of storage and computational time. The feature dimension of the proposed method is 42, and is much  less than that of TAR, MCC or IDSC. It is also smaller than that of MSRA and MARCH. The Swedish leaf dataset is also used to test leaf image retrieval performances of the proposed method. All the 1125 leaf images are used as query images to compute PR curves, the MAP score and S-score. PR curves of the proposed method, the MSRA and MARCH are presented in Figure 4. The MAP scores and S-score are given in Table 4.
It is known from Figure 4 that at each level of recall the proposed method obtains higher precision than the state-of-the-art multi-scale leaf descriptors. For example, the proposed method is 4.92% higher than MSRA and 9.35% higher than MARCH respectively when recall value is 20%. Table 4 shows that the proposed method achieves higher MAP and S-score than the MSRA and MARCH. In the meantime the average time for feature extraction and matching stages are also listed in Table 4 for comparison. Retrieval experiments were repeated ten times to obtain the average time required for feature extraction and matching for each method. It is known from Table 4 that the proposed method requires fewer computation time for leaf feature extraction and matching.
These results show that our method has better leaf classification and retrieval performances compared with state-of-the-art

Performance on MEW2012 leaf dataset
The Middle European Woods leaf dataset is used with version 2012 (termed MEW2012) [35], which contains 9745 leaf images in total. There are 153 leaf classes with 50 to 99 samples for each class in the MEW2012. This dataset is more challenging than the Swedish and Flavia datasets for leaf classification because it has not only more leaf classes but also smaller inner-class distance.
Each column in Figure 5 shows nine of the 153 leaf classes with five samples each row. Retrieval performance of our method is evaluated using all the leaf images in MEW2012, and compared with the Unified method, MARCH and MSRA. Table 5 shows both MAP and Sscores of these four descriptors. It is known from the table that the proposed method achieves higher MAP and S-score on the MEW2012 than all three comparative method, which is similar to Swedish and Flavia leaf datasets. Specifically, the proposed method achieves 44.14% MAP score, which is 3.13% higher than the second Unified method, 10.09% higher than MSRA and 13.26% higher than MARCH.

Performance on ImageCLEF 2012 dataset
The last leaf dataset is ImageCLEF 2012 [36]. There are three kinds of image content in the whole dataset: Scan, pseudoscan, and Photograph. Following the reference [10,13] only the ''Scan'' images are used in the experiments to facilitate performance comparison. There are 6630 images belong to 115 plant species with 2 to 249 samples for each species in the ''Scan'' subset. This leaf dataset very challenging because there are more than 100 species with smaller inner-class distance and large intra-class variations. The evaluation metric defined in [36] is adopted. And 1760 images are used as test leaves and the rest images are treated as ground truth leaves. The test leaves contains 105 species from 10 users, and the individual plants number observed by each user range from 1 to 41. Figure 6 gives examples from each of the 105 test species. Table 6 lists the S-score of our method and compared stateof-the-art approaches, which are MCC, IDSC, TAR, MDM, TAR, FC7 [37], MSRA and MARCH. S-scores of these compared methods are from the references [10,13]. The proposed method achieves excellent performance on ImageCLEF 2012 dataset. It can be easily seen from the table that our method achieve higher S-score than most of the state-of-the-art methods, second only to MSRA. It is worth mentioning that S-scores of most approaches on ImageCLEF 2012 dataset is lower than that on MEW2012 and Flavia, which demonstrate that Image-CLEF 2012 is very challenging.

CONCLUSION
In this paper a hierarchical leaf descriptor based on the novel scale generation rule is proposed for efficient leaf classification and retrieval. The proposed method can capture leaf features from global to local using angle information. FFT operation is used to make the descriptor RST invariant and more compact. Four challenging leaf datasets are used to evaluate leaf classification and retrieval performances which are also compared with multiple state-of-the-art approaches. The empirical results revealed that our method showed excellent performance in term of classification accuracy, MAP S-score, and precisionrecall results. The proposed method is not suitable for leaves with complex background. Because perfect segmentation of leaves with complex background is a hard task, such as the presence of occlusions for real world implementation.
In the future, we are interested in improving the leaf recognition and retrieval using both shape and vein features of leaves. Because vein is a unique morphological characteristics of leaves. Also, deep learning method is helpful to extract leaf vein feature.