A root auto tracing and analysis (ARATA): An automatic analysis software for detecting fine roots in images from flatbed optical scanners

Buried scanners are often used to study fine root dynamics by continuously observing them from the images taken at a fixed point. Accordingly, software have been developed to support operators to quantitatively analyse fine roots from scanned images. However, image processing is still time‐consuming work. Deep learning has achieved impressive results as a method for recognising objects in pixel units. In this study, we attempted to automate the image analysis of fine roots using convolutional neural network. Using a root auto tracing and analysis (ARATA), we succeeded in extracting fine roots from scanned images and calculated projected area of fine roots for long‐term dynamics. Our software enables the automatic processing of scanned images acquired at various study sites and accelerates the study of fine root dynamics over extended time periods.


| INTRODUC TI ON
Fine root dynamics forms an important component of the forest carbon cycle, and data can enable the meaningful improvement of the ecosystem and terrestrial biosphere models (McCormack et al., 2017). The production of fine roots accounts for 15%-70% of the net primary production due to its high turnover rate (Matamala et al., 2003). Fine roots have morphological, architectural, physiological and biotic traits (McCormack et al., 2017), and these functional traits have plasticity to adapt to the environmental conditions. Thus, rhizotron (Bates, 1937), minirhizotron (Böhm, 1974) and root scanner (Dannoura et al., 2008(Dannoura et al., , 2012 have been developed as tools to take time-series scanned images to extract root dynamics continuously. The series of images taken by a camera or scanner are manually processed to trace the roots and measure their length and projected area using various software, such as WinRHIZO Tron MF (Regent Instruments Inc., Canada). However, the results differ depending on the user and time constraints, which suggests that these methods have low reproducibility (Kume et al., 2018). In addition, the manual analysis of images is time-and labour-intensive, depending on the complexity of the root system. As image acquisition becomes more automated and the number of images to be processed increases, software have been further advanced to automate processes.
Currently, convolutional neural networks (CNNs) based on deep learning algorithms are applied in various fields (Li et al., 2020), including root image processing (Smith et al., 2020;Teramoto & Uga, 2020;Wang et al., 2019). In this study, we extracted roots from scanned images with a high accuracy using the deep learning model DeepLabv3+ (Chen et al., 2018). To accurately identify roots on a pixel-by-pixel basis for quantification of the projected root, a precise feature map is required. With the development of fully convolutional networks (FCNs) (Long et al., 2015), a detailed output characteristic map can be obtained, and pixel-by-pixel prediction is possible. In FCN, all the fully connected layers are removed from the conventional CNN, and the feature map that is smaller in the convolution process due to the deconvolution layer is expanded. This ensures that a feature map of the same size as the input is provided as output. Various techniques, such as atrous convolution and atrous pyramid pooling, which expand the receptive field while suppressing the calculation cost, have been incorporated in DeepLabv3+ (Chen et al., , 2018Zhao et al., 2016). In this study, we developed a software to extract root area from scanned images of the soil with DeepLabv3+ for the fine root semantic segmentation.

| Model structure
Our network model, ARATA, uses DeepLabv3+ with Xception (Chen et al., 2018;Chollet, 2017) (Figure 1). In the network, the semantic information is encoded to recognise a collection of pixels that form a characteristic category, such as the root. It is pooled in the atrous spatial pyramid pooling module by processing the input image from the input layer to the output layer (Figure 1 left side). Subsequently, this information is decoded, and the probability that each pixel is included in the root image is generated (Figure 1 right side). In our model, we added up-sampling layers using Pixel Shuffler  and a 3 × 3 convolution layer before the output layer (the boxes filled with grey in Figure 1).
The accelerated gradient optimisation developed by Nesterov (1983) was applied during training with a momentum value of 0.9 and the weight decay was set to 4E-5. The learning rate lr was calculated using the following equation: Here, the initial value of the learning rate lr init was set to 0.007 based on the 'poly' policy (Liu et al., 2015). iter represents the current number of learning iterations, max iter represents the maximum number of learning iterations, and the power constant p was set to 0.94. We ran our software with an NVIDIA GeForce GTX1060 (NVIDIA). PyTorch version 1.9.1 was used for the deep learning framework.

| Image data
Scanned images were captured for training and validation using flatbed image scanners (GT-S650 and GT-S600; EPSON) from vari- F I G U R E 1 Deep learning network used in ARATA. The grey boxes represent the additional processes to the original DeepLabv3+ model. Pixel shuffler and convolution layers were added for generating high resolution feature maps in the output layers. 216 × 296 mm 2 (5100 × 7019 pixel 2 ), resolution was 600 dpi, and the size of one pixel was 0.0423 mm. The images were saved after JPG compression.
Fine roots were manually extracted and painted using Paint 3D (Microsoft) on 246 original scanned images. The painted images were stored as binary images (255: root, 0: background). The images were split into 2800 small image patches of 16.9 × 16.9 mm 2 (400 × 400 pixel 2 ). Finally, a dataset consisting of 2800 pairs of original and painted binary images were obtained, of which 2520 were used for training and 280 were used for validation ( Figure S1).
To adapt ARATA to images of different qualities, the number of images was increased to approx. 20,000 (7-8 images for each of the 2800 images) by applying image augmentation techniques such as flipping and rotation, random scaling and swinging and adding noise and blur ( Figure S1). The contrast-limited adaptive histogram equalisation (Pizer et al., 1987) in OpenCV (version 3.3.1, OpenCV.org) was applied to the patches for local contrast correction. The parameters clip limit and grid size were set to 2 and 11, respectively.

| Post-processing
Our software provides a function to correct the extraction result, considering the temporal relationship of the scanned image. We assumed that mature roots are immobile unless there is interference due to environmental factors, such as scanner displacement, soil animal contact and soil particle movement. Firstly, the root probabilities obtained by the CNN were weighted by a normal distribution centred on the target day, for five images before and after the day. Based on the results averaged by these weighted images, thresholding was applied to the image to determine whether it was a root or not.

| Calculation of morphometric parameters
The total projected area (TPA) of the fine roots was calculated from the results of the segmented root area on the scanned images.
Temporal changes in scanned images other than root dynamics appear as noise in the morphometric parameters. In this program, we used the ℓ 1 trend filter (Kim et al., 2010) to extract the overall trend of TPA. The filter uses a least-square trend-extraction method with the penalty of the ℓ 1 norm to remove the outlier data. T PA(t) was obtained by minimising the cost function values F TPA (t) defined by the following equations: Here, t and n are the image number and total number of images, respectively, and is a normalisation parameter. We set = 0.03 in our analysis.

| Evaluation
Ten additional images of 1180 × 1180 pixel 2 obtained from the Ryukoku forest, which were also manually hand painted using Paint 3D (Microsoft), were used to evaluate the performance of ARATA ( Figure S1). The performance of our model was measured using the following indices after 100 epochs of training. For long-term evaluation of the model performance, an additional series of 77 images captured over two and a half years in the Ryukoku forest were used ( Figure S1). Automatic extraction results were compared with manually extracted results by two operators using WinRHIZO Tron MF (Regent Instruments).
Furthermore, the relative error (RE) of the TPAs of roots extracted by ARATA was calculated and compared for two different sites, the Himeji nature observation forest and the Ryukoku forest, to assess the effectiveness of training images, with much less images from the Himeji nature observation forest in the training data (Teramoto & Uga, 2020) than from the Ryukoku forest (220): (2)   Figure S1).

| RE SULTS
We obtained binary images of extracted root segments from the scanned images ( Figure 2). Root segments with high contrast to the background soil images were accurately detected (Figure 2a) They were also detected from low-contrast images (Figure 2b).
However, root-like objects with similar shape and colour to roots (mostly mycelia) were also detected ( Figure 2c). The evaluation results of the automatic detection of roots with ARATA compared to the manual segmentation for 10 scanned images confirmed that the accuracy is similar with the two extraction methods (Table 1).
A comparison of the manual extraction and ARATA results of the area projected on the scanner surface for the corresponding fine roots showed that the results differed a little from person to person ( Figure 3). In all cases, the trends observed in temporal changes between the manual and automatic segmentations were in good agreement. Overall, the results from ARATA were slightly higher than those of the manual extraction, but we detected changes in temporal trends such as peaks of increase or decrease in root area (marked with inverted triangles). In a few cases, the trends obtained with ARATA were different from the manual extraction results (marked with an asterisk), but this was transient with a subsequent return to values close to those obtained by manual extraction.
The fine roots were well extracted from images collected at the Himeji nature observation forest in most cases when the image quality was good enough (Figure 4a-HMJ-1), even though the number of images from this site in the training dataset was less compared with the Ryukoku forest (20 and 220, respectively). However, in a few cases when the image quality was poor, fine roots could not be extracted at all (Figure 4a-HMJ-2). Extraction results by ARATA from Himeji images had larger average value of the REs with a higher standard deviation than those of Ryukoku images (Figure 4b).

| DISCUSS ION
Approximately 70% of the roots on one image could be detected based on the TPR value, and nearly 72% of the pixels judged to be roots were correctly assigned according to the PPR value.
Furthermore, the MCC value was as high as 0.699. The dice score of ARATA was 0.712 and higher than the score of Segroot, another root segmentation method using CNN (0.65; Wang et al., 2019).
These results suggest that our program had a reliable performance for root detection. An unexpected very high accuracy value (0.981) was achieved probably because it is not suitable for classification of objects with significantly disproportionate occurrence rates (Brown, 2018). The root area was indeed often less than 5% of the area of the scanned images.
Roots were detected well in some images (Figures 2 and 4a-HMJ-1) but not in others (Figures 2c and 4a-HMJ-2). Our program falsely detected objects with common characteristics with roots, which did not exist in the training data or appeared as roots due to their colour or shape. In actual field conditions, image quality may vary; therefore, images of leaf litter veins, hyphae and earthworms should be included to generalise the images used for learning. A good evaluation of the model was obtained with images from the same site as those used for training. Because the soil which constitutes the background of the images differs from one site to another, F I G U R E 2 Scanned images of soil with fine roots and their extraction results. (a) Bright coloured soil with high-contrast image, (b) low-contrast image, (c) dark coloured soil with hyphae. we recognise the need, in order to take full advantage of the performance of ARATA on a given site, to use images from this site to train the model (Figure 4b). In the future, we can solve this problem by adding a package that allows users to train ARATA using images from their own sites. In addition, for long-term series of images, the results got worse with time because of the accumulation of dirt on the scanner surface during aging (Figure 3). The human eyes can detect the root even if the scanner surface is dirty, but ARATA cannot.

TA B L E 1 Evaluation results of the automatic detection of roots compared with the manual segmentation
This may be one of the limitations of using automated image analysis software.

F I G U R E 3
Comparison of ARATA and manual extraction results of fine root areas obtained from a time series of 77 scanned images in a Quercus serrata stand in Ryukoku forest. The characteristic temporal changes, such as increase or decrease in fine root area, were correctly captured (inverse triangle). The case where the trend obtained with ARATA is different from that obtained by manual measurement is shown with an asterisk.

F I G U R E 4
Extraction results of fine roots by ARATA with test site data (Himeji) with little training data. a. the green lines are the manual extraction result, red lines are the extraction result by ARATA, and the yellow segments indicate the parts where the results of the two extractions match. The image HMJ-1 is an example of a good extraction by ARATA. The image HMJ-2, of poor quality, is an example showing the inability of ARATA to extract the existing roots. b. Comparison of the average value of the relative errors (RE) between the Ryukoku site, based on 5 images, and Himeji site, based on 15 images. The training data are based on 220 original images from Ryukoku but only 20 original images from Himeji.
Among deep learning-based software for root extraction such as SegRoot (Wang et al., 2019), RootPainter (Smith et al., 2020) and TrenchRoot-SEG (Teramoto & Uga, 2020), ARATA provides a new option for root researchers. ARATA is equipped with a position and extraction result correction function that is conscious of continuously measured scanned images and of the spatiotemporal properties of root dynamics. All root extraction software are used by learning from target images, and comparing and examining the performance with respect to the data in various test sites could be one of the challenges for the future. We call for a collaborative effort between developers to tackle this task. Ikeno wrote the manuscript with feedback from Arata Yabuki.

AUTH O R CO NTR I B UTI O N S
All authors approved the final version of the manuscript for publication.

ACK N OWLED G EM ENTS
We thank the late Prof. A. Osawa for providing the long-term image data and extensively discussing this research, K. Hattori, K. Niihara, X. Cheng, R. Minaki and N. Kuwabe for providing the test data, Y.
Kameda, C. Tsuji and S. Tsujii for analysing images with WinRHIZO, and R. Nakahata for teaching them how to use this software. We also thank all other members of the forest utilisation laboratory at Kyoto University for their help in field and enthusiastic discussions.
We are grateful to these researchers for allowing us to work in their sites, their collaboration and supports; T. Miyaura, N. Kurachi for the Ryukoku forest; Y. Kominami, T. Miyama for the Yamashiro forest; M. Ohashi for the Himeji forest and Lambir Hills National Park. We thank D. Epron for the valuable supports.

FU N D I N G I N FO R M ATI O N
Part of the data acquisition was supported by JSPS KAKENHI Grant Number 16H05791.

PE E R R E V I E W
The peer review history for this article is available at https://publo ns.com/publo n/10.1111/2041-210X.13972.