Automatic localization and segmentation of focal cortical dysplasia in FLAIR‐negative patients using a convolutional neural network

Abstract Purpose Focal cortical dysplasia (FCD) is a common cause of epilepsy; the only treatment is surgery. Therefore, detecting FCD using noninvasive imaging technology can help doctors determine whether surgical intervention is required. Since FCD lesions are small and not obvious, diagnosing FCD through visual evaluations of magnetic resonance imaging (MRI) scans is difficult. The purpose of this study is to detect and segment histologically confirmed FCD lesions in images of normal fluid‐attenuated inversion recovery (FLAIR)‐negative lesions using convolutional neural network (CNN) technology. Methods The technique involves training a six‐layer CNN named Net‐Pos, which consists of two convolutional layers (CLs); two pooling layers (PLs); and two fully connected (FC) layers, including 60 943 learning parameters. We employed activation maximization (AM) to optimize a series of pattern image blocks (PIBs) that were most similar to a lesion image block by using the trained Net‐Pos. We developed an AM and convolutional localization (AMCL) algorithm that employs the mean PIBs combined with convolution to locate and segment FCD lesions in FLAIR‐negative patients. Five evaluation indices, namely, recall, specificity, accuracy, precision, and the Dice coefficient, were applied to evaluate the localization and segmentation performance of the algorithm. Results The PIBs most similar to an FCD lesion image block were identified by the trained Net‐Pos as image blocks with brighter central areas and darker surrounding image blocks. The technique was evaluated using 18 FLAIR‐negative lesion images from 12 patients. The subject‐wise recall of the AMCL algorithm was 83.33% (15/18). The Dice coefficient for the segmentation performance was 52.68. Conclusion We developed a novel algorithm referred to as the AMCL algorithm with mean PIBs to effectively and automatically detect and segment FLAIR‐negative FCD lesions. This work is the first study to apply a CNN‐based model to detect and segment FCD lesions in images of FLAIR‐negative lesions.


| INTRODUCTION
Focal cortical dysplasia (FCD) is a malformation of cortical development and one of the most common causes of intractable epilepsy, as defined by Taylor and his colleagues in 1971. 1 Currently, the main treatment for drug-resistant epilepsy is surgery, and magnetic resonance imaging (MRI) is usually an ideal tool for detecting FCD.
Although these techniques perform extremely well, the VBM technique employs a few neurological features and is sensitive to artifacts, and the SBM technology has high computational complexity because of the three-dimensional (3D) surface reconstruction.
In 1998, LeNet-5, which is based on the convolutional neural network (CNN) proposed by Lecun et al., 11 was successfully applied in handwritten character recognition. In 2012, AlexNet 12 won an image classification competition by using the large database Ima-geNet. After AlexNet, a variety of new CNN models, such as the Visual Geometry Group (VGG) of the University of Oxford, Google's GoogLeNet, 13 Microsoft's Residual Networks (ResNet), 14 and the Dense Convolutional Network (DenseNet), 15 were proposed. Preliminary progress in understanding CNNs has been achieved. Class activation mapping (CAM) 16 and gradient-weighted CAM (Grad-CAM) 17 can restore the location function of the convolutional layer (CL) and identify the locations of unsupervised feature regions. The deconvolution network (Deconvnet) 18 and guided backpropagation (GBP) can visualize the feature extraction of each CL. The activation maximum (AM) 19 can understand how neural networks recognize features by generating a pattern image.
CNNs are highly accurate in object detection and segmentation, such as pedestrian detection, 20 action detection, 21 object detection, 22 and saliency detection. 23 Current research focuses on natural images, and research in medicine includes tumor detection 24 and tumor segmentation. 25 All of these examples show that CNNs are excellent algorithms in image recognition and segmentation. In epilepsy, CNNs have been employed to detect abnormal electroencephalogram (EEG) signals in epilepsy. [26][27][28] We identified two studies on FCD focus recognition and location using CNNs; these studies obtained effective results. The first study employed a CNN model to automatically segment FCD lesions in FLAIR images. 29 The second study applied a deep CNN to classify FCD and non-FCD patches in a T1 image. 30 The current research is based mainly on either T1 images or T1 images combined with T2 or FLAIR images, and feature extraction is based on artificial extraction. The common findings of FCD are cortical or subcortical hyperintensities, which are especially observed in FLAIR images. Manually extracting features to distinguish FCD pixels from normal pixels is difficult. In this study, we propose a novel automated FCD detection and segmentation technique, which is referred to as activation maximization and convolutional localization (AMCL). This technique, which is based on an advanced CNN, is the first technique to automatically analyze the features of FCD lesions and locate them in FLAIR-negative epilepsy patients. This technique is noninvasive, effective, highly accurate, and inexpensive and can not only alleviate the problem of uncertain factors, such as doctors' subjective experience and judgment of misdiagnosis, but also assist doctors in implementing effective preoperative evaluation and completely removing the lesion area during the operation to completely cure epilepsy.  Table 1.

2.B | Methods
This study developed an AMCL method that consists of five steps: To perform a comparison with the Grad-CAM localization method, a simple procedure is implemented at the end of the proposed method.

2.B.1 | Image preprocessing
Training a CNN requires many training data; we employed a limited number of cases to construct sufficient training data. The dataset construction process is shown in Fig. 1. The FCD regions were manually marked by an epileptologist based on the operation location, as shown in the red area in Fig. 1. First, we extracted the brain region from the original image to perform the skull-stripping operation, resized the brain image to a uniform resolution of 256 × 256 pixels by using bilinear interpolation, and normalized the brain image to a zero mean with a unit variance. Second, we selected 28 × 28 pixel square blocks from the brain image as the training data, in which the positive image blocks were obtained from the lesion area (as shown in the right part of Fig. 1) and the negative image blocks were obtained from the normal area (as shown in the left part of Fig. 1).
The details of the training set are described as follows: first, we selected a 28 × 28 block from the first pixel in the upper left corner of the lesion area, which was manually labeled in the red area as the center point of the block, as shown by the blue rectangular block in Fig. 1; second, we determined whether the block could be considered to be a lesion image block. If more than four-fifths of the pixels in the block were labeled as lesion pixels or four-fifths of the total number of lesion pixels were contained within the block, the block was considered to be a lesion image block and was output. If these conditions were not satisfied, the block window continued to be slid to the right or down to extract the next block for determination until each pixel in the lesion area was traversed as the center of the block. The number of all lesion image blocks obtained from the lesion area was calculated as the number of positive image blocks.
The normal image blocks were symmetrical with the center of the image, and the selected areas are N1, N2, and N3. To ensure that the number of positive image blocks is equal to the number of negative image blocks, we selected one-third of the negative image blocks from the N1, N2, and N3 regions. First, we selected a 28 × 28 block from the N1 area. If the block has no lesion pixels, a normal image block is output. Otherwise, we continue to slide the block window to the right or down to remove the next block for evaluation. In selecting negative image blocks, we ensure that no   Net-Pos consists of two CLs, two PLs, and two FC layers, including 60 943 learning parameters. The input image of the network is  1): where h θ ðXÞ is the actual output of the network and y ∈f0,1g is the real label.
We establish the sample set ðX 1 ,y 1 Þ,⋯ðX m , y m Þ À Á À Á , which contains m input images X. The overall cost function is defined as Constructing the dataset.
shown in Eq. (2): To minimize the overall loss function, JðθÞ, the stochastic gradient descent with momentum (SGDM) optimizer, is employed in the proposed technique. The weights update the network parameters, as shown in Eq. (3): where θ is the parameter vector, n is the iteration number, ρ is the momentum, and lr is the initial learning rate. In our work, ρ ¼ 0:9, and lr ¼ 0:01.
After several iterations, the network tends to converge, and the loss function tends to zero, thereby indicating that the network training is completed; then, image classification or localization can be carried out.
In the subsequent image recognition, PIBs are used for template convolution, so we employed two sets of data to train two networks to obtain the optimal PIBs. The structures of the two networks are The process of obtaining the PIB is shown in steps one to five in Fig. 3. The optimization process of the PIB applies Eq. (4): where x * is the final image (or PIB), a i,l denotes the activation value of the i neuron in layer l, x is the random image of the input, and θ denotes the parameters of the network.
Step 1: Image block preprocessing. First, the input image was set to a random image block of size 7 × 7 and then resized to the CNN input size of 28 × 28. Second, z-score normalization was performed.
Step 2: Forward propagation. The preprocessed image block was input into the trained CNN for forward propagation, and the output is the probability that the image block contains disease.
Step 4: Inverse operation. The iterated image block was inversely operated by normalizing and resizing the z-score, and the 7 × 7 image block was obtained.
Step 5: Upscaling and blurring. The 7 × 7 image block was upscaled with a scale factor of 1.2, and then, the image block was smoothed (blurring size = 2); steps 1-4 were repeated eight times to obtain the final noise-free image, which is also referred to as PIB x * .
In this study, we calculate the PIB of the output neuron. We set the output neuron a i,l ðθ,xÞ ¼ Ày. The output probability is one when x * is input into the trained CNN, thus indicating that x * 100% is the lesion image block. Conversely, we set a i,l ðθ, xÞ ¼ y, and the output probability of x * input into the CNN is zero, indicating that x * 100% is a normal image block. When we input the random image block

2.B.4 | Convolutional localization and segmentation
Step 6: Template the convolutional operation. We employed the mean PIB from four runs of AM to perform a traversal patternmatching convolutional operation to locate and segment the lesion to be recognized in the image. The mean x * was slid along image I with stride one for recognition, and the convolution sum of the whole image was traversed. The calculation equation is shown in Eq. (6): Iconvði, jÞ ¼sumðconvðIði À 13 : i þ 14,j À 13 : j þ 14Þ,x * ÞÞ where Iði À 13 : i þ 14, j À 13 : j þ 14Þ extracts the 28 × 28 block from the original image to be recognized and i and j are the length and width of the image, respectively.
Step 7: Threshold. Select the lesion area from the convolved image with an appropriate threshold and set any value of <0:9 Á maxðIconvÞ in the Iconv image to zero. This threshold is the most appropriate value obtained via the receiver operating characteristic (ROC) curve.
Step 8: Display. The localization result, original image, and ground truth image, which are hand labeled by the epileptologist, were merged into one color image to show the lesion. The color area is labeled manually by an epileptologist in red and green, and the detected area is manually labeled in red and blue.
The whole process is shown in Fig. 3.
Due to the limited number of patients, we employed the leaveone-out cross-validation (LOOCV) strategy, in which each patient of a set of N patients is classified by using a classifier that is trained on the remaining N − 1 patients to evaluate the proposed method.
Because the Net-Pos training is based on the image blocks extracted from the FLAIR-positive lesion images, the LOOCV strategy is utilized in the location and segmentation of FLAIR-positive lesion images. FLAIR-negative lesion images belong to the new dataset for the Net-Pos network; therefore, the mean PIB is directly applied to locate and segment FLAIR-negative lesion images. Eq. (7). The gradient of y c to FM A k was calculated, and then, the gradients were averaged to obtain the weight α c k :

2.C | Grad-CAM localization
where y c is the value of category c before the last activation function; A k is the k À th FM of the last layer of the CL; i and j are the pixel sizes of the length and the width, respectively, of the FM; and Step 2: The final mapping image L c GradÀCAM is obtained by the linear combination of A k and α c k and then by the ReLU activation function, as shown in Eq. (8). When Output>0:5, we perform backpropagation and obtain the L c GradÀCAM . image: Combining the FMs with the gradient of the output to the FM can determine the difference between the lesion area and the normal area in the original image as much as possible and highlight the lesion area. The color region in the L c GradÀCAM image is the probability of recognizing the region as a lesion.

2.D | Quantitative evaluation
In this study, all images that need to be recognized are images with lesion areas. Because each patient's image is selected in different directions, the detection of each image can be considered subjectwisely. We utilized the recall to subject-wisely calculate the accuracy. To measure the robustness of the proposed method, the FCD detection performance was measured pixelwisely. The specificity, accuracy, recall, and precision were defined as follows: To evaluate the segmentation of FCD lesions, we adopted the Dice

3.B | Localization results
The detection results of some images are shown in Fig. 6

3.C | Quantitative results
A quantitative evaluation of the Grad-CAM and AMCL is shown in Table 2. The recall was calculated subject-wisely. The specificity, accuracy, recall, and precision were calculated pixel-wisely. When the recall was >20, FCD lesions were considered. Although our purpose is to detect and segment the focus of the FLAIR-negative lesion images, we still present the segmentation results of the FLAIR-positive lesion images in the table, which is valuable.  The detection results in the second to fourth rows of Fig. 6 show that Grad-CAM is not effective, meaning that it is unable to detect the lesions, whereas AMCL can identify the FCD lesions. Current MR technology provides excellent detection for FCD lesions that occur in extratemporal locations but is unreliable for detecting most FCD lesions, which predominantly occur in the temporal lobe. 33 The lesions of three patients occurred in the temporal lobe The limiting results of one patient, P13, is a special case. The lesion was detected in the axial images but was undetected in the coronal images. The two contrasting images are shown in the last two rows of Fig. 6. The detection of the P13_1 coronal image failed completely, and the lesion area could be detected in the P13_2 axial image. These two images were obtained using the same series of MR parameters. The MR findings of the patient P13 showed atrophy of some gyri and abnormal sulci in the right frontal lobe. The main reasons for image detection failure for the P13_1 coronal image is the small intensity change between the FCD pixels and the adjacent normal pixel and the large interclass similarity between the FCD pixels and the surrounding normal region. In the P13_1 coronal image, only tiny highlighted areas exist around the sulcus area, which is overlooked. Therefore, some defects in the detection of FCD lesion areas are observed in a single-direction slice of two-dimensional images.
Currently, the two studies that use CNN technology to detect FCD lesions also process 2D images. 29,30 Theoretically, we could obtain the three-dimensional (3D) pattern image block that is spatially correlated with the FCD disease by training a 3D CNN. After expanding to three dimensions, the number of parameters that need to be trained in the network will increase significantly; thus, numerous 3D training data will be required to make the network converge. Therefore, with an increase in the number of samples in the future, we will also consider 3D processing while considering neighborhood information in spatial processing, which should be more accurate in theory.   29 Because the purpose of this study is to perform segmentation, better segmentation results were achieved. Wang, Huiquan et al. employed a deep CNN to classify the cortical FCD patches into FCD and non-FCD patches with T1 images. 30 The two studies did not mention whether the images were positive or negative. Compared with the results of the two CNN algorithms, the proposed AMCL in this study is slightly superior mainly because all the images in this study are negative lesion images, for which location and segmentation are difficult. In addition to this reason, the lesions segmented by the doctor may not be particularly accurate; any such inaccuracies may have reduced the results of the quantitative analysis. Therefore, the manual labeling of lesions needs to be further

CONF LICT OF I NTEREST
No conflict of interest.

AUTHORS' CONTRIBUTIONS
Cuixia Feng mainly designed and implemented the algorithms and was a major contributor in writing the manuscript. Hulin Zhao mainly sorted out the patient data and manually marked the lesions. Yueer Li was mainly responsible for proofreading the article. Junhai Wen provided overall guidance and full participation. All authors read and approved the final manuscript.