T2‐weighted magnetic resonance imaging texture as predictor of low back pain: A texture analysis‐based classification pipeline to symptomatic and asymptomatic cases

Low back pain is a very common symptom and the leading cause of disability throughout the world. Several degenerative imaging findings seen on magnetic resonance imaging are associated with low back pain but none of them is specific for the presence of low back pain as abnormal findings are prevalent among asymptomatic subjects as well. The purpose of this population‐based study was to investigate if more specific magnetic resonance imaging predictors of low back pain could be found via texture analysis and machine learning. We used this methodology to classify T2‐weighted magnetic resonance images from the Northern Finland Birth Cohort 1966 data to symptomatic and asymptomatic groups. Lumbar spine magnetic resonance imaging was performed using a fast spin‐echo sequence at 1.5 T. Texture analysis pipeline consisting of textural feature extraction, principal component analysis, and logistic regression classifier was applied to the data to classify them into symptomatic (clinically relevant pain with frequency ≥30 days and intensity ≥6/10) and asymptomatic (frequency ≤7 days, intensity ≤3/10, and no previous pain episodes in the follow‐up period) groups. Best classification results were observed applying texture analysis to the two lowest intervertebral discs (L4‐L5 and L5‐S1), with accuracy of 83%, specificity of 83%, sensitivity of 82%, negative predictive value of 94%, precision of 56%, and receiver operating characteristic area‐under‐curve of 0.91. To conclude, textural features from T2‐weighted magnetic resonance images can be applied in low back pain classification.


| INTRODUCTION
Low back pain (LBP) is a complex condition in which biological, psychological, and social factors impact on both the experience of back pain and associated disability. 1 LBP is a very common symptom and the leading cause of disability throughout the world. 1 Consequently, LBP amounts to a considerable amount of annual costs worldwide when healthcare costs and indirect costs from sick leaves are considered. 2 LBP may result from an injury or degenerative process of the lumbar innervated tissues such as facet joints, intervertebral discs (IVDs), ligaments, or muscles. Several studies have shown that LBP is related to annular tears, 3,4 disc height narrowing, 3 facet (or apophyseal) joint degeneration, 5 and endplate lesions such as Schmorl's nodes, fractures, erosion, and calcifications. 6 Magnetic resonance imaging (MRI) is the preferred imaging modality for most spinal diseases, as it allows illustration of vertebrae, IVDs, musculature, nerve roots, foramina, and facet joints with good contrast. 7 MRI studies are often used to confirm IVD herniation, nerve root entrapment, spinal canal stenosis, and more serious pathologies such as trauma or tumor metastases. 8 MRI can also show IVD degeneration and vertebral endplate changes that have been associated with clinically relevant LBP. [9][10][11] However, these abnormalities are also common among asymptomatic subjects as imaging studies have revealed that up to 87% of asymptomatic people have lumbar IVD abnormalities seen in MRI. [12][13][14] Thus, lesions or degenerative changes revealed in MRI studies may not be representative of clinical symptoms. 15,16 LBP that cannot be attributed to any known pathology is called nonspecific LBP.
Despite these diagnostic challenges of LBP, substantial effort has been made to find a connection between LBP and MRI findings. Disc degeneration is categorized into five grades, also known as Pfirrmann grades, correlating loss of IVD signal intensity and height in T 2 -weighted MRI to progressive degenerative changes. 17 Associations between LBP and degenerative changes seen in T 2 -weighted MRI have been observed. [18][19][20][21][22] Disc herniation revealed on MRI has been related to LBP in sciatica patients. 23 Multifidus fat infiltrations visible in T 2 -weighted MRI have been strongly associated with ever having LBP and leg pain. 24,25 The so-called Modic changes categorize degenerative changes in vertebral endplate and bone marrow into three types, 26 and a significant association has been found between them and pain as well. 27,28 Furthermore, correlation between Modic changes and Pfirrman grades of disc degeneration has been observed. 29 Radiomics stands for the extraction of quantitative features from radiographic medical images. This process consists of image acquisition, reconstruction, segmentation, extraction of features, and building a data analysis pipeline for the given task. Radiomics studies are often conducted by means of texture analysis (TA), which refers to the characterization of images by their texture content. TA encodes images into feature vectors that characterize image properties such as roughness or smoothness by analyzing spatial variation in pixel intensities. Usually, a large amount of features is collected, and afterward, the most important ones are selected by statistical methods, or alternatively, data are transformed into a fewerdimensional space by dimensionality reduction techniques such as principal component analysis, to avoid the so-called "curse of dimensionality." Statistical measures or machine learning methods can then be applied to classify these feature data. Recent scientific contributions to using this methodology in spinal MRI include, but are not limited to, assessment of fatty infiltrations in paraspinal musculature, 30 could classify lumbar MRI data to symptomatic and asymptomatic cases. Furthermore, our aim is to compare predictive ability between (1) discs and vertebral bodies and (2) upper and lower lumbar levels.
Finally, we aim to project our results to a subset of data exhibiting nonspecific LBP symptoms. Such AI-enhanced analysis could be beneficial in the ever-increasing flow of radiological images in terms of time and resource savings.
The level of evidence of this prospective cohort study is 2.

| Data
The  Table 1. Statistical methods used on the data in Table 1 were the independent t test for continuous variables (distribution normality was tested with the Kolmogorov-Smirnov test) and the χ 2 test (with contingency tables) for binary variables. Figure 1 shows example images of symptomatic and asymptomatic patients with typical spinal degeneration phenotypes seen in MRI.

| Image segmentation with U-net
Before texture analysis, lumbar vertebrae L1…L5 and IVDs L1-L2…L5-S1 were segmented from the MR images ( Figure 2). 200 samples were segmented by hand and used to train a U-net 43 deep learning convolutional neural network that is known to perform well in segmentation tasks. A subset of 15% of the training data was used for model validation. The U-net comprised of five encoding and four decoding layers and was trained for 300 epochs using the combination of binary cross-entropy and Jaccard index (with equal weights) as the loss metric ( Figure 2). Separate models were trained to segment the vertebrae and IVDs. The trained networks were then used to segment the vertebrae and IVDs from the rest of the MR images.

| Feature extraction
The obtained segmentation masks were split to four regions-of-interest features, and local binary patterns. [44][45][46] Images were standardized (to zero mean and unit variance) before feature extraction. In addition to the textural features described above, Modic grading was added to the vertebral features and Pfirrman grading was added to the IVD features.
A more detailed description of the features can be found in Table 2.

| Classification
Sklearn (v. 0.21.2) machine learning library was used in Python

| Sensitivity analysis for nonspecific LBP
To investigate sensitivity for nonspecific LBP, the classification pipeline was run in two different scenarios as follows: (1) discarding cases with sciatica symptoms from the symptomatic group. This was done by asking "Have you had aches in your lower back that are associated with radiating pain or numbness below the knee". In addition, protrusions and extrusions were evaluated from the MRI data of the symptomatic group.
If a subject had both radiating pain below the knee and a protrusion or extrusion, they were removed from this analysis (referred to as NS1).
(2) Additionally, cases exhibiting Modic 1 or 2 changes (MC) exceeding 25% of the height of the adjacent vertebrae were discarded as the larger MC were more strongly related to clinically relevant LBP in our previous study. 47 This other group of nonspecific symptomatic subjects (referred to as NS2) included 54 cases (11.7%).

| Data preprocessing
An example output from the segmentation network is shown in  (Table 3).

| Classification
Using demographic variables with Modic and Pfirrmann grades in the logistic regression analysis resulted in poor classifier performance, and when textural features were added to the analysis, classification results improved greatly (Table 4, Figure 4). When comparing the classification performance between different ROIs, the best results in the test set were obtained with the ROI with the two lowest IVDs with 83% accuracy, 83% specificity, 82% sensitivity, 94% NPV, and 56% precision | 2433 (Table 4). Classification with the three uppermost vertebrae yielded a ROC-AUC of 0.78 ( Figure 4A), and the two lowest yielded a ROC-AUC of 0.84 ( Figure 4B). The ROC-AUC scores for upper ( Figure 4C) and lower ( Figure 4D) IVDs were 0.76 and 0.91, respectively.
In the sensitivity analysis for nonspecific LBP, in the NS1 group (subjects with sciatica symptoms discarded) all the classification metrics for the lowest two discs improved slightly (0.94 ROC-AUC, 93% specificity, 95% NPV, and 63% precision) apart from sensitivity (71%; Table 5, Figure 5B). The other ROIs exhibited similar improvements (Table 5, Figure 5A-B). In the NS2 group, classification accuracy and ROC-AUC were slightly lower for the lowest discs and slightly higher for the other ROIs (Table 5, Figure 5C, D).

| DISCUSSION
In this study, the texture of T 2 -weighted MR images was analyzed and machine learning methodology (logistic regression) was used to classify textural features by a binarized pain variable based on a questionnaire. A subsample of N = 518 subjects from the NFBC1966 data was used in this population-based study. Various classification metrics were computed along with ROC analysis to assess the quality of the classifier.
Best classification accuracy (83%) and ROC-AUC (0.91) in the test set were achieved using the two lowest IVDs (Table 4, Figure 4).
The specificity score of 83% suggests that true negatives were relatively well-identified. The sensitivity score of 82% in turn suggests that also true positives were identified by the classification algo- Our results suggest that texture in IVDs L4-L5 and L5-S1 in T 2 -weighted MRI play a role in the manifestation of LBP in the data we used. This is also supported by a recent genetic study that showed a strong significant genetic correlation between IVD problems and back pain. 48    To conclude, texture analysis of lumbar MRI data shows promise as a diagnostic tool in the assessment of LBP. This methodology could be used, for example, to identify which tissues and anatomical regions account for the presence of LBP, or to process clearly negative cases to lighten the diagnostic workflow of medical professionals in routine imaging tasks.