Development of a standardized histopathology scoring system for intervertebral disc degeneration and regeneration in rabbit models‐An initiative of the ORSspine section

Abstract Background The rabbit lumbar spine is a commonly utilized model for studying intervertebral disc degeneration and for the pre‐clinical evaluation of regenerative therapies. Histopathology is the foundation for which alterations to disc morphology and cellularity with degeneration, or following repair or treatment are assessed. Despite this, no standardized histology grading scale has yet been established for the spine field for any of the frequently utilized animal models. Aims The purpose of this study was to establish a new standardized scoring system to assess disc degeneration and regeneration in the rabbit model. Materials and Methods The scoring system was formulated following a review of the literature and a survey of spine researchers. Validation of the scoring system was carried out using images provided by 4 independent laboratories, which were graded by 12 independent graders of varying experience levels. Reliability testing was performed via the computation of intra‐class correlation coefficients (ICC) for each category and the total score. The scoring system was then further refined based on the results of the ICC analysis and discussions amongst the authors. Results The final general scoring system involves scoring 7 features (nucleus pulposus shape, area, cellularity and matrix condensation, annulus fibrosus/nucleus pulposus border appearance, annulus fibrosus morphology, and endplate sclerosis/thickening) on a 0 (healthy) to 2 (severe degeneration) scale. ICCs demonstrated overall moderate to good agreement across graders. An addendum to the main scoring system is also included for use in studies evaluating regenerative therapeutics, which involves scoring cell cloning and morphology within the nucleus pulposus and inner annulus fibrosus. Discussion Overall, this new scoring system provides an avenue to improve standardization, allow a more accurate comparison between labs and more robust evaluation of pathophysiology and regenerative treatments across the field. Conclusion This study developed a histopathology scoring system for degeneration and regeneration in the rabbit model based on reported practice in the literature, a survey of spine researchers, and validation testing.


| INTRODUCTION
The intervertebral discs of the spine are the composite, avascular structures which reside between bony vertebral bodies and are responsible for bearing often high magnitude loads derived from complex spinal motion during activities of daily living. 1,2 The intervertebral disc is composed of the highly hydrated, proteoglycan-rich nucleus pulposus (NP), surrounded by the annulus fibrosus (AF), which is composed of lamellae of primarily type I collagen fibers with alternating ±30 orientation to the transverse plane. 3 Bounding the disc both superiorly and inferiorly are the bony and cartilaginous endplates (EPs), which play a key role in constraining the NP and regulating nutrient transport to and from the avascular disc. 4,5 Intervertebral disc degeneration is associated with a variety of structural, compositional, and mechanical perturbations to all of the disc substructures. 6 Degeneration of the intervertebral discs is considered to be a major contributor to back pain, which has become a primary cause of disability globally. 7,8 Studying the etiology and progression of disc degeneration in humans can be difficult, costly, and time consuming, and as such, animal models are commonly used to study the progression of this disease or to evaluate the efficacy of new therapeutic interventions to affect disc regeneration. 9,10 The rabbit lumbar spine is a common preclinical model to study disc degeneration and to evaluate therapeutic interventions. 11,12 The rabbit model has the advantage of cost-effectiveness compared to larger animal models (ie, canine, sheep, goats), while possessing comparable anatomy to humans because of the presence of facet joints, paravertebral muscles and ligaments compared to the commonly utilized rodent tail models. Compared to preclinical rodent tail models, the larger size of rabbits means surgical intervention can be more precise and device implantation is possible. Normalized mechanical properties of the rabbit motion segment measured in axial compression and torsion are similar to human discs, as are the glycosaminoglycan content of the NP and AF. 13 Geometric properties of the rabbit lumbar disc, including disc height, dorso-ventral width, and NP area deviate by 26% from human geometry, after normalization. 14 Limitations of the rabbit model include the persistence of notochordal cells into adulthood, in contrast to humans where the notochordal cell population is largely lost in adolescence. 15 Additionally, unlike mouse and rat models, well established methods to assess pain in rabbit models of disc degeneration do not yet exist.
Histology is a common experimental endpoint for studies of disc degeneration and regeneration, yet standardized histopathology grading systems for animal models and humans do not yet exist for the spine field as they do, for example, in the cartilage field with the Osteoarthritis Research Society International (OARSI) score. 16 The lack of widespread adoption of a histologic scoring system in the field renders comparison of results across studies from multiple groups difficult. The purpose of this work was to establish and validate a standardized histological scoring system for grading intervertebral disc degeneration and regeneration in the rabbit model. To do so, we performed three studies: (a) a literature review of published papers utilizing the rabbit model, (b) a survey of spine researchers, and (c) validation of the new scoring system.

| Literature review
A Pubmed search through 31 December 2019 was conducted using the search term "intervertebral disc degeneration rabbit model." Papers not published in English were excluded, as were papers with no in vivo component, such as those utilizing computational methods, or in vitro or ex vivo cell or organ culture models . The full text portable document formats of the included studies were obtained and   reviewed for the type of histology performed, the stains utilized, if   histology grading was performed and what grading scheme was used, and what additional outcome measures were reported.

| Survey study
Based on the results of the literature review, a survey was generated in Google forms to query spine researchers regarding their current practices regarding histological processing of rabbit discs and their opinion on which categories would be important to include in a consensus scoring system. A copy of the survey is included in the supplemental information.
T A B L E 1 The scoring system drafted after review of the literature and collation of the survey responses, which was utilized for the validation studies

| Scoring system validation and refinement
Based on the literature review and the survey results, a scoring system for disc degeneration and regeneration was drafted (  Table 1). NP cell number was replaced in the regeneration scoring system with cell cloning and cell morphology within the NP and inner AF. The degeneration scoring system was designed for use in studies investigating only a degenerative response, such that a higher score indicates a more degenerative disc. The regeneration scoring system was designed for use in studies investigating a regenerative therapeutic in comparison to a sham treatment (ie, saline injection), such that a higher score would indicate a more degenerative disc with no repair, while a lower score would be indicative of repair processes or a healthier disc.
Rabbit histopathology images were collected from four groups in the field who have used the rabbit model to study disc degeneration (n = 22 slides, from injury models such as puncture or nucleotomy) or to evaluate repair or regeneration (n = 19 slides, from growth factor treatment studies). Slides were sagittal sections stained with Hematoxylin and Eosin (H&E), Alcian Blue and Picrosirius Red, or Safranin-O and Fast Green. Images were collected, blinded, and distributed to 15 individual graders for scoring. Graders were asked to self-classify as "expert" (having at least 2 years of experience with rabbit histopathology grading) or "novice". Scoring was completed by five expert and seven novice graders. Graders scored the images twice, with 1 week in between scoring sessions to assess intra-rater reliability. Intra-and inter-rater reliability for each scoring category, as well as total score, were assessed by calculating intraclass correlation coefficients (ICC) in R (https://www.r-project.org/). ICC values above 0.9 were considered to indicate excellent agreement, whereas values between 0.75 and 0.9 were considered to indicate good agreement, values between 0.5 and 0.75 moderate agreement, and values below 0.5 poor agreement. 17 F I G U R E 1 Results of the literature review summarizing the number of papers (as indicated next to the color legend) utilizing different A, mechanisms of inducing disc degeneration; B, metrics for assessing disc degeneration or regeneration; C, existing histology scoring systems; and D, histologic stains Based on the ICC results from the draft scoring system, the language and example images in the scoring system were further refined.

| Ethics statement
For the survey study, the distribution and collection of the survey was deemed exempt research by the Corporal Michael J. Crescenz Vet- 3 | RESULTS

| Literature review
The initial PubMed search yielded 236 manuscripts. Ninety-one papers fell within the exclusion criteria, and the remaining 145 papers were further reviewed. This literature search identified several methods of inducing disc degeneration in the rabbit model ( Figure 1A); the most common of which was via needle puncture, which accounted for 60% of all papers reviewed. Within the puncture models, the most common needle gauge utilized to induce degeneration was an 18G needle (26% of papers). Additional common methods of inducing degeneration included annular stab or defect creation, NP removal or aspiration, or mechanical loading. Less frequently used models, included in the "other" category, were EP disruption or injury, nicotine administration, aging, gene knockout, chemonucleolysis, or injection of a catabolic agent. between the annulus fibrosus and nucleus pulposus," "nucleus pulposus cellularity," and "nucleus pulposus matrix". In the Masuda scoring Results of the survey of spine researchers indicating opinions on which A, histology stain should be used for rabbit histopathology scoring and B, importance (1 = low importance, 5 = high importance) of including different categories in the new scoring system. Survey respondents were asked to select all features they considered important to consider when scoring, within the broad categories of C, AF/NP cellularity; D, NP morphology,; E, Endplate; and F, AF morphology system, each category is scored from 1 to 3 points, such that a minimal score of 4 represents no degenerative changes and a maximum score of 12 represents severe degeneration.

| Survey study
Sixteen responses were received for the rabbit histopathology survey.
Survey respondents indicated that paraffin processing (87.5% of respondents) should be utilized for rabbit spine histology, with discs sectioned in the sagittal plane (75% of respondents) at <10 μm thickness (100% of respondents  (Figure 2A). Next, respondents were asked to rank categories for histology scoring on a 1 (least important) to 5 (most important) scale. Over 50% of respondents ranked the categories of NP morphology, NP cellularity, AF morphology, AF-NP border and EP as a 4 or 5. AF cellularity was ranked as the category with lowest importance, with more than 43.8% of respondents rating this category a 2 or 3 ( Figure 2B). When asked to designate important features which should be included within each scoring category, most of the features provided as options were selected by respondents ( Figure 2C-F).

| Recommendations for processing and use of the scoring system
To increase the consistency between studies across groups, we propose recommendations for histology processing for rabbit spinal motion segments, based on our review of common practices in the literature and current spine researchers ( Figure 3). We recommend that paraffin processing in the sagittal plane be utilized to obtain sections for use with the scoring system, as it allows for the acquisition of many, high quality, thin sections most ideal for semi-quantitative histologic scoring. This generally involves fixation of the boneintervertebral disc-bone motion segment (4% paraformaldehyde or 10% neutral buffered formalin), followed by decalcification and processing through paraffin. Mid-sagittal paraffin sections should be cut using a microtome at a thickness < 10 μm.

| Scoring system validation and refinement
For the initial proposed scoring system (Table 1), the ICC values and 95% confidence intervals (CIs) are listed in Table 2 for each scoring category, separated by expert and novice graders. Generally, there was better agreement between expert graders compared to novice graders. Inter-observer ICC values were also generally higher for degeneration scoring compared to regeneration, despite significant overlap in the categories in the scoring systems for degeneration and regeneration. Moderate (0.5 < ICC < 0.75) to good (0.75 < ICC < 0.9) agreement between graders was found for most scoring categories.
Categories with consistently poor agreement (ICC < 0.5) by both expert and novice graders included EP sclerosis/thickening for both degeneration and regeneration scoring systems, AF morphology for the regeneration scoring system, and NP/inner AF cell cloning and cell morphology for the regeneration scoring system. Intra-observer reliability indicated that the reproducibility for using this scoring system was excellent among all graders, with ICCs ranging from 0.81 (95% CI: 0.59-0.92) to 0.99 (0.97-1.0).
Based on the ICC results, and following discussion among the authors, the language of the scoring systems and the example images provided were refined to further clarify the features to be scored in the categories with poor ICC and maximize the usability of the scoring system. The final scoring system is summarized in Table 3 and example images most representative for each feature are provided in Figure 5. This main scoring system, yielding total scores of 0 (healthy) to 14 (severely degenerative), may be utilized for any study in the rabbit model, including induced degeneration and regeneration studies.
The cellular morphology observed during regeneration is typically characterized by extensive cell cloning (clusters of 4 or more cells), where cells have a rounded morphology with intense pericellular matrix staining. 18 This regenerative response is characterized by inherently different cellular morphology than observed in a normal, healthy rabbit disc, and thus cannot be captured with the same scoring system as for changes with degeneration compared to healthy controls. Therefore, we also propose an addendum to the main scoring system to be utilized only in studies involving comparison of a regenerative treatment to a sham treatment (Table 4, Figure 6). This repair score, yielding total scores of 0 (robust repair) to 4 (no repair), can be reported independently from the main score, or summed with the main score, yielding total scores from 0 to 18.

| DISCUSSION
In this study, we have proposed a new standardized histopathology scoring system to assess disc degeneration and regeneration in the rabbit lumbar spine, based on previous literature and current practices by research groups in the field. Seventy-three percent of previously published papers using rabbit models for degeneration or regeneration did not perform histologic scoring in their studies, highlighting the significant need for a simple, standardized histopathology scoring system for rabbit models. As a majority of studies in the rabbit model that perform histology scoring have utilized the grading scheme proposed in 2005 by Masuda et al, 11,19 this scoring system was utilized as the basis of the proposed scoring system. The Masuda scoring system for disc degeneration included categories for characterizing changes to the AF and NP, but did not include scoring of the EP to assess cartilage and bony EP remodeling, or scoring for cellular changes occurring with regeneration which provides information on the biological action of regenerative strategies. Scoring of these features was therefore included in the proposed scoring system, modified from scoring systems utilized by Chujo et al 18 and Ashinsky et al. 20 Because the use of decalcified, paraffin processed sections is recommended for scoring, the endplate features included in the scoring system are focused solely on morphologic changes to this region. Researchers interested in T A B L E 3 The final main scoring system after refinement based upon the grader reliability analysis and discussion amongst the authors, to be used for studies of both degeneration and regeneration Additional categories for grading NP morphology changes (area and shape), previously utilized in the rat model, 21,22 were also included to further increase the resolution of the scoring system for assessing degeneration and regeneration, as NP morphology was ranked by survey respondents as the most important category to be included in a histology grading scheme ( Figure 2). Additionally, in the Masuda scoring system, a disc with no degeneration would have a total score of 4, which is less intuitive than a score of 0 for a healthy disc as utilized in the proposed scoring system. The proposed scoring system also allows researchers to quantify more subtle histological changes compared to scoring systems based solely on gross morphology, such as the Thompson grade. 23 Overall, the proposed scoring system provides the capacity to quantify changes across the whole motion segment; however, depending on each particular research question being asked in the rabbit model, adaptation of the scoring system might be necessary to exclude categories not relevant to the study design, or further expand upon existing categories.
Validation of the scoring system was performed by 12 independent graders of varying experience levels, on images collected from different laboratories. No training was given to graders beyond T A B L E 4 The repair scoring system to be used as an addendum to the main scoring system for studies evaluating regenerative therapeutics These results also point to a need for more extensive training of graders prior to utilizing any histopathology scoring system. Research groups may wish to consider developing a "training set" of images obtained from their laboratory, which can demonstrate to graders how the features to be scored present in their groups' model, and with their processing methods. Future efforts by this group will involve the development of virtual content and workshops that may be utilized for training new users of the scoring system. An additional approach which may be considered is scoring by consensus, as has been performed in OARSI scoring of human samples, 22 whereby scoring is conducted as a group and a consensus score is agreed upon by all graders for each image.
The scoring system proposed here is designed to serve as a first step towards standardized outcomes in disc degeneration and regeneration studies in the rabbit model to allow for more accurate comparison between labs and more robust evaluation of pathophysiology and regenerative treatments. We not only anticipate future refinement of the histopathology scoring system itself, but also the establishment of standards for other outcome metrics In conclusion, we have developed a new standardized histopathology scoring system for intervertebral disc degeneration and regeneration in the rabbit lumbar spine model. This scoring system was developed based on prior literature as well as input from spine researchers using this model, and was validated with images across multiple laboratories by 12 independent graders. It represents one of several parameters utilized in the assessment of intervertebral disc health in an animal model. As this scoring system is further validated and refined, it will facilitate a more objective comparison of results across laboratories, degeneration models, and regenerative therapies.