Root coverage stability: A systematic overview of controlled clinical trials with at least 5 years of follow‐up

Abstract Objectives To systematically assess the long‐term outcome (≥5 years) of root coverage procedures reported in controlled clinical trials. Material and Methods Literature search was performed according to the PRISMA guidelines with the following eligibility criteria: (a) English or German language; (b) controlled (CT) or randomised controlled clinical trials (RCT); (c) root coverage procedure with ≥5 years follow‐up; and (d) clinical treatment effect size and/or patient‐related outcome measures (PROMs) reported. Results Four CT and 14 RCT with a follow‐up of 5–20 years fulfilled the eligibility criteria; sample size per study ranged from 8 to 70 patients contributing with 18–149 sites. Coronally advanced flap (CAF) and CAF + connective tissue graft (CTG) were the prevalent treatments (i.e., in 24 and 38% of the groups, respectively), while other flap designs and adjuncts (i.e., enamel matrix derivative, bone graft, collagen membrane) were represented only once. For single Miller class I/II gingival recessions (GR), CAF + CTG appeared advantageous compared to other techniques, and provided low residual recession depths (i.e., ≤0.5 mm), and complete root coverage in ≥2/3 of the patients; similar tendency was observed for multiple GR. No data on Miller class III/IV GR is available. No meta‐analysis was feasible due to lack of similarity in the clinical and methodological characteristics across the trials and observed comparisons of interventions. Conclusions CAF + CTG appears to be the ‘gold standard’ technique for the treatment of single and multiple Miller class I/II GR also in regard to long‐term (i.e., ≥5 years of follow‐up) treatment outcomes. There is little information regarding the performance, on the long‐term, of other techniques and adjuncts.


| BACKGROUND
The most frequent reason for patients to undergo a cosmetic dentistry treatment (AACD, 2013), which includes periodontal plastic surgeries, was 'to improve physical attractiveness and self-esteem'.
Recently, two systematic reviews (Dai et al., 2019; addressed the question of time on the stability of root coverage procedures. One of them (Dai et al., 2019) summarised the available literature with at least 24 months follow up until July 2018. Based on primarily pairwise meta-analyses (i.e., short-vs. long-term and comparisons of different techniques) the results indicated that mean root coverage (RC) worsened over time for CAF, but not for CAF + CTG; further, the complete root coverage (CRC) rate and KTW was significantly higher at the longterm outcome for CAF + CTG compared to CAF only. CAF + EMD displayed no significant changes in terms of CRC rate comparing short-versus long-term results. In the other review ) the effect of time (but not specifically the long-term effect) was assessed by means of network meta-analyses (NMA), including all studies presenting data for at least two different time points (e.g., after 3 and 12 months). The authors concluded that primarily CTG-based procedures appeared sufficient to achieve stable results over time, while flap only or flap with the addition of EMD or STS showed a tendency for relapse. Unfortunately, all studies presenting data with a follow-up >80 months (i.e., four studies in total) were excluded from this specific analysis. In this context, the 10th European Workshop on Periodontology (Tonetti et al., 2014) and the recently updated Cochrane systematic review (Chambrone, Ortega, et al., 2019), have advocated that 'long-term' should be considered as having at least 5 years follow-up. Hence, no comprehensive summary of long-term outcomes of root coverage procedures (i.e., ≥5 years), specifically, is existing so far, but it seems relevant, considering the relatively high number of recently published individual long-term studies de Santana et al., 2019;Kroiss et al., 2019;Petsos et al., 2020;Tavelli, Barootchi, Di Gianfilippo, et al., 2019).
Thus, the present systematic review aimed to address the following focused question according to the Population, Intervention, Comparison, Outcomes, Study Design (PICOS) criteria (Miller & Forrest, 2001): 'In patients with single or multiple GR, what is comparatively the long-term outcome (≥ 5 years) of root coverage procedures with a flap alone or flap with adjuncts (soft tissue grafts or substitutes, bone grafts or substitutes, membranes, or biologic agents) and/or different flap designs in terms of clinical outcome parameters?'. Further, the aim was to provide a hierarchy of interventions by means of NMA, wherever possible.

| Protocol and eligibility criteria
The present systematic review was reported according to the criteria of the Preferred Reporting Items for Systematic Reviews and Metaanalyses (PRISMA; Appendix 1; Liberati et al., 2009;Moher et al., 2009) and was registered at PROSPERO (CRD42020165024).
The following inclusion criteria were applied during the literature sea-

| Data collection and extraction
Two authors (K.B., K.M.) independently checked title, abstract, and finally full-text on the pre-defined eligibility criteria. Abstracts with unclear methodology or follow-up were included in full-text assessment to avoid exclusion of potentially relevant articles. One author (K.B.) repeated the literature search. Kappa scores regarding agreement on the articles to be included in the full-text analysis and those finally chosen were calculated. In case of ambiguity, consensus through discussion was achieved together with a third author (A.S.).
Two authors (K.B., K.M.) extracted twice the following data at baseline (i.e., before surgery; BL), at an intermediate time-point (IM) [i.e., after 6 or preferably 12 months if existing], and at final evaluation (i.e., ≥ 60 months; FE): recession depth (mm; RD), RD reduction (i.e., BL to FE), RD stability (i.e., IM to FE), CRC (%), CRC stability (i.e., IM to FE), mean RC (%), RC stability (i.e., IM to FE), KTW (mm), KTW increase (i.e., BL to FE), KTW stability (i.e., IM to FE), and probing pocket depth (PD). Further, any evaluation of the aesthetic outcome and/or PROMs, study design, sample size, patient/tooth/GR characteristics, type of intervention, and evaluation time-points were recorded. Finally, a list of potential predictors for the outcome in general and its stability on the long-term was created and its frequency of reporting in each paper extracted: gingival phenotype/thickness, GR width, flap details (i.e., incision design, positioning), CTG details (i.e., donor region, harvesting technique, CTG thickness and coverage), root conditioning, details on any cervical lesion [i.e., detectability of the cemento-enamel junction (CEJ), absence/presence of a cervical step, restoration of any cervical step], timepoint of suture removal, and details on the supportive periodontal treatment provided during the follow-up (i.e., interval, surveillance of oral hygiene habits).

| Risk of bias assessment
Two authors (K.B., K.M.) independently evaluated the risk of bias (RoB) of the studies eligible for NMA applying the Cochrane Collaboration's Tool for assessing RoB Version 2 [Cochrane Handbook for Systematic Reviews of Interventions; (Sterne et al., 2019)]. The following domains were evaluated as 'low risk', 'high risk', or 'some concerns' risk: (a) randomisation process; (b) deviations from intended interventions; (c) missing outcome data; (d) measurement of the outcome; and (e) selection of the reported results. The overall risk of bias for an individual study was judged as: 'low risk', if all criteria were evaluated to be of low risk; 'high risk', if at least one criterion was evaluated to be of high risk; 'some concerns', if at least one criterion was evaluated to provide some concerns but no criterion with the judgement high risk. One author (K.B.) repeated the assessment and in case of ambiguity consensus through discussion with another author (A.S.) was achieved. Additionally, any report on any funding (e.g., self-supported, research grant, industry, etc.) was collected.

| Synthesis of results
Two primary outcome parameters (i.e., RD and CRC at FE) and several secondary outcome parameters (i.e., RD reduction, RD stability, CRC stability, RC at FE, RC stability, KTW at FE, KTW increase, KTW stability) were defined for statistical analysis. If necessary, outcome parameters were calculated (e.g., RD reduction by subtracting RD at FE from RD at BL, RD stability by subtracting RD at FE from RD at IM, etc.) and/or the authors of the original publications were contacted. Aesthetic outcome parameters, PD at FE, PROMs, and the potential predictors were summarised for overview in tables. and multiple GR and/or Miller class I/II and III/IV, were not considered as comparable. All outcomes were measured using the mean difference, except for CRC, which was measured using the odds ratio in the logarithmic scale (log OR). NMA was intended for each outcome; for details see Appendix 2 (including Figure 1 and Appendix 3).

| Study selection
The flowchart of the literature search is presented in Appendix 4.

| Study characteristics
An overview of study design, sample size, patient/tooth/GR characteristics, type of intervention, and evaluation time-points is given in Table 1.

| Study populations
The sample size in the various studies ranged from 8 to 70 patients, contributing with 18 to 149 sites; the number of patients and sites at FE was always reported, except for a single study (Kroiss et al., 2019).
All participants were judged as healthy or at least as not having any systemic disease that could interfere with periodontal tissue healing; one study (Petsos et al., 2020) did not report on any systemic conditions. Six studies included only non-smokers, 8 studies mixed (former) smokers and non-smokers, and 4 studies did not report in detail on the smoking status. All studies reported loss of study subjects to follow-up, ranging from 0 to 15 patients among studies; one study (Rasperini et al., 2018), which had been originally a multi-centre study (Cortellini et al., 2009), reported the long-term outcome of only one specific centre (i.e., 25 out of original 85 patients).
F I G U R E 1 A panel of network plots for the primary outcomes RD and CRC. The nodes refer to the interventions and the lines that link the nodes indicate the observed comparisons. The size of the nodes is proportional to the number of comparisons that include the node. The thickness of the lines is proportional to the number of trials that investigate the corresponding comparison. CRC, complete root coverage; RD, recession depth The study was initiated as multi-centre study, but for the long-term outcome only the patients of one specific centre were reported (no loss to follow-up for this specific centre). Authors report in the publication additionally on 14 adjacent sites, which had been treated with CAF only, but these sites have not been included herein.

| Description of defect and site characteristics
Most of the studies (n = 15) included Miller class I or II GR (Miller, 1985),   (Moslemi et al., 2011) with a relatively high mean residual RD in the CAF + CTG group (i.e., RD 1.83 mm, RC 39.8%), the mean residual RD remained in the CAF + CTG groups ≤0.83 mm with a RC rate ≥75%. In this specific study (Moslemi et al., 2011), the CAF + ADMA group achieved superior results compared to the CAF + CTG group, but mean residual RD was still high with 1.27 mm and RC and CRC rate low with 54.6 and 20%, respectively. One of the CAF + CTG groups (Paolantonio et al., 1997) presented an improved (i.e., lower) mean residual RD in the long-term follow-up (i.e., comparing IM to FE), while the above-mentioned group treated by CAF + ADMA (Moslemi et al., 2011) lost almost 1 mm in RD in the follow-up after IM, which resulted also in a loss of 33.2 and 53.3% in the RC and CRC rate, respectively. Mean PD at FE remained for all study groups ≤1.5 mm.
For the following groups no single CT/RCT with a long-term outcome was available: • Single Miller class III/IV GR

| Aesthetic outcome parameters and PROMs
Altogether, 10 out of the 18 studies reported either aesthetic outcome parameters and/or PROMs (Table 4). Based on these data it appears that procedures with CTG as adjunct might be less favourable in terms of colour, texture, and contour compared to the adjacent tissue, in terms of keloid formation, and in terms of patients' preference of the procedure, but patient satisfaction with the outcome was not affected and remained high (>80% in VAS). Further, dentin hypersensitivity showed in general an improvement, but 100% success should not be expected.

| RoB assessment and funding
Out of the seven studies, which were considered originally for NMA, two studies (Kuis et al., 2013;Leknes et al., 2005) (Dominiak et al., 2006;Francetti et al., 2018;McGuire et al., 2012;Paolantonio et al., 1997) did not report on any funding, and one study (Leknes et al., 2005) received the product free-of-charge, but received otherwise no funding.

| DISCUSSION
The primary aim of GR treatment is CRC, with natural appearance of the tissues, and stability of the outcome on the long-term. The present systematic review aimed to provide an overview of the available literature on the long-term outcome of root coverage procedures and to provideif possiblerecommendations on which techniques have the highest probability for a successful outcome on the long-term.
The results of one of the most recent systematic reviews (Chambrone,

T A B L E 4 Aesthetic outcome parameters and PROMs at final evaluation in relation to the gingival recession type
Study (  analysis not feasible. Therefore, the overview presented herein is primarily of descriptive nature. When excluding the CAF group of a single study (Leknes et al., 2005) with significantly inferior outcomes than what is usually reported (i.e., mean residual RD of about 2.5 mm), the mean residual RD ranged for the CAF and CAF + CTG groups from 0.46 to 1.15 mm and from 0.19 to 0.5 mm, respectively; CRC ranged from 33 to 60% and from 66.7 to 88.2%, respectively. Thus, it seems there is a tendency for more favourable treatment outcomes with CAF + CTG. This is supported by the fact that 2 CAF + CTG groups (Francetti et al., 2018;Rasperini et al., 2018) showed 'creeping attachment' over time resulting in a lower mean residual RD at FE compared to IM. In general, except for the above-mentioned CAF group presenting especially bad outcomes (Leknes et al., 2005) Regarding the treatment of multiple Miller class I/II (and III) GR, the few available studies showed better outcomes for CAF + CTG compared to only CAF (Pini-Prato et al., 2010;Zucchelli et al., 2014) or to ADMA (Kroiss et al., 2019). In particular, CAF groups slightly lost over the years from what was originally achieved (i.e., 0.2 mm increase in RD), while the addition of a CTG resulted in a minor improvement from IM to FE (i.e., 'creeping attachment' occurred); the mean residual RD for the CAF + CTG group was only 33-50% of the CAF group (Pini-Prato et al., 2010, Zucchelli et al., 2014. Use of ADMA as an alternative to CTG with CAF (Kroiss et al., 2019;Tavelli, Barootchi, Di Gianfilippo, et al., 2019), yield similar or higher mean residual RD compared to what was reported in other studies for CAF alone and 2-to 9-times higher values compared to CAF + CTG. Additionally, in both studies (Kroiss et al., 2019 (Dominiak et al., 2006) showing inferior results for CAF + CTG compared to CAF + GTR, again the CAF + CTG group presented with a F I G U R E 2 A panel of forest plots for all observed comparisons in RD and CRC. The unique observed comparisons and the included trials appear on the right and the left of the panel, respectively. The trials have been ordered chronologically. The x-axis refers to the mean difference and the log OR for the corresponding primary outcomes RD and CRC, respectively. The design of the trial (parallel group vs. split-mouth design), the level of RoB (some concerns vs. high), and the smoking status of the participants (mixed vs. non-smoker) are indicated with different line types (solid vs. dashed), colours (orange vs. red), and point shapes (circle vs. triangle), respectively. The vertical grey line above zero implies no difference between the compared interventions. A positive mean difference and log OR indicate that the second intervention in the comparison is more favourable. CRC, complete root coverage; RD, recession depth mean RD at baseline being 0.75 mm higher compared to the CAF + GTR group.
In perspective, the lack of long-term data of controlled clinical trials with at least 5 years follow-up for several interventions and combination approaches (i.e., TUN + CTG, CAF + EMD, CAF + CM, CAF + CTG + EMD, etc.) has to be kept in mind. However, taking also nonprospective and/or shorter follow-ups into account, a 6-year retrospective analysis (Bhatavadekar et al., 2019)  including specifically the close surveillance of oral hygiene habits, is considered as a major determinant of long-term success (Cairo et al., 2014;Dai et al., 2019;Leknes et al., 2005;McGuire et al., 2014;Moslemi et al., 2011;Pini Prato, Franceschi, et al., 2018;Rasperini et al., 2018;Zucchelli et al., 2014;Zucchelli & De Sanctis, 2005); for example, a horizontal toothbrushing technique increased the risk for a relapse 11-times (Moslemi et al., 2011). Further, the tooth region appears to affect the probability of achieving CRC (Zucchelli et al., 2018;Zucchelli et al., 2019). Naturally, this cannot be altered by the surgeon, but one should consider assessing different techniques separately in the upper and lower anterior sextant, which have been described as having the best and worst probability for a good outcome, respectively (Zucchelli et al., 2019). Additionally, a low baseline KTW as well as KTW <2 mm after the intervention were reported as negative predictors Pini Prato et al., 2011;Pini Prato, Franceschi, et al., 2018;Pini Prato, Magnani, & Chambrone, 2018;Pini-Prato et al., 2012;Tavelli, Barootchi, Di Gianfilippo, et al., 2019), while the gain of KTW might be affected by the chosen technique [e.g., by leaving the CTG exposed or not exposed (Dodge et al., 2018)].
In this context, although CAF + CTG is described as the 'gold standard', CTG as an adjunct might not be necessary in every single case; that is, recent studies indicated that a thick tissue at baseline might not need the addition of a CTG (Cairo, Cortellini, et al., 2016;Rasperini et al., 2019Rasperini et al., , 2020. Hence, one can speculate whether the gingival thickness at the end of the procedure is the main determining factor for long-term stability and not necessarily the addition of a CTG; for example, in one of the RCT  a gingival thickness ≥1.2 mm after 6 months was a significant predictor for stability of the gingival margin later on. However, the gingival phenotype/thickness was not reported in >70% of the studies included herein. Details on the CTG harvesting procedure itself might be another relevant factor (Tavelli, Ravidà, Lin, et al., 2019). Different harvesting techniques (split-flap procedure or de-epithelialised CTG) result in different tissue compositions of the graft (Bertl et al., 2015), which might affect the resulting tissue thickness but potentially also the risk for keloid formation. Nevertheless, a large variation in terms of reporting frequency (5.5-95%) of these and other potentially relevant factors was observed (Appendix 6).
Root coverage procedures are primarily cosmetic dentistry and aim to improve patient aesthetics (Cairo, 2017;Cairo, Pagliaro, et al., 2016). Hence, also aesthetic outcomes (based on both, professional evaluation as well as on patient perception) and PROMs should be evaluated. Ten out of 18 studies herein reported on various PROMS, ranging from dentin hypersensitivity to colour/texture/contour match, aesthetic scores, and patients' satisfaction. Although the analysis is again limited to an overview (Table 4), some tendencies can be deducted. Procedures including a CTG harbour the risk to be less favourable in terms of colour, texture, and contour match compared to the adjacent tissues, in terms of keloid formation, and in terms of patients' preference of the procedure. However, at least among the studies included herein, the overall patient satisfaction with the final outcome was not negatively affected by the adjunct of a CTG. These findings are well in agreement with short-term results on aestheticand patient-related outcomes following root coverage procedures recently summarised (Cairo et al., 2020).
Based on a primarily descriptive summary presented herein, a few recommendations for future studies and for the clinical praxis can be provided: • RCT reporting on long-term outcomes of single and multiple Miller class III/IV (RT 2 and 3) GR are not available so far, and therefore warranted.
• A higher number of long-term assessments of other flap designs (e.g., TUN) and/or adjuncts (e.g., EMD, CM) in comparison to the 'gold standard' CAF + CTG are required for all type of GR.
• Evaluation of the aesthetic outcome and PROMs should be included, as these might benext to the higher morbidity due to the second surgical siterelative shortcomings of the 'gold standard' CAF + CTG.
• Details on CTG harvesting (i.e., region, graft type/harvesting technique, graft thickness, positioning in relation to the flap) should be either reported or direct comparisons be performed to be able to evaluate whether a specific graft type/technique is more likely to achieve 'creeping attachment' and/or more favourable aesthetic outcomes.
• Individual patient data and frequency distributions instead of primarily mean values would allow a better understanding, whether any loss in mean root coverage is due to clustering to a few patients with a severe relapse or due to all patients losing slightly.

| CONCLUSIONS
• For single Miller class I/II GR, CAF + CTG appears as the 'gold standard' also on the long-term (i.e., ≥ 5 years of follow-up), providing low residual RD (i.e., ≤ 0.5 mm) corresponding to a high RC rate >80%, and CRC was achieved in at least 2/3 of the patients.
• Other interventions (e.g., CAF + EMD, CAF + CM, CAF + grafting) were tested too seldomly to draw firm conclusions for the treatment of single GR; in the direct comparisons of the individual studies, CAF + CTG appeared as the advantageous technique.
• For multiple Miller class I-III GR, CAF + CTG appears also as the best technique, providing a low mean residual RD with the potential for 'creeping attachment'; ADMA as adjunct tended to have a higher relapse rate compared to CAF + CTG.
• For single or multiple Miller Class III/IV GR, no information on long-term outcome is available.
• In general, comparing the mean residual RD values after 6 or 12 months to the final outcome, CAF + CTG was the only intervention for which a 'creeping attachment' became apparent.
• As CAF was the prevailing flap design (i.e., in 86.5% of the groups), no conclusion on other flap designs (e.g., TUN, LPF, etc.) can be drawn.
• Although CAF + CTG can be considered as the 'gold standard' in terms of clinical parameters, there might be some shortcomings in terms of tissue integration and aesthetic appearance.