Assessing the uncertainty of treatment outcomes in a previous systematic review of venous leg ulcer randomized controlled trials: Additional secondary analysis

Abstract In this secondary analysis of a previous systematic review, we assessed randomized controlled trials evaluating treatments of venous leg ulcers in terms of factors that affect risk of bias at the study level and thus uncertainty of outcomes obtained from the interventions. Articles that assessed the wound bed condition in venous leg ulcers and that were published in English between 1998 and May 22, 2018 were previously searched in PubMed, Embase, CINAHL, CENTRAL, Scopus, Science Direct, and Web of Science. Duplicates and retracted articles were excluded. The following data were extracted to assess the risk of bias: treatment groups; primary and secondary endpoints that were statistically tested between groups, including their results and p values; whether blinding of patients and assessors was done; whether allocation concealment was adequate; whether an intention‐to‐treat analysis was conducted; whether an appropriate power calculation was correctly done; and whether an appropriate multiplicity adjustment was made, as necessary. Pre‐ and post‐study power calculations were made. The step‐up Hochberg procedure adjusted for multiplicity. Results were analysed for all studies, pre‐2013 studies, and 2013/post‐2013 studies. We included 142 randomized controlled trials that evaluated 14,141 patients. Most studies lacked blinding (72.5–77.5%) and allocation concealment (88.7%). Only 49.3% of trials provided a power calculation, with 27.5% having an appropriate calculation correctly done. Adequate statistical power of the primary endpoint was found in 27.2% of trials. The lack of multiplicity adjustment in 98.6% of studies affected the uncertainty of outcomes in 20% of studies, with the majority of the secondary endpoints (67.7%) in those studies becoming non‐significant after multiplicity adjustment. Recent studies tended to weakly demonstrate improved certainty of outcomes. Venous leg ulcer randomized controlled trials have a high degree of uncertainty associated with treatment outcomes. Greater attention to trial design and conduct is needed to improve the evidence base.

K E Y W O R D S randomized controlled trials, risk of bias, trial design, uncertainty of outcome, venous leg ulcers

| INTRODUCTION
There is a glaring gap between evidence and clinical practice in wound care, with many clinicians relying solely on their clinical experience and a traditional approach to care. 1,2 The application of evidencebased medicine (EBM) to wound care is further complicated by the diverse variation in wound types and treatment options. Consequently, many clinical practice guidelines and recommendations have been based on expert opinion. 2 Limited evidence produced from wound care randomized controlled trials (RCTs) is a result of poorly designed studies that are underpowered with small sample sizes, have too short follow-up periods to be able to properly assess wound outcomes, and employ poor analysis of endpoints. [3][4][5] The lack of a sound and applicable evidence base in wound care results in great clinical uncertainty that clouds clinical decision-making and can contribute to the use of suboptimal treatments, inequalities in care, and wasted resources. 3,[6][7][8][9] Uncertainty in outcome effects has also been a focal point of the GRADE system, 10 which has now been extended to the concept of the threshold or ranges to rate certainty of the evidence for an individual outcome. 11 Venous leg ulcers (VLUs) are among the most ubiquitous types of wound, 12 with an annual incidence rate estimated to be greater than 2%, costing the United States up to $14.9 billion each year. 13 Considered to be the highest level of evidence, 1,2 systematic reviews are the most relevant vehicle to evaluate the certainty of RCT outcomes. In health care, they are now used to develop clinical practice guidelines and are often required as a prerequisite to research funding. 14,15 A well-conducted systematic review can produce more reliable, precise, and generalizable results with limited bias to be used by providers, payers, researchers, and policymakers for therapeutic advancements. 14,16 To be able to properly assess the bias, uncertainty of outcomes, and external validity of VLU RCTs, a systematic review must assess the randomization process, allocation concealment, blinding, power analysis, attrition rates, study group similarities, eligibility criteria, primary outcome measures, the inclusion of an intention-to-treat (ITT) analysis, and multiplicity adjustment of secondary endpoints. 1,14,17,18 In 2019, Gethin et al published a systematic review of 144 RCTs involved in the treatment of VLUs to assess the quality of reporting of data related to their external validity. 19 Their results showed there was inadequate reporting of factors that aid the clinician in determining the applicability of research findings to their patient population, despite the recommendations from CONSORT being available for over 20 years. Generalizability of studies is 1 of the 5 key domains of the GRADE approach to conducting systematic reviews. 20 The goal of our study was to assess the same RCTs in terms of other factors that could affect risk of bias at the study level and assess certainty of outcomes obtained from the interventions. We therefore sought to determine the uncertainty of outcomes for patients with VLUs treated with any drug, biologic, or device compared to standard of care or placebo.

| Study selection
We included the same studies selected by Gethin et al in their 2019 systematic review. 19

| Data extraction
If no primary endpoint(s) could be identified, the most relevant and/or prominent endpoint was chosen. Secondary endpoints were defined as any remaining endpoint that was tested statistically between treatment groups regardless of whether such endpoints were explicitly identified as such by the study authors. Evidence of successful blinding and allocation concealment had to be supported by detailed statements in the study reporting.
The ITT population was defined as all patients who were randomized to treatment groups. Exceptions were patients who were inappropriately randomized; that is, consent form not signed, or patient later found to be ineligible due to inclusion/exclusion criteria.
To be appropriate, a primary endpoint power calculation had to be congruent with the primary endpoint it supported, with reasonable assumptions, method(s), and sufficient data that a power calculation could be replicated. If the calculation was incorrectly performed by the study authors, the result was scored as a 'no'.
Any discrepancies between our initial independent assessments were resolved by consensus.

| Statistical analysis
Pre-and post-study power calculations were made using Pass13 Attrition rates for all treatment groups-that is, those patients whose outcomes became right-censored-were calculated based on the primary length of each study and expressed as a percentage of total patients in each treatment group. The overall attrition rate was calculated, as well as whether there was a difference of ≥20% between any treatment groups. 22 Adjustment for multiplicity of statistical testing used the step-up Hochberg procedure and was executed in Excel. The p values of all endpoints that were statistically tested with the exception of primary endpoint(s) were entered into the adjustment calculation. If there were coprimary endpoints, these were entered in a separate calculation. If actual p values were not reported but it was clear from the text that a statistical test was carried out, the following conservative p value imputations were made: non-significant: 0.06; <0.05; 0.04; <0.01; 0.009; <0.001; 0.0009.

| Reporting
The percentage of studies in which patients were blinded was calculated for all studies, pre-2013 studies, and 2013/post-2013 studies.
The same procedure was followed for blinded study assessment; adequate allocation concealment; ITT analysis (primary endpoint); reporting a study power calculation; appropriate power calculation; appropriate adjustment for multiplicity of statistical testing of secondary endpoints, if more than 1 endpoint was tested; the number of studies in which at least one secondary endpoint became statistically nonsignificant after adjustment; and the percentage of secondary endpoints that became statistically non-significant after adjustment.
The mean (standard deviation [SD]; range) of attrition rates across study breakpoints was also calculated.
T A B L E 1 Percentage of studies (n) with adequate blinding, allocation concealment, and ITT analysis conducted for the primary endpoint    High risk of bias featured prominently in both lack of blinding (about three quarters of trials) and lack of allocation concealment (approximately 9 out of 10 trials; Table 1). Although statistical power was consistent (Table 2) (Table 3). These figures improved slightly in newer studies.
The strengths of these trials were that ITT analysis was performed in the majority (62.7%) ( Table 1), and attrition rates were generally low (Table 4), with recent studies reporting much lower attrition rates than older studies. While there were no drastic differences between newer and older studies, recent studies tended to weakly demonstrate improved certainty of outcomes. However, our analysis also shows that VLU RCTs predominantly demonstrate a high risk of bias and low certainty of outcomes.

| Trial design
In wound care, robust RCT design is sometimes not possible, depending on the intervention or the condition studied. were of low quality due to inappropriate randomization, blinding, and nebulous exclusion criteria. 18,40 Researchers have to be more creative in how they tackle these issues in their trial design; for example, could inactive bioresorbable materials be made to approximate the appearance of any of the numerous cellular and/or tissue-based products (CTPs) currently being used in wound care so that the current subject blinding issue is ameliorated? While this seems an outlandish suggestion (it would have to be demonstrated that any such material did not affect wound healing and might also need FDA approval) an industrial consortium approach in which all CTP manufacturers contribute could explore its feasibility. Finally, better use of adaptive designs or even hybrid designs would allow for more identification of more responsive subjects or more efficient determination of safety/efficacy, as well as better generalizability of outcomes with attendant lower uncertainty. 39 requires two people to treat and assess the patient, which is not practical in many clinical situations, although is recommended for trial situations. Gould and Li recommend that wound care trials implement a standardized wound assessment methodology that tackles blinding, by using a blinded, on-site assessor, who is not the treating clinician, and a blinded, remote adjudication panel of two to three wound care experts. 43 Some products, such as CTPs, leave telltale marks in the wound area, which immediately inform an experienced assessor that a subject was treated with the intervention. This kind of problem can automatically invalidate blind assessment and is probably the most challenging aspect to assessing VLU treatment, but the use of artificial intelligence, such as computerized planimetry, and remote assessors to assess wounds could overcome these limitations. 43 Clearly this is not an ideal world, but one that many researchers still have to inhabit. Nevertheless, spending more time to develop new avenues to solving old problems before the trial starts, rather than ignoring them, is likely to pay off even under financial constraints.

| Study strengths and limitations
Rather than select a body of studies to examine a given treatment approach for our systematic review, we chose an existing review that focused on a particular assessment-study generalizability to broader populations-so we could add our results to visualize a bigger picture for one very common wound type. Consequently, our secondary analysis inherits some of the same limitations described by Gethin et al. 19 We did not perform a new, updated search of trials from June 2018 onward.
They only used English-language articles that they could access freely, so their initial search may not have been the most comprehensive, and there is some publication bias acknowledged. Further, we did not individually assess the risk of bias for each trial; our study design was based on assessing the overall risk of bias of these studies, including the newer versus older articles. We did not perform a comprehensive analysis of every factor that could potentially influence bias and uncertainty of outcomes in RCTs; for example, publication bias and consistency of treatment effects were not analysed in this review. However, allocation concealment, blinding, power analysis, attrition rates, primary outcome measures, the inclusion of an ITT analysis, and multiplicity adjustment of secondary endpoints are all major factors influencing uncertainty of outcomes and external validity that were not considered in the systematic review by Gethin et al. 1,14,[17][18][19] Finally, given our bundling of unclear with negative assessment categories, we recognize that in some instances, our results may be seen as too conservative. Nevertheless, the large dataset compiled from 142 VLU RCTs is a major strength of our analysis, and the results of our study demonstrate that more critical analysis of the uncertainty of outcomes in wound care is needed for other wound types and outcomes.

| CONCLUSIONS
VLU RCTs have high bias and poor uncertainty of outcomes incurred by lack of blinding and allocation concealment, insufficient statistical power associated with outcomes, and lack of multiplicity adjustment.
Newer studies tend to very weakly demonstrate improved certainty of outcomes compared to older studies. Greater attention to the uncertainty of outcomes and trial design and conduct is needed to improve the evidence base in wound care.

CONFLICT OF INTEREST
Kristen A. Eckert was a paid consultant of Strategic Solutions to this study; Marissa J. Carter: none to declare.