We are pleased that our work on shortened treatment regimes for those with chronic hepatitis C genotype 4 has generated interest. However, the comments Dr. Schafer makes in his correspondence invite a response. We welcome the opportunity to discuss our methodology and results and to provide explanations and references for Dr. Schafer as well as those interested in learning more about the statistical methods.
Schafer points to the number of patients randomized to the fixed duration group. As shown in the flow of patients through the trial, although 60 patients were initially randomized to the control group, 10 patients did not receive treatment. Therefore, only 50 patients were assigned to the control fixed group and received the standard 48-week treatment regimen. Schafer also claims that the early virological response (EVR) was only 3% in the variable duration group. It is not clear where the author got this from because it is clearly stated in the article that in the variable duration group 22% of the patients attained rapid virological response and 70% had complete or partial EVR, which is comparable with the results achieved in the control group.
Table 4 in Kamal et al. shows the results of multiple logistic regression analyses, where the measure of association is the odds ratio. For those in Group B, we see that, adjusted for other factors in the model, those with histologic stage of 0 or 1 have 616 times the odds of sustained virological response (SVR) compared to those with histologic stage of 2 through 4.
Those with interest in logistic regression can find detailed explanations and examples in the text by Hosmer and Lemeshow.2
To perform the noninferiority test, one must specify a boundary, b, defining equivalence. Let p1 denote the proportion achieving SVR in the variable-duration group and p0 denote the proportion achieving SVR in the fixed-duration (control) group. Our null hypothesis is that variable-duration treatment is inferior to fixed-duration treatment, and our alternative hypothesis is that variable-duration treatment is equivalent to fixed-duration treatment. In our case, we chose to specify b = −0.01. Thus, we write our hypothesis as follows: H0: p1 − p0 ≤ −0.01 versus H1: p1 − p0 > −0.01, and calculate a Wald test for a difference in two proportions from two independent groups, where the test statistic is compared to a normal distribution and a one-sided P value (probability of obtaining a statistic greater than the observed value of the statistic, not the absolute value, as would be used for a two-sided test) is calculated.
In this study, the sample size randomly assigned to variable-duration treatment was 308, and the sample size randomly assigned to fixed-duration treatment was 50. It is not clear why it is asserted that “this degree of accuracy is impossible in a study of this size.” Accuracy is not a function of sample size, rather something that is accurate is free from bias, which depends on the properties of statistical procedures such as estimators (sample proportions are unbiased estimators of true proportions) and issues related to study design such as sample selection, treatment assignment, and so forth. On the other hand, one may be concerned that the study lacks precision to estimate a difference in proportions within 1%; however, we note that the level of precision (a function of the standard error, which does depend on sample size) does not have to equal b. In particular, most conventional statistical tests reported in biomedical literature are two-sided with b = 0, that is, H0: p1 − p0 = 0 versus H1: p1 − p0 ≠ 0. Performing a Wald test of this hypothesis is very appropriate as long as the sample size is large for the normal approximation of the estimated difference in proportions to be accurate. One general guideline is that the number of “successes” (in our case, the number achieving SVR) and the number of “failures” (those not achieving SVR) expected if the null hypothesis is true should be at least 5 for both treatment groups. Those with interest in statistical methods for comparing proportions can find additional explanations in the text by Fleiss et al.3
It was asserted in the correspondence that there is “an overall higher response rate with fixed duration of treatment”. However, this could not be located anywhere in the article. On the contrary, the Results subsection “SVR by Treatment Group” and Figure 2 showed that those assigned to the control group had a lower percentage attaining SVR compared to those assigned to the variable treatment group (58% versus 68%).
Finally, it is not clear what assumptions (true difference in SVR proportions, etc.) were made and what test properties (type I error, desired power, boundary for equivalence) were specified to make Schafer conclude that “the study was far too small to demonstrate noninferiority” and that “the failure to achieve comparable groups in this study renders it impossible to support its conclusions”. In our study, the data provided enough evidence to reject the null hypothesis of inferiority. At this point, it would be appropriate for us to point out that our published study is the first and largest study to examine shorter treatment regimens based on viral response for patients infected with hepatitis C virus genotype 4. Tailoring treatment according to week 4 and week 12 viral responses promotes a more efficient use of this combination therapy by stopping treatment in patients with a high likelihood of treatment failure and by shortening regimens in patients who show EVRs that are predictive of favorable treatment outcomes. However, as we mentioned in the article, we believe that treatment duration should be further explored, in particular examining subgroups defined by early viral kinetics (for example, by designing the study with dynamic treatment regimes or randomizing at 4 or 12 weeks after initiation of Pegylated-interferon) and identifying patient-level factors that are potentially helpful for designing customized, efficacious, and cost-effective treatment regimens. The estimates derived from our published study can be used to determine an appropriate sample size for designing a new study. Sample size formulas in Fay et al. are useful here.4