Simplifying the estimation of diagnostic testing accuracy over time for high specificity tests in the absence of a gold standard

Many different methods for evaluating diagnostic test results in the absence of a gold standard have been proposed. In this paper, we discuss how one common method, a maximum likelihood estimate for a latent class model found via the Expectation‐Maximization (EM) algorithm can be applied to longitudinal data where test sensitivity changes over time. We also propose two simplified and nonparametric methods which use data‐based indicator variables for disease status and compare their accuracy to the maximum likelihood estimation (MLE) results. We find that with high specificity tests, the performance of simpler approximations may be just as high as the MLE.


INTRODUCTION
Evaluating results of diagnostic tests in the absence of a gold standard is a critical problem in patient care and is correspondingly the subject of a large body of research. In a recent review of methodology on diagnostic accuracy where a gold standard is missing or unavailable, Chikere et al. (2019)  ods where gold-standard results are missing, but a goldstandard test does exist; (2) Correction methods using imperfect reference tests with known diagnostic accuracy; (3) Methods using multiple results from the same or various imperfect tests with unknown diagnostic accuracy; and (4) Less robust methods such as agreement and test positivity rates. For this paper, we are interested in methods that fall into group 3. These methods are likely to be needed when evaluating diagnostic tests for emerging infectious disease viruses such as Ebola virus or more recently, SARS-CoV-2. For these viruses, true patient status is unknown. While it cannot be determined directly, some methods have been proposed approximating it using discrepancy analysis (administering a resolver test when discordant results occur) (Hawkins et al., 2001;van Dyck et al., 2004;Juhl et al., 2013) or using a composite reference standard combining multiple other test results (Alonzo and Pepe, 1999). Rather than approximating disease status with imperfect tests, latent class models (LCMs) can also be used. These models do not suffer from the same biases as discrepancy analysis (Chikere et al., 2019) or require as many assumptions about testing accuracy as a composite reference standard. Most importantly, they do not require more than one type of test to be administered to a participant. LCMs can be fit using frequentist (Hui and Walter, 1980;Qu et al., 1996;Albert et al., 2001;Chu et al., 2009aChu et al., , 2009bXu and Craig, 2009) or Bayesian (Black and Craig, 2002;Chu et al., 2009a;Ling et al., 2014) methods. To maximize the likelihood or posterior of an LCM, the Expectation-Maximization (EM) algorithm is often employed, maximizing over both the unknown patient status and the model parameters.
To measure viral persistence of a disease, multiple applications of the same nongold-standard tests can be administered over time to a population known to be previously infected. In a study of adult, male, Ebola virus disease (EVD) survivors that serve as motivation for the work described here, participants were asked to supply samples to determine the presence of Ebola RNA in semen. This is an important public health issue as sexual transmission of Ebola virus has been documented. Survivors supplied semen samples that were then tested using a polymerase chain reaction (PCR) assay. While the exact diagnostic accuracy of this test for Ebola virus is not known, PCR assays are generally believed to have very high specificity (Lemmon and Gardner, 2008). In addition, 100% specificity is reported by Pinsky et al. (2015) for the Xpert Ebola Assay used in our motivating example and test results by Pettitt et al. (2017) have also confirmed near perfect specificity. Semen samples from EVD survivors were tested every 2 or 4 weeks (if a survivor had a positive test then the next test was scheduled in 2 weeks, otherwise in 4 weeks) beginning at least 8 months after EVD symptoms resolved. As a result of the varying testing schedules and other assumed to be non-EVD-related reasons, test results were sometimes unobserved at given time periods. These test results can be assumed to be missing at random, that is, not dependent on unobserved disease shedding status (although they clearly depend on observed shedding status and so are not missing completely at random). Generally speaking, the proportion of positive results declined over time. This decline can be modeled as either a declining prevalence of Ebola viral shedding, or a declining sensitivity to detect Ebola virus in a viral "shedder" where the decline is sensitivity is attributed to a decreasing viral load rather than a decrease in the effectiveness of the test. Only the second can be expressed as an LCM since there is only a single latent variable per participant rather than one latent variable per data point which would make the model unidentifiable.
Because of the known high specificity of the PCR assay, simpler, data-based approximations of an LCM could also be considered. We will propose two data-based indicator variables to approximate the unobserved disease status. If we expect the specificity of the test to be nearly or effectively 100%, it could be possible to assume all positives are true positives and simply classify those who had at least one positive test as a "shedder" and those without a positive test, a "nonshedder." This reductive approach will break down with even extremely low rates of false positives, and also has bias associated with low sensitivity. However, we will show that modifications to this approach can allow for a more effective databased indicator for shedding status which can perform as well as the maximum likelihood estimation (MLE) found using the EM algorithm for an LCM when specificity is high and an appropriate number of test results are observed.
There are several advantages to these simpler approximations despite their bias. First, most LCMs require full specification of a model for viral decay. This does not allow for easy modeling of nonmonotonic decay without the inclusion of additional parameters which could make the model unidentifiable. Our simpler approximations make no assumption about the shape of the decline in sensitivity. Second, standard R packages which can be used to implement the EM algorithm for binomial mixture models, such as flexmix (Grün and Leisch, 2008), make assumptions of conditional independence. This assumption, which is difficult to verify, means that given the "shedding" status of an individual, their test results are considered independent, and this assumption is known to bias results if it is violated (Albert and Dodd, 2004;Pepe and Janes, 2007). It is possible to implement the EM algorithm and include a correlation structure without the use of a standard package, as in Menten et al. (2008); Xu et al. (2013); Daggy et al. (2014). However, this is much more complex to implement and assumptions about the conditional dependence structure must be carefully justified which involves a deep understanding of the biological pathways of both disease status and diagnostic test (Pepe and Janes, 2007). Misspecification may not be detected by a lack-of-fit assessment and still result in substantial bias in estimation (Schofield et al., 2021). This can be especially limiting in the case of emerging infectious diseases. The simpler, nonparametric estimates, like those proposed in this paper, can relax assumptions of conditional independence without having to justify a correlation structure by considering each time point separately. Third, the use of the EM algorithm to find the MLE can lead to a couple of computational issues. For one, the EM algorithm is not computationally efficient and may converge slowly. In addition, it is only guaranteed to converge to a local maximum and may have difficulty converging at all if the parameters are on the boundary of the parameter space (Nettleton, 1999). This is of particular relevance given the high specificity of PCR-based assays which will make parameter estimates close to the boundary. Our simpler estimates do not employ the EM algorithm and thus do not suffer from these issues. In fact, without bias corrections, their estimation boils down to the calculation of simple sample proportions. This gives rise to our fourth advantage of simpler approximations: easy to implement sample size and power calculations.
This paper will expand on existing methods to estimate diagnostic accuracy and prevalence in the absence of a gold-standard test by modeling decaying sensitivity and introduce new approximation methods that can have several advantages in an applied setting. These methods will be specifically applicable in settings with repeated PCR assay test results or other diagnostic tests known to have high specificity, an approach commonly used to test for the persistence of a variety of infectious diseases over time.

Latent class model
Let equal the number of individuals who have been tested and let , = 1, … , represent the "shedding" status of each individual, where = 1 indicates the individual exhibits prolonged viral shedding and = 0 indicates the individual does not. Assume that ∼ Bernoulli( ). (1) Notice that does not depend on time which implies that participants who are labeled a "shedder" will always remain so and vice versa. Thus, it is important to note that the shedding status of an individual represents whether they at any time exhibited prolonged viral shedding since the commencement of the study, though their viral load may eventually decline to a level below detection via PCR assay. Let = 1 indicate a positive test result for person at time where each indexes a 2-week period. Assume Parameter can be interpreted as the rate of false positives among the "nonshedders" and thus 1 − represents the test specificity. Function ( ) represents the detection probability among those whose viral shedding is prolonged, but continues to decline. Let be the largest number of 2-week periods over which individuals were sampled and let Δ = ( 1 , … , ) be a collection of indicator variables denoting whether a test was observed for person at time . We will assume that the distribution for Δ depends only on the observed and parameters other than , , and . This is reasonable assumption for our EVD application since the scheduled timing of the next visit depends on the most recent test result. Given this assumption, we obtain maximum likelihood estimates using only the observed test results, and unknown shedding status. The likelihood we wish to maximize is given in Equation (4).

EM algorithm
In order to allow for time dependence for ( ) and to limit the number of parameters we define the following: = logit( ).
The resulting Θ = ( , ′ , ) ′ can be estimated using the EM algorithm via the R package flexmix. Let * represent the time index of the first observed test for person, . To initialize the algorithm, we assign those with * = 1, an initial positive test, to have = 1 and all others to have = 0. Then letˆ( 0) be the sample proportion of the observed "nonshedder" data with positive tests and letˆ( 0) be the coefficients from a logistic regression on the observed "shedder" data predicting positive tests by time point. Performing the "E-step" of this algorithm we find Maximizing the resulting expected log-likelihood, (Θ|Θ ), we have the following updates for the algorithm at iteration, : where = ∑

=1
. We can findˆ( +1) numerically using standard generalized linear model (GLM) methods. The expectation and maximization steps are repeated until convergence. Standard errors for each coefficient can be calculated using the observed information from the Louis method (Louis, 1982). Letˆ,ˆ( ) , andˆrepresent the resulting estimates.

Data-based approximations of latent variables
We propose two indicator variables, and * meant to approximate .
We hypothesize that these indicator variables may be useful to approximate due to the high specificity of our test, thus we expect most, if not all, positive tests to come from those who are actually shedding the virus. Rationale for * comes from the suspected decline in sensitivity to identify "shedders" over time, thus we are most likely to observe a positive from a "shedder" in their first test. Using or * we can get estimates for Θ = ( , , ) ′ using simple sample proportions. Assume that = 1 and = 0 for at least one person, (and the same for * ).
All of these estimates are biased to a degree. See the Web Appendix for the expectations for each estimate. Rearranging the expectation equations, we can also find expressions for bias corrections. Further details for these derivations can also be found in the Web Appendix. Letˆandb e the bias-corrected estimates for estimates found using and letˆ * andˆ( ) * be the bias-corrected estimates for estimates found using * : Becauseˆ= 0, these estimates can correct for bias associated with ( = 0| = 1) > 0, but not for the bias associated with ( = 1| = 0) > 0.
Let ( ) = ( * = 1| = 1, = ), then Performing these bias corrections and substitutingˆ( ) * for ( ) may result in values less than 0 or greater than 1. We thus take the minimum of the bias corrected expression and 1 − and the maximum between the bias correction and where is a small positive number close to 0. Unlike the bias-corrected estimates from , the biascorrected estimate using * can correct for bias associated with ( = 0| = 1) > 0 and ( = 1| = 0) > 0.

2.3.1
Smoothing decay in sensitivity Althoughˆ( ),ˆ( ) * ,ˆ( ) , andˆ( ) * are calculated from sample means at each individual time point, we may still wish to represent the decline in test positivity among shedders as a smoothed curve. This can be done using cubic splines to create a trend line which is not constrained by parametric assumptions as strong as our MLE estimate.
We will employ this smoothing technique to our motivating example of EVD survivors. We can also fit a logistic regression curve weighted by the sample size observed at each time point among "shedders." We will consider the case of a simple logistic regression further when considering power calculations corresponding to the data-based indicator variables.

Model comparisons
2.4.1 Bias, MSE, and coverage probability To compare the operating characteristics of the five estimates (MLE using the EM algorithm, the sample proportions using , the sample proportions using * , and the two bias-corrected estimates), it is useful to know the bias, mean squared error (MSE), and coverage probability of each under different circumstances. Bias and MSE can be determined analytically for the uncorrected sample proportions using the closed-form expressions for expected values calculated in the Web Appendix. Coverage probabilities and the MSE and bias for the other estimates will be calculated using the bootstrap.

Simulations
There are two major assumptions in our likelihood-based model: (1) conditional independence and (2) logistic decay in sensitivity. Our simulations will take place under three scenarios: (1) Conditional independence and logistic decay in sensitivity (both assumptions met); (2) Conditional dependence for "shedders" and logistic decay in sensitivity (first assumption met only); (3) Conditional independence and nonmonotonic decay in sensitivity (second assumption met only).
In all three scenarios, we simulate from a population with 30% of participants showing persistent viral shedding.
Overall, sensitivity will decay in each simulation. Under scenarios 1 and 2, it will decay monotonically and in scenario 3 it will following a polynomial trend as specified in Equation (32). Plots of the assumed decay in sensitivity to detect "shedders" can be found in Web Figure 1. We assume that = 1 for all and , and, for each participant, the number of tests, will vary, but the sensitivity of the first and last test will always be the same. Specificity will also be varied from 95% to 100%. The simulation parameters are described in Equations (25-32) = {3, 4, 5, 6, 7, 8}, = {0, 0.005, 0.01, 0.05}, Scenario 1: Scenario 2: ∼ Normal(0, 1), It should be noted that in the first scenario we are simulating data under the assumption that both the likelihood expressed in Equation (4) and the time dependency expressed in Equation (6) are correct. Thus, any bias or deviations in coverage probability for the MLE can be attributed to the possibility of the EM algorithm converging to a local maximum or small sample size rather than model misspecification. Scenarios 2 and 3 will illustrate ways in which the conditionally independent MLE may fail due to misspecification. For each and specification, we will run 1000 iterations and calculate each of the five estimates and their corresponding MSE, bias, and coverage probability.

Study design simulations
Using the simplest approximations, power calculations are fairly simple. Let Φ represent the cdf of the normal distribution and let ( ) = Φ −1 ( ). For the desired margin of error, , and significance level, , the following condition must be met: Because we have no estimate for before commencing the study, the most conservative estimate for sufficient sample size is found by assuming = 0.5. According to Inequality 33, if we wish to have a margin of error of 5%, we would need a sample size of at least 385 participants.
We can use the Wald statistic for a simple logistic regression to perform a slightly more complex power calculation to determine the sample size needed to ensure enough power to show significant decay in sensitivity.
×2 is the fixed design matrix for a single participant and × = diag{ exp 0 + 1 (1+exp 0 + 1 ) 2 }. This expression cannot be solved analytically, but can be plotted or solved numerically for . For sufficiently small If participants are not expected to be observed at every available time point, an estimate for missingness must also be considered and used to scale up the number of participants needed for the study. For example, if we only expect two-thirds of our participants to come in for a test at each time point, we would need to scale up the number of participants needed by a factor of 3/2. For simplicity, these simulations will assume that = , ∀ , that is, that every individual is observed at each available testing time.
These power calculations assume that the sample proportions are unbiased estimates, which is not true in this case. We will simulate data using these sample size specifications to determine whether these power calculations can be useful.
For each simulation we will find the resulting margin of error for each of the five estimates. Because most of our estimates are biased, we will also calculate the coverage probability to demonstrate when and if any of the estimates become too precise to capture the true value due to biasedness.
Second, we will simulate data to test whether enough power is achieved by each of our estimates to detect a difference between (1) and ( ) using the following specifications: = {10, 15, 20, 25, 30}, = 0.01.
For each sample size, we will calculate the expected power and the observed power for each of the estimates.

Applications to Ebola virus data
To illustrate the use of these estimates, we will apply our methods to data on PCR assay results from semen samples from men who recovered from EVD in Liberia. We will then compare the results from each of the methods. We will also use the power calculations from Inequalities 33 and 35 as part of the design of a clinical trial of remdesivir in male survivors in the Democratic Republic of the Congo (DRC). This trial was motivated by the realization that male EVD survivors are capable of transmitting Ebola during sexual intercourse and the administration of remdesivir may reduce the frequency of viral shedding in male survivors. A similar trial was conducted in Liberia and Guinea but was stopped for futility due to an inability to identify male survivors who were still actively shedding. Many of the survivors in the DRC had been vaccinated or treated with more effective therapies than were available after the outbreak in West Africa and it has been hypothesized that the decay rate among these men is faster than among those who lacked access to vaccines or effective therapies.

Simulation results
Simulation results comparing the operating characteristics of the five estimates for the first scenario are summarized in Figures 1-4. When = 0, that is, tests have 100% specificity, using to estimate produces results with very similar MSE and coverage probability levels as the MLE. The bias-corrected version of this estimate effectively eliminates the small level of bias, but increases variance of the estimate resulting in larger MSE. As increases, estimates using to approximate , with and without bias corrections, begin to suffer, especially when a greater number of tests are performed. The performance of estimates using * to approximate , are less dependent on the value of .
With more tests, estimates using this approximation for tend to decline in performance, but much less dramatically than approximations using . Furthermore, both the biascorrected and uncorrected versions of this approximation for ( ) tend to improve with more tests (to a point). While the bias correction tends to decrease bias and increase coverage probability of these estimates in most situations, it again tends to increase MSE. Estimates using * to predict ( ), with and without a bias correction, tend to perform slightly worse when = 1 which makes some sense as all those with a positive initial test are classified as "shedders" under this approximation. It should be noted that ( ) estimates for other time points have coverage more similar to that of the median and final tests displayed in Figures 3  and 4. Simulation results comparing the operating characteristics of the five estimates for the second scenario (conditionally dependent sensitivity) are summarized in Web  Figures 2-5. Estimates for and (1) show roughly the same trend as the previous scenario with slightly worse performance with our data-based estimates mostly noticeable when is larger. At later time points, the MLE begins to suffer from increased bias and decreased coverage probability in comparison to our data-based estimates especially when is low. Again we observe estimates to be useful only when is approximately 0, but * is more robust to lower specificity and even outperforms the MLE as the number of tests increases.
Simulation results comparing the operating characteristics of the five estimates for the third scenario (nonmonotonic decay in sensitivity) are summarized in Web  Figures 6-9. Similar to the previous scenario, changing the functional form of does little to affect the estimation of . However, the MLE for ( ) performs very poorly, especially when there are many tests, compared to the other estimates. Again we observe to be more useful with high specificity and fewer tests, and * to be more robust to lower specificity and improve with more tests.
In these scenarios, we did not observe any instances where the EM algorithm failed to converge. However, in addition to the three scenarios outlined in Equations (29-32), when we explored instances of lower shedding prevalence ( = 0.1) we did find that the EM algorithm failed to identify a single "shedder" 2.9-14.7% of the time when = 100 (varying with the degree of tests and false positive rate) and up to 4.2% when = 300. We did not observe these failures to obtain estimates with or * , bias corrected or otherwise. We also experimented with other model misspecifications that violate both assumptions of conditional independence and logistic decay and found similar overall trends in decreased relative performance of the MLE, although they were not as dramatic as those observed in scenarios 2 and 3. This could likely be attributed to both monotonic (although not logistic decay) and a lower level of overall conditional dependence.

Study design simulation results
Plots summarizing the results of the study design simulations can be found in the Supporting Information. Using Inequality 33, we determined that a sample size of 385 F I G U R E 2 Simulation results for (1) for scenario 1 summarizing mean squared error, coverage probability, and bias of the five estimates for different values of and varying number of tests. This figure appears in color in the electronic version of this article, and any mention of color refers to that version F I G U R E 3 Simulation results for ( ) for scenario 1 where is the median test summarizing mean squared error, coverage probability, and bias of the five estimates for different values of and varying number of tests. This figure appears in color in the electronic version of this article, and any mention of color refers to that version F I G U R E 4 Simulation results for ( ) for scenario 1 summarizing mean squared error, coverage probability, and bias of the five estimates for different values of and varying number of tests. This figure appears in color in the electronic version of this article, and any mention of color refers to that version F I G U R E 5 Sensitivity to detect "shedders" using theˆ( ) , andˆ( ) * , andˆ( ) * smoothed using cubic splines. This figure appears in color in the electronic version of this article, and any mention of color refers to that version would be necessary to achieve a margin of error of 0.05. Web Figure 10 shows the expected power using Inequality 33 as a dashed line. Although not all estimates are simple sample proportions, the margin of error is not much dif-ferent for each of the estimates. The coverage probability for the MLE, * , and * with a bias correction stays consistently around 0.95 regardless of sample size. The coverage probability using , with or without a correction, decreases as the sample size grows. This finding is consistent with earlier simulation results when = 0.01.
Web Figure 11 shows that power calculations assuming simple logistic regression can be used to inform the appropriate sample size for all five estimates. The dashed line represents the expected power given Inequality 35. In fact, * , shows higher than expected power more similar to the LCM results than those expected for a simple logistic regression.

Results using PREVAIL III data
We calculated each of the proposed estimates on data collected from a cohort of survivors from the 2013-2016 West African EVD outbreak. Overall, there were = 265 participants with at least one test result. Tests were administered at a schedule of every 2 or 4 weeks, thus each time point represents a 2-week interval. The first testing time point occurred around 8 months after EVD symptoms resolved.
In total there were = 50 possible time points, but on aver- The sensitivity to detect Ebola virus in a persistent "shedder" at = 1 or 8 months after symptoms resolve is 88%. The odds of detecting Ebola virus in a persistent "shedder" declines 1.15 times with each passing 2-week interval of time. The effect of time is statistically significant. Table 1 compares all five estimates for , ( ), = 1, 11, 21, 31, and .
The estimates for range from 0.211 to 0.415. All four of the approximation confidence intervals overlap with the MLE's confidence interval. All estimates show a declining sensitivity to detect Ebola virus in "shedders." The sample proportion estimates using both and * estimate the sensitivity to be exactly 1 at = 1. It should be noted that the data are very sparse at this time point (only one individual was tested this early, and he had a positive result). All the confidence intervals using approximations overlap with the MLE's confidence interval. Many of these time points have only a handful of observations and thus the confidence intervals for ( ) are quite wide. Based on simulation results, we expected the MLE, * , and * with a bias correction to perform the best for this example. Figure 5 shows the trend for ( ) using these estimates. The approximations are smoothed by fitting a weighted logistic regression.
Again we see overlapping confidence intervals for the three estimates showing they are not significantly different at the 95% confidence level.

Calculating sample size for a follow-up study
We can recommend sample sizes for a similar study on Ebola virus persistence that was planned to take place in the DRC. For the proportion of "shedders," if we assume =ˆthe recommended sample sizes for a margin of error of = 0.01, 0.05, 0.1 are 8608, 345, and 97, respectively. More conservatively, assuming = 0.5, the recommended sample sizes to achieve the same margin of error would be 9604, 385, and 97, respectively.
For a similar study, where participants are tested every 2 weeks, we can make recommendations for the number of participants and number of tests needed to have sufficient power to detect a significant decline in sensitivity. Overall, we find that to achieve 80% power with 3, 5, or 7 tests, we would need 5173, 946, and 312 participants, respectively. This is summarized in greater detail in the first three rows of Web Table 1. We could also test the hypothesis that the change in sensitivity in the DRC for "shedders" is different than what was observed in Liberia. Assuming the decline in the DRC is 1.5 times as fast, to achieve 80% power for 3, 5, or 7 tests we would need 397, 153, and 34 participants, respectively. The second half of Web Table 1 summarizes these results in greater detail. These values were calculated using the following power expression: When participants are not expected to be observed at every time point, sample sizes must be scaled up to account for this loss.

DISCUSSION
When a gold-standard test is absent, an LCM can be used to represent the unobservable patient status. In a case where the test sensitivity is dependent on time, but the specificity remains constant, the EM algorithm can be implemented, but it is only guaranteed to converge to a local maximum and may become unidentifiable with increased flexibility. For tests with high specificity, such as PCR assays, databased approximations can be considered. We introduced two approximations for the true status of patient, : and * . While the more obvious proved to be too simplistic to be useful outside of the scenario where specificity is 100%, a slight modification of this approach, * , yields results on par with the MLE in many scenarios. These nonparametric, data-based approximations were also observed to be more robust to violations of the likelihood assumptions such as conditional independence and logistic decay in sensitivity. Bias-correction attempts, were able to improve the coverage probability and decrease bias when estimating the proportion of persistent "shedders," but in some scenarios increased MSE especially when estimating the sensitivity for "shedders" over time.
Sample size calculations are very straightforward for sample proportions and simple logistic regression. Designing a study with sufficient sample size to obtain precise proportion estimates and enough power to demonstrate a significant decline in sensitivity can easily be done for simple approximations. Furthermore, these sample size calculations assuming a simple model also prove useful for the more complicated bias-corrected and EM algorithm estimates.
In an application of these estimates for data on persistence of Ebola virus in semen of men after recovery from EVD, we found estimates for the proportion with persistent shedding, the changing sensitivity of the PCR assay over time and the specificity of the assay. From the MLE found via the EM algorithm and the * approximations, we found that around a fifth to a third of all men exhibited persistent viral shedding in semen 8 months after recovery from acute EVD symptoms. Of these persistent shedders, the sensitivity to continue to detect Ebola viral shedding in semen declined over the course of the study. This decline in sensitivity implies a decrease in the viral load below detectable levels for the PCR assay. Our results show a high specificity of over 95% for all estimates and over 98% using the MLE which is consistent with the previous studies showing high specificity of this PCR assay. Although this method can help us understand when viral loads of "shedders" drop below detectable levels, incorporating continuous measurements of viral load among "shedders" over time could give a more precise and useful understanding of how the virus declines among "shedders." This remains a topic for future studies.
These estimation methods could be useful for the design of future studies on Ebola virus persistence in semen such as the cohort study planned in the DRC mentioned previously. Beyond Ebola virus, many other viruses are detected using high specificity PCR assays, for example, SARS-CoV-2. There are still many unanswered questions about viral persistence after recovery from COVID-19 (Qi et al., 2020) and estimation methods such as these could inform power calculations for a cohort study and could also be employed to analyze the prevalence and sensitivity to detect persistent viral shedding over time.

A C K N O W L E D G M E N T S
This project has been funded in whole or in part with federal funds from the National Cancer Institute, National Institutes of Health, under Contract No. HHSN261200800001E. The content of this manuscript does not necessarily reflect the view or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the US Government.

D ATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.