## 1. Introduction

Incidence estimation has long been the holy grail of Human Immunodeficiency Virus (HIV) epidemiological research. Estimates of incidence are needed to monitor ongoing transmission, to evaluate interventions aimed to reduce transmission and to plan resource allocation for prevention. Cohort studies that follow-up uninfected individuals, the gold standard for incidence estimation, are expensive to run and can be subject to observational biases due to selection and follow-up adherence 1. Thus, historically, much effort has been put into the development of indirect methods of estimation 2–6. Methods based on ‘snapshot’, or cross-sectional, sampling 1, 5, 6 have attracted considerable interest in recent years as laboratory methods, based on characteristics of the antibody response soon after infection, are being continuously developed to identify recent infections.

The idea underlying these methods, or at least their simplified version, is as follows. Let *d* be the date on which a cross-sectional survey is conducted and the sampled individuals are tested for HIV and classified as negative or positive and, among the positive, as *recently infected* or not according to the measured level of a chosen biomarker. The prevalence of *recent infection* *P*(*d*) at date *d* can be expressed in terms of the incidence density of HIV at time *t*, *I*(*t*) as

where *S*(*t*) is the survival function of the time spent in the *recent infection* state, the so-called *window period*. HIV incidence *I*(*t*) is commonly estimated from (1) under two assumptions. First, there exists a maximum window period *w*_{m} such that *S*(*w*_{m}) = 0, and second the incidence is constant over the past *w*_{m} years, that is over the calendar period [*d*−*w*_{m}, *d*]. Under these assumption, Equation (1) simplifies as follows:

where µ is the mean of the window period distribution (see 1 for more details). The problem of estimating *I* then becomes that of using a cross-sectional (random) sample to estimate the prevalence of those recently infected, and to acquire the necessary knowledge of µ. Owing to the assumptions underlying Equation (2) it is, therefore, undesirable for *w*_{m} to be too large and hence the distribution of the window period to have a long tail.

In the last 10 years a number of assays have been proposed to detect recent infections. The original procedure involved testing individuals using Sensitive/Less Sensitive (S/LS) commercial antibody assays (e.g. 3A11-LS, LS EIA), in order to detect differential HIV titre 7. More recently a biomarker has been proposed based on the principle that antibodies produced early after infection bind less strongly to the antigen than those produced in established infection 8. The *avidity* of the antibodies to bind to the antigen can be measured using the Avidity Index (AI). The AI is calculated by dividing the sample-to-cutoff (S/CO) ratio from a low-avidity sample treated with guanidine by the S/CO ratio from a control sample, more details of which can be found in 9. For early infection, weak binding causes the level of antibodies in the treated sample to be less than that in the control, and hence the AI takes values less than one. For more established infection, antibody levels in the two samples are similar and hence the AI approaches a value of one. Conditionally on the choice of a specific threshold, commonly 0.8, individuals with measured AI below the threshold are classified as *recently infected* and the window period is the time spent below the chosen threshold.

It is clear that the window period is a fundamental ingredient in the estimation of HIV incidence. It depends on the rate of antibody response and hence can vary considerably between individuals. By raising or lowering the associated threshold, the window period can be lengthened or shortened, respectively. If it is too short very few individuals are classified as *recently infected*, resulting in a loss of precision for incidence estimation; too long and the assumption of a constant incidence is no longer viable. Hence knowledge about the distribution of the window period, not just its mean, is essential. Despite this, it is commonly the case that a threshold is chosen based on the diagnostic accuracy of classifying individuals as *recently infected*, where true recency is defined as a certain period post-seroconversion, rather than based on the resulting distribution of the window period.

The aim of this paper is to illustrate two statistical methods for estimating the distribution of threshold-specific window periods. The first method (Section 2) implements a doubly censored survival analysis approach to obtain a non-parametric estimate of the window period distribution. The second method (Section 3) is based on modelling the individual growth curves of the biomarker using mixed-effects models, and inverting the functional relationship to obtain estimates of the window period distribution. In Section 4, we apply the methods to data from a cohort of HIV infected individuals. For each individual AI measurements are available longitudinally and the dates of the last negative and first positive HIV antibody test are known. Finally in Section 5 we make recommendations for the choice of threshold associated with the AI assay so that the resulting window period distribution is likely to satisfy the assumptions required for Equation (2).