Data‐driven analysis of JAK2V617F kinetics during interferon‐alpha2 treatment of patients with polycythemia vera and related neoplasms

Abstract Treatment with PEGylated interferon‐alpha2 (IFN) of patients with essential thrombocythemia and polycythemia vera induces major molecular remissions with a reduction in the JAK2V617F allele burden to undetectable levels in a subset of patients. A favorable response to IFN has been argued to depend upon the tumor burden, implying that institution of treatment with IFN should be as early as possible after the diagnosis. However, evidence for this statement is not available. We present a thorough analysis of unique serial JAK2V617F measurements in 66 IFN‐treated patients and in 6 untreated patients. Without IFN treatment, the JAK2V617F allele burden increased exponentially with a period of doubling of 1.4 year. During monotherapy with IFN, the JAK2V617F allele burden decreased mono‐ or bi‐exponentially for 33 responders of which 28 patients satisfied both descriptions. Bi‐exponential description improved the fits in 19 cases being associated with late JAK2V617F responses. The decay of the JAK2V617F allele burden during IFN treatment was estimated to have half‐lives of 1.6 year for the monoexponential response and 1.0 year in the long term for the bi‐exponential response. In conclusion, through data‐driven analysis of the JAK2V617F allele burden, we provide novel information regarding the JAK2V617F kinetics during IFN‐treatment, arguing for early intervention.

In this section we present the specific values of the parameters for the 64 fits to the patient-responses. The fits are shown visually in the supplementary D. For reference, the functional forms of the two models will be shown here. The mono-exponential decay has form: The bi-exponential model is defined somewhat differently than described in the main paper. This is done in the fitting procedure to ensure that some parameters are positive and that the slope at t = 0 was continuous. As such the functional form is: Where ν is the growth-rate before treatment is initiated.
To determine how well a given model fits a dataset a multitude of goodness-of-fit (GoF) measures can be used. We use the adjusted R-square,R 2 values as given by: Where SSE is the sum of squared errors, SST is the total sum of square, n is the number of data-points and m is the number of coefficients fitted. This measure is automatically calculated as part of the MATLAB fit procedure. In Supplemental table 1 the parameters found are shown, along with the corresponding adjusted R 2 goodnessof-fit values.

B Distribution of fit-parameters
In this section we will present the distributions of the fit-coefficients presented in supplementary A. Firstly, this section serves to illustrate the distributions of patient-responses. Secondly, to describe how the parameters for a population-level mono-exponential response and for a population-level bi-exponential response were determined. Before presenting the arguments for the choices of parameter-values, the specific values found is shown in Supplemental table 2. As such, the mono-exponential response from equation (1), can be expressed numerically as: With the growth-rate before treatment estimated as 0.49 year −1 (yielding a period of doubling of 1.41 years), the bi-exponential response from equation (2) can be stated as a numerical expression: By visual inspection of the fits shown in the supplementary D, a conservative threshold for the minimalR 2 accepted as a good fit is chosen asR 2 ≥ 0.6.

Mono-exponential
A total of 28 fits are above the threshold for the mono-exponential fit. In figure 1 a histogram of the α parameter-values of all fits of the mono-exponential model withR 2 above the threshold is shown. Additionally, the Gaussian distribution best fitting these values is shown in black, as well as the result of a Gaussian mixture modelling of the data with two Gaussian distributions, shown in dashed blue. The Gaussian mixture modelling was fitted using the fitgmdist MATLAB function included in the Statistics and Machine Learning Toolbox. The function also evaluates the Akaike Information Criterion (AIC) which is used to determine whether one or two Gaussians was the best description of the distribution. The single Gaussian has a mean of 0.62 and a standard deviation of 0.39, while the two Gaussians has means 0.46 and 1.35, standard deviations 0.19 and 0.23, with mixture coefficients of 0.82 and 0.18 respectively.
From the AIC the sum of the two Gaussian distributions is determined to be the better descriptor of the distribution of the values. The two Gaussian distributions split the α-parameters in two groups. Since the Gaussian with the lower mean is representative for the majority of the α parameter-values, the population-level mono-exponential decay expression is determined to have an α of 0.46. With a standard deviation of 0.19, this yields 95 % confidence intervals between 0.08 and 0.84. Note that the group with the higher mean consists of patients that responded very well to treatment, and as such, picking the lower groups as the population response allows for a conservative estimate of the efficacy of treatment.
Since the A parameter of the mono-exponential form is simply the JAK2 V617F allele burden at the initial time, determining a representative value for the parameter is unnecessary for present purposes. As the threshold ofR 2 greater than 0.6 might be considered to strict, we also include the histogram and distribution given a threshold of 0.3. The less strict threshold leads to 38 acceptable fits. While these values are worse fits to the given patient-responses, the distribution is found to be similar, and thus the previously found values should also be a representative value for the additional patients. Figure 2 shows the distributions of α for the lower threshold.

Bi-exponential
A total of 32 fits are above the threshold for the mono-exponential fit. Figure 3 shows two histograms of the β 2 parameter values of all fits of the bi-exponential model withR 2 above 0.6. One histogram (bottom) shows the β 2 distribution, while the other (top) displays the distribution of log(β 2 ). Additionally, the Gaussian distribution best fitting the log(β 2 ) distribution is shown in black, as well as the result of a Gaussian mixture modelling of the data with two Gaussian distributions, shown in dashed blue. This fitting of the Gaussian mixture model was done to the logarithmic values, as the distribution is assumed to be log-normally distributed.
From the distribution on a logarithmic scale, the single Gaussian has a mean of −0.39, with standard deviation of 0.74. The mixed Gaussians has means −0.63 and 0.54 with standard deviations 0.62 and 0.31 and mixture coefficients of 0.79 and 0.21 respectively. In the case of the bi-exponential model, using the AIC, the single Gaussian distribution is determined to be the better descriptor. As such we have log(β 2 ) = −0.39 or β 2 = 0.68 as our population-level value for β 2 . Since the standard deviation is 0.74, the corresponding 95 % confidence intervals are 0.15 and 2.97. As in the mono-exponential case, we also present a figure of the distributions found given a less strict threshold, namely a threshold ofR 2 ≥ 0.3. The less strict threshold leads to 37 acceptable fits. This is found in figure 4. The values for the c 2 parameters are found to vary across many orders of magnitude. This is to be expected as there is almost no difference between low values of c 2 once a certain low threshold has been reached, and similarly for high values of c 2 . As such we decide to use the median value of fits withR 2 > 0.6 for the population value of the c 2 parameter, yielding a value of 3.87. As in the mono-exponential case, the parameter B is determined by the JAK2 V617F allele burden at the initial time.

C Risk of secondary cancers
In this section we will derive a mathematical expression relating the time at which treatment is initiated with the mutational load, or the risk of secondary cancers. For patients experiencing exponential growth of some disease indicator (such as JAK2 V617F allele burden), and with treatment resulting in exponential decay of the disease indicator, the disease indicator can be described as: where A is some arbitrarily low detection limit, α is the exponential growth rate before treatment, β is the exponential decay rate during treatment and τ is the time at which treatment starts. We will use units of days, such that τ is in days, while α and β is in days −1 , but any matching units (e.g. years and years −1 ) could also be used.
Considering some initial day, t = 0, where a lower limit A of the indicator can be measured, i.e. f (0) = A, and some final day t f > τ at which the same detection limit is reached again, i.e. f (t f ) = A, the mathematical problem now has boundary conditions.
The number of days of treatment need to reach A is given as t f − τ = τ α+β β − 1 = α β τ . Thus is it clear that for α > 0 and β > 0, the days of treatment needed to reach A increases with τ .
Denoting the integral of f (t) from t = 0 to t = t f as F (t) we find: For ατ 1 we have e ατ −1 ≈ ατ , and the expression is linear in τ . For ατ 1, the expression is approximately Ae ατ 1 α + 1 β . As such the rate of doubling for large ατ is log (2) α . If the disease indicator f (t) is proportional to the number of mutated cells, there is a proportionality constant, M , relating f (t) to the number of mutated cells at any given time. The integral of M f (t) for all time yields the accumulated number of mutated cells per day. Since M f (t)dt = M F (t), the average number of mutated cells per day is found to also double with rate log (2) α . Assuming cells having a constant probability of dividing at any given time, the average number of division of mutated cells is given as M rF (t) where r is the rate of division. Denoting this as If every division of mutated cells has a given probability of resulting in a second mutation, D(τ ) is a measure of this probability. The doubling rate of D(τ ) is also approximately log (2) α for ατ 1 as was the case for f (τ ). Given α ≈ 0.0013 and assuming τ 1 α ≈ 770 days, we see that the risk of second mutations double approximately every 1.4 years treatment is postponed.

D Figures of fits
Data for all patients are shown in Supplemental figures 6 through 71 together with the two fit-types as well as an extrapolated growth following the exponential growth determined in the main paper. All figures are shown twice, with a linear y-axis of the left hand side and with a logarithmic y-axis on the right hand side. Data is shown as black stars, the bi-exponential fit as a full red line, the mono-exponential fit as a dashed blue line and finally the extrapolated growth is shown in dotted black.

E.2 Inclusion and exclusion criteria
Inclusion criteria • Age > 18 years of age at the time of signing the informed consent.
• No previous cytoreductive treatment with HU or IFN. However, HU was allowed from time of diagnosis to randomization. Previous phlebotomized PV patients were eligible.
• Ability to comply to study visits and requirements.

Exclusion criteria
• Pregnant or lactating females • Females of childbearing potential (FCBP) had to undergo pregnancy testing and the result had to be negative • Inadequate or lack of acceptance to use adequate contraceptive method by FCBPs (oral contraceptives, intra uterine device, implant, transdermal patch, vaginal ring, depot injection). Infertile patients were exempted from using contraceptive methods. To be considered infertile the patients had to be surgically sterilized (vasectomy, bilateral tubectomy, hysterectomy, or ovariectomy) or post menopausal defined as amenorrhea < 12 months at time of inclusion) • History of another malignancy within five years of enrolment (except basal cell carcinoma of the skin) • Eastern Cooperative Group Oncology Status ≥ 3 • Serum creatinine > 2 × the upper limit of the normal range (ULN)

E.5 IFN interruption or discontinuation due to adverse events
Adverse events were graded according to the Common Terminology Criteria for Adverse Events (CTCAE) version 4.0. Before IFN administration it was ensured that the platelet count > 100 · 10 9 /l, neutrophilic count > 1 · 10 9 /l and that haemoglobin level > 6.2mmol/l. A maximum treatment interruption period of 6 months was allowed. Reasons for exclusion from the study included: • Adverse events Supplemental table 5: Periods of JAK2 V617F doubling for three patients (A, B and C) with confidence intervals (CI), as well as the period of doubling for the pooled data.
Supplemental figure 1: Distribution of α for the mono-exponential fits, for the 28 fits with an adjusted R 2 value greater than 0.6. The single Gaussian is shown in full black lines, while the dashed blue lines show the distribution following two Gaussians.
Supplemental figure 2: Distribution of α for the mono-exponential fits, for the 38 fits with an adjusted R 2 value greater than 0.3. The single Gaussian is shown in full black lines, while the dashed blue lines show the distribution following two Gaussians.
Supplemental figure 3: Distribution of β 2 for the bi-exponential fits, for the 32 fits with an adjusted R 2 value greater than 0.6. Top: Logarithmic first axis, bottom: Linear first axis. The single Gaussian is shown in full black lines, while the dashed blue lines show the distribution following two Gaussians.
Supplemental figure 4: Distribution of β 2 for the bi-exponential fits, for the 37 fits with an adjusted R 2 value greater than 0.3. Top: Logarithmic first axis, bottom: Linear first axis. The single Gaussian is shown in full black lines, while the dashed blue lines show the distribution following two Gaussians.
Supplemental figure 5: Chart for IFN dose-escalation Supplemental figure 6: Data and model fits for patient 1. Data is shown as black stars, mono-exponential fit is shown as a dashed blue line and the bi-exponential fit is shown a full red line. An extrapolation of continued exponential growth is shown in a dotted black line. The right-hand figure displays the same as the left-hand figure, but on a logarithmic y-axis. On a logarithmic y-axis, the mono-exponential decay appears as a straight line, allowing for a simple visual check of the decay-rates.