Bayesian and deep‐learning models applied to the early detection of ovarian cancer using multiple longitudinal biomarkers

Abstract Background Ovarian cancer is the most lethal of all gynecological cancers. Cancer Antigen 125 (CA125) is the best‐performing ovarian cancer biomarker which however is still not effective as a screening test in the general population. Recent literature reports additional biomarkers with the potential to improve on CA125 for early detection when using longitudinal multimarker models. Methods Our data comprised 180 controls and 44 cases with serum samples sourced from the multimodal arm of UK Collaborative Trial of Ovarian Cancer Screening (UKCTOCS). Our models were based on Bayesian change‐point detection and recurrent neural networks. Results We obtained a significantly higher performance for CA125–HE4 model using both methodologies (AUC 0.971, sensitivity 96.7% and AUC 0.987, sensitivity 96.7%) with respect to CA125 (AUC 0.949, sensitivity 90.8% and AUC 0.953, sensitivity 92.1%) for Bayesian change‐point model (BCP) and recurrent neural networks (RNN) approaches, respectively. One year before diagnosis, the CA125–HE4 model also ranked as the best, whereas at 2 years before diagnosis no multimarker model outperformed CA125. Conclusions Our study identified and tested different combination of biomarkers using longitudinal multivariable models that outperformed CA125 alone. We showed the potential of multivariable models and candidate biomarkers to increase the detection rate of ovarian cancer.


| INTRODUCTION
Ovarian cancer is the most lethal of all gynecological cancers.When detected at early stage, the survival is much more encouraging (5-year survival of >93% for Stage I disease) than when diagnosed at an advanced stage (5year survival of 13% for Stage IV). 1 Despite the extensive efforts to improve treatment over the last 20 years, although there have been modest improvements in survival, these have not had a significant impact.The major efforts in detecting ovarian and tubal cancer have spanned decades.3][4] Algorithm-based approaches to screening in the UK trial have demonstrated that longitudinal CA125 can lead to earlier detection (stage shift with multimodal screening) with no impact on mortality from the disease. 5More recent data however suggest that there may be longer survival in women diagnosed with the most lethal subtype of ovarian cancer, the highgrade serous cancers (HGSC) in the screened (multimodal group) compared to the control group.Since its identification in 1981, CA125 has been used in clinical practice and investigated in screening trials.Clinical decisions have been made based on the patients' risk of having a change point in the serial CA125 with respect to their baseline.A statistical method to determine in a probabilistic way such a risk was developed (Risk of Ovarian Cancer Algorithm, ROCA). 6,7[10][11][12] The efforts have since focused on exploring the value of a combination of CA125, HE4, and other promising markers in combination.Furthermore, p53 autoantibodies have been shown to detect ovarian/tubal cancers in women with ovarian /tubal cancers which do not express CA125 (16% and many months prior to diagnosis, lead time of 22 months). 13The interpretation of multiple markers in longitudinal samples is challenging unless sophisticated mathematical modeling is applied.We have previously shown that a method of mean trends (MMT) algorithm has a comparable performance to the ROCA which was used in the UK trial (UK Collaborative Trial of Ovarian Cancer, UKCTOCS). 14Here, we extend the observation made in [14][15][16] and describe a novel approach to interpretation of multiple markers in samples preceding diagnosis to assess if any of these can improve on sensitivity of the ROCA or offer potential advantage on lead time to detection of ovarian/tubal cancer.

Bayesian model
Biomarker levels Y ijk are modeled using a hierarchical Bayesian model (Figure S1).Here, subject-specific variables are indexed by i = 1, 2, … , n 0 , n 0 + 1, … , N, where n 0 is the number of controls and the remaining subjects account as cases.A specific biomarker is indexed by k = 1, 2, … , K. Each patient i has a set of screening visits t ij from zero up to time of last measurement d i (in years), where j = 1, 2, … , T i .
Following the assumptions introduced in 7 to model the cancer progression based on fully Bayesian screening, the longitudinal observations of biomarkers vary based on the nature of the patient.For control patients, the biomarker levels are expected to randomly fluctuate around a constant mean ik .That is expressed as For cases, we define a binary indicator I ik to distinguish between two different model assumptions in the evolution of biomarkers.If I ik = 0, then we assume that the marker level does not increase after the onset of cancer and follows the same behavior modeled for controls.If I ik = 1, the marker levels vary around a mean ik until an unobserved change-point time, defined by ik .From this change point, we expect a positive slope ik of the biomarker levels up to diagnosis.That is Y ijk = ik + ik t ij − ik + + ijk where (. ) + is the positive part of the expression.
showed the potential of multivariable models and candidate biomarkers to increase the detection rate of ovarian cancer.

K E Y W O R D S
CA125, change-point detection, longitudinal biomarkers, ovarian cancer, recurrent neural networks Let S = ik , I i , ik , ik be the set of subjectspecific parameters.Thus, the probability density function of the observations conditional on the set of parameters  is where Y denotes the set of values Y ijk , t the set of screening times t ij for each patient, and is the standard normal probability density function.
A key feature of the Bayesian hierarchical model that we adopt in this paper is the statistical dependence among the levels of different biomarkers, following the methodology proposed by. 17This element is now compared to earlier work in. 7,15,16Dependence is explicitly introduced for the binary indicators I i = I ik k=1,…,K , which are assumed to form a Markov random field (MRF).Their joint probability mass function (pmf) has the form where R is an upper triangular matrix weighted by a coupling coefficient I .The parameter I controls the sparsity of the model, given that not all biomarkers may increase during the onset of disease.From Expression (2), it follows that the probability of a change point in the level of the biomarker k for patient i given all the other markers is where a change point in one biomarker, for example I ik = 1, implies an increase of the probability of having a change point in the remaining biomarkers whenever  I > 0. If we set I = 0 , then the biomarkers are independent (decoupled), and the probability of a change point for each single biomarker is reduced to a Bernoulli distribution with a mean parameter of  4) and ( 5) are assumed deterministic.The binary indicators I i follow a Markov random field distribution, Equation ( 2), which we denote as I i ∼ MRF( ), where = I , I .Following, 4,7,16 approximately 15% of the patients with ovarian cancer do not show an increment in CA125 levels.In the absence of coupling, we assumed that the logistic transformation of I follows a Beta prior distribution with a mean of 0.85 and a standard deviation of 0.05.This accounts for the proportion of cases we expect to exhibit a change point.In this work, we assume that all the biomarkers under consideration follow the same rate.Similarly, we assumed a Beta prior distribution for the parameter I , as in. 17dividual random effects for the rate log( ik) are assumed to follow an independent normal distribution log( ik ) ∼ N k , 2 k where The individual change point ik is modeled as a truncated normal distribution as in, 7 where the mean is centered at k = 2 years and the variance is 2 k = 0.75.The distribution is truncated at d i − * , d i , reflecting the preclinical duration, which is assumed to be of * = 5 years.
Finally, the conditional variance of the k-th biomarker level, 2 k is assumed to follow an inverse gamma prior distribution IG (a, b) as in. 15,16rom the description above, we define by k as the set of parameters specific to each biomarker k.For a given set of observations Y = Y ijk , the likelihood of the parameters  and  can be readily obtained from Equation (1).Thus, the likelihood function L(, ) is Equation (1) for a given set of observations Y, that is If we let P 0 (, ) denote the a priori probability of the model parameters, as described through Equations (2-9), the a posteriori probability distribution of  and  given the data Y has the form This posterior distribution contains all the statistical information relevant for the model.Below, we discuss methods for the numerical approximation of P Y (, ).The hyperparameter values are chosen following clinical considerations discussed in 7,15,16 (Table S1).

| Procedure
The posterior distribution of all the unknown parameters can be approximated using a Markov chain Monte Carlo (MCMC) algorithm, following the procedure proposed and described in full detail in. 17Here, we highlight the main considerations for the iteration process.First, the choice between the types of sampling is based on whether the full conditional distributions can be easily calculated or not.For the biomarker-specific parameters, the posterior distribution for the subset of parameters k can be obtained using Gibbs sampling at each step of the iteration.In addition, { I , I } are determined from Metropolis-Hasting sampling.
For the subject-specific parameters, ik , we use a Gibbs sampler.To get draws from the full conditionals of I ik , ik , and ik , we use a reversible-jump step. 18This comes from the construction of the change-point parameter: I ik provides either ik for

Initialization
For each parameter in , we draw an initial sample from its a priori distribution (similarly, for each patient and each parameter in ).We initialize the MCMC iteration by sampling from the priors of the model parameters (Table S1).Then, we estimate the posterior accordingly using MCMC as described above in this subsection.

Iteration
In this paper, we generate two independent chains, each with different initial values, to assess the convergence to the same stationary distribution for each unknown parameter using trace plots.We also ensured convergence using the Gelman-Rubin statistic.For each chain and unknown parameter, we simulate 40,000 samples with a burn-in period fixed at 5000 samples.The remaining samples from the two chains are combined to generate 70,000 samples in total.These were used for the calculation of the average of the joint probabilities to get an estimate of P Y i1k , … , Y ijk | o i for each patient at each screening time and biomarker k.Here, o i = {0, 1} indicates whether the patient i is a control or has ovarian cancer.

Screening
The screening methodology in a N ′ patient from a testing cohort is based on the computation of the posterior probability of ovarian cancer o N ′, P o N � = 1| Y N � .The variable Y N ′ denotes the longitudinal time series of different biomarkers for the patient up to time t ij, that is, the sequence where P o N ′ is the prior prevalence estimated from population data (Annual Incidence of Ovarian Cancer in the United Kingdom by 5-year age group, 2016-2018 19 ), and P Y N ′ | o N ′ is estimated from the posterior predictive distribution for the N ′ patient at each screening time using the training data of N patients.Algorithm implementation was developed in RStudio, using the supporting codes provided by. 17

| Recurrent neural networks
Each subject, either a patient with ovarian cancer or a control, has a sequence of different measurements of biomarkers (CA125, HE4, and glycodelin) taken at different ages.For the i-th patient, the biomarker level We propose to use recurrent neural networks (RNNs) for the prediction of ovarian cancer using longitudinal observations (Figure S2), in line with our previous study. 15e used the long short-term memory (LSTM) architecture, [20][21][22][23] which is a special form of RNNs that can both learn long-term dependencies and control the flow of information to be passed from one time step to the next by means of gate units. 24The LSTM approach offers a significant advantage in addressing the vanishing gradient problem, which is common in traditional RNNs, making it generally more stable during training.It may suffer from computational complexity and potential overfitting, especially when dealing with small datasets.The latter, however, was mitigated by employing cross-validation and dropout, a regularization technique.More precisely, the LSTM equations can be written as follows, 20,24 where c ijk and h ijk denote the cell state and hidden state for patient i, time step j, and biomarker k, respectively; ⨀ denotes point-wise multiplication.The cell state and hidden state at a fixed time step are given by vectors of size 1 × H, where H is the number of hidden neurons.
The cell state of the LSTM at each time step is controlled by input and forgetting mechanisms using the equations shown below.That is, the forget gate unit f ijk modulates the effect of the cell state of the previous step c ij−1k .Similarly, the external input gate i ijk weights the contribution of the candidate cell via cijk .Finally, the memory information in the hidden state is controlled by the output gate o ijk .
The so-added gates cijk , i ijk , f ijk , and o ijk are defined by the following equations.Thus, for each time step j, we obtain a temporal sequence of hidden states (h i1k , h i2k , …, h iT i k ) corresponding to the i-th subject and biomarker k.Here, the state of the network at the last step is denoted h iT i k , which corresponds to the last sample of the patient under consideration.
Next, we can concatenate the last hidden state associated to the output of each LSTM cell from the K different biomarkers.We also include the last hidden state associated to the LSTM processing the screening age t ij as a longitudinal feature. 25That is, where h i is the last hidden state for i-th patient, resulting from the concatenation of the last hidden state of each one of the K biomarkers under consideration, and an additional one associated with the age of the patient h iT i 0 .
To define (20), we can consider multiple combinations of biomarkers to analyze how the joint interaction of them impacts the model classification performance.In this paper, whichever is the case, we always include the age of the patient in our model as a feature.The resulting vector is of size , where each H l denotes the number of hidden neurons used for each LSTM associated with each one of the features, that is, biomarkers k = 1, 2, … , K and age, respectively.
The final output of the proposed model is where hi denotes the last hidden state after dropout, W e is a weight vector of size ( and b e is a scalar bias.We optimize the weight matrices and biases using the cross-entropy loss. 26ere N is the number of patients, o i is the true label of subject i (0 for controls and 1 for cases), and ôi is the estimated probability of risk of ovarian cancer.The RNNs must be adequately trained before they can be used for the classification of unknown subjects.For the training phase, we used batch gradient descent with dynamic learning rates updated through the Adam optimizer. 24,27The hyperparameter tuning number is defined by the number of hidden neurons and dropout rate.Meanwhile, the learning rate and number of epochs are set fixed as meta-parameters (Table S2).The weight matrix for the recurrent state is initialized by a random orthogonal matrix, while for the inputs we use a weight matrix initialized using the Glorot's scheme. 28,29The bias of each transformation is initialized at zero.
As part of the preprocessing step, the features are standardized during training by obtaining the mean and variance for each one of them.The same scaling transformation used during training is then applied for validation.1][32][33][34][35][36] Deep-learning algorithms were implemented in Python 3.8.8 using TensorFlow version 2.11.0 and Keras version 2.11.0.( 14)

selection, and evaluation
The detection scheme for each patient at each screening time is based on the soft classification of ovarian cancer using multiple correlated longitudinal biomarkers (CA125, HE4, and glycodelin).Our methodology is based on fully Bayesian screening based on change-point models (BCP) and LSTM-based models (RNNs).As a preprocessing step for both methodologies, the biomarkers are transformed into the form Y = log(Z + 4) , where Z is a particular biomarker. 7,15,37o evaluate the screening performance in both approaches, we use stratified 5-fold cross-validation with two repetitions (outer loop).This approach is particularly helpful in reducing the bias in performance evaluation.In this way, each fold divides the data into one set of training data and testing data preserving the proportion between cases and controls.Our main evaluation metrics included area under the ROC curve (AUC) and sensitivity (at 90% specificity).Significance was determined using the permutation test for mean of paired differences between two models.
For the Bayesian method, we estimate the posterior distribution of the parameters at each fold using N patients in the training data.In turn, calculated from the posterior predictive distribution through these biomarker levels for the patient N ′ > N in the testing data.The probability of having ovarian cancer is then calculated using (13).As mentioned before, convergence of the MCMC chains was determined using the Gelman-Rubin statistic for each of the posterior parameters (extracted from training data).
When considering deep-learning-based models, for each of the 10 iterations obtained from the outer loop we perform hyperparameter tuning on hidden neurons and dropout rate for model selection.The inner loop consists of a 10-fold cross-validation (3 repetitions).Once the optimal set of hyperparameters is selected, we estimate the probability of having ovarian cancer on the data held-out from training for each patient (and each longitudinal observation).We repeat this step for every outer fold.Flow chart describing the design of the study is presented in Figure S3.

| Lead-time analysis
Patients developing cancer show detectable preclinical elevations of biomarkers.The literature reports that patients with ovarian cancer show abnormal rise in these biomarkers approximately 3 years before diagnosis, with detectable elevations becoming apparent in the last year before diagnosis.We assess the potential value of using models based on multiple biomarkers in detecting cancer at earlier stages than it would be if diagnosed clinically. 12,14,38,39In the results section, we determine the model-based lead time of each one of the proposed screening tests.This is defined as the interval from being correctly classified as case by the diagnostic test to the actual clinical diagnosis.To discriminate patients, we detect the earliest observation per patient considered as abnormal using a threshold at 90% specificity. 40We then calculate the interval from this screening time point up to the time of diagnosis.This procedure is repeated over multiple outer folds in the cross-validation procedure, from which we get different statistics for each of the models under consideration.

| Data
The data consist of 224 patients with serum samples sourced from the multimodal arm of UK Collaborative Trial of Ovarian Cancer Screening (UKCTOCS, number ISRCTN22488978; NCT00058032 41 ).This includes 180 controls (healthy subjects) and 44 cases (diagnosed patients).The eligible patients attended for screening and had an annual serum CA125 level measured as a baseline and transvaginal ultrasound in women as a second-line test.Additional biomarker assays human epididymis 4 (HE4) and glycodelin (PAEP) were performed within a subset of the serial samples from the general population of the UKCTOCS trial.
The screening age range of cases is [52.0, 77.4] years with an average of 65.3 years.The dataset includes the biomarkers history per patient for up to 5 years.Out of the 44 cases, 12 cases are screened for 1 year prior to clinical diagnosis, 12 cases for 2 years, and 20 cases for up to 3-5 years.In addition, from this set of cases 10 have 2 samples, another additional 10 have 3 samples, and 24 cases have 5 samples.For controls, the screening age is within the range of [50.3, 78.8] years with an average age of 63.6 years.Each patient has four to five observations (for 2 and 178 controls, respectively).
5][16] All serum samples were assayed by ELISA (enzyme-linked immunosorbent assay).Previous reports discarded possible confounding effects of sample processing using correlation between the concentration of any of the samples and time between sample collection and spin. 14ongitudinal observations of biomarkers in case patients prior to clinical diagnosis are displayed in Figure 1.Loess curves were fit to reflect the mean levels over time.It is observed that for case patients the biomarker levels start to slowly rise between 1 and 2 years prior to diagnosis.This is particularly noticeable for CA125 and HE4, while for glycodelin the rise is to a limited extent.This rise becomes more pronounced within 1 year before diagnosis, in which all the biomarkers show recognizable elevations.In addition, from a qualitative perspective, the rate of increase within 1 year appears to be led by CA125 followed by glycodelin and then HE4.

| Multivariable longitudinal models
Next, we evaluate the performance of different models in the detection of cases.The simulation studies consider multiple scenarios, including three joint multivariable screening tests that combine CA125 levels with other biomarkers (HE4 and glycodelin), as well as three single biomarker tests.More specifically, we have considered the following scenarios: • m(1,2,3): CA125-HE4-glycodelin • m(1,2): CA125-HE4 F I G U R E 1 (A-C) Box plots of protein biomarker levels (Y1: CA125, Y2: HE4, and Y3: glycodelin) in controls and cases.For each panel, the cases are grouped in time ranges: year, between 1 and 2 or more than 2 years before diagnosis.(D-E) Biomarker levels in cases for CA125, HE4, and glycodelin for cases and controls, respectively.Loess curves fit has been added to depict the trend prior to clinical diagnosis.As observed, the biomarkers for case patients exhibit a significant increase within the final year.The biomarker levels have been log-transformed in all the plots.Notation m(i,j,k) (or m(i,j)) indicates a joint multivariable test that relies on the biomarkers i, j, and k; while u(i) refers to a test that uses the i-th biomarker alone.The biomarkers are numbered as CA125 (1), HE4 (2), and glycodelin (3).We use two classification methodologies based on Bayesian change-point and recurrent neural network models.
The risk of ovarian cancer is estimated for each patient's longitudinal observation starting from two visits up to all the screening period.That is, for each screening time point t * i starting from the second visit, the model makes a prediction based on the patients' previous observations Unless it is stated, we report the performance metrics using all the screening period, which implies that we estimate the risk of ovarian cancer in the last patient's screening time point.
Figure 2 shows the receiver operating characteristic (ROC) curve indicating the area under the curve (AUC) statistic.This provides a summary of the classification performance in each case.These results are calculated by averaging the sensitivity at a given level of specificity for each one of the outer folds obtained by cross-validation.
The ROC curves suggest an improvement in using additional biomarkers over single biomarkers as the sensitivity of joint multivariable tests lie above the univariate ones using BCP and RNNs models.Furthermore, the sensitivity of joint multivariable models shows confidence intervals slightly reduced with respect to using single biomarkers.
The combination of CA125 with HE4 provides improvement of the AUC score and sensitivity over the univariate tests using the proposed methodologies.To test the significance of such improvement, we used the permutation test for mean of paired differences with respect to CA125 for these two metrics using BCP and RNNs.For the AUC, we get the one-sided p = 0.043 and p = 0.002, respectively.Meanwhile, for the sensitivity, we obtain 0.031 and 0.062.
In Table 2a, using case patients only, we build a contingency table, to compare the best two diagnostic tests (joint multivariable and univariate, respectively) based on data from all the outer folds.The tests used a threshold set at 90% specificity.As observed, the joint multivariable model m(1,2) (CA125 and HE4) attains higher sensitivity than the reference standard, CA125.This holds for both RNNs and BCP methodologies.Applying a McNemar test is unfeasible in this case as the power is dependent on discordant pair sample size, which in our case is rather small.However, the detection rate in addition to the results shown above suggests that the combination of CA125 and HE4 has potential to improve current tests based on CA125 alone.
Next, we study the ability of our algorithms to detect the ovarian cancer at earlier stages. 12,14In Figure 3, we show the summary statistics of the AUC and sensitivity (90% specificity) estimated by screening the last time point available per patient using the complete longitudinal history, 1 and 2 years prior to clinical diagnosis.We observe that both metrics decrease as we screen patients with biomarker levels taken at earlier times.T A B L E 1 Model performance for the detection of ovarian cancer diagnosed within 1 year: estimates of sensitivity (90% specificity) and AUC, and their 95% confidence intervals (CI) comparing different models (joint multivariable and univariate) based on cross-validation procedure.
The estimated mean lead time based on joint multivariable tests spanned from 1.6 to 1.9 years, while median lead time from 1.4 to 1.8 years prior to diagnosis in comparison with the estimated range based on CA125 alone, Table 3.No multivariable algorithm significantly outperformed CA125 using both BCP and RNNs.
In line with the results above, there is a strong indication that the combination of CA125 and HE4 increases the classification performance over CA125 alone.This is based on AUC and sensitivity at 90% specificity.Furthermore, we also find that this combination outperforms over CA125 and all other alternatives based on the ratio between correctly diagnosed cases and missed ones by the algorithm, Table 2b.Finally, its mean lead time is close to 2 years.
Overall, our results emphasize once again the benefit of using HE4 as a complementary biomarker that deserves further evaluation for the improvement of early detection of ovarian cancer compared with CA125 alone.

| DISCUSSION
The present study addresses two distinct issues-methodological, related to the integration of multiple longitudinal biomarkers into a single model, and practical, concerning the enhancement of performance rates in ovarian cancer detection through longitudinal data analysis.
We compared two longitudinal algorithms allowing the integration of more than one biomarker-a Bayesian change-point model and a LSTM architecture of the recurrent neural networks approach.Our findings show that the combination of longitudinal CA125 and HE4 levels outperforms the CA125-only model with both the changepoint model and the LSTM method, highlighting the complementary nature of HE4.The multimarker model did not improve the lead time but provided higher area under the ROC curve (AUC) and sensitivity at a fixed specificity, potentially improving early cancer detection.Importantly, this work illustrates the advantage of the methodology for the simultaneous analysis of multiple longitudinal biomarkers that may prove useful in any scenario where such biomarkers emerge, particularly in early cancer detection.
Previous research in ovarian cancer has mainly concentrated on multiple biomarkers at a single time point or on longitudinal CA125, the best-performing individual biomarker. 8,37,40However, the UKCTOCS trial demonstrated that monitoring CA125 alone does not provide significant mortality benefits. 5Consequently, there is an urgent need to explore additional potential biomarkers that could improve detection rates.Our previous work focused on the analysis of longitudinal HE4, CA72-4, and anti-TP53. 12he findings suggested that these biomarkers offer limited additional value to longitudinal CA125.However, the study was limited to using only the MMT approach thus emphasizing the significance of the present work.
The primary limitation of our study is the small sample size, which we attempted to address through the nested cross-validation approach.Both cancer cases and controls were randomly selected from the complete dataset.Since the UKCTOCS was a randomized trial, this selection process should mitigate potential confounding factors and biases.Another limitation is that only two approaches have been tested in this study, and the biomarker panel comprised only three proteins.We focused on currently available approaches for analyzing time-series biomarker data in the cancer setting, utilizing the available dataset.While we incorporated some of the most prominent biomarkers, it is important to note that future studies may achieve improved performance with a larger panel of biomarkers and potentially new statistical and AI methodologies.The primary strength of our work is the use of the unique dataset obtained from the UKCTOCS trial.In conclusion, our investigation provides evidence of enhanced performance using a combination of longitudinal biomarkers compared with the best individual longitudinal CA125.To the best of our knowledge, this is the first investigation in which statistical and artificial intelligence (AI) approaches have been employed and for the analysis of multiple longitudinal biomarkers, not only in the context of ovarian cancer but also in other healthcare settings.The findings from the current work could be applied to facilitate early detection, risk stratification, and the prevention and treatment of various diseases.The enhanced early detection capabilities of the CA125-HE4 multimarker model hold the potential to significantly improve patient outcomes by enabling timely diagnosis, assisting healthcare providers in more accurate clinical decision-making, and informing policymakers in shaping effective strategies for ovarian cancer screening and management.
These findings now warrant blinded validation in a larger longitudinal sample set to assess the potential for early detection in ovarian cancer screening.

ACKNOWLEDGMENTS
This study was funded by Cancer Research UK and EPSRC joint award EDDCPJT/100022.We thank all trial participants and all staff involved in the UKCTOCS trial.IPM acknowledges the support of grant PID2021-125159NB-I00 (TYCHE) funded by MCIN/AEI/10.13039/501100011033and by 'ERDF A way of making Europe'.MIK acknowledges support by a grant for research centers in the field of artificial intelligence, provided by the Analytical Center for the Government of the Russian Federation in accordance with the subsidy agreement (agreement identifier 000000D730321P5Q0002) and the agreement with the Ivannikov Institute for System Programming of the Russian Academy of Sciences dated November 2, 2021, No. 70-2021-00142.OB acknowledges support from Barts Charity (G-001522).

CONFLICT OF INTEREST STATEMENT
U.M. and I.J. declare financial interest through UCL Business and Abcodia Ltd in the third-party exploitation of clinical trial biobanks, which have been developed

2 | METHODS 2 . 1 |
Change-point detection algorithm 2.1.1 | Joint multivariable fully where (x) = 1∕(1 + exp( − x)) denotes the sigmoid function.The terms W c , W f , W i , and W o denote the weight matrices, with dimension 1 × H; the termsU c , U f , U i , and Uo denote the kernel matrices of size H × H; vectors b c , b f , b i , and b o denote the bias of size 1 × H.The terms W , U, and b are learned during training.
k , b k in ( One-sided p-values are determined from permutation test on the difference of mean AUC (or sensitivity) between diagnostic tests with respect to our baseline (CA125).Bold values indicate p < 0.05 considered as statistically significant. Note: (A) Results obtained from contingency tables at 90% specificity level.The tests are either based on recurrent neural networks (RNNs) or Bayesian change-point models (BCP).Estimations are based on the cross-validation procedure.(B) Number of correctly diagnosed and missed cases.For the calculation, we detect the earliest observation per patient considered as abnormal using a threshold at 90% specificity.Values correspond to the total number over all outer folds.
T A B L E 2Note: Summary statistics of model-based (joint multivariable and univariate) lead time for the detection of ovarian cancer at 90% specificity.Estimations are based on cross-validation procedure.
T A B L E 3