Towards a machine‐learning assisted diagnosis of psychiatric disorders and their operationalization in preclinical research: Evidence from studies on addiction‐like behaviour in individual rats

Abstract Over the last few decades, there has been a progressive transition from a categorical to a dimensional approach to psychiatric disorders. Especially in the case of substance use disorders, interest in the individual vulnerability to transition from controlled to compulsive drug taking warrants the development of novel dimension‐based objective stratification tools. Here we drew on a multidimensional preclinical model of addiction, namely the 3‐criteria model, previously developed to identify the neurobehavioural basis of the individual's vulnerability to switch from controlled to compulsive drug taking, to test a machine‐learning assisted classifier objectively to identify individual subjects as vulnerable/resistant to addiction. Datasets from our previous studies on addiction‐like behaviour for cocaine or alcohol were fed into a variety of machine‐learning algorithms to develop a classifier that identifies resilient and vulnerable rats with high precision and reproducibility irrespective of the cohort to which they belong. A classifier based on K‐median or K‐mean‐clustering (for cocaine or alcohol, respectively) followed by artificial neural networks emerged as a highly reliable and accurate tool to predict if a single rat is vulnerable/resilient to addiction. Thus, each rat previously characterized as displaying 0‐criterion (i.e., resilient) or 3‐criteria (i.e., vulnerable) in individual cohorts was correctly labelled by this classifier. The present machine‐learning‐based classifier objectively labels single individuals as resilient or vulnerable to developing addiction‐like behaviour in a multisymptomatic preclinical model of addiction‐like behaviour in rats. This novel dimension‐based classifier increases the heuristic value of these preclinical models while providing proof of principle to deploy similar tools for the future of diagnosis of psychiatric disorders.

Postdoc Mobility Fellowship, Grant/Award Number: 191274 Edited by: Venetia Zachariou vulnerable to developing addiction-like behaviour in a multisymptomatic preclinical model of addiction-like behaviour in rats. This novel dimension-based classifier increases the heuristic value of these preclinical models while providing proof of principle to deploy similar tools for the future of diagnosis of psychiatric disorders.
K E Y W O R D S addiction, clustering, individual vulnerability, machine learning, neural networks, substance use disorder The past decades have been the stage of a profound change in the conceptualisation of the delineation of so-called abnormal from adaptive psychology, with a progressive transition from a categorical to a multidimensional approach to the diagnosis of psychiatric disorders (Brooks et al., 2017;Ford et al., 2014;Insel, 2014;Woody & Gibb, 2015), increasingly reliant on transdiagnostic endophenotypes of vulnerability. This ongoing transition at the clinical level has occurred in conjunction with an increasing interest in the preclinical field of substance use disorder (SUD) in the neurobehavioural basis of the individual vulnerability to switch from controlled to persistent, or compulsive, drug seeking and/or taking (Augier et al., 2018;Cannella et al., 2020;Harada et al., 2021;Kasanetz et al., 2010Kasanetz et al., , 2013Luscher et al., 2020;Pascoli et al., 2018;Pohorala et al., 2021;Radwanska & Kaczmarek, 2012). These developments in both fields warrant the development of new dimensionbased diagnostic or stratification tools aiming objectively and systematically to identify vulnerable and resilient individuals, a necessary step towards the standardisation of translational research in SUD in particular and in psychiatry in general.
Like for many psychiatric disorders, the presence of a triggering factor, such as exposure to a drug in the case of SUD, is not sufficient for the development of the several behaviours, which by the extreme nature of their manifestation alongside the continuum of their respective dimensions, are characteristic of the pathology. This individual vulnerability to transition from controlled, recreational drug use to the compulsive drug seeking and taking behavior that characterizes SUD (APA, 2013) has long been suggested to stem from the interaction between environmental, psychological, neurobiological and behavioural factors (Anthony et al., 1994;Conway et al., 2002;Ersche et al., 2012Ersche et al., , 2020Grant et al., 2001Grant et al., , 2004Swendsen et al., 2009Swendsen et al., , 2010. However, it is difficult to identify and study the biobehavioural basis of the factors conferring this vulnerability in humans, not least because such endeavours require the study of large populations across their lifetime in controlled conditions, with little if any opportunity to carry out the invasive manipulations that are necessary to identify the underlying neural and cellular mechanisms. Over the past two decades, preclinical models have progressively evolved to incorporate the importance of these individual differences, thereby offering unique opportunities to overcome these limitations by using prospective longitudinal studies to investigate the psychological and neural basis of the vulnerability to developing addiction-like behaviour (Belin-Rauscent et al., 2016). Indeed, as in humans, all individual rats that regularly self-administer or seek addictive drugs do not necessarily lose control over drug intake and develop persistent, compulsive drug-seeking and taking behaviours. In this context, in the early 2000's, a multidimensional model of addiction was developed (Deroche-Gamonet et al., 2004) based on the intersectionality of specific behavioural characteristics that are the operationalization of DSM-IV criteria (APA, 1994), namely, increased motivation to take the drug, inability to refrain from drug-seeking and continued drug use despite knowledge of aversive consequences. This approach enables the identification of divergent trajectories of the transition from controlled to compulsive drug intake in that only 20% of a given population of outbred rats exposed to cocaine eventually displays the three-behavioural criteria following a prolonged (>60 daily sessions) history of self-administration. Importantly, rats identified as displaying the 3-criteria for addictive behaviours also show an increased tendency to escalate their drug intake when access is illimited, and they are prone to relapse following abstinence (Belin et al., 2009), thereby displaying additional behavioural manifestations reminiscent of diagnostic criteria for which they were not selected. The construct and predictive validity of the 3-criteria model were further substantiated as these differences between vulnerable and resilient rats are not due to a differential cocaine exposure since all rats self-administer the same amount of drug before being identified as 3 vs 0 criteria (Deroche-Gamonet et al., 2004). However, while they do not take more cocaine, 3crit rats develop a binge-like pattern of intake that precedes the transition to addiction (Belin et al., 2009).
The 3-criteria model has led to several breakthroughs in our understanding of the vulnerability to addiction which together represent a unique success story of translational research. This model first helped establish that impulsivity (Belin et al., 2008) and boredom susceptibility (Belin et al., 2011) confer a vulnerability to switch from controlled to compulsive cocaine intake. In contrast, both sensation seeking, assessed as a greater locomotor response to novelty, and sign tracking, which predict an increased tendency to initiate drug self-administration and to respond to drug-paired cues, respectively, were revealed to confer resilience to addiction-like behaviours (Belin et al., 2008;Fouyssac et al., 2021). These observations in rats have paved the way for studies in humans confirming that the factors associated with recreational cocaine use are dissociable from those specifically associated with the transition to SUD (Ersche et al., 2010). The evidence of a causal relationship between a high impulsivity trait and the subsequent vulnerability to developing compulsive behaviours (Ansquer et al., 2014;Belin et al., 2008) has far-reaching implications for our understanding of the neural basis of addiction (Besson et al., 2013;Dalley et al., 2007;Fouyssac et al., 2021), which the model helped reveal to be very different to the biological responses to drug exposure. Thus, the tendency to persist in drug taking despite adverse consequences is associated with rigidity, not an exacerbation, as it is the case following single or repeated administrations of drugs, of drug-induced synaptic plasticity (Kasanetz et al., 2010;Pascoli et al., 2014;Ungless et al., 2001).
This multidimensional approach has since been applied to the study of the neural and psychological basis of the vulnerability to developing alcohol use disorder (AUD) (Jadhav et al., 2017(Jadhav et al., , 2018 or compulsive-like food seeking (de Jong et al., 2013), illustrating the high translational value of novel preclinical models that encapsulate the multidimensional nature of SUD and the importance of focusing on the individual.
However, these procedures are all dependent on defining a threshold above which a behaviour is deemed maladaptive; strikingly where the cursor should be placed on a continuum to consider a behavior abnormal is a very challenging question, especially at a time of a transition from categorical to dimensional approaches. For example, in the 3-criteria model, the threshold used for an individual to be deemed positive for each addiction-like criterion is determined by the unique physical properties of the distribution of the population for one of the three criteria, namely resistance to punishment (measured as infusions during punishment as a percentage of the baseline number of infusions) (Belin et al., 2008(Belin et al., , 2011Deroche-Gamonet et al., 2004;. Along the dimension of compulsiveness, while the majority of any population (60 to 70%) belongs to a log normal distribution ranging from 0-30% resistance, the remaining 30-40% belongs to an abutting normal distribution centred on 85-100% resistance. The bimodal distribution of this dimension, which ranges from noncompulsive to absolutely persistent, compulsive drug self-administration, offers an objective threshold selection for the associated criterion, but its application to the two other criteria, inability to relinquish drug seeking even in the absence of the drug and high motivation for the drug, which both follow a lognormal distribution (see Belin & Deroche-Gamonet, 2012 for review), relies on the assumption that a similar rupture in the continuum exists in them too, which is an inherent limitation. In addition, a distribution-based threshold selection to ascribe diagnostic scores puts too large an emphasis on the population to which each individual belongs, the physical properties of which eventually contribute almost as much as the individual characteristics themselves to its characterization as 'addicted-like' or 'resilient'. This thereby precludes the determination of the vulnerability status of a given individual considered independently of a particular cohort, as is the case in humans, a limitation of the underlying approach, which together with the associated need to train large cohorts at once for long periods of time, may have hindered the development of preclinical and/or translational research programmes using this or similar multi-dimensional preclinical models of addiction.
Recent developments in machine learning may offer unprecedented means to overcome these limitations as they have been suggested to be ideal tools for the refinement of the classification of individuals along dimensions, including psychiatric patients, within subgroups with shared underlying endophenotypes, an approach necessary for the implementation of more effective, personalised therapeutic strategies (Bzdok & Meyer-Lindenberg, 2018). These approaches also have an advantage over classical statistics (e.g., null hypothesis testing, ANOVAs), because they uncover substructures/ subgroups in data without necessarily receiving specific instructions, e.g., in the absence of any a priori hypothesis with regards to the data structure, and yet they endow the classifiers they underlie with the ability systematically to extrapolate patterns learnt from the data with which they are trained to entirely new data sets with individual precision.
Hence, here we used the 3-criteria multidimensional models for cocaine or alcohol addiction to test the potential of machine-learning assisted classifiers to identify individuals with or without addiction-like behaviour in drawing on diagnostic-relevant dimensions of addiction (APA, 1994(APA, , 2013. For this, we subjected the individual scores in each of the three addiction-like behaviours to different clustering algorithms and then validated the labels using supervised prediction algorithms.

| Data
Data from four published papers (Belin et al., 2008(Belin et al., , 2009(Belin et al., , 2011Fouyssac et al., 2021) were used to assess addictionlike behavior in cocaine. The data from the first three studies were pooled as a heterogenous cohort of 88 individuals (All_data_cocaine.csv) to train the classifier, while the data (n = 36, Cocaine_independent_dataset. csv) of the most recent publication (Fouyssac et al., 2021) were used as a completely independent dataset to test its generalisability and accuracy. Of the 11 studies published so far using this model (Belin et al., 2008(Belin et al., , 2009(Belin et al., , 2011Cannella et al., 2013Cannella et al., , 2018Cannella et al., , 2020Deroche-Gamonet et al., 2004;Fouyssac et al., 2021;Kasanetz et al., 2010Kasanetz et al., , 2013Pohorala et al., 2021), those selected for the present study were the only ones using footshock as punishment that also provided clear delineation of each of the four criteria groups and easy access to distributions. They also ensured the robustness and generalisability of the classifier that was initially intended as they encompass a large experimental and individual heterogeneity including Sprague Dawley or Lister Hooded rats that nose-poked or lever pressed for cocaine and were housed in different conditions.
For addiction-like behaviour for alcohol, data were pooled from two published (Jadhav et al., 2017(Jadhav et al., , 2018 and one unpublished experiment with a cohort of 150 rats (All_data_alcohol.csv).
The instrumental responses (i.e., active lever presses/ nose pokes) performed in each of the three behavioral tests (termed 'raw data') were used as the three dimensions injected in the algorithms) namely, increased motivation to take the drug, as measured under a progressive ratio schedule of reinforcement, inability to refrain from drug seeking, as measured during two periods within each daily session during which a discriminative stimulus signals that instrumental responding does not give access to the drug, and maintained drug use despite aversive consequences (compulsivity), measured as the persistence of responding despite punishment of the instrumental response. These dimensions have been shown to represent marginally overlapping, complementary aspects of addiction-like behaviour (Belin et al., 2008;Deroche-Gamonet et al., 2004;Jadhav et al., 2017).
The large datasets were split 50 times into 50 different training (67%) and test sets (33%) to avoid a cohort-driven bias in the clustering of individual rats (Figure 1, ①).

| Algorithm
Individuals consuming drugs can be categorized as resilient or vulnerable to the development of SUD, the latter further being distributed along a clinical continuum of severity (Aguilar et al., 2020;Ersche et al., 2020;Morrow & Flagel, 2016), suggesting that any population could be segregated into two clusters. Nevertheless, the F I G U R E 1 Workflow of the machine learning classifier. The steps are illustrated as numbers in the circles. Clustering algorithms used are Gaussian mixture method and Kmean/K-median clustering. Classification algorithms used are K nearest neighbor, logistic regression, support vector machines and artificial neural networks. The blue arrows indicate the clustering algorithms and the green arrows indicate the classification algorithms optimal cluster number to be used in the classifier was determined by subjecting the 50 training and 50 test sets (i.e., 100 sets) (Cocaine_cluster_number_files.zip, Alcohol_cluster_number_files.zip) to the Silhouette algorithm to inform the expected cluster number based on the actual experimental dataset (cluster_numbers_cocaine.py, cluster_numbers_alcohol.py) and the one most commonly informed by the silhouette algorithm across 100 iterations, was included as an input in the clustering algorithms of the classifiers tested in the study. Behavioural data of a single pair of Training and Test set was subjected to unsupervised clustering algorithms (Figure 1, ②,③) (namely Gaussian mixture model (GMM) (Reynolds, 2009) or K-mean/K-median clustering (Forgy, 1965)) (SOM) to determine resilient and vulnerable rats in both sets (Figure 1, ④,⑤).
We used four supervised classification algorithms (SOM), namely K-nearest neighbour (KNN) (Forgy, 1965), logistic regression (LR) (Cramer, 2002), support vector machines (SVM) (Pedregosa et al., 2011) and artificial neural networks (ANN) (Zou et al., 2008) ( Figure 2) to fit the behavioural data of the Training Set and the labels assigned by the clustering algorithm to generate a mathematical model that best explains the behavioural data and the labels of the rats in the Training Set (Figure 1, ⑥). For ANN, increasing numbers of hidden layers were used (5, 50 and 500), keeping the number of neurons in each layer constant, to test both a potential tendency to overfit and the ability of the algorithm to accommodate larger sample sizes in the future. Then, to predict the labels of the rats belonging to the Test set, their behavioural data were submitted to the mathematical model (Figure 1, ⑦) generated by each of the four supervised classification algorithms.
When submitted to these mathematical models, the behavioural data of the Test Set is used to ascribe a resilient or vulnerable label to each rat of the Test Set (Figure 1, ⑧) (Pedregosa et al., 2011). Thus, each rat in the Test set is ascribed two labels, one by the unsupervised-clustering algorithm and one by a particular supervised-prediction algorithm. The goal of this approach is to determine the unsupervised clusteringsupervised prediction combination that yields overlapping labels for the Test Set rats (Figure 1, ⑨).
The labels assigned to the Test set rats by the clustering algorithm (considered here as true labels) and the predicted labels of the same rats by a supervisedprediction algorithm can be represented in a classification matrix (Table 1) Each pair of Training and Test Set were subjected to this pipeline four times, i.e., GMM-clustering followed by the four supervised prediction algorithms. As mentioned previously, there were 50 pairs of Training and Test sets, so that for each combination of GMM clusteringsupervised prediction algorithm, 50 iterations were processed, resulting in 50 accuracy, precision, recall and AUC ROC scores. Similarly, the same procedure was followed for K-median/K-mean clustering followed by four supervised prediction algorithms. Results are depicted as kernel density estimates of the probability density function of these 50 iterations for all four performance evaluation metrics for each combination of unsupervised clustering-supervised prediction algorithm.

| RESULTS
For addiction-like behaviour for cocaine, the Silhouette score revealed the optimal number of clusters was '2' in 88% (Training sets-K-median clustering), 76% (Test sets-K-median clustering), 74% (Training sets-GMM clustering) and 76% (Test sets-GMM clustering), while the second most commonly suggested cluster number ranged from 3 to 6. Similarly, for addiction-like behaviour for alcohol, the optimal number of clusters was '2' in 74% (Training sets-K-mean clustering), 78% (Test sets-Kmean clustering), 96% (Training sets-GMM clustering) and 70% (Test sets-GMM clustering), while the second most commonly suggested cluster number ranged from 3 to 6. This analysis confirmed that two clusters should be used in subsequent analyses.
For addiction-like behaviour for cocaine, K-median-KNN, K-median-LR and K-median-SVM (Kmedian_cocaine.py) classifiers yielded similar scores (Table 2A, Figure 3) that were overall superior to GMM-KNN, GMM-LR and GMM-SVM (GMM_cocaine.py) classifiers with regard to median accuracy, precision, recall and ROC-AUC scores as well as the proportion of iterations reaching the top ten percentile, respectively (Table 2B, Figure 4). The K-median-ANN classifier and GMM-ANN classifier gave similar median accuracy and ROC-AUC scores and resulted in a similar proportion of these scores being in the top ten percentile (Tables 2A, 2B, Figure 3).
The performance of the K-median-and GMM-ANN or K-mean-and GMM-ANN classifiers was further improved by an increase in the number of hidden layer neurons used in the ANN (Tables 2B and 3B). The ensuing increase in accuracy thereby demonstrates the ability of the K-median/K-mean-ANN classifiers to accommodate larger sample sizes and/or more dimensions, a feature that is not reflective of overfitting since back propagation and early stopping processes were included in the ANN (Caruana et al., 2001).
Together, these results demonstrate that a classifier based on K-median/K-mean followed by ANN is the most robust and future-and dimension expansion-proof approach to accurately predict whether a single rat is vulnerable or resilient as assessed in our multisymptomatic T A B L E 2 A Classifier based on K-median clustering followed by supervised algorithm-based predictions for addiction-like behavior for cocaine model with great heuristic value with regards to the clinical definition of SUD ( Figure 5). To cross validate the classifier, the entire datasets related to cocaine and alcohol addiction-like behaviour (n = 88, n = 150, respectively) were subjected to the Kmedian (All_cocaine_Kmedian.py)/K-mean clustering (All_alcohol_Kmean.py). All the rats originally characterized as 0 or 3crit in their respective cohorts were correctly labelled as resilient or vulnerable, respectively, revealing an absolute intersection (Tables 4A and 4B) (Cocaine_crit_correspondence.xlsx, Alcohol_crit_correspondence.xlsx).
Finally, in order to establish the predictive potential of the classifiers we developed, we applied them to a completely new dataset (Fouyssac et al., 2021) that consists of the 3-criteria behavioural scores of a cohort of 36 rats housed either in a standard (two individuals in a standard cage) or an enriched environment. While replicating previous findings that environmental enrichment decreases the tendency to self-administer cocaine (Bardo et al., 2001;Puhl et al., 2012), this study demonstrated that rats housed in an enriched environment were more vulnerable to developing addiction-like behaviour than rats raised in a standard environment (Fouyssac et al., 2021) in that all the 3crit rats identified in this heterogeneous cohort came from the former. In line with the original study, none of the rats from the standard housing group were labelled as vulnerable by the classifiers, while those identified as vulnerable overlapped with 100% accuracy with those identified as 3crit that came from the enriched environment (EEES.xls) (Fouyssac et al., 2021).

| DISCUSSION
The next frontier in addiction research lies in understanding the environmental, psychological and biological mechanisms that mediate, in vulnerable individuals, the transition from controlled drug intake to the compulsive seeking and taking characteristic of SUD. Behavioural procedures that enable the study, under controlled conditions, of individual trajectories from a drug naïve state to the development of addiction-like behaviour over the course of drug self-administration have only started to demonstrate their utility in our understanding of the mechanisms of individual vulnerability to addiction (Belin et al., 2008(Belin et al., , 2009Besson et al., 2013;Deroche-Gamonet et al., 2004;Fouyssac et al., 2021;Jadhav et al., 2017Jadhav et al., , 2018. These procedures have hitherto been limited by a lack of an objective diagnosis strategy, i.e., one that is not influenced by the physical datadistribution properties of the cohort to which an individual belongs, thereby resulting in the unwarranted need to train large cohorts of animals at any given time and detracting the approach from the individual-centred diagnosis in humans.
In this study, we drew on large datasets produced over the past two decades, to develop new machinelearning asssisted classifiers for cocaine or alcohol addiction-like behaviour that characterize with high accuracy single individuals, irrespective of the cohort to which they belong, as resilient or vulnerable.
The role of clustering algorithms is to identify data points in a multidimensional space that are closer to one another than they are to any other data point in the cloud (Fung, 2001). In many real-life situations, the labels of such data-points are obvious, e.g., males vs females for biological differences, or voted for/against Brexit. In these situations, data clustering is not necessary. However, ascribing labels, such as those to determine if an individual meets the criteria of addiction-like behaviour, cannot be informed by natural dichotomic population segregation. This requires structuring a multidimensional space into delineated subspaces which can be used to ascribe a specific label to each individual constituent of the cluster and to train supervised classification algorithms in order to successfully predict the label, i.e., the specific cluster to which they most likely belong of a single individual whose data has never been used to train the classification algorithm.
The first step of such an algorithm's development was to objectively determine the cluster number to structure the multidimensional cloud to accommodate the physical properties of the data and the objective of the classifier. In real life, individuals can be categorized as vulnerable or resilient, thereby suggesting that any experimental population could be segregated into two clusters (Aguilar et al., 2020;Ersche et al., 2020;Morrow & Flagel, 2016). Nevertheless, a data-driven approach was used to ensure that such a dichotomy was present in the experimental datasets. The Silhouette algorithm ran on all the datasets used here systematically revealed that the T A B L E 2 B Classifier based on GMM clustering followed by supervised algorithm-based predictions for addiction-like behavior for cocaine multidimensional space of the datasets was predominantly structured around two clusters, an outcome that is compatible with the prerequisite of the algorithm: to segregate two subpopulations from heterogeneous groups, namely vulnerable and resilient individuals. This also provided an unbiased threshold for the various cluster analyses (GMM and K-median/K-mean) used in the several potential classifiers tested in this study. Identification of resilient or vulnerable rats in the 3-criteria model was hitherto based on the bimodal distribution of each population for resistance to punishment (Belin et al., 2008;Deroche-Gamonet et al., 2004;Jadhav et al., 2017) which comprises a large log-normally distributed subpopulation of non-compulsive rats (60-70% population) tailed by an independent, normally distributed population of compulsive rats (30-40% population). Since GMM-clustering can fit bi/multimodal data distributions (Lubke & Muthen, 2005), it was originally used to assimilate such physical properties on which depends the selection threshold for addiction-like behaviour. However, the GMM-based classifier did not yield outputs superior to the K-median/K-mean-based classifier. This surprising outcome can be due to the fact that a GMM classifier, in contrast with the strategy we developed to apply the 30-40% threshold to the other two criteria, each characterized by a log-normal distribution, uses differential densities across quartiles in each variable independently to develop the classifier.
K-median/K-mean clustering, which is based on the Euclidean distance between the data-points in a threedimensional vector space that plots the number of responses along three axes representing three different psychological constructs, was revealed to be the superior clustering method to accurately and consistently ascribe labels of resilience vs vulnerability. Not only are Kmedian/K-mean algorithms easy to implement, but they are scalable and can be used to separate nonlinearly separable data. These properties were exploited to develop a robust and universal classifier. Thus, the same clustering algorithms were applied to 50 independent sets drawn from a large dataset comprising data from experiments carried out in different laboratories, on different strains (Sprague-Dawley (Belin et al., 2009(Belin et al., , 2011 or Lister-Hooded (Belin et al., 2008) for cocaine addictionlike behaviour or Wistar rats (Jadhav et al., 2017(Jadhav et al., , 2018) for alcohol addiction-like behaviour that differed in addiction relevant traits (McDermott & Kelly, 2008) and using different instrumental responses (nose-pokes (Belin et al., 2009(Belin et al., , 2011 or lever presses (Belin et al., 2008)). The ability of the classifier to survive randomization tests and to generalize across response modalities and strains demonstrates its potential use across a large repertoire of experimental idiosyncrasies that may reflect the behavioural heterogeneities observed by clinicians when a diagnosis is warranted. Furthermore, the ability of the K-median/K-mean- supervised algorithm classifiers to accurately identify rats as being vulnerable to addiction or resilient from a completely different dataset generated with a heterogeneous cohort exposed to very different housing conditions to those used in the experiments exploited for the training datasets (Fouyssac et al., 2021) indicates that T A B L E 3 B Classifier based on GMM clustering followed by supervised algorithm-based predictions for addiction-like behavior for alcohol these tools can be deployed across many diverse experimental conditions. Nevertheless, the same classifier could not be generalized from addiction-like behavior for one drug to another drug. While the K-median clustering-based algorithm used for addiction-like behaviour for cocaine systematically yielded the right vulnerability/resilient labels, even when applied to a dataset never used in its development (the enriched environment experiment (Fouyssac et al., 2021)), it was suboptimal in the case of addictionlike behaviour for alcohol, the best classifier for which was based on K-mean clustering. The lack of generalizability of a given classifier across drugs is further evidence of construct and predictive validity since AUD and SUD are independent diagnoses in humans and they have long been shown to involve different psychological and neurobiological mechanisms (Nestler, 2005). In addition, while the 3-criteria model for cocaine relies on the assessment of compulsive cocaine intake (consummatory conflated with preparatory responses as is the case under fixed-ratio schedules of reinforcement) (Belin-Rauscent et al., 2016), the 3-criteria model for alcohol is based on the assessment of the compulsive nature of a seeking response in a chained schedule where lever pressing results in the procurement of alcohol, the ensuing consumption of which occurs in a dedicated magazine, involving a set of behavioural responses independent of the instrumental component of the chain. Considering how neurally and psychologically dissociable preparatory and consummatory responses are (Blackburn et al., 1989;Everitt, 1990), it was not expected that a single classifier could be used across measures of compulsive taking and seeking. However, it will be interesting to test in future studies if the alcohol-specific K-mean classifier can be applied to compulsive cocaine seeking data, as measured under second-order or seeking-taking heterogeneous chained schedules of reinforcement, which dissociate seeking from taking/consummatory responses for orally  and intravenously administered drugs (Everitt et al., 2018;Fouyssac et al., 2022;Murray et al., 2012;Pelloux et al., 2015).
Irrespective of the outcomes of these future studies, a one-size-fit-all approach does seem to be not an optimal expectation, and further research is needed to consider the ability of such a classifier to accommodate the potential differences in the multidimensional relationship between addiction-related behavioural criteria that may exist between males and females, which, to the best of our knowledge, have not yet been experimentally investigated. Another important avenue for future research is to identify mathematical tools that will enable the introduction of dimensionality within the categories that are now identified accurately with the K-mean/K-median classifiers. The 3-Criteria model was designed to have construct validity with regards to the diagnosis strategy of DSM-IV (APA, 1994), i.e., prior to the development of the RDoC (Brooks et al., 2017). Nevertheless, the approach we had then developed embedded a dimensional aspect, in that rats were not only stratified as showing 0 criterion or 3 criteria (deemed resilient and showing addiction-like behaviour, respectively), but 30-40% of any population was also stratified as showing 1 or 2 criteria. In some studies, 1crit and 2crit rats have been considered similar to 0crit and 3crit, respectively (Cannella et al., 2018;Domi et al., 2019;Jadhav et al., 2017), but molecular data, at least for cocaine addiction-like behaviour, support the notion that 1 and 2 criteria rats represent an intermediate stage that is different to 0 and 3 criteria rats.
As all the resilient rats identified by our classifier included 0crit, most of the 1crit and no 3crit rats, whereas all the rats identified as vulnerable included 3crit and most of the 2crit but no 0crit rats (Cocaine_crit_correspondence.xlsx, Alcohol_crit_correspondence.xlsx), it can be suggested that the present classifier does not yet provide the dimensional granularity necessary to distinguish several levels of severity (2crit vs 3crit) within the vulnerable population, thereby warranting further research to determine whether the addition of endophenotypes (Belin et al., 2008(Belin et al., , 2011Jadhav et al., 2017Jadhav et al., , 2018 to our classifiers will enable them to fully comply with the dimensional nature of the debilitating condition that is SUD. This could contribute to the several initiatives to identify clinically relevant subtypes of SUD (Leggio et al., 2009) through cluster analysis of patients to better capture the clinical heterogeneity (Blanco et al., 2013;Herzig et al., 2015;Kupfer et al., 2008;Kwako et al., 2016Kwako et al., , 2019 with the aim of advancing personalized medicine (Mann & Hermann, 2010;Witkiewitz et al., 2019).

| Conclusion
The present machine learning-based classifiers represent a unique tool to objectively identify whether a single experimental subject is resilient or vulnerable to cocaine or alcohol addiction-like behaviour ( Figure 5). The ability conferred by such a tool to consider a single individual irrespective of the experimental cohort to which it belongs (and the associated experimental conditions) bridges a new frontier in the study of the individual vulnerability to developing SUD, bringing the focus back on the individual, as it is the case in humans. It can be boldly envisioned that, with the advent of large data sets in humans from imaging, genomics and proteomic approaches, a successful back-translation strategy could see the application of such machine learning-assisted tools to the personalized diagnosis of clinical populations.