Prediction of cell cycle distribution after drug exposure by high content imaging analysis using low‐toxic DNA staining dye

Interference in cell cycle progression has been noted as one of the important properties of anticancer drugs. In this study, we developed the cell cycle prediction model using high‐content imaging data of recipient cells after drug exposure and DNA‐staining with a low‐toxic DNA dye, SiR‐DNA. For this purpose, we exploited HeLa and MCF7 cells introduced with a fluorescent ubiquitination‐based cell cycle indicator (Fucci). Fucci‐expressing cancer cells were subjected to high‐content imaging analysis using OperettaCLS after 36‐h exposure to anticancer drugs; the nuclei were segmented, and the morphological and intensity properties of each nucleus characterized by SiR‐DNA staining were calculated using imaging analysis software, Harmony. For the use of training, we classified cells into each phase of the cell cycle using the Fucci system. Training data (n = 7500) and validation data (n = 2500) were randomly sampled and the binary classification prediction models for G1, early S, and S/G2/M phases of the cell cycle were developed using four supervised machine learning algorithms. We selected random forest as the model with the best performance through 10‐fold cross‐validation; the accuracy rate was approximately 75%–87%. Regarding feature importance, variables expected to be biologically related to the cell cycle, for example, signal intensity and nuclear size, were highly ranked, suggesting the validity of the model. These results showed that the cell cycle can be predicted in cancer cells by simply exploiting the current prediction model using fluorescent images of DNA‐staining dye, and the model could be applied for the use of future ex vivo drug sensitivity diagnosis.

indicator (Fucci).Fucci-expressing cancer cells were subjected to high-content imaging analysis using OperettaCLS after 36-h exposure to anticancer drugs; the nuclei were segmented, and the morphological and intensity properties of each nucleus characterized by SiR-DNA staining were calculated using imaging analysis software, Harmony.For the use of training, we classified cells into each phase of the cell cycle using the Fucci system.Training data (n = 7500) and validation data (n = 2500) were randomly sampled and the binary classification prediction models for G1, early S, and S/G2/M phases of the cell cycle were developed using four supervised machine learning algorithms.We selected random forest as the model with the best performance through 10-fold cross-validation; the accuracy rate was approximately 75%-87%.Regarding feature importance, variables expected to be biologically related to the cell cycle, for example, signal intensity and nuclear size, were highly ranked, suggesting the validity of the model.These results showed that the cell cycle can be predicted in cancer cells by simply exploiting the current prediction model using fluorescent images of DNA-staining dye, and the model could be applied for the use of future ex vivo drug sensitivity diagnosis.

K E Y W O R D S
cell cycle, cell line, high-content imaging analysis, machine learning

| INTRODUC TI ON
Bioactive compounds exert some effect on living cells, causing phenotypic changes.Among these, those that are found to affect the (Cdc10 dependent transcript), which is abundantly expressed in Gap 1 phase (G1) and degraded in S phase, and a green fluorescent protein called monomeric Azami-Green (mAG) to Geminin, which is abundantly expressed in S phase and degraded in late mitotic phase (M phase) to early G1 phase. 2 However, it takes some effort to introduce Fucci constructs into a cell line with lentiviral vectors and make them available for experimental use, and sometimes it is particularly difficult to perform lentiviral introduction to primary tumor cells.
The development of a high-content imaging system in combination with the low-toxic DNA-staining dye SiR-DNA 3 enables time-lapse imaging of cultured cells to obtain real-time high-content imaging data such as cell nucleus area, fluorescence intensity, and texture analysis. 4 this study, we aimed to develop a prediction model of the cell cycle using high-content imaging data from SiR-DNA staining.

| Image analyses using Harmony software
The captured images were analyzed using Harmony software (Revvity).
First, we applied a sliding parabola (SP) filter to reduce background noise, and nuclei were segmented using the FindNuclei building block.
Basic morphology (i.e., area, roundness, width, and length) and intensity properties (i.e., mean, standard deviation, median, max, min, sum, coefficient of variance, quantile-50%, and contrast) of each ROI (region of interest) of the segmented nucleus were calculated using the Calculate Intensity Properties (Standard) building block.To capture morphological properties, we utilized the Calculate Morphology Properties (Standard and STAR) and Calculate Texture Properties (SER and Haralick) building blocks.The SER (Spots, Edges, and Ridges) method uses eight different filters (bright, ridge, dark, saddle, edge, spot, hole, and valley) that highlight different patterns of staining intensity.The STAR method was used to calculate morphological properties (i.e., symmetry, threshold compactness, axial, radial, and profile) for each ROI.When running the STAR method, either of eight SER filters were applied.Through these analyses, a total of 240 numerical parameters were generated.Detailed phenotypic profiles using texture and advanced morphology analyses are described elsewhere. 6

| Data preprocessing for supervised machine learning
In this study, we used complete data without missing measurements.We determined the thresholds from the distribution of mean fluorescence intensity within the segmented ROI with respect to the two Fucci colors to be used as explanatory variables for classifying nuclei into four groups, G1, early S, S/G2/M, and double negative (DS).For HeLa/Fucci cells, we defined G1 phase: mKO2 ≧700/mAG < 2500; early S phase: mKO2 ≧700/ mAG ≧2500; S/G2/M phase: mKO2 < 700/mAG ≧2500; DS: mKO2 < 700/mAG < 2500, respectively.For MCF7/Fucci cells, we defined G1 phase: mCherry ≧150/mAG < 300; early S phase: mCherry ≧150/mAG ≧300; S/G2/M phase: mCherry <150/mAG ≧300; DN: mCherry <150/mAG <300, respectively.In addition, apoptotic nuclei with CV values greater than 70% were excluded to avoid the unfavorable effects of substantial contamination of dead cells in MCF7 experiments.Then nuclei for training (n = 7500) and validation (n = 2500) were extracted by random sampling.In order to unify the scale among the explanatory variables, normalization was performed so that the data fit between 0 and 1 based on the minimum and maximum values.

| Construction of cell cycle prediction model
We constructed a binary classification prediction model with three patterns defined from Fucci's information: "G1" or "else," "early S" or "else," and "S/G2/M" or "else."The following four models were used in this study: logistic regression as a representative model of classification models in statistical methods; support vector machine (SVM), which was used mainly before the advent of deep learning and still has relatively high classification performance; random forest, a model that creates a large number of decision trees in parallel; and neural network (3 hidden layers of a sequential model), a model that is considered to be the basic model of deep learning.The model development methods are described in the Data S1.We developed these models and compared their performance by performing a 10-fold cross validation.The validation data were then used to evaluate the model performance of the selected models.Accuracy, sensitivity, specificity, and Area Under the Curve were calculated as evaluation indices.Additionally, we calculated the importance of the explanatory variables for the selected models.Histograms were drawn for the variables with the highest importance for each group defined by Fucci, and together with the importance of the explanatory variables, the validity of the predictive models was confirmed.

| Development of cell cycle prediction models using high-content imaging data with SiR-DNA staining
To develop cell cycle prediction models using high-content imaging data after cell staining with SiR-DNA, we used two Fucci-expressing cancer cell lines, HeLa/Fucci and MCF7/Fucci.We first obtained fluorescent images of cells under the indicated experimental conditions (Figure S1) and prepared a dataset for supervised machine learning.In brief, nuclei were segmented, and 240 numerical parameters characterizing the morphological properties of each nucleus were generated by imaging analysis using SiR-DNA-stained images.At the same time, the segmented nuclei were classified into each phase of the cell cycle using the Fucci system.We then developed binary classification prediction models for each cell cycle phase ("G1" or "else"; "early S" or "else"; "S/G2/M" or "else") using the 240 parameters from the training data (n = 7500) by exploiting four machine learning algorithms as described in materials and methods.A comparison of the accuracy rates of the HeLa prediction models in a 10-fold cross-validation revealed that the random forest model exhibited the highest accuracy of all models used for G1-, early S-, and S/G2/M prediction (Table 1, upper panel).Interestingly, similar results were observed in another dataset obtained from experiments using MCF7/Fucci cells.
Using the validation dataset (n = 2500), we evaluated the prediction performance of the selected random forest models and found that they exhibited preferable scores, except for the models predicting early S with extremely low sensitivity (Table 1, lower panel).
Therefore, we decided to use the G1 and S/G2/M prediction models in the subsequent analysis.

| Cell cycle prediction in cell population after exposure to antitumor agents
To evaluate the cell cycle prediction performance after drug exposure, the breakdown of the nuclei counts predicted as "G1" or "else," "S/G2/M" or "else," along with the count in each phase of the cell cycle classified by the Fucci system, is shown in Figure S2.
In this study, we selected five anticancer compounds with various modes of action: an antimetabolite 5-FU; a MEK1 inhibitor proliferation and viability of cancer cells may become anticancer agents.The cell cycle has long been a focus of attention and remains an important target, as many compounds are known to show anticancer activity by affecting the cell cycle distribution of cancer cell populations. 1Flow cytometry is often used to observe cell cycle progression, but it requires fixing the cells and nuclear staining with a DNA-staining dye such as propidium iodide.In contrast, the development of a fluorescent ubiquitination-based cell cycle indicator (Fucci) has made it possible to observe the cell cycle in real time using a fluorescent microscope.Fucci is a fluorescent probe that visualizes cell cycle progression by fusing a red fluorescent protein called monomeric Kusabira-Orange2 (mKO2) or mCherry to Cdt1

ab
Training data were used for 10-fold cross-validation.Evaluation indices are the percentages of accuracy.The accuracy, sensitivity, specificity, and AUC of the random forest model were evaluated using validation data.TA B L E 1Accuracy of five prediction models by 10-fold cross-validation (upper panel) and the evaluation of random forest model using validation data (lower panel).and MCF7 cells can be roughly estimated.We are now trying to develop a tumor type agnostic cell cycle prediction model.In order to achieve this, it will be necessary to examine various cell types other than HeLa and MCF7 cells to identify common explanatory variables for use in cell cycle prediction.Actually, major explanatory variables selected in the current prediction models with HeLa-and MCF7 datasets commonly included variables that are clearly biologically related to the cell cycle, such as the fluorescence intensity of nuclei.The identification of such variables will enable us to develop a versatile prediction model.In addition, some textural features such as "Nucleus Radial Mean SER-Edge," "-Valley," and "-Dark" were also highly ranked in both datasets, suggesting their usefulness in cell cycle prediction.Texture analysis is expected to play an important role in extracting meaningful morphological features, and the radial mean is the mean object radius based on the intensity value weighted by the distance from the mass center; however, the causal relationships between these features and cell cycle progression, as well as the biological meaning of these textural features, remain to be elucidated.In summary, the current prediction model allows us to estimate the effect of anticancer drugs on cell cycle in HeLa and MCF7 cell lines, and identification of common features related to cell cycle progression suggested the possibility of a versatile prediction model by increasing recipient tumor cell lines used for training and fine-tuning the prediction model.By realizing these, it will be possible to apply them to patient-derived cells to diagnose their anticancer drug sensitivity ex vivo.AUTH O R CO NTR I B UTI O N S Participated in research design: Kazuma Takeuchi, Yumiko Nishimura, Masaaki Matsuura, and Shingo Dan.Conducted experiments: Kazuma Takeuchi, Yumiko Nishimura, Sho Isoyama, and Takayoshi Matsubara.Performed data analysis: Kazuma Takeuchi, Yumiko Nishimura, Takayoshi Matsubara, Sho Isoyama, Asuka Suzuki, and Masaaki Matsuura.Wrote or contributed to the writing of the manuscript: Kazuma Takeuchi, Masaaki Matsuura, and Shingo Dan.ACK N OWLED G M ENTSWe are grateful to Drs.Takeshi Imamura (Ehime University, Ehime, Japan), Asako Sakaue-Sawano, and Atsushi Miyawaki (RIKEN, Saitama, Japan) for providing HeLa/Fucci (RCB2812) and MCF7/ F I G U R E 2 Box plot of total fluorescence intensity and nuclear area at the top of feature importance using validation data.Box plot analyses of fluorescence intensities (A, C) and areas (B, D) of the nuclei classified in G1-and S/G2/M populations by Fucci system in HeLa (A, B) and MCF7 (C, D) dataset.The p-value was calculated by the Wilcoxon rank sum test.*Indicates p < .001.G1, Gap 1 phase; G2, Gap 2 phase; M, mitotic phase.