Matrix decomposition in meta‐analysis for extraction of adverse event pattern and patient‐level safety profile

Abstract The purpose of assessing adverse events (AEs) in clinical studies is to evaluate what AE patterns are likely to occur during treatment. In contrast, it is difficult to specify which of these patterns occurs in each patient. To tackle this challenging issue, we constructed a new statistical model including nonnegative matrix factorization by incorporating background knowledge of AE‐specific structures such as severity and drug mechanism of action. The model uses a meta‐analysis framework for integrating data from multiple clinical studies because insufficient information is derived from a single trial. We demonstrated the proposed method by applying it to real data consisting of three Phase III studies, two mechanisms of action, five anticancer treatments, 3317 patients, 848 AE types, and 99,546 AEs. The extracted typical treatment‐specific AE patterns coincided with medical knowledge. We also demonstrated patient‐level safety profiles using the data of AEs that were observed by the end of the second cycle.


| INTRODUCTION
In order to use medications properly, it is imperative to assess the adverse events (AEs). This assessment is made concomitantly with efficacy evaluations in clinical studies. The purpose of assessing AEs in clinical studies is to evaluate the AE patterns that are likely to occur during treatment. The occurrences of AEs are summarized per treatment, and a meta-analysis is often applied to the summarized AE data. [1][2][3][4] However, such a population-level method does not predict the occurrences of AEs at the patient level, and thus the meta-analysis cannot be used to determine a patientlevel treatment strategy. At the patient-level, a predictive model including Poisson-or logistic regression uses the occurrence of an AE as an outcome and data such as age, weight, and biomarkers as predictors. 5,6 In general, however, hundreds of different AE types are observed in clinical trials, with low occurrence frequency. It is impractical to predict all AE types, even though the data of patients characteristics are available. At the patient level, an AE pattern that is likely to occur is sufficient information to support the treatment strategy, even though the occurrence of each AE type cannot be predicted. Therefore, it is important to know which AE pattern occurs at the patient level. We refer to the extent to which a patient has each pattern component as "patient-level safety profile".
To obtain AE patterns and patient-level profiles, this paper utilizes co-occurrence relationships of patient-level AEs. For example, among the AEs of anticancer treatments, "vomiting" and "nausea" tend to occur simultaneously. Statistical models dealing with such co-occurrences include matrix decomposition which could probably extract the "vomiting and nausea pattern." In other fields, such as natural language processing, matrix decomposition with nonnegative constraints on the elements (nonnegative matrix factorization, NMF 7,8 ) is often applied to nonnegative data such as the number of occurrences. For example, one study applied NMF to the number of occurrences of words in web postings or product labels for drugs and was able to extract patterns. 9 To the best of our knowledge, however, NMF has not yet been applied to patient-level AEs. Meanwhile, a previous study applied factor analysis, a form of matrix decomposition, to data from approximately 200 patients in a single clinical study and focused on fewer than 20 AEs of interest. Due to the small amount of data, however, only two patterns were extracted and were not used for construction of the patient-level safety profiles. 10 Moreover, factor analysis contains strong constraints on the factor score matrix which can cause several issues. For example, the estimation results contain negative values, making the interpretation of patterns and safety profiles difficult.
The aim of this paper is to develop a method which identifies treatment-specific AE patterns and constructs patient-level safety profiles. We used patient AE information gathered from treatment onset to a predetermined time point. In this way, we could know the profiles and predict which AEs were likely to occur from the aforementioned time point to the end of treatment.
To this end, we intended to apply NMF to the AE data. However, this could not be realized without developing a new statistical model expanding NMF. Because AEs in clinical studies include both disease-specific and treatmentspecific events, we needed to build a model to separate them. Moreover, AE data from clinical trials includes severity (ordinal categorical data) and the information of AEs varies with severity; hence, our model was constructed to avoid losing this information. Also, because a single clinical study does not contain sufficient data for our desired analysis, we included the assumption that treatment-specific AE patterns are similar for drugs with the same mechanism of action. This allowed us to use a sufficient number of patients from multiple clinical studies within a meta-analysis framework. These expansions enabled us to use actual data to extract AE patterns and demonstrate patient-level safety profiles.
In Section 2, an example of patient-level AEs data handled in this paper, our statistical model, and patient-level safety profile are described. In Section 3, the results from application of these methods are presented. In Section 4, we summarize and discuss our findings.

| DATA AND ANALYSIS METHODS
Although our method was widely applicable to clinical study data with patient-level AEs, specific data was explained earlier to understand the proposed model in Section 2.2.

| Data
Three datasets (registration numbers 118, 120, and 127) were chosen from the Project Data Sphere, which contains patientlevel AE information. Each dataset corresponds to one Phase III study and has only control arm data. These datasets were selected because they were derived from three studies with the highest number of participants. They consisted of breast cancer patients who had undergone chemotherapy. The dosages and disease conditions slightly differed among the studies. Nevertheless, these discrepancies were not considered because we assumed that they had negligible influences on the analysis.
The datasets included AEs that occurred during baseline or treatment periods. Their severities were defined by the Common Terminology Criteria for Adverse Events (CTCAE, 1: mild, 2: moderate, 3: severe, 4: life-threatening, and 5: death) and ranged from 1 to 4. The baseline data represent AEs occurring during the pre-treatment period (from registration to randomization). This type of data is generally recorded. In this paper, because alopecia occurred in more than 70% of the patients, its pattern was clear and it was excluded from the analysis. Patients without AEs were also excluded, because their safety profiles were estimated to be zero and their data did not contribute to the estimation of AE patterns. There were 3317 patients and 99,546 AEs analyzed.
The outline of each arm included in the datasets is shown in Table 1. The index p is defined for the treatment periods to divide AEs by treatment. For example, a clinical study with s = 1 includes one arm named "AC ! T," in which 1625 patients were enrolled and treated with four cycles of AC (doxorubicin + cyclophosphamide) followed by four cycles of T (docetaxel). p = 1 represents the baseline period, and the AEs occurring during this period are stored in the matrix Y 1 as described below. p = 2 represents the treatment period using AC, and the AEs that occurred during this period are stored in Y 2 . Similarly, p = 3 represents the treatment period using T. The number of patients decreases to 1583 in p = 3 because 42 patients dropped out of the study. We also define the treatment index and the mechanism index to facilitate the description of equations. The index of the periods was p = 1, …, 10, the index of the studies was s {1,2,3}, the index of the treatments was t {1, …, 5} and the index of the mechanisms of action was m {1, 2}. The study and the treatment were determined by period p and the mechanism of action was determined by treatment t. Therefore, they were suffixed such as s p , t p , and m t p when they were accessed using p. For example, s 3 = 1, t 3 = 5 and m 5 = 2.
In this paper, AEs with the same name but different severities were distinguished. The combination of name and severity was treated as "AE type". For example, VOMIT with severity 2 is "VOMIT_2." Consequently, the total number of AE types was J = 848.
These datasets included the number of occurrences of each AE type per cycle. The number of occurrences was summarized per period p and the result was expressed as N p × J matrix Y p (p = 1, …, 10), where N p is the number of patients who participated in period p. Y p was a sparse matrix in which many elements were 0. Y p ð Þ ij , which is the (i, j) element of Y p , represents the number of occurrences of AE type j for patient i during period p.

| Statistical model
Here, the statistical model for generating Y p (p = 1, …, 10) is explained. Scalars are in italics, vectors are in bold font, and matrices are in bold italics.
Hereafter, one period p was selected and fixed. Y p was written as Y, N p as N, s p as s, t p as t, and m t p as m. The patient index was written as i and the AE type index was written as j.
Because Y was count data, it was assumed that each element of Y independently followed the following Poisson distribution: where C i is a constant representing the length of the treatment period of patient i (i.e., the number of cycles in the aforementioned data). Note: s represents the index of the studies and p represents the index of the periods. A is doxorubicin, C is cyclophosphamide, F is 5-fluorouracil, M is methotrexate, and T is docetaxel. AC, FAC, and CMF represents combination therapies. t represents the index of the treatments and m represents the index of the mechanisms. Docetaxel is a taxane-based anticancer drug that binds to microtubules and inhibits mitosis. The other drugs inhibit DNA or RNA synthesis and are often used in combination. Their AEs are relatively similar. Therefore, two mechanisms of action were assumed here, namely, inhibition of nucleic acid synthesis and inhibition of mitosis. Cycles represent the lengths of the treatment periods.
λ ij was separated into β t ð Þ ij derived from treatment t and α s ð Þ ij derived from study s. In other words, where α s ð Þ ij is a parameter per study representing the occurrence intensity of each AE caused by disease, radiotherapy, etc. For the baseline data, only α s ð Þ ij was considered and it was assumed that β The occurrence patterns of AEs are usually limited. 11 Therefore, it was assumed that they consisted of a combination of relatively few patterns with little information loss and the matrix decomposition was applied as follows: where K and L are J and represent the numbers of AE type occurrence patterns. ϕ and η are common to all patients and represent the occurrence intensities of all AE types determined for each pattern. θ and ξ represent the extent to which each patient has each pattern component. Expressing a matrix by the product of low-rank matrices generally facilitates interpretation and improves prediction accuracy by eliminating noise. The decomposition of β into θ and ϕ is not unique. The constraints for θ and ϕ make them identifiable except the order of the components and make the estimation possible. For example, the constraint that θ and ϕ are orthogonal matrices makes matrix decomposition equivalent to principal component analysis. However, the approximation of Y is poor because orthogonality is a strong constraint. It is difficult to interpret the result because certain elements of θ and ϕ are negative. Moreover, some λ ij , which represents the average number of occurrences, may also be negative. Therefore, in this paper, the constraint that all elements of θ and ϕ are nonnegative was added (NMF). θ (t) was reparametrized as follows: whereθ t ð Þ is an N × K matrix, and the exp function operates on each element and returns a matrix. For ϕ (t) , background knowledge of treatments was used, and it was assumed that ϕ (t) is constructed as follows: where Φ t ð Þ k represents the row vector of the k-th row of ϕ (t) . u = softmax(v) converts vector v into vector u whose elements are nonnegative and have a sum of 1, using the following formula: Equation (6) facilitates comparisons between treatments because the sum of J elements of each row (pattern) of ϕ (t) becomes 1. M (m) is a K × J matrix representing the mean effect of the mechanism of action m. R (t) is a K × J matrix representing the random effect of treatment t. M (1) and M (2) were considered for all data. For "AC," "FAC," "A," and "CMF," which inhibit nucleic acid synthesis, the random effects of R (1) , R (2) , R (3) , and R (4) were considered, respectively (see Table 1).
Similar constraints were assumed for α (s) as in β (t) . ξ (s) was reparametrized as follows to make each element nonnegative: η (s) was defined using L × J matrix S (s) as follows: If AEs caused by disease were of interest, S s ð Þ l could be divided into mean-and random effects as for the treatments. In an occurrence pattern, AE types with the same name and similar severities (e.g., NAUSEA_2 and NAUSEA_3) should have similar occurrence intensities. Therefore, M (m) , R (t) and S (s) were constrained using a NDLM (normal dynamic linear model), 12 which is often applied in dose-response curves: where J is a set of the combination of AE types j and j 0 satisfying the conditions that j and j 0 have the same AE name, their severities are adjacent, and j < j 0 .
The addition of a constant to each element of the argument vector does not change the output of the softmax function. Therefore, M (m) , R (t) and S (s) were constrained as follows so that they could be identified and estimated: where J is a set of AE type j satisfying the condition that j has the lowest severity in each AE name. Considering Equations (1)- (14) for all data Y p (p = 1, …, 10), the log of the posterior probability was constructed as follows: where the softmax function operates on each row of a matrix and returns a matrix, Poisson(y j λ) represents the probability mass at y of the Poisson distribution with parameter λ, and N y j μ, σ 2 ð Þrepresents the probability density at y of the normal distribution with parameters μ and σ 2 . In the Appendix A, we explain that the models and Equation (15) can be simplified under certain conditions.

| Estimation
, ξ (s) , and S (s) were estimated by maximizing the log-probability logP in Equation (15). σ G , σ M , σ R , and σ S could also be estimated if there was sufficient data. In this paper, they were regarded as hyperparameters and assigned fixed values. Although the number of parameters is large, the parameters are identifiable because matrix decomposition is an informative constraint, 7 and Equations (6), (8)- (14) do not cause the loss of identifiability.
The software used for this estimation were Stan 2.17. 13 Maximum a posteriori (MAP) estimation was performed using the limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) algorithm. The Stan and R 14 programs are available on GitHub, 15 and statisticians can use our method by modifying parts of the programs.

| Hyperparameter determination
The number of patterns K and L were hyperparameters that had to be fixed before estimation. Although there are several determination methods, the technique applied here ensured hyperparameter interpretation according to background information. If the number of patterns was too small, several were combined and extracted as one pattern. As a rule, however, this manipulation should be avoided. Conversely, if the number of patterns was large, similar patterns were extracted.
Other determinations were based either on predictive performance or information criteria, such as the Akaike Information Criterion (AIC) and the Widely Applicable Information Criterion (WAIC). 16 On the other hand, one study indicated that results derived from the number of patterns determined by these methods do not align with the actual interpretation. 17 In this paper, the hyperparameters were determined as follows. The small value L = 2 was selected because there are far fewer AEs associated with the disease itself than those originating from anticancer treatments used in the clinical studies. The number of detectable patterns associated with the disease was assumed to be small. In fact, in the data in Section 2.1, less than 10% of the total AEs could be attributed to the disease. K = 15 was set because similar patterns occur when K is assigned higher values. For actual clinical studies, L and K must be determined after trying several values and discussing the results.
σ G = 1.5 because the number of severities was as low as 1 to 4. σ M = 5 and σ S = 5 were sufficiently large to express any pattern and σ R = 0.5 because it was assumed that the magnitude of the random effect was about 10% that of the mean effect.

| AE patterns
The estimate of ϕ (t) is a set of AE type patterns specific to treatment t. Φ t ð Þ k is pattern k and a vector of occurrence intensities of all AE types. AE types most likely to occur in pattern k are readily confirmed by rearranging its intensities in descending order and plotting them in a table or graph. Patterns where severe AEs are most likely to occur can also be confirmed.
The estimate of θ t ð Þ i represents responsiveness of patient i to treatment t, and it provides condensed information of Y i through the patterns. θ  When prompt predictions are needed during a new clinical study, diverting the already estimated ϕ (t) (e.g., our estimated ϕ (t) ) enables rapid θ t ð Þ i 0 estimation. Assuming that ϕ (t) are sufficiently close to the true values and the influence of each study (i.e., α s ð Þ i 0 j ) is sufficiently small, we maximize the following log-probability to estimate θ t ð Þ i 0 : where C is a constant representing the length of the treatment period of patient i 0 . The posterior distribution may also be estimated by setting noninformative prior distributions or appropriate prior distributions forθ The estimate of θ t ð Þ i 0 and Equations (2)-(4) indicate that λ i 0 j can be calculated. λ i 0 j is the mean number of occurrences of AE type j in a future treatment period. AE types with high λ i 0 j must be carefully monitored.

| RESULTS
We applied the method described in Section 2.2 to the data in Section 2.1 for AE pattern extraction and patient-level safety profile estimation. The results corresponding to Sections 2.3 and 2.4 are explained below.

| AE patterns
Here, the ϕ (t) patterns of treatment AC (t = 1) are shown. For the patterns of all treatments, see Table S1.
The 15 patterns of ϕ (1) were numbered in descending order of the log-probability increase. The results of patterns 1 to 15 are shown in Table 2. Because the log-probability increase involving patterns 1 to 8 was greater than 80% of that for all patterns, we focused on the results from patterns 1 to 8.
For instance, pattern 2 is a "nausea and vomiting pattern" in which nausea and vomiting of severities 2 and 3, respectively, tended to occur simultaneously. However, nausea and vomiting of severity 1 were segregated into patterns 1 and 5 because the numbers of their occurrences were very large. Pattern 4 was interpreted as a "constipation pattern" and included constipation, stomatitis, and weight increase. Pattern 6 was the "neuropathy pattern" involving taste perversion, diarrhea, insomnia, and lacrimation disorder. Pattern 7 was another "neuropathy pattern" with nausea of severity 2, nail disorder, vasodilation, anorexia, and weight decrease.
Next, we confirmed the responsiveness of the 2106 patients being administered AC. The distributions of θ

| Patient-level safety profile
We demonstrated patient-level safety profiles using the method in Section 2.4. Suppose that four new patients (i  data of the four patients were assumed to be the same as i = 548,338,871,530, who received AC, respectively (see Data S1). The estimated θ  Note: The unit of occurrence intensity was converted to % because the sum of occurrence intensities per pattern was set to 1 by the softmax function. The top eight AE types whose occurrence intensities are more than 1% are presented for each pattern. We also show the top two AE types with severity 3 or higher and whose occurrence intensities are more than 0.1% (bold letters).
an increase in θ  (2) and (3) and listed the AE types most likely to occur in the future (Table 3). Those AE types should be carefully monitored. Note: The distribution was partially discrete because the number of occurrences of AEs and the treatment periods (that is, the number of cycles) were integers here Our statistical model with patient-level AE data extracted AE patterns specific for each treatment. We clarified that our method allowed interpretable patient-level safety profiling in a clinical study such as the illustration of the data analysis in this paper. Our statistical model can use the number of occurrences of each AE in a patient to predict future AEs if the total number of occurrences is sufficiently large so that the estimated θ is close enough to the true value. The proposed method is also widely applicable to data obtained from clinical studies with patient-level AEs, regardless of disease or treatment. We believe that this paper is of great significance to the current situation where patient-level data is being released.
In Section 2.1, we excluded alopecia due to its common occurrence in patients. Such a frequent AE is called "stopwords" in the field of natural language processing (e.g., prepositions). It is common to remove such terms before matrix decomposition. 18 If alopecia were included, many patterns would have contained alopecia, which would prevent pattern extraction and pattern interpretation. Therefore, we decided to exclude alopecia from the analysis in this paper.
A limitation that was noted in our method is that K must be determined before analysis (Section 2.2.3). For example, if we used K = 10, instead of pattern 6 and pattern 7, a mixture of those patterns would have been extracted. We used K = 15 after discussion because we thought they can be separated. Although nonparametric Bayesian methods can estimate K, further research is needed to examine whether it is effective for our model.
In our model, θ :k are strongly correlated, a patient may be advised to opt for alternate treatments wherein severe AEs seldom occur based on the correlation. In this paper, however, there was no strong correlation between treatments. For this reason, it was difficult to predict AEs caused by a certain treatment based on the AEs induced by a different one.
For the sake of simplicity, we assumed that (1) the dosages of each drug were the same across clinical studies, (2) treatments were not changed within a cycle, and (3) the order in which the AEs occurred was insignificant.
T A B L E 3 Top 10 AE types and top 10 severe AE types with the largest λ