Optimization of a modeling platform to predict oncogenes from genome‐scale metabolic networks of non‐small‐cell lung cancers

Cancer cell dysregulations result in the abnormal regulation of cellular metabolic pathways. By simulating this metabolic reprogramming using constraint‐based modeling approaches, oncogenes can be predicted, and this knowledge can be used in prognosis and treatment. We introduced a trilevel optimization problem describing metabolic reprogramming for inferring oncogenes. First, this study used RNA‐Seq expression data of lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) samples and their healthy counterparts to reconstruct tissue‐specific genome‐scale metabolic models and subsequently build the flux distribution pattern that provided a measure for the oncogene inference optimization problem for determining tumorigenesis. The platform detected 45 genes for LUAD and 84 genes for LUSC that lead to tumorigenesis. A high level of differentially expressed genes was not an essential factor for determining tumorigenesis. The platform indicated that pyruvate kinase (PKM), a well‐known oncogene with a low level of differential gene expression in LUAD and LUSC, had the highest fitness among the predicted oncogenes based on computation. By contrast, pyruvate kinase L/R (PKLR), an isozyme of PKM, had a high level of differential gene expression in both cancers. Phosphatidylserine synthase 1 (PTDSS1), an oncogene in LUAD, was inferred to have a low level of differential gene expression, and overexpression could significantly reduce survival probability. According to the factor analysis, PTDSS1 characteristics were close to those of the template, but they were unobvious in LUSC. Angiotensin‐converting enzyme 2 (ACE2) has recently garnered widespread interest as the SARS‐CoV‐2 virus receptor. Moreover, we determined that ACE2 is an oncogene of LUSC but not of LUAD. The platform developed in this study can identify oncogenes with low levels of differential expression and be used to identify potential therapeutic targets for cancer treatment.

The optimization is rewritten as the following simplified formulation for easily explaining the NHDE algorithm.
Outer optimization problem: max ( ) subject to the inner optimization problems: The inner optimization problem consists of LP and QP problems, which is a sequential relationship.
The NHDE algorithm is a stochastic optimization based on hybrid differential evolution (HDE), which was extended from the original DE algorithm (Storn and Price, 1996;Storn and Price, 1997). The basic operations of original DE and modified NHDE are shown in Tab. S2-1. The detailed procedures have discussed by Wang (2017). The computational procedures of NHDE are listed in Tab. S2.2. NHDE is a parallel direct search algorithm (as shown in Fig.S2.1) that utilizes a population of Np individuals (enzymes) to find an optimal solution. The initialization process randomly generates Np individuals to cover the entire search space uniformly. Each individual in the population consists of a set of enzymes that are selected to be modulated.
The mutation operator of NHDE adopted from DE was an essential component compared with other evolutionary algorithms. Different from conventional evolutionary algorithms, the mutation operation of DE/NHDE uses the difference between two or four randomly chosen individuals as an evolutionary direction. The i th mutant individual (z G )i in generation G is obtained through the difference of two or four random individuals as expressed in the following form: in the equation is used to rounding the real vector into the integer vector. In DE, the differential mutation factor  G [0, 1.2] is fixed and set by the user to obtain faster convergence. This factor is used to control the step length along the searching direction. A random mutation factor was used in NHDE to obtain more diversified individuals. NHDE also includes an additional mutation strategy that applying a linear crossover for the i th individual and the best individual (z G )b to generate the parent individual. The parent individual is therefore expressed as follows: where the factor p G is a random number between zero and one generated by a uniform distribution generator, and The choice of mutation factor for DE/NHDE is heuristic and random. When population diversity is low, candidate individuals rapidly cluster together such that the individuals cannot be further improved, and premature convergence occurs. Similar to conventional evolutionary algorithms, the local population diversity could be increased by using a crossover operation such as a binomial crossover. NHDE use the difference between two or four mutually independent individuals to determine the direction of search and obtain a mutant individual. This differential mutation converges quickly so that most individuals cluster around the best candidate individual in some generations. Consequently, the population diversity and exploration capability diminish and clustered individuals are unable to reproduce more diversified individuals through the mutation operation because the weighted difference is nearly zero. The recombination of mutant individuals and their clustered parents further prevents the reproduction of a diversified population. Therefore, all individuals quickly cluster together and superior individuals cannot be generated through mutation and crossover operations.
The migration operation of the NHDE algorithm is used to help individuals escape from the local cluster, but this operation is performed only if the population diversity falls below a desired level. The degree of population diversity is introduced to check whether the migration operation should be performed. Each element of the i th individual (z G )i in generation G is referred to as a gene of the individual, and the gene diversity index dzji is given by 0, if , 1,..., ; 1,..., ; 1, otherwise, where zji G and zjb G are the j th gene of the ith and best individual at the G th generation, respectively. dzji is set to zero if the j th gene of the i th individual is identical to the best gene; otherwise it is set to one (Chiou and Wang, 1999;Liao, et al., 2001). is defined as the ratio of total gene diversities to the total number of genes other than those of the best individual: The value of population diversity degree ranges between zero and one. A value of zero implies that all of the genes are clustered around the best individual. On the other hand, a value of one indicates that current candidate individuals are a completely diversified population. The desired tolerance for population diversity is assigned by the user. A tolerance value of zero implies that the migration operation in NHDE is switched off, and one implies that the migration operation is performed at every generation. Consequently, the user can set a tolerance value for population diversity degree, (0, 1). If is smaller than , then NHDE performs migration operations to regenerate a new population in order to escape from a local point; otherwise, NHDE suspends the migration operation and maintains a constant search direction toward finding a new solution.  Figure S2-1. Flowchart of the parallel search algorithm in NHDE