Optimizing the widely used nuclear protein‐coding gene primers in beetle phylogenies and their application in the genus Sasajiscymnus Vandenberg (Coleoptera: Coccinellidae)

Abstract Advances in genomic biology and the increasing availability of genomic resources allow developing hundreds of nuclear protein‐coding (NPC) markers, which can be used in phylogenetic research. However, for low taxonomic levels, it may be more practical to select a handful of suitable molecular loci for phylogenetic inference. Unfortunately, the presence of degenerate primers of NPC markers can be a major impediment, as the amplification success rate is low and they tend to amplify nontargeted regions. In this study, we optimized five NPC fragments widely used in beetle phylogenetics (i.e., two parts of carbamoyl‐phosphate synthetase: CADXM and CADMC, Topoisomerase, Wingless and Pepck) by reducing the degenerate site of primers and the length of target genes slightly. These five NPC fragments and 6 other molecular loci were amplified to test the monophyly of the coccinellid genus Sasajiscymnus Vandenberg. The analysis of our molecular data set clearly supported the genus Sasajiscymnus may be monophyletic but confirmation with an extended sampling is required. A fossil‐calibrated chronogram was generated by BEAST, indicating an origin of the genus at the end of the Cretaceous (77.87 Myr). Furthermore, a phylogenetic informativeness profile was generated to compare the phylogenetic properties of each gene more explicitly. The results showed that COI provides the strongest phylogenetic signal among all the genes, but Pepck, Topoisomerase, CADXM and CADMC are also relatively informative. Our results provide insight into the evolution of the genus Sasajiscymnus, and also enrich the molecular data resources for further study.


| INTRODUC TI ON
Selecting appropriate molecular markers is crucial for inferring phylogenetic relationships and evolutionary history at different taxonomic levels. From a perspective of sequence data acquisition, mitochondrial genes and rDNA genes can be easily handled using universal primers and standard PCR (Caravas & Friedrich, 2013;Lin & Danforth, 2004). However, base compositional heterogeneity and among-site rate variation of mitochondrial genes and alignment problems and slow rates of evolution in the case of rDNA genes can have a negative impact on phylogenetic tree inference (Bernt et al., 2013;Cameron, 2014). Nuclear protein-coding (NPC) markers possess features that make them suitable for phylogenetic reconstruction.
They are less prone to biased base composition than mitochondrial genes. Due to their distinct variation in the evolutionary rate (Lin & Danforth, 2004;Winkler et al., 2015), they have broad phylogenetic utility (Evangelista et al., 2019;Meusemann et al., 2010;Peters et al., 2017;Wipfler et al., 2019). A potential problem with NPC markers is that they are often more difficult to amplify than traditional markers via standard PCR. Usually, nested PCR and Hemi-nested PCR amplification strategies are employed to obtain NPC genes by using degenerate primers (Shen, Liang, & Zhang, 2012;Shen, Liang, Feng, Chen, & Zhang, 2013;Wild & Maddison, 2008). Nonetheless, the amplification success rate of degenerate primers is low, and they tend to amplify nontargeted regions (Lin & Danforth, 2004;Pistone, Mugu, & Jordal, 2016). Although primer development and amplification across taxa for NPC markers can be relatively difficult compared to mitochondrial and rDNA markers, some researchers argued that improved primer design may be helpful to obtain targeted regions and sequencing regularity at appreciable levels (Pistone et al., 2016;Winkler et al., 2015).
The genus Sasajiscymnus Vandenberg belongs to the tribe Scymnini in the family Coccinellidae. The species of this genus are small and can be recognized by 9 antennomeres, a carinate prosternal process, a 1st abdominal ventrite with incomplete postcoxal lines, and 3 tarsomeres tarsi (Chapin, 1962;Sasaji, 1971;Vandenberg, 2004). They are specialized predators of hemipteran pests such as aphids, mealybugs, and scale insects (Conway, Culin, Burgess, & Allard, 2010;Pang & Gordon, 1986). Up to the present, sixty-three species of this genus were described from Asia, Africa, and Oceania (Sasaji & McClure, 1997). Only a few species of Sasajiscymnus have been included in phylogenetic studies of Coccinellidae so far. Based on results of phylogenetic analyses of a combination of molecular and morphological data, Seago, Giorgi, Li, and Ślipiński (2011) proposed an affinity between Sasajiscymnus and Nephus, but with limited support. Subsequently, Robertson et al. (2015) presented a phylogeny of Cucujoidea based on molecular data and including 87 coccinellid terminals. They recovered Sasajiscymnus and Scymnus + Nephus formed a clade also with limited support. Che et al. (2017), in a phylogenomic study based on 95 NPC genes, recovered closely related between Sasajiscymnus and Axinoscymnus + Nephus. Due to only a single species of Sasajiscymnus included in the above-mentioned analyses, the monophyly of the genus has not been tested.
In this study, we optimized five NPC fragments widely used in beetle phylogenetics by reducing the degenerate site of primers and the length of target genes slightly (Escalona et al., 2017;Wild & Maddison, 2008). These five fragments combined with three mitochondrial genes (COI, 12S, 16S) and three nuclear genes (18S, 28S, H3) were amplified by polymerase chain reaction (PCR) to address the phylogenetic relationships of Sasajiscymnus, using maximum likelihood and Bayesian inference. We also generated a fossil-calibrated time tree to assess the time of origin and the pattern of diversification. A phylogenetic informativeness profile was also generated to evaluate the phylogenetic properties of the individual genes.

| Sampling
The study material was obtained from the Insect Collection, Department of Entomology, South China Agricultural University (SCAU). Specimens were collected from China by sweeping net in 2010. The samples were either stored in 95% ethanol or dried and pinned. Eight species of Sasajiscymnus were newly sequenced.
Two additional species included in previous studies (Escalona et al., 2017;Robertson et al., 2015) were added as in-group taxa.
Six species of Nephus (sequences were newly generated for two species and obtained from GenBank for four others) were used as outgroup as possible sister group (Seago et al., 2011) or at least closely related (Che et al., 2017;Robertson et al., 2015) of the Sasajiscymuns. Two species of Endomychidae were included as distant outgroup taxa to root the trees. A total of 19 species were included in the analyses (Table S1).

| DNA extraction, PCR, and sequencing
Total genomic DNA was extracted using a DNeasy Blood and Tissue kit (Qiagen, China) and following the manufacturer's protocol. The dried specimens were pretreated with 0.9% NaCl buffer before DNA extraction according to the previous study (Huang, Xie, Liang, Wang, & Chen, 2019). Thirteen gene fragments were  Table S2.
For the PCR, we followed protocols by Wild and Maddison (2008), Robertson et al. (2013), and Escalona et al. (2017). Product yield, specificity, and potential contamination were monitored by agarose gel electrophoresis. DNA fragments were sequenced in both directions with sufficient overlap to ensure the accuracy of sequence data.

| Primer design and sequence alignment
We used the program Primer 3 (Rozen & Skaletsky, 2000) in Geneious 7.1.4 (Kearse et al., 2012) to optimize the primers for CADXM, CADMC, TOPO, WGL, and PE ( Figure 1). The sequences of each fragment of different coccinellid taxa were downloaded from GenBank as templates (Table S3). For each primer pair, the following parameters were set: The oligomer ranged from 18 bp to 28 bp, GC content ranged from 30% to 70% and the Tm value ranged from 50 to 65.
All sequences were initially assembled to contigs and examined using Geneious (Kearse et al., 2012). Sequences of each protein-coding gene were aligned individually based on codon-based multiple alignments using the MAFFT algorithm implemented in Geneious.

| Phylogenetic analyses
We used maximum likelihood (ML) and Bayesian inference (BI) approaches to explore their phylogenetic relationship. Before the analyses, PartitionFinder 2.1.1 (Lanfear, Frandsen, Wright, Senfeld, & Calcott, 2016) was used to determine the best-fit partition scheme and corresponding models of nucleotide substitution for the concatenated data with the greedy algorithm and corrected by Akaike Information Criterion (AICc). ML analysis was conducted in RAxML 8.2.8 (Stamatakis, 2014) with the 1,000 rapid bootstrapping replicates on the CIPRES Science Gateway (Miller, Pfeiffer, & Schwartz, 2010). BI analysis was implemented in MrBayes 3.2.6 (Ronquist & Huelsenbeck, 2003) with four chains (three heated chains and one cold chain) running for 30 million generations and sampling every 10,000 generations. We discard the first 25% of trees as "burn-in" and the remaining trees were used to generate a majority-rule consensus tree. The analysis was carried out on the CIPRES Science Gateway.

| Divergence time estimation
Divergence times of the Sasajiscymnus were estimated with BEAST 1.8.4 (Drummond, Suchard, Xie, & Rambaut, 2012) based on the concatenated sequence data through the CIPRES Science Gateway.
All partitions were treated as unlinked for substitution models but as linked for clock models and trees. An amber fossil of the genus Nephus (Nephus subcircularis) from the Lowermost Eocene was used as a calibration node (Kirejtshuk & Nel, 2012). We conservatively assigned a minimum age of 40 Myr to the crown node of Nephus.
The prior estimate of divergence date was specified using lognormal distributions, with a mean of 30 and log (stdev) of 0.75, which is consistent with the dating analyses of Coleoptera according to McKenna et al. (2015) and Toussaint et al. (2017). We performed analyses using the Yule branching process prior, assuming an uncorrelated lognormal distribution clock model. The Markov Chain Monte Carlo (MCMC) analysis was conducted using two independent runs of 50 million generations sampled every 10,000 generations. Tracer 1.6 was used to examine the trace plots and effective sample size (ESS) value to ensure that each parameter had been appropriately sampled (ESS > 200). We used the program Treeannotator v1.8.4  to obtain the maximum clade credibility, after removing the first 25% of topologies as "burn-in" timetree for viewing in the program Figtree v1.4.0 (Rambaut, 2009).

| Phylogenetic informativeness estimation
We used phylogenetic informativeness (PI) profiles to quantify the relative contribution of each partition to the resulted tree as implemented in PhyDesign (http://phyde sign.towns end.yale.edu/) (López-Giráldez & Townsend, 2011). An ultrametric tree file and an alignment were required for estimating phylogenetic informativeness. The input ultrametric tree was obtained from the above-mentioned divergence time analysis. The peak of the PI distribution is suggested to predict the maximum phylogenetic informativeness for the corresponding partition. We used the concatenated DNA alignment as the input alignment with each gene as a partition.

| Estimation of divergence time and phylogenetic informativeness
The resulting consensus topology from the molecular clock calculation was largely identical with the Bayesian consensus tree from MrBayes and the ML tree obtained with RAxML ( Figure 3) In order to assess the relative contributions of the individual genes to the phylogeny, the PI profiles are plotted overtime against the chronogram (Figure 4). Phylogenetic informativeness profiles with relatively low and gradual peaks can be seen in 18S and pro-  Wild and Maddison (2008) in a phylogenetic analysis of beetles, and they were applied to process the phylogenetic inference of Coccinellidae on the genus level (Escalona et al., 2017). In our PCR-based experiment, the amplification success rate of these genes was poor before optimization, especially for dry pinned specimens (data not shown). The novel primers produced a discrete band of the expected size and showed a good performance for these five NPC fragments. The redesigned primers can distinctly facilitate the acquisition of molecular data in phylogenetic studies of other ladybird beetle lineages when using these NPC fragments.
Although our enhanced test of the monophyly of Sasajiscymnus with six added "traditional genes" clearly provided robust support for the status of this genus as a clade, we cannot draw a conclusion that this genus was monophyletic due to our limited taxon sampling from other distribution regions and the type species has not been included in this study. Nephus was also confirmed as a monophyletic group, as in a previous study based on COI, 12S, 16S, 18S, and 28S (Magro, Lecompte, Magné, Hemptinne, & Crouau-Roy, 2010). An affinity between Sasajiscymnus and Nephus was revealed by simultaneous analyses of molecular and morphological data, but with limited support (Seago et al., 2011). Nevertheless, other studies recovered the genus Sasasjicymnus either as sister group of Scymnus + Nephus (Robertson et al., 2015) or of Axinoscymnus + Nephus (Che et al., 2017). Due to the lack of species of Axinoscymnus and Scymnus in our sampling, we cannot draw a conclusion on the sister group of Sasajiscymnus.  (Mulcahy et al., 2012;Shen et al., 2015). Battistuzzi, Filipski, Hedges, and Kumar (2010) also pointed out that analyzing multiple loci can improve the precision of posterior time estimation using this approach. Therefore, future studies should increase the number of specimens and genes to improve the robustness of divergence time estimation of Sasajiscymnus, to elucidate evolutionary processes in this genus and related groups, including biogeographic patterns, species diversification, and phenotypic evolution.
Quantifying the relative contribution of the genes by using phylogenetic informativeness profiles is important for selecting appropriate molecular markers to conduct phylogenetic analyses on different taxonomic levels. Our analysis indicates that COI provides the strongest phylogenetic signal for the addressed issue (Figure 4).
In previous studies on the entire Coccinellidae, COI was commonly used and confirmed as informative (Escalona et al., 2017;Magro et al., 2010;McKenna et al., 2015;Robertson et al., 2015;Seago et al., 2011). Our results showed that the NPC fragments (except for WGL) are relatively informative and thus contribute to the reconstruction of the genus-level relationships. Moreover, it is likely that the performance would be increased if these sequence data could be obtained for each included taxon. Among four rRNA genes, 28S rRNA was most informative, followed by 16S and 12S, and finally 18S rRNA. This result is in agreement with Yang and Zhang (2015), who showed that 28S rRNA was very informative in phylogenetic analyses of Nymphalidae (Lepidoptera). Based on these results, we suggested that widely used NPC genes and "traditional markers" such as COI and 28S rRNA can be combined in future phylogenetic analyses of Coccinellidae.

| CON CLUS IONS
We propose the genus Sasajiscymnus may be monophyletic but con-

ACK N OWLED G M ENTS
We would like to express our great appreciation to Prof. Rolf Beutel (Friedrich-Schiller-Universität Jena, Germany) for his valuable comments on the early draft of this manuscript. We are also grateful to three anonymous reviewers for their helpful suggestions, which improved the manuscript. The present study was sup- funding acquisition (lead); investigation (lead); project administration (lead); supervision (lead); writing-original draft (lead).

DATA AVA I L A B I L I T Y S TAT E M E N T
The DNA sequences reported in this study have been deposited in GenBank under accession number: MT096438-MT096511.