Peer-comparison overconfidence: Does it measure bias in self-evaluation?
Correspondence: Professor Shu Li, Institute of Psychology, Chinese Academy of Sciences, 4A Datun Road, Chaoyang District, Beijing 100101, China. Email: email@example.com
Overconfidence is generally regarded as one of the most robust findings in the psychology of judgment. A precise method for evaluating overconfidence is essential if researchers are to validate these findings. Although peer-comparison questions are a convenient tool for measuring overconfidence, their validity has been questioned. We employed a specific paradigm to verify the validity, and the respondents were asked to predict a verifiable future event in a real-world setting that allowed empirical checking and comparison between the actual result and the prediction. Studies 1 and 2 found that the actual percentile of overconfidence could be accurately predicted using our initial calculation of participants’ peer-comparison overconfidence in answering questions about academic performance. Study 3 found a similar effect when using questions related to job hunting. All studies indicated that peer-comparison questions are valid for measuring bias in self-evaluation. Thus, future studies could employ peer-comparison questions to investigate the domain specificity versus the domain generality of overconfidence.
An artisan must first sharpen his tools if he is to do his work well.
The Analects of Confucius
Overconfidence is generally regarded as one of the most robust and reliable findings in the psychology of judgment (De Bondt & Thaler, 1995; Lichtenstein, Fischhoff, & Phillips, 1982) and is defined as a positive difference between confidence and accuracy. A survey of the literature indicates that people often overestimate their own actual ability, performance, level of control, or chance of success (Krueger, 1998; Mamassian, 2008; Marottoli & Richardson, 1998; West & Stanovich, 1997).
A precise evaluation of overconfidence is a prerequisite for establishing the reliability of these “reliable” results (Li, Chen, & Yu, 2006; Yates, Lee, & Bush, 1997; Yates, Lee, & Shinotsuka, 1996; Yates, Lee, Shinotsuka, Patalano, & Sieck, 1998). Research into overconfidence using general-knowledge questions1 has been somewhat controversial, raising doubts as to whether overconfidence is real or, rather, an artifact of factors using difficult questions, but not a representative set (overconfidence is more common for difficult items [Juslin, Winman, & Olsson, 2000]) of the sampling process (Gigerenzer, Hoffrage, & Kleinbölting, 1991; Juslin, 1994) or the result of using data analysis methods that bias the results (Erev, Wallsten, & Budescu, 1994; Soll, 1996; Wallsten & Gu, 2003). However, regardless of the conclusions that researchers eventually reach about the usefulness of general-knowledge questions, it would be very helpful to ascertain whether other tools will provide valid tests of the measurement of overconfidence.
One important component of overconfidence is termed overplacement, a phenomenon that has been examined in approximately 5% of the empirical overconfidence studies (Moore & Healy, 2008). Overplacement occurs when people believe themselves to be better than others. The peer-comparison question is a convenient tool for studying this effect (Lee et al., 1995; Li et al., 2006). Typically, for this type of question the subject is asked to make a percentile estimate about a specified population that consists of a random sample of people. An example of such items is the following:
Imagine a random sample of 100 University of Michigan students the same sex as you and who entered the University the same year you did. Assume that you yourself are one of those 100 students. Suppose that all 100 students in the sample are ranked according to MATHEMATICAL SKILLS (i.e., facility at solving various kinds of mathematics problems). What is your best estimate of the number of students in the sample (0–99) who would be MORE MATHEMATICALLY SKILLED than you are?:______ (Lee et al., 1995, pp. 63–69)
According to Lee et al. (1995), if the participants in general are neither overconfident nor underconfident, the group average of all individual estimates about their performance relative to that of their peers should be at the 50th percentile. Group-wide, over- or underconfidence would be indicated by a difference between the average subject's estimate of his or her percentile and 50%. Thus the statistic for peer-comparison overconfidence is
This seems reasonable because, if individuals in general are appropriately confident in their standing relative to that of their peers, their percentile estimates should be at the 50th percentile. For an average person to claim accurately that he or she is “above average” or even “below average” is statistically impossible (Ehrlinger, Johnson, Banner, Dunning, & Kruger, 2008; Lee et al., 1995). Although Larrick, Burson, and Soll (2007) argued that the better-than-average (perceived percentile) is a predictor of overconfidence, questioning the validity of this formula is also reasonable. First, Colvin and Block (1994) argued that judgments about abstract others (e.g., the average person) may not be an appropriate criterion for assessing bias in self-evaluation, because such judgments do not distinguish persons who are accurate in describing themselves from those who are inaccurate (Gramzow, Elliot, Asher, & McGregor, 2003), not to mention the more abstract others (“imagine a random sample of 100 University of Michigan students who have the same gender as you and entered the university the same year you did”) employed in peer-comparison problems. Second, general-knowledge overconfidence, which is usually regarded as a reliable index of positive bias despite being questioned by some researchers, as mentioned above, may have little relation to peer-comparison overconfidence (Lee et al., 1995). Although some researchers suggested that the two types of overconfidence may depend on different mechanisms (peer-comparison overconfidence might very well involve self-esteem considerations, whereas general-knowledge overconfidence is not based on affective processes implicit in self-esteem theories; Lee et al., 1995), doubting whether researchers can use peer questions to gain a core understanding of overconfidence is also reasonable. Nevertheless, despite doubts, the peer-comparison method is frequently used by researchers (Li et al., 2006; Li, Bi, & Zhang, 2009), but, as yet, no direct research has been done to test the method or its validity (Li et al., 2010).
Given the existing doubts about the validity of the peer-comparison method together with a complete absence of supporting data, any strong conclusions drawn from peer-comparison questions seem unwarranted. To address this issue, the current study employed a paradigm in which respondents were asked to predict an estimate of a verifiable future event in a real-world setting so that the actual result could be checked empirically and compared with the estimate. The logic behind this paradigm was that, if these peer-comparison questions accurately reflect the true nature of overconfidence then the degree of peer-comparison overconfidence should be mirrored by the gap between the self-evaluated percentile and the percentile of the actual outcome. In general, we hypothesized that peer-comparison overconfidence would be a positive predictor of actual overconfidence.
Study 1: Academic-performance-related peer comparisons
The participants were 126 sophomore students (mean age = 21.1 years, SD = 0.99 years; 103 men) from the College of Computer Science and Technology at Jilin University.
Procedure and materials
Near the middle of the semester (approximately 2 months before the final examination), the participants were asked to estimate what percentile their own term score would fall in for an average of four main courses (English, Data Structure, Discrete Mathematics, and Computational Methods) which would be assessed in the final examinations. The format of the question was similar to the format of the “Mathematical Skills” question mentioned above. Our question read as follows:
Imagine a random sample of 100 Jilin University students the same sex as you and who entered the University the same year you did. Assume that you yourself are one of those 100 students. Suppose that all 100 students in the sample are ranked according to an average of term score of four main courses (English, Data Structure, Discrete Mathematics, and Computational Methods). What is your best estimate of the number of students in the sample (0–99) who would receive a higher score than you do?:_____
When the final scores for the courses were available, we contacted the Teaching Secretary of the College and collected the objective performances of the participants.2 The Secretary was debriefed and thanked.
Results and analysis
Because the participants were required to compare themselves with their peers by asking them to compare with those who were “the same sex as you,” the subsequent analysis was based on sex groups.
We separately ranked the 103 male participants and the 23 female participants according to their objective performance. In order to be able to compare these to the self-evaluated percentile of the peer comparisons, the percentile in the sample was transformed using the following formula:
where N denotes the sample size for each sex group and ni denotes the total number of people who got a lower or a worse outcome than the participant in their own sex group.
Then the actual percentile of overconfidence was calculated using the following formula:
The actual percentile of overconfidence was computed using Formula (3), and the usual peer-comparison overconfidence calculation was performed using Formula (1). Table 1 contains a summary of the means of the participants’ self-evaluated percentile, the percentile of the actual outcome, the peer-comparison overconfidence, and the actual percentile of overconfidence for each sex group.
Table 1. Summary of the Descriptive Data. Means of Self-evaluated Percentile (SEP), Percentile of Actual Outcome (PA), Peer-comparison Overconfidence (PO), Actual Percentile of Overconfidence (AO), and the Percentage of Participants who Overestimated their Actual Percentile (POA) as a Function of Type of Question and Sex
|Study 1||Male||61.7% (0.22)||49.8% (0.29)||11.7% (0.22)||11.8% (0.27)||68.9%|
|Female||57.8% (0.21)||48.2% (0.29)||7.8% (0.21)||9.6% (0.30)||73.9%|
|Study 2||Field and Wave Electromagnetics||Male||47.8% (0.23)||51.2% (0.29)||2.2% (0.23)||−3.5% (0.25)||48.8%|
|Female||53.8% (0.20)||50.0% (0.29)||3.8% (0.20)||3.8% (0.24)||46.7%|
|Signals and Systems||Male||46.8% (0.23)||50.6% (0.29)||3.2% (0.23)||−3.8% (0.25)||48.8%|
|Female||51.9% (0.19)||49.1% (0.30)||1.9% (0.19)||2.8% (0.29)||50.0%|
|English||Male||55.8% (0.22)||52.1% (0.30)||5.8% (0.22)||3.7% (0.30)||55.0%|
|Female||55.7% (0.19)||51.4% (0.30)||5.7% (0.19)||4.2% (0.34)||56.7%|
|Analogical Electronics||Male||50.8% (0.22)||50.5% (0.29)||0.8% (0.22)||0.4% (0.26)||52.5%|
|Female||53.0% (0.21)||49.0% (0.30)||3.0% (0.21)||4.0% (0.30)||46.7%|
|Study 3||Day||Male||75.9% (0.18)||54.7% (0.33)||25.9% (0.18)||22.2% (0.35)||70.6%|
|Female||77.1% (0.18)||44.9% (0.32)||27.1% (0.18)||32.2% (0.46)||71.4%|
|Salary||Male||70.4% (0.22)||55.4% (0.24)||20.4% (0.22)||15.0% (0.25)||70.6%|
|Female||75.7% (0.15)||46.9% (0.27)||25.7% (0.15)||28.8% (0.29)||71.4%|
|Spend||Male||67.6% (0.27)||50.2% (0.30)||17.6% (0.27)||17.4% (0.32)||76.5%|
|Female||47.9% (0.29)||44.9% (0.33)||−2.1% (0.29)||3.0% (0.43)||42.9%|
|Interview||Male||67.9% (0.25)||50.2% (0.29)||17.9% (0.25)||17.7% (0.39)||70.6%|
|Female||71.7% (0.25)||44.9% (0.33)||21.7% (0.25)||26.8% (0.51)||57.1%|
The male participants overestimated their ability level by an average of 11.8% (actual percentile of overconfidence), and the peer-comparison percentile revealed an overestimation of 11.7% (peer-comparison overconfidence). Thus, no significant difference existed between the actual percentile of overconfidence and the peer-comparison overconfidence measurements, F(1, 102) = 0.003, p > .05. The relation between the peer-comparison overconfidence rating and the actual percentile of overconfidence was addressed using linear regression analysis, with the scores derived from the peer-comparison formula entered as the independent variable. The results indicated that the actual percentile of overconfidence could be closely predicted by the peer-comparison overconfidence, β = 0.309, t(102) = 3.27, p < .01.
A similar trend was seen in the female participants. No significant difference was found between the peer-comparison overconfidence (7.8%) and the actual percentile of overconfidence (9.6%), F(1, 22) = 0.085, p > .05, and the actual percentile of overconfidence was marginally predicted by the level of peer-comparison overconfidence, β = 0.356, t(22) = 1.75, p = .096.
In brief, these findings indicated that the participants who had a higher peer-comparison overconfidence level overestimated their percentile of academic performance to a greater degree.
Study 2: A replication of Study 1 with added items
Study 2 was a conceptual replication of Study 1, but also addressed an important limitation of Study 1. In Study 1, the participants were required to make only a single estimate of the mean self-evaluation percentile, which might deviate from the score derived from several estimates. Thus, the main objective of Study 2 was to see if the conclusion of Study 1 would hold true if an individual was required to make several estimates.
The participants were 110 sophomore students (mean age = 20.9 years, SD = 1.12 years; 80 men) from the School of Communication Engineering at Jilin University.
Procedure and materials
We used the same procedures as in Study 1 to obtain the self-evaluated percentiles and the final course scores.
The participants were asked to estimate the rank of their term scores for four main courses (Field and Wave Electromagnetics, Signals and Systems, English, and Analogical Electronics). This was different from Study 1, in which the participants were asked to estimate what percentile their own term score would fall in for an average of four main courses.
Results and analysis
The 110 participants were also separated based on sex and were ranked according to their actual scores within their sex groups, exactly as for Study 1.
First, we analyzed the four main courses separately, exactly as in Study 1 (see Table 2 for details). For each course, no significant difference was found between the actual percentile of overconfidence and the peer-comparison overconfidence measurements in either sex group. Apart from the figures for the female participants in the course of Field and Wave Electromagnetics, the level of peer-comparison overconfidence was shown to predict the mean actual percentile of overconfidence in each sex group. Many of these were replications of the results from Study 1.
Table 2. The Relationship between Actual Percentile of Overconfidence and the Peer-comparison Overconfidence Measurements as a Function of Type of Question and Gender in Study 2
|Field and Wave Electromagnetics||Male|| F(1, 79) = 0.144, p = .705|| β = 0.287|| t(79) = 2.64, p = .010|
|Female|| F(1, 29) = 1.496, p = .231|| β = 0.141|| t(29) = 0.75, p = .460|
|Signals and Systems||Male|| F(1, 79) = 0.037, p = .849|| β = 0.251|| t(79) = 2.29, p = .025|
|Female|| F(1, 29) = 0.027, p = .871|| β = 0.307|| t(29) = 1.71, p = .099|
|English||Male|| F(1, 79) = 0.394, p = .532|| β = 0.371|| t(79) = 3.53, p = .001|
|Female|| F(1, 29) = 1.375, p = .250|| β = 0.464|| t(29) = 2.78, p = .010|
|Analogical Electronics||Male|| F(1, 79) = 0.019, p = .890|| β = 0.279|| t(79) = 2.57, p = .012|
|Female|| F(1, 29) = 0.820, p = .373|| β = 0.365|| t(29) = 2.07, p = .048|
Then, based on strong reliability data (self-evaluated percentile: αmale = 0.91, αfemale = 0.93; percentile of the actual outcome: αmale = 0.82, αfemale = 0.73; see Table 1 for details), we collapsed the data across the four questions and calculated the mean peer-comparison overconfidence level and the mean actual percentile of overconfidence for each participant. The resulting means for the individual actual percentile of overconfidence and the peer-comparison overconfidence were not significant, the actual percentile of overconfidence being M male = −0.80%, SD male = 0.20, t male (79) = 0.35, p male = .73 and M female = 3.70%, SD female = 0.21, t female (29) = 0.96, p female = .34, and the peer-comparison overconfidence being M male = 0.31%, SD male = 0.20, t male(79) = 0.14, p male = .89 and M female = 3.60%, SD female = 0.18, t female (29) = 1.10, p female = .28, indicating that those surveyed were not overconfident. No significant difference between the actual percentile of overconfidence and the peer-comparison overconfidence measurements was found in either sex group: F male(1, 79) = 0.17, p male = .67; F female(1, 29) = 0.001, p female = .978.
We also tested the mean peer-comparison overconfidence as a predictor of the mean actual percentile of overconfidence using linear regression, finding for males, β = 0.322, t(79) = 3.0, p < .01, and for females, β = 0.368, t(29) = 2.1, p < .05. Replicating the findings of Study 1, the level of peer-comparison overconfidence was shown to predict the mean actual percentile of overconfidence when an individual responded to several items.
Study 3: Job-hunting-related peer comparisons
Studies 1 and 2 suggested that the level of peer-comparison overconfidence was a satisfactory predictor of the actual percentile of overconfidence. However, one might argue that taking an examination is a relatively high-frequency event for an academic student. The participant's estimation may be influenced by the often-repeated cycle of “Take an examination—Estimate your own score and percentile—Receive the actual test score and find out your percentile of actual outcome.” To determine whether the predictor is robust enough to extend across a wide variety of problems, we conducted a third study to replicate and extend our initial results. Rather than using academic-performance-related questions, we used job-hunting related questions to assess the generality of our findings, as they may be able to provide greater ecological validity.
A sample of 48 senior students (mean age = 22.35 years, SD = 1.04 years; 28 men) from the College of Foreign Languages at Jilin University was recruited in the month of October (approximately 8 months before their graduation). They were asked to estimate the percentile of four job-related questions (the number of days and the cost in money they would spend on the job search, the salary for their job and the number of companies or organizations that they would interview with in the process of job hunting). The format of the questions was also similar to the problem of “Mathematical skills.” As an example, one of the questions read as follows:
Imagine a random sample of 100 Jilin University students the same sex as you and who entered the University the same year you did. Assume that you yourself are one of those 100 students. Suppose that all 100 students in the sample are ranked according to the cost in money they will spend on their job search. What is your best estimate of the number of students in the sample (0–99) who will spend less money on their job search than you will spend?:_____
In order to contact the participants later to obtain their actual outcomes for the survey questions, they were also required to provide their telephone numbers on the questionnaire. The participants were instructed that there were no right or wrong answers and were encouraged to give their own honest and frank opinions.
Approximately 1 week before their graduation, we sent short messages to the participants to collect data about the actual number of days spent and the monetary cost of finding a job, the salary of their job, and the number of companies or organizations they interviewed with during their job-hunting process. Twenty-four complete responses were received (mean age = 22.4 years, SD = 1.2 years; 17 men). Because the items in the questionnaire were job-related questions, a critical criterion for the participants to be qualified as complete was that he or she had been successful in obtaining a job by the end of the academic year. The remaining 24 students were not included because two had not received a job by that time, one planned to go abroad for further study, five planned to do postgraduate studies, eight made no response to the short message, and we were unable to connect with the other eight.
Results and analysis
The 24 participants (17 men) were also separated and were ranked, exactly as for Studies 1 and 2. The actual percentile of overconfidence was computed using Formula (3), and the level of peer-comparison overconfidence was calculated using Formula (1).
For both groups, the Friedman nonparametric test was used to examine differences between the four questions with respect to the self-evaluated percentile and the percentile of the actual outcome (see Table 1 for details). No significant variation was found, for either the self-evaluated percentile, χ r(male) = 1.16, p > .5, and χ r(female) = 2.36, p > .5, or the percentile of the actual outcome, χ r(male) = 0.21, p > .5, and χ r(female) = 0.11, p > .5. Therefore, we collapsed the data across the four questions and calculated the mean peer-comparison overconfidence level and the mean actual percentile of overconfidence for each participant. The sign test indicated that the percentile of the actual outcome was overestimated, both for males, M = 18.1%, SD = 0.20, p < .05, and females, M = 22.7%, SD = 0.24, p < .05. The level of peer-comparison overconfidence was also significant, for males, M = 20.4%, SD = 0.17, p < .05, and females, M = 18.1%, SD = 0.12, p < .05. Sign tests showed that the actual percentile of overconfidence and peer-comparison overconfidence were not significantly different for either males, p > .05, or females, p > .05. These overconfidence results essentially mirror those previously reported in the published literature (Ehrlinger et al., 2008; Li et al., 2006).
Using linear regression, we tested the mean level of peer-comparison overconfidence as a predictor of the mean actual percentile of overconfidence, finding for males, β = 0.853, t(16) = 6.33, p < .01, and for females, β = 0.626, t(6) = 1.8, p = .13. Although the data for the female participants did not reach significance at the.05 level, the tendency was similar to that in the male participants. This indicated that a higher level of peer-comparison overconfidence corresponded to a greater mean actual percentile of overconfidence.
The results of Study 3 showed an overall good agreement with those of Studies 1 and 2. Once again, peer comparison overconfidence predicted the mean actual percentile of overconfidence when a less common, but more ecologically valid, event was used. However, one should be cautious when drawing such a conclusion, considering the small sample size used in the study.
Do all roads lead to Rome? Such questions remain unsettled until every “road” is tested. As a convenient method for the measurement of bias in self-evaluation, the peer-comparison question has been accepted by some researchers (Lee et al., 1995; Li et al., 2006, 2010; Li, Bi, & Rao, 2011; Yates et al., 1996). However, the validity of this as a diagnostic criterion has remained unproven. In the present research, a simple experimental design was employed to provide a direct test of the effectiveness of the practice.
Measured overconfidence is reported to be stronger among Asian than among Western subject groups (e.g., Li et al., 2011; Yates et al., 1997). In the present study, we demonstrated that not only “measured” but also “real” overconfidence can be detected by using peer-comparison questions, and showed a simple and easily used measurement tool to be a valid and reliable indicator of overconfidence. To summarize, using three studies, we demonstrated that the level of peer-comparison overconfidence and the actual percentile of overconfidence are fundamentally related. On the basis of these results, we concluded that the peer-comparison question has been demonstrated to be, in general, a good alternative for assessing overconfidence: The higher the predicted peer-comparison overconfidence level, the greater the degree of the actual percentile of overconfidence. In addition, this study adds new evidence to address the issue of using a self-evaluated percentile to predict overconfidence (Larrick et al., 2007).
The theoretical contributions of the study are three-fold. First, it validates the results of previous studies that employed peer-comparison problems as a research tool (e.g., Li et al., 2010). Second, the present study provides evidence that peer-comparison problems can be used as a reliable tool to measure overconfidence when actual participant performance is unavailable, which will expand the scope of future research on overconfidence. Third, the present result that such a simple and easily used measurement tool is proven as a valid indicator of overconfidence will lead to a better understanding of the nature of overconfidence.
The results of Study 1 are consistent with previous studies which have suggested that college students cannot provide an accurate estimate of where they stand (Dunning, Johnson, Ehrlinger, & Kruger, 2003; Krueger & Mueller, 2002). However, the participants in Study 2 did not show any overconfidence or underconfidence. This may be due to the repetitive cycle that we mentioned above. This could be similar to a study of weather forecasters who constantly received feedback from the weather and thus were able to make good predictions (Murphy & Winkler, 1971, 1974). The difference in overconfidence between Study 1 and Study 2 supports the support theory (Rottenstreich & Tversky, 1997; Tversky & Koehler, 1994), which states that the subjective probability of an event depends on the manner in which the event is described. This suggests that one reason people overestimated their percentile in Study 1 is that they did not unpack the task into its various subcomponents: the percentiles for English, Data Structure, Discrete Mathematics, and Computational Methods.
Substantial numbers of studies that have been carried out on samples from two major cultures—Asian cultures (e.g., China) and Western cultures (e.g., United States)—have documented that: (a) the Chinese from China exhibited higher degrees of overconfidence than did the Americans from the United States (Acker & Duck, 2008; Li et al., 2006, 2009; Yates, 2010; Yates et al., 1996, 1997, 1998); and (b) the Chinese were more risk seeking than were the Americans (Hsee & Weber, 1999; Weber & Hsee, 1998; Weber, Hsee, & Sokolowska, 1998). Our research suggests that the peer-comparison overconfidence measure is a strong predictor of the actual percentile of overconfidence, at least in the more overconfident Asian group. To determine whether the present finding is robust enough to persist in the less overconfident Western group remains an unanswered question.
Moreover, employing the tool of the peer-comparison question to examine the domain specificity and domain generality of overconfidence is another desirable direction, which is similar to the work done by Weber, Blais, and Betz (2002) on attitudes toward risk. Although overconfidence in ability estimation has been classified as a pervasive cognitive bias, the question as to whether overconfidence is “domain specific” or “domain general” is still under debate (Bornstein & Zickafoose, 1999; Klayman, Soll, González-Vallejo, & Barlas, 1999; Larrick et al., 2007; West & Stanovich, 1997). As they are immune to the hard–easy effect that makes general-knowledge questions difficult to evaluate, peer-comparison questions appear to be better alternatives for investigating the domain specificity versus the domain generality of overconfidence.
General-knowledge questions are also called almanac items. For a general-knowledge question, the participants state which alternative they believe to be correct and then indicate how sure they are that the selected alternative is really correct. An example of such an item is the following (Lee et al., 1995):Which city has a greater population?:a. Mexico City; b. Cairo. Chosen answer (circle one): a bProbability that my answer is correct (50–100%): __%The gap between the mean subjective probability and the proportion of correct answers indicates over- or underconfidence.
Objective performance data were obtained with student consent. Participants were required to provide their name and student ID when the questionnaire was completed. They were also told that the experimenter would obtain their objective performance, which would be only used for research, from the Teaching Secretary. The same procedures were used in Study 2.
This research was partially supported by grants from the National Basic Research Program of China (973 Program, 2011CB711000), Knowledge Innovation Project of the Chinese Academy of Sciences (KSCX2-EW-J-8) and the National Natural Science Foundation of China (70871110). We would like to thank Drs. Rhoda E. and Edmund F. Perozzi for their constructive suggestions and extensive review of the English language and professional content of this article. We also wish to thank Li-Lin Rao and Hong-Yue Sun for their helpful discussion and comments on this manuscript, and Shuo Yang and Xiao Zou for their assistance with the data collection.