What does COVID‐19 testing results really say? The real statistics concealed behind the accessible data

Abstract Accurate and comprehensive testing is crucial for practitioners to portray the pandemic. Without testing there is no data; yet, the exact number of infected people cannot be determined due to the lack of comprehensive testing. The number of seropositive for SARS‐CoV‐2 infection is obviously relative to the extent of testing. However, the true number of infections might be still far higher than the reported values. To compare the countries based on the number of seropositive for SARS‐CoV‐2 infection is misleading, as there may not be enough tests being carried out to properly monitor the outbreak. In this paper, we closely look through the COVID‐19 testing results. Herein, we try to draw conclusions based on the reported data: first, the presence of a possible relationship between COVID‐19 transition and patients' age will be assessed. Then, the COVID‐19 case fatality rate (CFR) is compared with the age‐demographic data for different countries. Based on the results, a method for estimating a lower bound (minimum) for the number of actual positive cases will be developed and validated. Results of this study have shown that CFR is a metric reflecting the spread of the virus, but is a factor of the extent of testing and does not necessarily show the real size of the outbreak. Moreover, no large difference in susceptibility by age has been found. The results suggest the similarity between the age distribution of COVID‐19 and the population age‐demographic is improving over the course of the pandemic. In addition, countries with lower CFRs have a more similar COVID‐19 age distribution, which is a result of more comprehensive testing. Finally, a method for estimation of the real number of infected people based on the age distributions, reported CFRs, and the extent of testing will be developed and validated.


| INTRODUCTION
When a communicable disease outbreak such as COVID-19 begins, identification of cases, quick treatment, and immediate isolation would be crucial to prevent the spread of the disease. Testing is the window onto the pandemic; crucial either to identify, treat, isolate or hospitalize infected people and also for understanding the spread of the pandemic for implementing effective policies for controlling the outbreak. Accurate estimation of the number of COVID-19 confirmed cases is substantial for both conducting non-pharmaceutical interventions (NPIs) and implementing effective regulations called "control orders" for preventing, protecting against, delaying, or otherwise controlling the incidence or transmission of COVID-19. [1][2][3] The implementation of these policies (i.e., travel restriction, quarantine, or lockdown) might seem effective and necessary, but always comes at a price. Moreover, many countries might not even benefit from applying these policies at all. 4 Therefore, it is necessary to take the right action at the right time; and the right action needs correct information.
The most influential key factor in making such decisions would be the real status of the pandemic in the region of interest. No country knows the actual number of people infected with COVID-19, as only the infection status of those who have been tested is known.
Many studies highlight the importance of statistical inferences to assess the percentage of people who become infected and the mortality rates. 5 The accuracy of this data depends on how much a country actually tests. The positive-rate results (seropositive for SARS-CoV-2 infection cases/total cases) in some countries even reach 35% (i.e., Ecuador), and in some other countries, it is almost near 0% (i.e., Australia). 6 Although this metric implies the spread of the virus in the region, it is also a measure of how adequately countries are testing.
Based on WHO, a seropositive rate below 5% is one indicator that the epidemic is under control in a region. 7 However, this metric is a function of the number of the performed tests. In fact, limited testing heavily affects the results, and in regions with a high seropositive rate, the true number of infections is estimated to be much higher than the number of the confirmed cases. Moreover, this estimation is always associated with errors due to many reasons, that is., asymptomatic transmitters, limitations in tests, false-seropositive/negative results, etc. The performance of any quantitative analysis based on case fatality rate (CFR) evaluations is also influenced by the modes of transmission of the virus causing COVID-19, that is, respiratory infections, direct contact transmission, and droplet or airborne transmission. 8,9 Moreover, studies have suggested an association between lower SARS-CoV-2 positivity rates and higher circulating 25(OH)D levels. 10 Also, it has been shown that the initial viral load is an important factor in the transmission of the disease. 8,11 As of April 10, 2021, the number of confirmed cases in the United States was >95,800 (per 1 million population), and based on CDC reports, 45% of US coronavirus cases have been among people older than 65 years old associated with more than 80% of US coronavirus deaths. 12 It must be noted that almost 17% of the American population was 65 years old or over. Based on these data, one may conclude that the susceptibility of the elder people to the coronavirus is high. On the contrary, in Italy, where almost 23% of the total population is aged 65 years and older, almost 23% of people with positive COVID-19 belong to this group. Here, data suggests no large difference of susceptibility by age. 13 However, reports suggest that the risk for serious disease and death deaths, ICU admissions, and hospitalization rates are higher among older adults. There is little known about whether people of different ages have different susceptibility to the infection. 14 Li et al. reported that more than 50% of early patients with positive COVID-19 in Wuhan were elder adults of 60 years or more. 8 However, the underrepresentation of younger people was suspected to be attributed to the fact that some of the young infected people were asymptomatic. 9,15,16 The true number of infected people is easily underestimated as it is dependent on several factors. It also often differs between regions. 17,18 Many attempts have been made to predict the total number of infected people or the fatality rate. [17][18][19] In this paper, we try to better understand and interpret the COVID-19 testing results. First, the presence of possible relation between COVID-19 transition and the patients' age will be assessed. In this part, the progressive relation between the population demographic and COVID-19's demographic data will be studied. The aim is to examine whether the difference in susceptibility to COVID-19 by age exists, and also, how

| SOURCE OF DATA
The data on the number of COVID-19 confirmed cases and deaths were taken from the Lancet database. 20 Also, the age-stratified data for the officially confirmed cases were obtained from available credible sources. [21][22][23][24][25][26] Demographic and geographical regions' populations' data were provided by United Nations Population, 2020 report. 27 These data sets are used for developing a model for estimation of the true number of COVID-19 positive cases in the regions of interest. The relation between the COVID-19 transition and the patients' age was assessed by studying the progressive relationship between the population demographic and COVID-19's demographic.
All these Data sources are publically available.

| SYNOPSIS OF METHODS
Herein, we want to examine if susceptibility to COVID-19 is relevant to the patient's age. This relevancy was also studied in the course of the pandemic to see how this relationship changes over time. In the second part, based on the results and correlations, we tried to develop a better estimation of the true number of COVID-19 positive cases. Herein, we used the information obtained in the former section with the following assumptions: on average, >80-year-old cases have a lower number of social interactions than the younger population, 28 and the infection would be more symptomatic in this group range. Studies support the validity of these assumptions: often, a lower number of social interactions is acceptable for >80-year-old people, especially during the course of a pandemic. [29][30][31] Moreover, the fraction of symptomatic cases to the overall positive cases is highest in this group. CDC reports also support the fact that older adults are at increased risk for severe illness, with older adults at the highest risk. 12,32 These two assumptions will be used for determining the lower- information, we refer to two or more health-related propositions that are logically inconsistent with one another. One of these conflicting pieces of information spread about the transmission of COVID-19 was its attribution to the patient's age. Early reports showed a markedly low proportion of COVID-19 positive cases among children, and very high susceptibility of elder people, 33 which has also been supported by the reported data for some regions. For example, based on CDC, 45% of US coronavirus cases have been among people older than 65 years old, a portion much more than their age-demographic share (~17%). 34 On the contrary, in Italy, one of the hotspots of the pandemic, only 23% of coronavirus cases have been among people older than 65 years old, a value much closer to their age-demographic share (~23%). 34 This claim has also been supported in some literature. 35 In this article, first, we demonstrated the relation of COVID-19's age distribution with respective countries' age distribution of populations over time. Furthermore, the inconsistencies in the COVID-19 CFRs of different geographical regions were addressed. Finally, a method for estimating the true number of positive cases of COVID-19 was suggested and validated.  Table 1). The same trend happens for Italy (DCP reduces from 0.58 to 0.27-see Table 2), and also for England (DCP reduces from 0.72 to 0.28 see Table 3).

| Timely reports
It should be mentioned that the DCP-vs-timeline is a monotonically decreasing function, indicating that the age distributions of the population and the COVID-19 infected people became more similar as time passed. The daily tests per thousand people in the United States increased from <0.01 on March 8 to 2.54 on August 31. This is evidence indicating that more comprehensive testing results in more similarities between the population age distribution and COVID-19 age distribution.
In Figure 1 the age distribution of COVID-19 and the population age distribution of the United States in different months have been presented. The calculated values for the US also could be observed in Table 1. From Figure 1, it is visible that the differences between the curves are diminishing.

| 5979
The same trend for the value of DCP could be observed for Italy.
The last column of Table 2 presents the daily COVID-19 positive cases at the collection date in Italy. The data of Table 2 and Figure 2 suggest that as time passes, more asymptomatic people are tested along with COVID-19 positive cases. Figure 2 represents the trend of changing COVID-19's age distribution. From Figure 2, it is observed that the similarity between the COVID-19's age-distribution curve and the population's age distribution is progressively increased. Moreover, this trend is monotonic. Therefore, in Figure 2H (which represents the Italy data for October), the confirmed cases and the population lines are closer, indicating more similarities between these values (Compare Figure 2A and Figure 2H).
The same approach can be employed to analyze the weekly status report, provided for England. Here, more sets of data (almost 17 weekly reports) are available, and therefore, more precise conclusions can be drawn. The number of confirmed cases and the calculated DCP values for England have been presented in Table 3. The trend of DCP values of England is also descending over time, and from 0.72 in week 13, it is consistently decreased and reached a value of 0.28 at week 39 (see Table 3).
Respective COVID-19's age-distributions and population's age-distribution of England have been depicted in Figure 3 and the same conclusion can be drawn. This data has been provided by England's national health institution and covers a smaller time frame; therefore, the similarity between the age distributions of COVID-19 and the population over time could be observed more easily. As expected, after the incidence of the outbreak, countries got better prepared for COVID-19 gradually, and various strategies had been adopted to both stop the spread and to track the positive cases more accurately in many countries (i.e., "COVID it is more likely that more tests had been devoted to this group during the very first stages of the outbreak as they had been at higher risks and also more visible symptoms were observed in this group. Therefore, the age distribution of COVID-19 left-skewed in the first days of the outbreak. As time passed, more tests were available and more people with mild or even no symptoms were also tested. This resulted in that the skewness of the age distribution gets reduced, and suggests that the age would not impact the COVID-19 transmission, but the disease would be more symptomatic (with severe symptoms) in older ones.

| Lower bound estimation of COVID-19 positive cases
An accurate estimation of the positive cases in the region is crucial for the health care officials to control the spread of the pandemic. As infected person. 8,11 In the former section, it has been concluded that the age distribution of COVID-19 infected cases and the population's age distribution for a region are expected to be similar in shape.
However, it is observed that in regions with relatively higher CFRs, the age distribution of the infected cases is rather skewed towards the elder ages part (see Figure 4B-A). Furthermore, the observed differences between the CFRs of regions with similar population's age distributions are beyond the differences in the health care quality and other such contributing factors. To explain these discrepancies, we hypothesized that these inconsistencies stem in scenarios at which the testing has been occurred, especially at the early stages of the outbreak. As discussed, at the beginning of the outbreak, the testing shortage was the main obstacle, preventing authorities from estimating the true extent of the pandemic spread. 37 However, the symptomatic positive cases, more prevalent among the younger age groups, normally would have gone unnoticed, whereas they equally contribute to the spread of the virus in society. 38 To address the aforementioned inconsistencies, the data on the oldest age group (i.e. >80-year-old) have been considered. Assuming that due to the severity of symptoms, "all" of the positive cases in this age group are identifiable, the ratio of the positive cases over the entire population has been calculated. Definitely, it is not realistic to assume that "all" of the positive cases were identified; that's why the real number Lower bound estimation results are summarized in Table 4

| CONCLUSION
Accurate estimation of the fraction of infected people is crucial when a communicable disease outbreak, such as COVID-19 occurs in a region. Testing is the window onto the pandemic; however, the results of tests are often prone to be misinterpreted for portraying the big picture of the outbreak, especially at its early stages. In this study, publicly accessible data for several developed countries were used to estimate the real dimensions of the pandemic. First, the age distribution of COVID-19 and the age distribution of the population of some affected countries were compared and analyzed. Results showed that more progressive similarities occur between these two distributions, as time goes by (as the testing is improved). In other words, no large difference in susceptibility to COVID-19 by age has been found.
In the second part, a method was developed for estimating the lower bound of the true number of positive cases in the region. The method was based on the reported test data of the oldest age group (people older than 80 years) and the regions' population age distributions. The proposed estimation method improved the expected similarity between the age distribution of positive cases and the region's population. Moreover, it was observed that regions with higher CFRs show more discrepancy between the age distribution of confirmed cases and the region's population. The discrepancy was quantified by calculating the error of the confirmed cases against our estimated lower bound. This leads to a more accurate estimation of true COVID-19 positive cases, which can help policymakers assess how the country/ community is doing in regard to COVID-19 and when and how strict the mitigation policies should be.

CONFLICT OF INTERESTS
The authors declare that there are no conflict of interests.