COVID-19 Data: The Logarithmic Scale Misinforms the Public and Affects Policy PreferencesThe Scale of COVID-19 Graphs Affects Understanding, Attitudes, and Policy Preferences

Mass media routinely present data on COVID-19 diffusion with graphs that use either a log scale or a linear scale. We show that the choice of the scale adopted on these graphs has important consequences on how people understand and react to the information conveyed. In particular, we find that when we show the number of COVID-19 related deaths on a logarithmic scale, people have a less accurate understanding of how the pandemic has developed, make less accurate predictions on its evolution, and have different policy preferences than when they are exposed to a linear scale. Consequently, merely changing the scale the data is presented on can alter public policy preferences and the level of worry about the pandemic, despite the fact that people are routinely exposed to a lot of COVID19 related information. Reducing misinformation Providing the public with information in ways they understand better can help improving the response to COVID-19, thus, mass media and policymakers communicating to the general public should always describe the evolution of the pandemic using a graph on a linear scale, at least as a default optionor at least they should show both scales. Our results suggest that framing matters when communicating to the publicMore generally, our results confirm that policymakers should not only care about what information to communicate, but also about how to do it, as even small differences in the framing of data can have a significant impact.


Introduction
The coronavirus disease 2019  pandemic is a formidable challenge. Absent a cure or a vaccine, it is crucial that people are adequately informed about the pandemic (Everett et al., 2020), so that they stand behind policies that aim to minimize the spread of the virus and adopt behaviors that can limit the risk of contagion (Bursztyn et al., 2020). However, research has shown the challenges of communicating scientific facts in a way that effectively conveys essential information to the general public (Pidgeon and Fischhoff, 2011). In this article, we highlight the importance of this problem by focusing on one of the most basic pieces of information relative to the pandemic: the number of deaths.
To provide information on the diffusion of the virus, mass media routinely publish graphs that depict the evolution in the number of COVID-19 related deaths in a given area. Many of these graphs present quantities on the Y-axis on either a linear scale (TheWashingtonPost, 2020, Vox, 2020 or a logarithmic scale (Guardian, 2020, Financial-Times, 2020, NewYorkTimes, 2020a. The New York Times, for instance, has explained that the logarithmic scale helps better visualize exponential growth (NewYorkTimes, 2020b).This follows advice given by epidemiology journals (Gladen, 1983, Levine et al., 2010 and data visualization handbooks (Kosslyn, 2006). However, what might be true for conveying information among experts might not hold when issuing information to a broader audience. The principle that logarithmic scales are better suited for exponential growth does not hold true if readers do not, in fact, comprehend them.
We show that scale choice has important consequences on how people understand and react to the information conveyed. In particular, we find that when people are exposed to a logarithmic scale they have a less accurate understanding of how the pandemic unfolded until now, make less accurate predictions on its future, and have different attitudes and policy preferences than when they are exposed to a linear scale. Another study (Ryan and Evers, 2020) carried out a week after ours, confirms our finding that the scale of the graph affects policy preferences and that people have problems understanding logarithms. Instead, a study with Canadian respondents finds that the scale of the graph has no impact on respondents (Sevi et al., 2020). 1 . Previous studies have already shown that even experts have problems understanding graphs that use the logarithmic scale (Menge et al., 2018, Heckler et al., 2013. However, unlike most stud-1 However, their study uses a "catch all" question for pessimism and one on policy preferences. These catch all questions might be unable to capture the nuanced impact of graph scale on policies and attitudes that we observe. For instance, we observe an impact on worry for the health crisis, but not on worry for the economic crisis. 2 ies on graph comprehension we test understanding of graphs that represents real world highly salient data about which the public is likely to have ample background information and to care deeply. The obvious relevance of the data depicted in the graphs also allows us to test the impact of the scale in which the data is plotted on preferences about important policy issues. This result is consistent with existing evidence that even engineering students and specialized scientists have trouble understanding information conveyed in logarithmic scale graphs (Menge et al., 2018, Menge et al., 2013 Since reducing misinformation Since providing the public with clear information can help improving the response to COVID-19 (Van Bavel et al., 2020), mass media and policymakers should present data on the evolution of the pandemic using a graph on a linear scale, at least as a default optionor at least they should show both scales.

Experiment
We devised a double-blind experiment approved by the Yale IRB to test people's graph comprehension and its effects on attitudes and policy preferences.The experiment was approved by Yale's Institutional Review Board, and we asked all participants to confirm that they were 18 years old or older at the time of taking the survey and giving their informed consent before participating. Those clicking no on any of the two statements were not allowed to answer any question. All participants were recruited through Cloud Research, while the survey was structured on and administered via Qualtrics, where we were able to download the anonymized data. We recruited a sample of approximately n = 2000 (after exclusion criteria, with no regression with less than 1825 observations) U.S. residents on Cloud Research. Half of them were randomly assigned to the Linear Group, in which they were shown the evolution of COVID-19 deaths in the U.S. on a linear scale. The other half were assigned to the Log Group, in which participants saw the same data, but plotted on a logarithmic scale. The graphs were taken from the popular website www.worldometers.info (See Fig 1). We asked respondents three sets of questions: (i ) attitudes and policy preferences, (ii ) graph understanding, and (iii ) standard demographic questions. In the supplementary material, we report the questions we asked and the order in which they were asked. The analyses can be grouped into: 1) determinants of worry, 2) policy preferences and 3) differences in understanding. In all three cases our primary variable of interest is "linear", a binary taking value 1 whenever the participant was exposed to the linear scale graphs, and 0 otherwise.
We start by showing participants in the two groups the graph plotting the evolution of the total number of deaths on the scale to which they were randomly assigned. Then we ask respondents in the two groups to indicate how worried they are about the health crisis and the economic crisis caused by COVID-19 on a five points Likert scale from "not worried at all" to "extremely worried". Second, we ask respondents about their preferences on some policies that many States have adopted to mitigate the spread of COVID-19. In the first pair of policy questions we ask whether they support the policy of closing non-essential businesses (five points Likert scale from "strongly disagree" to "strongly agree"), and until which date they would keep these businesses closed. In the second pair of policy questions we ask participants how often they would use a mask if the government sent a supply (five points Likert scale from "never" to "always"). Moreover,, we ask whether they would support a tax that finances the distribution of masks for everyone in their State (five points Likert scale from "strongly oppose" to "strongly support").
We then turn to test respondents' understanding of the graphs. To increase external validity and to avoid priming respondents, we ask attitudes and policy preferences before testing understanding. This allows us to obtain respondents' policy preferences before they are asked to think thoroughly about the graph and its meaning in a way that they 4 would be unlikely to do when reading actual news.
We test understanding of graphs by asking three questions. First, we show them the COVID-19 graph on the scale that they had been assigned and ask them whether the number of deaths increased more between March 31st and April 6th or between April 6th and April 12th. Second, we show them a graph describing non-COVID-19 related data on the number of deaths from an hypothetical infection Z (taken from Okan et al. (2016)) and asked them a similar question. As for the first graph shown to participants, people in the Linear Group saw the data plotted on a linear scale, whereas respondents in the Log Group saw data plotted on a logarithmic one. The goal of this question was to test whether respondents' ability to answer correctly the first question depended on prior information on COVID-19, or on a correct understanding of the scale on which their graphs are plotted.
Third, we test whether respondents can make predictions based on the curve. In particular, we ask them to make a prediction on the total number of deaths on April 25th, one week after we launched the experiment.
Predicting the number of COVID-19 related deaths in a week is very difficult, but some predictions are more reasonable than others. We forecast the number of total deaths on April 25th using an ARIMA model, a standard forecasting method that has already been used to predict COVID-19 diffusion (Benvenuto et al., 2020). We use a ARIMA (0,2,1), as simulations show that it offers the best fit for the data, and forecast the number of cases and its 95% and 99% confidence intervals (CIs). On the 18th of April the number of deaths was 39, 014. The 95% CI forecasted using the ARIMA(0,2,1) ranges from 49, 203.15 to 62, 559.27, whereas the 99% CI ranges from 46, 895.47 to 64, 685.95.
We remark that the actual number of deaths on the 25th of April was 54, 256, while our ARIMA predicted 55, 791 deaths predicted model. This is well within the CIs we consider.
We use these CIs to divide predictions in three groups. In the first group, we include the predictions that fall within the forecast 95% confidence interval ("accurate range"). We consider these predictions "accurate". In the second group, we include the predictions that fall within the 99% confidence interval, but outside the 95% confidence 5 interval ("unlikely range"). We refer to these predictions as "unlikely". Last, we consider the predictions that fall outside the 99% confidence interval ("unreasonable range") as "unreasonable".
Additionally, for each of the understanding questions we asked how confident respondents were about their answers. The level of confidence is important as it can shed some light on how much weight people will attach to the information represented in the graph.
We concluded by collecting standard demographic information on the respondents. Therefore, using linear scale graphs reduces the risk of confusing the public. Moreover, the scale also impacts people level of worry for the health crisis (but not for the economic crisis) and their policy preferences. People in the Linear Group are more worried about the health crisis (see Table 4), and prefer that non-essential businesses remain closed for longer (Table 5). However, they support less strongly the idea of closing non-essential business in the first place (Table 5), and would wear governmentsupplied masks less often (Table 6). These results are statistically significant and robust to a series of different controls and specifications (the regressions presented use Logit and OLS and the results are robust to different sets of controls). The odds ratios show that the magnitude of the effects is non-negligible (Table 7).

Results and Discussion
These findings are remarkable because the data underlying the graphs is identical.
Merely changing the scale can alter public policy preferences and the level of worry, despite the endless flow of COVID-19 related information to which everyone is exposed.
We cannot know the mechanism leading to these preferences, but we advance the conjecture that the shape of the curves could explain these findings. The flat logarithmic curve can give the impression that we reached a plateau and that, while the present situation is very serious, things are about to get better soon. Thus respondents in the Log Group might be less worried because they feel that the end of the pandemic is near.
For the same reason, they could strongly support closing non essential businesses now, i.e. during the peak, but could want to reopen them as soon as the peak is over. Moreover, they might concentrate the use of masks during the peak. As the Log Group thinks we are at the peak, they could also expect a very high number of deaths in the short term, which would also explain their strong support to wear masks and to keep business closed.
Vice versa, the linear curve is constantly growing with no sign of improvement, hence it might give the impression that the crisis will go on for long and will be very serious.
Consequently, people in the Linear Group might be more worried and wish to reopen nonessential businesses later. However, they could support closing non-essential businesses relatively less, because they believe that the pandemic will last for a long time, and nonessential businesses cannot remain closed for too long. However, if the decision taken is to close non-essential businesses, they might feel that it would be pointless to do it for a short period of time. They would apply a similar logic to masks. As they believe that the pandemic will last for a long time, they could use them less frequently to ration them.
Regardless of the reasons behind our findings, it is noteworthy that changing the scale can alter policy preferences, intentions to adopt precautionary measures, and level of worry for the health consequences of the pandemic. Given that the scale affects policy preferences and that people have significant problems understanding the logarithmic scale, our findings suggests that representing data on a linear scale is preferable. Garfin et al. (2020) noted that during a public health crisis, the general public relies on the media to convey accurate and understandable information, so that it can take informed decisions regarding health protective behaviors. Absent information of this kind, people cannot form informed preferences or take informed decisions. Moreover, unclear information conveyed by the media could undermine how much people trust science, which is a key predictor of compliance with COVID-19 guidelines (Brzezinski et al., 2020, Phlol and Musil, 2020). 8 4. Tables  Table 1: Frequency Table for Demographic variables: Number, Percentage and Cumulative Percentage of respondents for the following variables: Age, Education, Income, Political orientation, Gender, Live in city with less than 50K people, Live in city with more than 500K people. Column 1 shows the overall distribution, Column 2 shows the distribution for the Linear Group and Column 3 the one for the Log Group.  p-values in parentheses * p < 0.10, * * p < 0.05, * * * p < 0.01 p-values in parentheses * p < 0.10, * * p < 0.05, * * * p < 0.01  p-values in parentheses * p < 0.10, * * p < 0.05, * * * p < 0.01 p-values in parentheses * p < 0.10, * * p < 0.05, * * * p < 0.01 p-values in parentheses * p < 0.10, * * p < 0.05, * * * p < 0.01 Exponentiated coefficients; Standard errors in parentheses * p < 0.05, * * p < 0.01, * * * p < 0.001