Improving the quality of statistical questions posed for group comparison situations

Posing statistical questions is a fundamental and often overlooked component of statistical inquiry. In this paper, we provide an overview of shared understandings regarding what constitutes a good statistical question. We then describe three approaches—a checklist for improving statistical questions, a three‐phase feedback activity, and a matching game—that we find useful with preservice teachers to support the development of high‐quality statistical questions for the comparison of data sets.


| INTRODUCTION
Comparing groups is an important activity in statistics because it necessitates attention to, and application of, fundamental statistical ideas such as data, distribution, representation, center, variability, sampling, and inference. In addition, the comparison of distributions, which is, according to Konold and Higgins [9] "the heart of statistics," can be seen as a motivating activity from the early school years to adulthood and provides a compelling context for the development of statistical reasoning and for engagement in mathematical communication [3] and meaningful problem solving [4]. A general overview of research on group comparisons and a framework for distinguishing between different kinds of group comparison elements can be found in Biehler, Frischemeier, Reading, and Shaughnessy [2]. In this overview, the authors advise that group comparison activities should be conducted with real and meaningful data and should be embedded in a statistical investigative cycle like the PPDAC cycle [13], which consists of the phases: problem, plan, data, analysis, and conclusions. Historically, a disproportionate focus has been placed on the DAC part of the PPDAC cycle with the corresponding neglect of problem formulation. Shaughnessy [12] argues that greater attention is merited on the problem formulation, problem design, and data production components of statistical investigation: If students are given only pre-packaged statistics problems, in which the tough decisions of problem formulation, design and data production have already been made for them, they will encounter an impoverished, three-phase investigative cycle and will be ill-quipped to deal with statistics problems in their early formulation stages ( [12], p. 963).
In this paper, we focus on the first P-the Problem phase in the statistical investigative cycle, applied to comparisons of two groups and suitable for school students, particularly at primary school level. The quality of the The copyright line for this article was changed on 16 May 2020 after original online publication. problem (ie, the statistical question) steers the direction of the investigation and has implications for each stage that follows [7]. The statistical question impacts the nature of the data collected or accessed from a preexisting dataset (the data phase), which in turn influences the representations and measures that determine the quality of the group comparison (the analysis phase) and the subsequent conclusions drawn (the conclusions phase). For example, statistical questions requiring only a yes/no answer do not necessarily motivate rich data explorations, whereas statistical questions directed to broader differences allow deeper explorations. Another problem that occurs when generating statistical questions is differentiating between statistical and survey questions. Statistical questions are investigative questions or research questions such as "In how do boys and girls differ in spending time watching TV on the weekend?". In contrast, at times, learners tend to generate survey questions, which are questions used in a questionnaire to collect the data, for example, . "How much time do you spend watching TV on the weekend?" ([1], p. 19).
Consequently, we identify two crucial aspects when approaching the first P-Phase of the PPDAC cycle: (a) The distinction between survey questions and statistical questions, (b) The quality of statistical questions to support a robust and deep exploration of data. Thus, this article has three purposes: we first provide a checklist for improving statistical questions; second, we demonstrate the distinction between statistical questions and survey questions and illustrate an approach that highlights this distinction to students; and third, we provide teaching ideas for developing understanding of the characteristics of good statistical questions and how to improve the quality of the statistical questions in a think-pair-share setting.

| Characteristics of good statistical questions
Constructing statistical questions is not a trivial task. Development of characteristics for good statistical questions has been carried out by Arnold [1], who identifies six fundamental criteria vital for the generation of good investigative questions (in our terms: statistical questions): To fulfill the first criterion, the variables of the question have to be precisely described, available to measure and correctly identified from the actual question. The second criterion attends to whether the question focuses on individuals, on a sample or a population. The third criterion focuses on the question itself, for example, whether it is a summary, a comparison, or a relationship question. Another fundamental issue (in line with Arnold's (2013) fourth criterion) is whether the data generated by the question will be sufficient to adequately answer the posed question. The fifth criterion deals with the question whether "the information obtained by answering the question will be useful to someone, i.e. there is a purpose for the investigation" ([1], p. 111). Finally, the sixth criterion addresses whether the question allows analysis based on a local view or broader view of the data.
Building upon the work of Arnold [1] and with focus on the quality of the questions and the exploration process, Frischemeier and Biehler [5] distinguish different levels of qualities of statistical questions like yes/no questions and questions aiming at working out differences between distributions. An example of a yes/no question is "Is there a difference between boys and girls in their time spent on computer use?". In this case, the answer can just be given with a simple yes/no answer. A more elaborated statistical question, they state, is one which aims at working out differences in group comparison situations-for example "How does computer use differ between boys and girls?" Another aspect which would improve the statistical question would be to talk about the size/magnitude of the difference of the groups. Finally, a statistical question such as "which differences exist between boys and girls regarding their leisure time activities?" serves as an example for "open and complex" questions, which involve at least two variables and are therefore more complex and sophisticated in comparison to the other questions.

| Checklist for improving statistical questions
Taking into account the experiences of Arnold [1] and Frischemeier and Biehler [5] we have designed a checklist (see Table 1) with four categories, and associated prompts, which focus on how to improve statistical questions for the use in teaching-learning arrangements [8].

| An introductory example:
Statistical project work with elementary preservice teachers: writing questions that motivate inquiry into a preexisting data set Let us start with an example: In our lectures and seminars for elementary school preservice teachers at the University of Paderborn, we want to provide the opportunity for our students to explore real and meaningful data. For this purpose, we collected data (called Primary School NRW dataset, please note that the dataset is not representative of the population of all primary school students in Germany) on the leisure time and media activities of 809 primary school students in a federal state of Germany. The task for our preservice teachers was to generate statistical questions and to explore the dataset with respect to their self-generated questions. For this purpose, our participants were given the data analysis tool TinkerPlots [10] to analyze and explore the data according to their statistical questions. In TinkerPlots, the data are typically stored in a data cards stack (see Figure 1 left). The graph feature in Figure 1 (right) allows learners to separate, stack, or order the symbols (which represent the data cards) to create meaningful insights into the data when answering the statistical question. In Figure 1 (left), we see the data cards stack of 809 pupils in TinkerPlots, and each data card offers us information about several attributes like the city, gender, age, class, height, and shoe size. In this case, we see the case of a girl named "Bad Girl," who lives in Kleve (a town in Germany), is 8 years old, is in class 3, and has shoe size 38, etc. A first discussion point before exploring the data is the question of how the data (for example, "Bad girl") have been collected. We discuss with our students that it is necessary to set up survey questions to collect all the data for the attributes, and we then examine the survey questions for selected attributes for the case "Bad girl." Some of these survey questions are: "In which town do you live?," "What is your fantasy name?," "Are you male or female?" or "How old are you?" 2 | MATCHING GAME FOR DISTINGUISHING SURVEY QUESTIONS FROM STATISTICAL QUESTIONS As mentioned above, one challenge for students is the distinction between a statistical question for their project and a survey question, a question necessary to collect the data they require. Thus, we designed an activity in which students match statistical questions to a good survey question for the Primary School NRW dataset. The idea is that they play a kind of a matching game. The idea of this game was established in a conversation with a master student (Johanna Kellner) when discussing a teaching-learning arrangement on generating statistical questions in primary school (see Kellner [6] for details). As we see in Figure 2, some cards present statistical questions and others present survey questions.
Then the cards are turned and students play against each other to find matching pairs (see Figure 3). In Figure 3, for example, the survey questions "Are you male or female?" and "How do you come to school?" match the statistical question "In which way do boys and girls differ how they get to school?" The goals of this game are on the one hand that the students learn to differentiate between both types of questions and that on the other hand, the students can match survey questions to the corresponding statistical question.

| AN ACTIVITY TO DEVELOP THE QUALITY OF STATISTICAL QUESTIONS IN GROUP COMPARISON SETTINGS
In the following, we will report on the further process of our project analyzing the Primary School NRW dataset concerning self-generated statistical questions and present our activity to develop the quality of statistical questions in the process. The development of the quality of statistical questions for the data exploration process is grounded on the implementation of the process in a think-pair-share setting [11]. We can see the idea and structure of this activity in the snapshot of the task in Figure 4, which was given to our preservice teachers. As we can see at first, in the "think phase," learners get minimal information about the problem and are asked to generate a statistical question for the exploration of the Primary School NRW dataset. We recommend that students work in pairs in the think phase so that they can communicate with each other about the statistical question and discuss the quality of the statistical question.
Then in the next step, in the "pair phase," two pairs, pair 1 and pair 2, of students come together. The idea is that pair 1 reviews the statistical question of pair 2 and vice versa. To support the students in providing adequate feedback, they are provided with the checklist "Checklist for improving statistical questions" (see Table 1). Thus, referring to items on the checklist, pair 1 gives feedback to pair 2 and pair 2 provides feedback for pair 1. Then both pairs are given some time to revise their statistical questions taking into account the feedback received. Finally, the statistical questions of all pairs are discussed in the whole class and the instructor provides expert feedback. After this discussion, each pair finalizes their statistical question. To illustrate the process described above, we will refer to the specific example of the two pairs Anna & Clara and Tom & Laura (preservice teachers for elementary school in their fourth semester at the University of Paderborn) who have taken part in our project.

| Phase 1 (Think)
At first, in phase 1, student pairs worked in a think phase in pairs on the design of a statistical question that promoted the comparison of two groups in the dataset Primary School NRW (see Figure 5).
As we see Anna and Clara have designed the question "Do girls spend more time on leisure activities per day than boys?". From our point of view, this is a statistical question, which can be further developed concerning its quality because the question does not necessarily trigger a deep exploration process; rather, it requires only a single yes/no answer. Furthermore, the question is not specific with regard to the variables leisure activities, and, to a lesser extent, time.

| Phase 2 (Peer-feedback)
In phase 2, the question of Anna and Clara was given to their peer-pair Tom and Laura. The idea was that Tom and Laura provide peer feedback to Anna and Clara using the checklist in Table 1. In Figure 6, we see the feedback of Tom and Laura concerning the question of Anna and Clara.
Tom and Laura mention that the question can be answered with a single word (yes/no) and give a hint to revise the question. In addition to that, Tom and Laura also emphasize positively that the question is meaningful and that the question is a group comparison question.
With this feedback, Anna and Clara are asked to revise their initial question taking into account the feedback of Tom and Laura. The revised question of Anna and Clara after the pair phase can be seen in Figure 7.
In the question displayed in Figure 7, we see that their question is no longer a question requiring only a yes or a no as an answer because it now allows investigation of extent of differences, which could even involve investigation of different periods of time as well as just a simple total.

| Phase 3 (Expert-feedback)
The final issue was the production of the final statistical question based on the expert feedback to all students on their questions which were discussed in the classroom by the instructor. The questions after the peer phase were analyzed concerning their general shortcomings, strengths, and desirable features that emerged in the revised questions. The instructor asked all pairs to present their statistical questions after the peer-feedback phase. Then all questions from the peer-feedback phase were discussed in the whole class with regard to consider possible improvements to these questions. Hence, the instructor provided feedback for each pair with respect to their individual question. Examples of expert feedback to specific questions can be seen in Table 2.
After the expert feedback, the students were asked to revise and improve their statistical questions taking into account the expert feedback. For the question of Anna and Clara, the expert feedback focused on how the variable time is measured and Anna and Clara were asked to be more specific with regard to their measurement. In Figure 8, we see this component incorporated in the question. Note that leisure activities would still need to be carefully identified or articulated.
This three-phase activity (see Figure 4) has also been trialed in situations where there is no preexisting large data set to motivate the design of statistical questions. Irish preservice elementary teachers in their fifth semester of study worked in pairs to design a statistical question that promoted comparison of two numerical variables. They engaged in a process that mirrored the The question may not sustain curiosity as the answer is obvious. Are fourth graders older than first graders?
There is no indication of the clarity of measurement In which way do fourth graders differ from third graders with regard to their time spent on using a tablet? To what extent do girls and boys differ in the time of their leisure activities per day?
Look at the relationship between the question and the data it will generate The question is a survey question and not an investigative question How many games do you have on your smartphone?
The question only requires a yes/no answer Do students from the village need more time for their way to get to school than students from the city?
Look at (or imagine) the data Question cannot be answered with the given data or data to be collected In how far do male students have better grades than female students? activity of their German peers (see Figure 4); the only differences being that (1) there was no preexisting quantitative data set to explore and take into account when designing their questions and (2) they did not complete the matching card game. Tables 3 and 4 present the initial questions posed (phase 1), the peer feedback received (phase 2) and the final revised questions that followed the provision of expert feedback (phase 3) for two pairs of students. For the first pair, Niamh and Liam, the feedback pertains to the identification of two groups. For the second pair, Sínead and Clodagh, the feedback focused on posing a problematic question. As can be seen, despite the absence of a preexisting data set and no engagement with the matching game, the use of the three-phase feedback alone brought about improvements in the statistical questions posed. The final revised question could be further improved so that it does not appear to be a yes/no question. For example, "In what ways do men's times for the Olympic 100 m sprint in the last 10 years compare with the previous 10 years?"

| SUMMARY AND OUTLOOK
In this paper, we provide some ideas on how to support the development of rich statistical questions in school teaching-learning environments, focused on comparing two groups, and particularly for primary school level. As we mentioned in the introduction, there are two crucial aspects in the first P-Phase of the PPDAC cycle: the need to distinguish between survey questions and statistical questions and the need for a sufficiently high quality statistical question. Our checklist for improving statistical questions, the three-phase feedback activity and the matching game are three different (yet complementary) # Phase 2Pair Feedback provided by another pair: • Definitely a unique context • The data will be numerical that is, the data will be counts or number of times of "saying hello" • It is not clear that there are two groups-you will need to make that clearer #

Phase 3Expert Feedback
Feedback provided by expert: You need to consider who the comparison will be made between-identify the two different groups that will be compared.

#
Final revised question [designed by students] November 21st marks the 21st world Hello Day which promotes world peace around the globe. Two classes of fifth grade children take part from School A and School B. On November 21st each child from both schools tries as hard as they possibly can to say Hello to as many people as possible and record the number. Which class of children promotes world peace the best?
T A B L E 4 Using the three-phase activity for improvement of statistical questions for Sínead and Clodagh

Phase 1Think [Initial question]
What were the winning times in the 100 m sprint in the Olympics among both males and females over the past 8 years?
# Phase 2Pair Feedback provided by another pair: • Good use of real data from a historical event • There is not much analysis-it only involves finding times for competitors • It needs more of a problem or comparison such as "Who are the fastest-males or females"?

Phase 3Expert Feedback
Feedback provided by expert: You need to phrase the question so that two groups are identified. However, if your intent is to compare males and females, you need to consider that, given the different physical physiques of males and females, we already know the answer. So, if comparing male and female Olympic speeds-is it a problem if we already know the answer?
# Final revised question [designed by students] Are men completing the Olympic 100 m sprint faster in the last 10 years than in the previous 10 years?
approaches that may serve as valuable tools for teachers who want to design teaching-learning arrangements around the PPDAC cycle to develop statistical questions. Also, researchers investigating the development of reasoning when generating statistical questions might find one or more of the three approaches useful when either setting up a framework to assess student reasoning when generating statistical questions or when providing feedback on statistical questions.