Students' approaches to exploring relationships between categorical variables

In the context of an afterschool program in which students explore relatively large authentic datasets, we investigated how 11‐ to 14‐year old students worked with categorical variables. During the program, students learned to use the Common Online Data Analysis Platform (CODAP), a statistical analysis platform specifically designed for middle and high school students, to create and interpret graphs. Following the program, we conducted individual clinical interviews, during which students used CODAP to answer questions about relationships between variables. Here, we describe how students engaged in exploratory data analysis that involved looking at relationships between two categorical variables. Students worked from data in table form and created “contingency graphs,” a variant of contingency tables, which they used to analyze and draw insights from the data. Our research identified four strategies that students used to examine the data in order to explore patterns, make comparisons, and answer questions with the data.


| INTRODUCTION
In this paper, we describe how middle school students examine relationships between categorical variables, especially variables that have five or more levels. Our interest in this topic arose naturally from our work developing materials to introduce students to data science. Because we use datasets around topics of interest to students aged 11 to 14 (middle school in the US), the statistical issues we focus on emerge from the interaction between students and the data and are somewhat unpredictable. We start with the datasets themselves, rather than the statistical concepts we wish to teach. This means we are forced to work with statistical issues that arise from the datasets, including the complexities of the categories the original investigators chose to use. The datasets at the core of one of our curriculum modules ("Injuries on and off the Field") include mainly categorical variables, which is common in many large, publicly available datasets, so the question of how to support students in reasoning about relationships among categorical variables becomes salient.
As we describe in our literature review below, most of the research on reasoning about relationships among categorical variables deals with variables that have two levels (often yes/no or male/female) or possibly three. By contrast, some of the categorical variables in the datasets we use have 10 or more possible values, which makes different and more complex kinds of data-based reasoning possible. This situation led us to examine how students work with this complexity and explore the strategies they employ to make sense of the data.

| STATISTICAL BACKGROUND AND TERMS
Key to the understanding of relationships between categorical variables are the notion of a "conditioning variable" and the concept of conditional probability. Consider Figure 1 below, which was used by Budgett and Puloka [1] in their study of both experts' and students' approaches to contingency table representations of the relationships between categorical variables. There are two possible kinds of questions one could investigate with these data, corresponding to two different choices of conditioning variable: (1) Choosing "gender" as the conditioning variable (focusing on the columns) allows one to answer questions such as "Is there a difference between boys' and girls' choices of the source of their lunch?" and "Are boys more likely to bring their lunch from home or buy it?"; (2) Choosing "lunch from" as the conditioning variable (focusing on the rows) allows one to answer questions such as "Is there a difference in gender distribution between those who bring lunch from home and those who buy lunch?" and "Is a tuck shop customer more likely to be a boy or a girl?" Note that each of these choices of conditioning variable implies a different way of calculating conditional probability. If "gender" is the conditioning variable, the denominator of the conditional probability is the total number of students of a particular gender (10 boys or 15 girls). If "lunch from" is the conditioning variable, the denominator is the total number of students who get their lunch from each source (14 for the tuck shop or 11 for home). Also note that while the questions that can be answered with each choice of conditioning variable are different, the differences are subtle. In addition, some questions do not specify a conditioning variable (eg, "Is there a relationship between gender and where the students get lunch?") and others are open to multiple interpretations, even by statistical experts (eg, "Who is more likely to get lunch from a tuck shop, boys or girls?"). Budgett and Puloka's research is described in more detail in the literature review below.
It is also important to note that among more traditional treatments of two-way contingency tables, the chisquare test of independence is used to determine how likely it is that a relationship between two categorical variables is due to chance. A conditioning variable is generally not specified in this context; rather, the test answers whether there is a statistically significant relationship between two categorical variables.

| LITERATURE REVIEW
According to the revised Guidelines for Assessment and Instruction in Statistics Education (GAISE II) framework, middle school students should learn to work with categorical data and should become "comfortable describing the manner in which [categorical] data are organized in two-way [contingency] tables as well as noticing the benefits a visual representation can provide" (p. 52) [2]. More generally, GAISE II stipulates that students in grades K12 should have opportunities to explore patterns of association between two categorical variables.
Most research on students' understanding of relationships between categorical variables presents students with contingency tables, not the individual cases used to construct the table. There has been significant research on how students understand contingency tables, beginning with Inhelder and Piaget's research with adolescents showing that understanding associations in a table requires understanding proportionality, probability, and combinatorics [3]. More recent research continues to shed light on how people make sense of two-way contingency tables and also examines how people think about covariation more generally; this process is referred to by Garfield and Ben-Zvi as "covariational reasoning" [4]. Garfield and Ben-Zvi identify several significant challenges people face in reasoning about covariation, including students' tendencies to: let prior beliefs influence their reasoning; focus on certain cells in contingency tables more than others; have greater difficulty understanding inverse/negative correlations (as opposed to positive ones); and draw causal inferences where none may exist.
Context and beliefs also matter when interpreting contingency tables. Beginning in the 1990s, a number of studies have employed a contingency table where one variable is smoking (yes/no) and the other is lung disease (yes/no). Typically, the number of smokers in the table is larger than the number of nonsmokers, but the proportion of each group that has lung disease is identical, so there is no covariation between the two variables. Watson and Callingham developed a rubric to show levels of students' thinking about these data and concluded that students' knowledge of the dangers of smoking interfered with making a correct conclusion about the lack of association in the table [5].
A considerable amount of research has been devoted to understanding how students utilize additive vs multiplicative reasoning (or proportional reasoning more generally) in the context of contingency tables. In a study with young students (7-10 years old), most employed additive reasoning when using contingency tables to compare the number of chips of different colors in unequal-sized bags [6]. About 14% of the older students were able to use multiplicative or proportional reasoning and explain how they did so. Students had a hard time verbalizing their proportional reasoning, though it was easier for them to do so when there were simple ratios in the table. Saffran and colleagues found that students who examined a series of two-by-two contingency tables were better able to use proportional reasoning to explain correct conclusions that had been provided to them, compared to using proportionality to justify their own conclusions [7]. The authors found that explanations involving ratios were rare and that part of the reason may be due to limited working memory.
Recent research has examined the challenges described above in a more nuanced way. Natural language can be ambiguous about which variable is being focused on in a contingency table [8]. Decisions have to be made about what constitutes the whole when computing percentages. The decision about how to think about the whole-or in more formal statistical language, select your conditioning variable-determines the kinds of questions one can answer. For example, Budgett and Puloka [1] posed this question to both students and experts: "Who is more likely to get lunch from a tuck shop, boys or girls?" and showed them the contingency table in Figure 1 above.
Most experts treated gender as the conditioning variable and concluded that boys were more likely to get lunch from a tuck shop, as 60% of boys but only 53% of girls got lunch there. However, one statistician concluded that girls were more likely to get lunch from a tuck shop because 8 of the 14 tuck shop customers (57%) were girls and only 43% of them were boys. Articulated more precisely, this statistician was answering the following question: "Is a customer of a tuck shop more likely to be a boy or a girl?" Clearly, the decision about what constitutes the whole matters.
The ways in which technology may support students' reasoning about relationships between categorical variables remain underexplored. We do know that technology can support students' understanding of covariation more broadly. Much of the research in support of this claim focuses on how students investigate the relationship between numeric variables and employs technology that preceded the development of CODAP: The examples included in Biehler et al.'s overview [9] are from Cobb's mini-tools [10] and TinkerPlots [11]. Part of our interest in focusing on relationships between categorical variables is curiosity about how CODAP might support student reasoning about categorical variables, as well as understanding how students' naive approaches might serve as scaffolds for more sophisticated statistical reasoning.
In sum, the literature shows that it is quite challenging for people of all ages and levels of mathematical experience to reason about relationships between categorical variables. Choosing one vs another variable as the conditioning variable changes the questions that can be answered, but this is a subtle realization. For example, a person may start a sentence with "Boys are more likely to play baseball than…" but fail to finish the sentence with the object of the comparison-is it "girls" or "soccer"? Some data visualizations may highlight a particular variable as the conditioning variable and thus switch one's conclusions about relationships in ways that are analogous to the figure/ground shifts in perception relating to visual illusions like the one involving either a vase or two faces. Radford discusses how visual perception is intertwined with mathematical conceptualization, postulating that the learner's eye must become "domesticated" to look in the right places. In the case of categorical data, students must make decisions about how to look at the data-by cells, by rows, or by columns-as they make different kinds of comparisons [12].

| CONCEPTUAL CHALLENGES FOR STUDENTS
Based on the literature review, we have focused on three conceptual challenges that students confront in analyzing relationships between categorical variables. These appear in our analysis of students' interviews, and we return to them in the discussion.

• Coordinating counts and percentages
Most reasoning about relationships between categorical variables requires thinking in terms of percentages, whether they be "row" or "column" percentages. This multiplicative/proportional reasoning can be difficult for people of all ages. But the absolute numbers of cases also make a difference, as rows (or columns) with small numbers of cases provide less evidence than those with larger numbers.
• Matching the conditioning variable to the question being asked Different questions require different choices of conditioning variables (and, thus, of "row" or "column" percentages). The correspondence between a question and conditioning variable can be confusing.

• Taking multiple levels of both variables into account at once
It is simpler to make a statement about one of two categorical variables than to make a statement that takes both into account at once (and thus involves all cells in a contingency graph).

| THE CONTEXT: DATA CLUBS
The Data Clubs project [13], an NSF-funded collaborative research project, is based in both urban and rural areas in Massachusetts and Maine. The project's goal is to introduce data science to middle school students in outof-school settings (afterschool programs and summer camps), with a focus on students historically underrepresented in STEM (ie, students of color, girls, and rural students). We partnered with community organizations to help us reach these students, including nonprofit organizations, school districts' afterschool programs, and summer camps.
Each of the three modules focuses on a topic (as opposed to particular statistical techniques) and includes multiple publicly available datasets that our team has curated and modified to make them suitable for students to explore. All of the students whom we report upon below had participated in the "Injuries on and off the Field" module, using data from the National Health and Nutrition Examination Survey (NHANES) and the National Health Interview Survey (NHIS) [14,15]. Participating in the seven-hour module included considerable experience using CODAP, and none of the students had prior experience with CODAP.

| AFFORDANCES OF CODAP
Students participating in our modules used CODAP [16] to manipulate data and create visual representations, so CODAP's capabilities were instrumental to their work with categorical data. Data were initially presented to students in a case table in which each row was a case and each column a variable. This allowed students to see the entire case, including all of the variables. Creating a graph in CODAP is simply accomplished by dragging each variable name from the table to a graph axis. A graph with a categorical variable on each axis contains a point for each case, arranged in a grid according to the values for both variables (see Figure 2 for an example). While this representation is isomorphic to a contingency table, it differs in that each case is individually represented; we call these representations "contingency graphs." All representations in CODAP are linked, so the points in a contingency graph are linked to the cases in a table, and clicking on a point highlights its corresponding row in the table; in this way, all of the information about any particular case is easily available in the course of the analysis.  Our primary research question was: "What range of strategies did students use in CODAP to explore the injuries dataset as they investigated the relationship between injury type (10 levels) and whether an injury resulted in a trip to the emergency room (2 levels)?" Our research sheds light both on the affordances of CODAP and on students' natural ways of reasoning about relationships between categorical variables.
In this paper, we consider data from eight students who participated in the injuries module of an afterschool program in a suburban town in southern Maine. Each of the eight students participated in a 35 to 45 min one-onone virtual interview after the module was over. Within the interview, students engaged in several data investigations in CODAP. The segment of the interview we used as the basis for our analysis was the final section; prior to this part, students worked with a different dataset (extracted from the Census at School dataset). If students could not remember how to create graphs or make other moves in CODAP, we reminded them. In the results reported below, pseudonyms are used to refer to individual students.
In the portion of the interview we analyzed, students investigated variables found in the NHIS dataset, which was familiar to them from the Injuries module. This dataset provides a sample of 868 cases of injuries requiring medical attention taken from the 2017 NHIS, with 11 variables, nine of which are categorical (to view the dataset in CODAP, visit https://bit.ly/nhis2017). Each case represents one injury, not one person (individuals can be represented multiple times if they had multiple injuries). Eight variables provide information on the nature of the injury ("Month of injury," "Main cause of injury," "Main body part hurt," "Other body parts hurt?," "Type of injury," "Went to an Emergency Department?," "Activity at time of injury," and "Location at time of injury"). The remaining three variables provide demographic information about the patient ("Age," "Age group," and "Gender"). Of the nine categorical variables, four have two levels, one has five levels, and four have 10 levels or more. Students were first asked to explore a relationship between any two variables that they found interesting. If they had not already answered it, they were then asked a standard question: "How would you investigate the relationship between the injury type and whether the injury resulted in a trip to the emergency room (ER), using the tools in CODAP?" This question purposely did not specify a conditioning variable so that students could frame the question and explore it for themselves. Students constructed their own graphs to address the standard question as well as additional questions that emerged for them as they explored the data.
If students did not use the count or percentage features of CODAP, they were prompted to do so in the later part of the interview. Video recordings of the interviews were transcribed, including all verbal interactions as well as the work students did in CODAP to make graphs, compute summary information (such as counts or percentages), manipulate the dataset, and point out comparisons with their cursors.

| ANALYSIS
We analyzed the student data as follows: First, we extracted and cleaned the interview transcripts and inserted graphs made by the students within the relevant parts of the transcript. Collectively, the research team read the transcripts, identified the segments where F I G U R E 9 Ellie's second graph, with row percentages.
[Colour figure can be viewed at wileyonlinelibrary.com] students were exploring the focal relationship between two categorical variables, and wrote thematic notes describing the ways students were interacting with the data. During this phase, the unit of analysis was an analytic move or set of moves made by the student that ended in a statement expressing an insight or inference drawn from or connected to the students' manipulation of the data.
The research team then came together to discuss excerpts from transcripts selected to represent the range of student approaches and began to identify strategies that were repeatedly captured in the thematic notes. As we worked through the excerpts, we began to converge on a set of codes that captured important similarities and differences in the students' analytic approaches. Individual researchers were then independently assigned to code a small number of transcripts to test the reliability of the codes and consider whether there were strategies that fell outside the codes. Coming back together, we examined excerpts that tested distinctions between our codes, requiring further fine-tuning and clarification. We found no evidence of strategies that fell outside the codes. Finally, we recorded the full corpus of interviews and identified examples of each type of strategy.

| RESULTS
To reiterate, we focus on this research question: What range of strategies did students use in CODAP to explore the injuries dataset as they investigated the relationship between injury type (10 levels) and whether an injury resulted in a trip to the emergency room (2 levels)? With little or no prompting, all the students we interviewed at some point made a graph similar to the one in Figure 2 by dragging one of the variables onto the horizontal axis and the other onto the vertical axis.
Students typically spent 9 to 11 min engaging with the investigation by making and examining their contingency graphs, asking additional questions, using the count and percentage tools, and speculating about the possible reasons for patterns they were seeing in the data. Our analysis identified four core strategies that students used in their analysis of the contingency graphs they had created. Over half of the students used at least two of these strategies, depending on what they were drawn to in the graph and the relationships they wanted to explore. Some went back and forth between different strategies. Typically, students gained different insights from different strategies. Below, we present the four core strategies. 9.1 | Strategy 1: Zoom in on a case (Josh) During the course of their analysis, some students examined individual cases by mousing over individual points on the graph. CODAP shows partial information about the case (the values of the two variables in the graph, plus one additional variable), as shown in Josh's graph in Figure 3 below.
In this example, Josh mouses over points on the graph for "cuts" that did not go to the emergency room, highlighting individual cases, and saying, for example: "Well the 85 year old [who was cut] did not even go to the emergency room." Josh does the same thing for each of the four cases in the burn category, noting with concern that the case that did not go to the ER is a one-yearold baby. (The interviewer and Josh discussed possible reasons why a one-year old with a burn would not go to the ER, allaying his concern.) 9.2 | Strategy 2: Zoom in on a cell (Lauren) Some students were immediately drawn to the cell with the highest number of cases. For example, Lauren looks at the entire graph shown below and says: "I am seeing a lot of people did not go to the emergency room for a 'Sprain, strain, or twist.' That is by far my biggest number, so my eyes are drawn to it right away." Note that Lauren does not compare the focal cell to any others in its row or column, except to note that it is larger than any other cell in the whole graph.
While most students employing this strategy were drawn to the cell with the most cases, some of them also noticed cells with very few or no cases. For example, after noticing the largest cell, Lauren also looks at the "Do not know" category, noticing that there are no cases in one of the "Do not know" cells and only three in the other. Then, she questions how a person could "not know" what kind of injury they had.
If students used the percentage tools as part of this strategy, they chose "cell" percentages, consistent with focusing on a single cell in the contingency graph. Later in her analysis, Lauren adds both counts and cell percentages to her graph and finds that 27% of cases in the entire sample were in the category of interest (ie, a "Sprain, strain, or twist" that did not go to the emergency room). Figure 4 below shows the contingency graph after Lauren added both counts and cell percentages.
Note that neither Strategy 1 nor 2 actually helps answer the question about a relationship between injury type and whether or not someone went to the ER. They are, rather, ways of orienting to the dataset, often in preparation for using strategies 3 and/or 4.

| Strategy 3: Collapse or filter to focus on a one-dimensional distribution (Tim and Camila)
This strategy involves focusing on only one variable (either injury type or whether or not the person went to the ER) and looking at the distribution of cases across levels of that variable. Students achieved this focus in a variety of ways, sometimes by actually manipulating the graph in CODAP, but more often by focusing their attention on only one variable. The two students described below took different approaches but had a similar analytical goal: to focus on the distribution of only one of the two variables in question.
In the graph in Figure 5 below, Tim focuses on just one category of injury; he does not use the counts at first, but makes a rough estimate from looking at the data, saying: "The 'Sprain, strain, or twist' has definitely the most-there are like 300 [cases] there if you combine those two [gesturing that he is conceptually combining the 'Yes' and 'No' cases and comparing across injury type]." Tim combined yes/no, essentially collapsing the ER variable. This allowed him to note the injury type that yielded the most cases in the entire sample, regardless of ER outcome. In using this strategy, Tim is answering the question: "What is the most common injury type (whether or not the person ended up in the ER)?" Related to this strategy is a common move that many students made: to focus on a single injury type (a single level of the "Type of injury" variable) and note the distribution of cases that went to the ER or not. For example, another student, Camila, focuses on the "Sprain, strain, or twist" category, and says, "73% of all 'Sprain, strain, or twist.' They did not go to the emergency department… and 27% did." Note that Camila refers to percentages, not counts, so she has chosen to add row percentages to her graph shown in Figure 6 below (note that Camila's axes are reversed in comparison to Tim's graph in Figure 5 above).

| Strategy 4: Coordinating both variables (Zeke)
This strategy takes both variables into account by conditioning on one variable and then looking at the distribution of cases across levels of the other. While it is not a "formal" method, it comes closest to canonical methods in its consideration of all levels of both variables and its incorporation of proportional reasoning. In the graph in Figure 7 below, Zeke has added column percentages to his contingency graph, using "Type of injury" as his conditioning variable. He then examines how the data for different injury types are distributed between going to the ER or not. In looking across all of the levels of "Type of injury," Zeke notes a similar pattern for the proportion of cases going to the ER in almost all of the injury categories. As part of his attempt to discern a pattern across injury types, he describes percentages for "Yes" and "No" that are in the 40% to 60% range as roughly "even." After working through a column by column analysis, he says: "Most of these categories [injury types], they all kind of have the similar proportions of 'Yes' and 'No' [for going to the emergency room], around 40% to 60%. There are a few outliers, I think, especially with sprain and maybe cuts as well." While Zeke does not state his conclusion explicitly, his analysis suggests that he would say there is not a relationship between injury type and whether or not the person went to the ER, since for most injuries the percentages for both categories are "similar." This comparison across multiple rows or columns (as opposed to focusing on the distribution within a single row or column) differentiates this strategy from Strategy 3.

| Case study: Using multiple strategies (Ellie)
Unsurprisingly, in an extended investigation of data, students generally do not use only one strategy. Questions that arise from the application of one strategy often prompt a switch to another. In order to demonstrate this complexity, we provide a detailed description of one student's (Ellie's) work as she considered the question about the relationship between injury type and whether the injured person ended up in the emergency room or not. In this case, Ellie uses two of the four strategies described above as well as several different CODAP tools (counts, row percentages, and column percentages) to examine the relationship. She spends 9 min exploring the data, starting with the interviewer asking her if she can make a prediction about the relationship. Ellie begins by looking at the "Type of injury" column in the raw data table and predicts, based on her knowledge of injury types: "I think 'Broken bone or fracture' and 'Sprain' is going to be more likely to go to the emergency room than 'Cut' or 'Other.'" The interviewer then asks Ellie to make a graph in CODAP to see if her prediction is valid. She makes the graph shown in Figure 8 below, adding cell counts.
Ellie examines the graph and says "It looks like… more people with broken bones went to the emergency room, but not that much… Yeah, 94 and 76" (pointing to the numbers of people with broken bones who went to the ER vs did not go to the ER). While Ellie is comparing within an injury type, her earlier prediction was about the proportion of people who went to the ER across injury types. The interviewer asks for clarification: Interviewer: "What is the comparison there?" Ellie: "More people with broken bones went to the emergency room than… more people went to the emergency room when they had a broken bone." Interviewer: "More than what?" Ellie: "Than others?" Interviewer: "This is tricky language. I mean you could say more people went to the emergency room with broken bones than with animal bites, or you could say more people went to the emergency room with broken bones than did not go. I am curious which way you are thinking about it." Ellie: "I think when someone had a broken bone, they were more likely to go to the emergency room, or most of them went to the emergency room." Ellie's final statement provides some evidence that she is comparing "went to ER" vs "did not go to ER" in the broken bones column, an application of Strategy 3. But the interchange also illustrates the ambiguity of many of the statements the students made.
The interviewer then asks Ellie to look at some other categories of injuries. She keeps the graph with counts open (Figure 8 above) and looks at the "Sprain, strain, or twist" category.
Interviewer: "So what do you see there?" Ellie: "Most of the people with a sprain or twist did not go to the emergency room." Ellie continues to make a within-injury comparison, as she had with broken bones. As long as she is looking at one single injury type at a time, using counts rather than percentages is sufficient. But the interviewer wants to see how Ellie might use percentages to extend her analysis. The interviewer asks Ellie if there is any way that she could use the percentage tool on the graph to help make the comparison. In response, Ellie turns off the count feature, briefly clicks on the column percentage option, then changes her mind and clicks on the row percentage option, as seen in Figure 9 below.
Using the percentages in this way effectively conditions on went to the ER/did not go to the ER rather than on injury type, so Ellie cannot use this graph to make a statement about which kinds of injuries were most likely to go to the ER vs not. At first she tries to do so, saying: "I can see for each type of injury what percentage… did go to the emergency room and how much did not go to the emergency room." But then Ellie switches her perspective and makes several observations about the went to the ER row, saying: "Most of the people who go to the emergency room have a broken bone or sprain or cut." She then adds: "27% of the people who go to the emergency room have a broken bone or fracture," pointing out the modal cell in the row. These are all examples of Strategy 3.
The interviewer realizes that the row percentages do not allow Ellie to answer her original question, so she prompts her: "What happens if you look at the percentages the other way?" Ellie changes the row percentages to column percentages, producing the graph in Figure 10 below.
Ellie now moves from one column to the next, saying: "I can see for sprains, most of the people who get a sprain or a twist, they do not go to the emergency room. And the same with bruises." Ellie concludes: "For most injuries, except for cut and broken bones, when people have an injury they normally don't go to the emergency room." While the observations of individual columns are examples of Strategy 3, Ellie's summary of the general pattern across injury types is an example of Strategy 4 and comes closest to a "canonical" view of the data that considers all levels of both variables. Her final comment can be seen as a response to the original question: "Is there a relationship between injury type and whether or not the person went to the ER?"

| Overall trends
Ellie was not unique. Most of the students we interviewed spent significant time engaged in exploring the relationship between injury type and whether or not a case went to the emergency room; over half of them used two or more strategies. For example, a student might start by noticing a cell with the largest number of injuries (Strategy 2). Then, they might use CODAP's percentage tool to compare the cells representing the proportion of fractures that resulted in a trip to the emergency room vs those that did not (Strategy 3). At that point, they might move on to another injury, complete the same process, and then compare that injury with their initial analysis of fractures.
Overall, we saw three out of the eight students using Strategy 4; of these three students, all also used Strategy 3, and two of them also used Strategy 2. Every student used a version of Strategy 3 at some point in their analysis, and five students used only Strategy 3 (although several different versions of it). Strategy 1 was the least common in our analysis (but see Section 10.4 for additional observations on this point).

| DISCUSSION AND CONCLUSIONS
We start by noting that the types of reasoning students used in our study are in many ways different from the reasoning reflected in the literature on contingency tables. While contingency tables used in the literature are usually two-by-two tables, students in our study dealt with a variable that has 10 levels ("Type of injury"). Students in our study initially saw the data as a case table with multiple variables beyond the two that they eventually examined in their contingency graph, and they had the ability to refer back to these additional details during their analysis. Finally, students in our study were using CODAP, which allowed them to create and modify their own visualizations, use tools such as "count" and "percent," and link back to the table of individual cases.
We now return to the conceptual challenges we identified in the introduction to reflect on how students' strategies indicate progress with respect to these challenges and to draw some implications for pedagogical approaches. We also draw some preliminary connections to the seminal Konold et al. paper [17] on students' lenses on data distributions.

| Coordinating counts and percentages
Both strategies 3 and 4 require students to attend to the relative number of cases in a row or column, so they both indicate students' ability to engage in proportional reasoning. In general, we noted that whichever strategy students used, they were able to switch back and forth between counts and percentages (with different choices of cell, row, or column percentages) either on their own initiative or in response to an interviewer's question. The cell sizes in the contingency graphs had a large range, with one cell having over 230 cases and others having none or fewer than five, which forced students to consider how much "weight" to give to the patterns they saw in cells of different sizes. Several students noticed the very low number of burns (four), but also noted that burns had the highest percentage of cases going to the ER (75%). This combination prompted one student, Jon, to comment: "For certain ones like 'Burn' there is 25% [that did not go to the ER] vs 75% [that did go to the ER] which makes you think that there is a lot on this side [pointing to the did go to the ER side] and then not so much on this side [pointing to the did not go to the ER side] when there is really only three [cases] vs one [case, for did/did not go to the ER]." He disregards the high percentage in the category involving burns that were seen in the ER, arguing that the counts are too small for the percentages to be very useful.
On the other hand, in some categories the counts were high, but the percentages told a different story. For example, a high number (86) of the sprains, strains, and twists went to the ER, but that number is relatively small compared to the number that did not go (235). During the course of the interview, some students F I G U R E 1 0 Ellie's third graph, with column percentages.
[Colour figure can be viewed at wileyonlinelibrary.com] noticed this difference between using counts and percentages with respect to sprains, strains, and twists, and in some cases, it prompted them to decide that percentages told a better story than counts about injuries that went to the ER vs those that did not.

| Matching the conditioning variable to the question being asked
This aspect of dealing with relationships between categorical variables appeared to give many students trouble. As seen in Ellie's case study, it is easy to use the CODAP percentage tool in a way that does not match the question being asked. Ellie was interested in whether people were more likely to go to the ER with a particular kind of injury, but her choice of "row" percentages did not provide that information. Rather, it showed her which injuries were more common in the ER. While the choice of "row" or "column" percentages was confusing for many students (and sometimes for the interviewer), there are ways in which it was also beneficial. Once they saw the percentages in the graph, many students realized that the choice of "row" or "column" actually gave them different information. While we did not follow up on this realization by formalizing the idea of a conditioning variable, the students were likely primed by their experience to grasp this difficult notion when they encountered it again.

| Taking multiple levels of both variables into account at once
Using Strategy 4 required students to take multiple levels of both variables into account at the same time. In our example, it was easy to take both "Yes" and "No" for "Went to an Emergency Department" into account, and students often did that for just one or two of the levels of "Type of injury." The move from Strategy 3 to 4 is accomplished by looking across multiple levels of "Type of injury" and noting similarities or differences in the percentage of injuries that ended up in the ER. While only three out of eight students made this move, it was encouraging to see that the combination of an interesting dataset, a powerful tool, and a probing interview was enough for several students to accomplish this multifaceted coordination. Notably, no student actually made a statement like: "There is/isn't a relationship between injury type and whether the person went to the ER," so there is still a gap between what students are noticing and the canonical response to this question.

| Additional observations
We present here some additional observations that go beyond our analysis of the focal data, but that suggest additional findings and future research directions. Some of these observations come from other parts of the interview, which we did not fully analyze for this paper.
While only one student used Strategy 1, zooming in on a case, in the part of the interview we analyzed in detail, a retrospective examination of the earlier part of the interview uncovered many more instances of this strategy. In that portion of the interview, students explored a dataset in which the cases were students and two of the variables were age and how students traveled to school (by car, bus, bicycle, etc.). It is possible that students used this strategy more frequently earlier in the interview because they were reacquainting themselves with CODAP. Another possibility is that they were more curious about individual cases in the earlier dataset since the cases were students like them. For example, one student commented as he focused on one case: "That one takes the subway and they're 16, because if you're 10 your parent wouldn't trust you [to take the subway by yourself to school]." Some of the cases were particularly intriguingsuch as a student who took a boat to school-and as such also seemed to draw students' attention. Future research might probe when and how students return to looking at individual cases in an analysis of a large dataset.
Even when they were not focusing on an individual case, students drew on their personal experience and real-world knowledge when examining the data. When we asked students to make a prediction about which kinds of cases would be more likely to go to the emergency room, they might bring up an experience they or a family member had with an injury and reflect on whether or not they went to the emergency room. In general, though, students were readily able to switch their attention back to the bigger picture, especially when reminded of the question they were investigating.
We were only able to complete a partial analysis of the ways in which students' investigations unfolded and how they moved from focusing on a variable of their own choosing to the question we wanted them to investigate. Our interest in this study was to compare students' approaches to a central question, so we did not give them much time to explore the dataset in depth before introducing the focal question. Future research could profitably give students more time to explore a particular dataset in more depth, possibly without the introduction of a prompted question.
In reviewing these results, we see some connections to the framework from Konold et al. [17]. Our first strategy-zooming in on a case-is similar to their "case" lens, in which a student is primarily interested in the details of a single case. The distinction between "classifier" and "aggregate" thinking in Konold and colleagues' article has some aspects in common with students' choice to use percentages rather than counts in order to see a group of cases as a part of the whole (rather than in isolation). Future research could find additional parallels with the lenses identified in that paper.

| Pedagogical implications
What might these insights mean for how we introduce students to thinking about categorical variables? First, our approach starts with the actual data for each case. Students are free to examine individual cases and to create their own contingency graphs using the variables that interest them. They can choose to find counts or percentages, and they need to decide which kind of percentages to add to their graph-by cell, row, or column. Our research shows that there are many ways that students can extract different insights from the data. By working with a dynamic platform (CODAP) that allowed students to manipulate the data in different ways, we believe they were encouraged to think about how to group the data and how to draw inferences that reflected the grouping they chose. Students do not usually encounter this kind of analysis until high school, but our research suggests that it is both possible and beneficial for younger students to be engaged in the examination of relationships between complex categorical variables.
Pedagogically, the starting point for introducing categorical data depends on the instructional goal. If one focuses singularly on preparing students for later mastery of the chi-square test of independence, it might be useful to start with examining two-by-two tables. But there is much more to the analysis of categorical data than can be captured in small contingency tables and chi-square tests. As the field shifts toward a broader conception of "data science education" rather than the more narrowly defined "statistics education," we believe students need to encounter complex and genuine categorical data (with many levels), like the data that are actually being used in most STEM disciplines. We have found that students almost universally enjoy having the choice to investigate their own questions about relationships between variables in a dataset, including relationships among categorical variables with more than two levels. Such a task structure led to students engaging in the type of exploratory data analysis that would be recognizable to a data analyst. We conjecture that this type of exploration is an important foundation for a deeper, more formal understanding of large datasets with multiple categorical variables.