Teaching R in the undergraduate ecology classroom: approaches, lessons learned, and recommendations

Ecology requires training in data management and analysis. In this paper, we present data from the last 10 years demonstrating the increase in the use of R, an open-source programming environment, in ecology and its prevalence as a required skill in job descriptions. Because of its transparent and flexible nature, R is increasingly used for data management and analysis in the field of ecology. Consequently, job postings targeting candidates with a bachelor’s degree and a required knowledge of R have increased over the past ten years. We discuss our experiences teaching undergraduates R in two advanced ecology classes using different approaches. One approach, in a course with a field lab, focused on collecting, cleaning, and preparing data for analysis. The other approach, in a course without a field lab, focused on analyzing existing data sets and applying the results to content discussed in the lecture portion of the course. Our experiences determined that each approach had strengths and weaknesses. We recommend that above all, instructors of ecology and related subjects should be encouraged to include R in their coursework. Furthermore, instructors should be aware of the following: learning R is a separate skill from learning statistics; writing R assignments is a significant time sink for course preparation; and, there is a tradeoff between teaching R and teaching content. Determining how one’s course fits into the curriculum and identifying resources outside of the classroom for students’ continued practice will ensure that R training is successful and will extend beyond a one-semester course.

students to these modern approaches. Though there is potential framework in place to teach R, 79 there are still few publications that directly address the training of undergraduate ecology 80 students in R. 81 In this paper, we first argue that data analysis using R is important in ecology by 82 quantifying its increased use for statistical analysis in recent journal papers, as well as in job 83 advertisements that require R as a qualification. We then examine our two different approaches 84 to teaching students how to use R, in conjunction with RStudio, to analyze ecological datasets. 85 We identify advantages common to both approaches, strengths and weaknesses of each 86 approach, and lessons learned from our approaches. Finally, we make recommendations for 87 those wishing to integrate R into their ecology course design. From our approach, we endeavor 88 to begin an important conversation and to lay the framework for improving pedagogy in teaching 89 R to ecology undergraduates. To determine the degree to which R is used in the analysis of ecological data, we 94 examined the first two issues of Ecology published in 2008, 2013 and 2018. For each paper that 95 included data analysis, we recorded the statistical software used. For papers in which R was 96 used, we also noted any packages that were identified. To measure the degree to which skills in

115
There were important differences between the courses, as well (Table 1). Community

116
Ecology was organized as a twice-weekly seminar course (with no lab) that met for 1.5 hours 117 each class period, whereas FE met for one 1.5 hour class and one 5.5 hour extended lab period 118 each week. Approximately two-thirds of CE students had previous, though limited, experience 119 with R, whereas no students in FE had used R before. In CE, class time was divided between 120 lecture on community ecology concepts and theories, with some time dedicated to in-class 121 activities and discussion as well as assessments (weekly quizzes and one mid-term exam). In FE, 7 the 1.5 hour lecture periods were primarily used for lecture on forest ecology and for lessons in 123 R. We used the 5.5 hour lab sessions, which for the first 2.5 months of the semester were set 124 primarily in local forests, for gathering data on forest structure as well as discussing papers from 125 the primary literature. During the remainder of the semester, the lab period was used both for 126 lecture and for introducing and getting started on the two major data analysis problem sets 127 assigned during the semester.

128
Because of the distinct structures of our two courses, our approaches to teaching data 129 management and analysis with R were also different, though complementary. In CE, students 130 were given one assignment per week (for a total of eight assignments) ranging from 5-15 points 131 based on a specific task that was tied into the content of the lecture (example assignments from 132 both CE and FE are in Appendix S1). The task for these assignments required each student to use 133 RStudio to import and analyze a data set and answer questions, applying ecological knowledge.

134
For example, when we covered competition and niches in lecture, the R assignment for that week 135 focused on how to use the spaa package (Zhang 2016) to calculate niche overlap among 136 MacArthur's warblers (MacArthur 1958). As the students gained more experience with R, 137 assignments were designed to encourage them to recall how to do steps that they had done before 138 (e.g., import data from a .csv file) rather than explicitly instruct them each time. Therefore, each 139 student was expected to build on previous knowledge as they progressed through the 140 assignments.

141
Students used the desktop version of R Studio for their analyses after downloading it 142 independently to their computers. All assignments were written by the instructor or adapted from 143 multiple sources, including exercises from textbooks such as Gardener (2014). Students were 144 also assigned readings from GSWR and applied what they learned from those readings to write 8 their own R code for analysis of data relevant to our course. Students were expected to use R for 146 a final project in which they analyzed a dataset in three different ways: first, through appropriate 147 statistical analysis; second, through some form of visual analysis (either a graph or a map); and 148 third, through analysis of community structure (e.g., diversity or niche overlap). The instructor 149 assigned real datasets from the Ecological DataWiki (https://ecologicaldata.org/home) so 150 students could practice data management skills, such as selecting and formatting the data they 151 needed to do their analyses. Each student was required to meet with the instructor twice during 152 the semester; the first time to discuss the three analyses that they planned to do and the second  In FE, R and R Studio were presented early in the semester. During the first R lesson, to 158 motivate further R learning, students imported and worked with their own data, collected from a 159 local forest during the first lab period. Thereafter, we spent less class time devoted to R until the 160 last third of the semester, but each week, students completed 1-4 short "low stakes" R 161 assignments, each worth two points. We worked through most of the GSWR book (Chapters 1 -162 6 and 8, of 9 chapters). In each week's set of assignments, the first was to read the assigned 163 chapter of GSWR and submit an R script showing that the student had worked through the 164 material in the chapter. The later assignments during the same week asked students to use subsets 165 of the data they collected in the field to complete tasks similar to those covered in that week's 166 GSWR chapter. All homework assignments were submitted as R scripts via email to the 167 instructor. As in CE, toward the start of the semester, assignment instructions were more detailed 9 including, for example, lines of R code that students should use as well as hints. As the semester 169 progressed, assignments became less detailed so that students had to build on prior knowledge in 170 order to complete the assignments. At both mid-semester and at the end of the semester, students 171 completed a problem set based on analyzing aspects of their forest data. Each problem set listed 172 specific required end products (figures, analyses, etc.) that students were asked to produce 173 without any instruction, thus pushing students to gain more independence in managing data, 174 hypothesis testing, data visualization and writing R code. During the last third of the semester, 175 we devoted class or lab time to discuss application of statistical and ecological analyses (e.g. 176 community ordination) with R. The analysis workflow presented over the course of the semester 177 followed a similar focus to GSWR in how to approach a data set and statistical testing, 178 introduced in Chapter 4 (Table 2). In this workflow, the first three steps could be considered 179 "data management" rather than exploratory or statistical data analysis. instructions emphasized "cleaning" the data, e.g. looking for and correcting mistakes in data 183 entry, dealing with NAs, examining data for outliers, etc. In both of the problem set assignments, 184 students were asked to demonstrate, using their R code, that they had completed all 7 of the 185 GSWR steps in working with their data. This requirement reinforced the need for data 186 management, including cleaning and repair, prior to analysis, steps often left out of the 187 undergraduate curriculum.

190
Putting R in Context 10 We examined 56, 54 and 44 Ecology papers (154 total) from 2008, 2013 and 2018, 192 respectively. In 2008, only 59% of published papers indicated the software used for data 193 analysis, and only 12% of those papers (N = 4) indicated using R. Both indication of software 194 used for analysis and use of R for published data analyses increased dramatically over time, 195 reaching 92% and 80%, respectively, by 2018 ( Figure 1). Authors were also more detailed about  Table 3. associates. There were also two posts for workshops that required previous knowledge of R (one 11 was for distance sampling and not specific to learning more about R) and one post for an 215 internship that both required a bachelor's degree and knowledge of R. Figure 2 summarizes these 216 findings and shows an increasing requirement for undergraduates in ecology to be able to use R.

218
Approaches to teaching R 219 Students in FE completed 22 low-stakes R assignments and two larger problem sets in 220 which they had to apply their data management and R skills. Students in CE completed 8 weekly 221 R assignments and a large final project. By the end of the semester, students in both courses 222 were confident in their ability to independently import .csv files into R, install and load 223 packages, create and save R scripts, and create and save figures. Students in FE were able to 224 clean and repair datasets and look for outliers prior to analysis. In both classes, some of the 225 students were able to run a series of statistical tests independently, and others with assistance.

226
The list of R skills students developed and R packages students were exposed to are in Table 4. Our work has shown that R has become the standard tool for ecological data analysis.

230
Further, experience working with R has become a commonly required skill for post-231 baccalaureate employment and admission to graduate school. However, R has a fairly steep 232 learning curve as a scripted programming language; students with no background in 233 programming may find it more difficult to learn than they would a graphical-user-interface 234 driven software application. Thus, it is imperative that undergraduate programs in biology and 235 ecology begin teaching R to adequately prepare students for the next stages in their careers.

12
We have presented two different approaches to teaching R in the context of 237 undergraduate ecology courses. In FE, the primary emphasis was on collecting and managing 238 data, with limited statistical analysis, whereas in CE, the primary emphasis was on using R to 239 answer specific community ecology questions with already existing datasets. We found distinct 240 advantages to teaching R regardless of approach and found that there were distinct strengths and 241 weaknesses in each of these two approaches.

242
Overall, we found that the scientific thought process was reinforced as students made 243 observations from the dataset they were assigned (CE) or which they created (FE) and asked 244 questions they could answer in a stepwise fashion that was clearly traceable in their code. In both 245 courses, students were required to prepare their data for analysis by first fixing mistakes in the 246 data set, removing missing data points (NAs), finding outliers, and subsetting data sets to obtain 247 variables relevant to their question. With R, data management steps such as these can be 248 accomplished in a few lines of code, making them easy to include in teaching. R is far more 249 flexible than a spreadsheet in allowing students to conduct exploratory data analyses, to quickly 250 visualize data, and to look for outliers prior to statistical testing. Further, students can use these 251 visuals to actively predict how a statistical test might turn out, a good practice in scientific 252 thinking. By writing and commenting code, students are able to separate their scientific thoughts 253 from analytical steps, resulting in more clarity of thought regarding their analysis. The process 254 of writing and commenting code as part of an analysis also helps students learn practices in 255 reproducible research. Further, the use of comments and code by students simplified grading of 256 data analysis assignments and aided in troubleshooting problem areas. The community ecology (CE) course did not have a laboratory component. As a result, 260 students used published datasets that were freely available. By using these datasets in R, students 261 were given an opportunity to apply their understanding of theory and concepts they learned in 262 lecture, and to see expected patterns within the data. Based on student feedback, the most 263 valuable datasets in terms of student interest and learning were those taken from papers students 264 discussed in the course. In CE, the focus was on the application of R to specific community 265 ecology problems, and so students used specialized packages not covered in the GSWR book.

266
As far as student attitude and interest in having R introduced in this course, LAA found 267 that by explicitly stating that R would be included in the course description, students who were 268 enrolled expected that learning R would be part of the coursework. In addition, students were 269 encouraged to learn from their errors. For example, in the first lab assignment LAA included an 270 error log to give students a place to record common errors in their code that they could return to 271 later for reference. In-class student reviews about the material were generally positive about 272 learning R, and at least three of the students in the course used R in their senior-year capstone 273 projects.

274
The consequence of having a course without a lab is that students were limited to already 275 existing data sets. While this saved time so that more theoretical content was covered, students 276 did not have the experience of collecting their own data or practicing good data management for 277 each dataset. However for their final project, students were required to "clean up" a dataset and 278 subset variables, with minimal experience, before they moved on to data analysis. Another result 279 of focusing more on content was that LAA did not spend a lot of time in class going over R 280 assignments. Feedback was mostly limited to comments on assignments submitted via the 281 learning management system, or in one-on-one meetings during office hours. Finally, there are 14 few textbooks that incorporate R into theory in the field. Our textbook was purely conceptual and 283 most of the assignments were adapted by LAA. Therefore, there was a disconnect between 284 readings and hands-on assignments that may be better integrated with a text that uses R to work 285 through relevant community ecology problems.

287
Forest ecology strengths and weaknesses 288 As with CE, students knew from the outset of Forest Ecology that learning R would be a 289 focus of the course. Students used R to manage and analyze data they had collected themselves.

290
A benefit of this course design was that students had the opportunity to directly relate their field 291 observations to the data management and analysis process. When they saw, for example, that the 292 factor variable of "Tree Species" included 4 different versions of "sugar maple," they were able 293 to easily understand that the error was the result their own errors in data entry and not an abstract 294 problem. Further, because of their connection to the forests, there was a strong motivation for 295 learning R for data analysis to better understand the patterns and processes the students had been 296 observing in the field. This motivation was particularly helpful when the analysis being 297 performed introduced a new concept. For example, near the end of the semester we compared the 298 forests via ordination with the 'vegan' package. Ordination is a multivariate technique that is 299 generally not included in introductory statistics classes. Reducing multivariate data sets was thus 300 not familiar to these students. By the end of the semester, however, their familiarity with R 301 allowed us to focus less on the technical side of how to do the ordination, and more on the 302 conceptual side of how to understand what the results of the ordination meant. Familiarity with 303 the forests from which the data were collected allowed the students to consider the results of the 15 ordination relative to their personal experience with each forest, adding an element of 305 understanding.

306
The primary weakness of this course design was the loss of time devoted to conceptual 307 content. The extended field time meant less lecture time; dividing lecture time between content 308 and learning R meant that we covered less forest ecology content in less depth. Student reviews 309 were positive, both in terms of learning R and in terms of learning field skills; some students 310 observed a desire to have learned more course content. That five of the 11 students in FE opted 311 to enroll in a course to expand their R skills the following semester is testament to the fact that 312 students found value in their growing ability in R.

Recommendations to other ecology instructors 315
Because R is becoming increasingly more prevalent in the ecology field and 316 undergraduates with an R background will be better prepared for post-baccalaureate positions, 317 we first and foremost recommend that other ecology instructors use R in their courses when 318 conducting data analysis. We feel that, depending on your course goals, one could take either 319 approach we outlined for our courses and successfully incorporate R into the classroom.

320
Regardless of the approach taken, the following considerations should be made for a successful 321 experience: 322 1. Be aware that teaching R is different from teaching statistics. In our experience students 323 were weak in statistical skills and, prior to our courses, were unaware of the concepts of 324 reproducibility and documenting steps in data analysis. Students are able to learn R 325 without having a strong background in statistics, but may need some statistical practice in 326 addition to learning the programming language. Sarvary (2014) recommends an approach 16 in which R programming is taught alongside statistics early in a lab section, and then 328 students use both of these skills concurrently throughout the remaining coursework.  programming through senior-year research projects. We have also encouraged students to 18 attend a local R Users Group to learn and practice new skills. Furthermore, advising 373 students to take additional courses that are available, such as an advanced statistics or 374 ecological modelling course, may also be a way to continue their preparation. Both LAA 375 and ELB have found that having taught an R course, we have had students return with R-376 specific questions about independent research or for seeking advice in coursework to 377 expand their R knowledge. By incorporating R into the curriculum, we have increased the 378 institutional support for learning R and have served as resources for our current and 379 former students. In the last decade, R has become the de facto application for data analysis in ecology and 383 its use is increasingly required for post-baccalaureate students joining the ecology workforce or 384 pursuing graduate school. The National Science Foundation "Vision and Change" document 385 (AAAS 2011) identifies several core competencies for undergraduate biology education, three of 386 which can be developed through teaching R: ability to apply the process of science, ability to use 387 quantitative reasoning and ability to use modeling and simulation. As more and larger data sets 388 become available, and as there is a growing push for reproducible research, exposing students to 389 basic data management skills and basic programming in addition to statistics will become even 390 more important. We encourage those instructors who have not yet done so to consider adding 391 some instruction in R to their course designs.

19
The authors would like to thank the Saint Lawrence University Forest Ecology students   positions that are not specified as masters or doctoral positions, "Job" refers to a non-degree-513 seeking position, "MS" is a Masters Program, "Other" is anything requiring R experience not 514 included in the other categories (e.g. workshops), "PhD" is a doctoral program, and "Postdoc" is 515 a postdoctoral program.