The Data Mine model for accessible partnerships in data science

The Data Mine at Purdue University is a pioneering experiential learning community for undergraduate and graduate students of any background to learn data science. The first data‐intensive experience embedded in a large learning community, The Data Mine had nearly 1300 students in academic year (AY) 2022–2023 and nearly 1700 students for AY 2023–2024. The Data Mine embodies data‐infused education, research, and collaboration. Students learn Python, R, SQL, and shell‐scripting, while working on weekly projects within a high‐performance computing (HPC) cluster. In the Corporate Partners cohort, students work on teams of 5–15 students, led by a paid student team leader. Each cohort follows an Agile approach, working on data‐intensive projects provided by industry partners and mentored by company employees. Students develop professional and data skills throughout the academic year, from August through April. Many students return in subsequent years to the program, increasing their tenure with a Corporate Partner. Student teams are inherently interdisciplinary; students from 133 different majors are involved in the program, ranging from new incoming students through PhD level students. These interdisciplinary teams of students bring new perspectives to challenging problems in which data science is a key part of the solution. The interdisciplinary teams foster an environment of synthesis with ideas and solutions. Students come together with different life experiences, different levels of technical skill, but also varying ways they navigate paths to solutions because of the variety of majors represented, resulting in a more creative and robust solution than a traditional data science program.

In order to maintain a vibrant research culture in Computational Science, Data Science, and Statistics, it is imperative to have meaningful, rigorous, impactful experiences for students.Early research experiences for students are often precursors for careers related to these STEM fields and their broad applications across many domains.Such activities take many shapes, including (but not limited to) research experiences for undergraduates (REU) programs, capstone experiences, co-op programs, internships, research seminars, advanced coursework, and so on.In this article, The Data Mine gives an overview of one such comprehensive research experience, which originated at Purdue University but has now been expanded to several partner institutions across the USA.Due to the fact that The Data Mine has become more than just a program and is now emerging as a full-fledged model for student engagement, The Data Mine believes that a critical, in-depth look at the many aspects of this model is valuable.Jaiswal (2022) wrote a PhD thesis about this emerging model.There are also a handful of papers about the educational aspects of The Data Mine, but there is not (yet) a comprehensive survey about the technical aspects of The Data Mine or the extensive partnerships with industry.This WIRE review is intended to fill that gap.
The Data Mine is a living and learning community at Purdue University currently in its sixth year as of fall 2023.It has experienced an organic growth of students, experiential projects, and staff, and continues to expand locally and regionally as an emerging leader of experiential student learning in the data sciences.Students are recruited into The Data Mine during university recruiting events for high school students during the fall and spring.In addition, The Data Mine team actively visits various classes throughout the year to recruit existing Purdue students.They also meet with academic advising teams to further inform the university student advisors so the advisors can pass on the opportunities to their students.During many of their on-campus recruiting events, students often opine that they are planning to apply to Purdue (or did apply) specifically because they want to join The Data Mine program.Students from various majors are able to live together in the same residence hall, work on Data Mine projects together, attend the Seminar held in their residence hall dining court, and show up to in-person office hours in that same residence hall.Aside from the convenience this programmatic structure offers to its students, this community creates a space conducive to feelings of belonging and inclusiveness.
In fall 2023, nearly 1700 students, representing more than 130 majors, participated in The Data Mine.These students ranged from freshmen through Masters and PhD levels.The Data Mine equips learners from any major or program to develop data competency skills within their field of specialty.In addition, the diverse range of majors increases the students' ability to solve real-world challenges.The communal environment is conducive to student collaboration.The students can more easily share their ideas, perspectives, tools, and approaches while problem solving.Furthermore, these skills allow graduating students to differentiate themselves from their peers as they seek job opportunities.The Data Mine model empowers the students to excel early in their careers.Students in the program can make sense of large, complex data sets using model data analysis environments, tools, libraries, and methodologies.
In addition to the diverse majors represented by the students, The Data Mine is passionate about creating an environment that embraces diversity, inclusiveness, and accessibility.All students are welcome and encouraged to apply to the program, including students from marginalized communities, diverse racial and ethnic backgrounds, women, LGBTQA+ students, deaf and hard of hearing students, blind students, and students from lower socio-economic communities.The current applicant pool for The Data Mine at Purdue West Lafayette, including applicants to spring 2024 and/or fall 2024 is made up of 81% undergraduate students and 19% graduate students; 16% first generation students; 34% female students; 32% Asian, 3% Black or African American, 3% Hispanic/Latino, and 3% two or more races; and 9% of students listing a survey response indicative of a disability.The Data Mine is committed to ensuring a safe space for students from any background and ability level to further their competencies.Currently executing on a three-year grant from the National Science Foundation, The Data Mine additionally includes 300 students from dozens of Minority Serving Institutions (MSIs) throughout the United States.These students have been able to participate in the research coordinated with Corporate Partnerships started at The Data Mine's program at Purdue University.These students work on data-driven projects alongside the Purdue teammates, collaborating in a synchronous, real-time modality, using online meetings and collaborative environments.
Not only is The Data Mine an opportunity for students to learn and grow technically and professionally within their major, but it also represents a model within a large university setting that is fiscally sustainable, without the need for ongoing backstop funding.For example, by partnering with corporations who contribute nominal donations toward these student-led projects, The Data Mine can employ a team of staff to support the various goals and objectives within the program.These small, organic partnerships also have the potential to lead to larger Sponsored Research contracts.
Especially for companies that operate in the same regional economy as the university and have a vested interest in the development of students within the state of Indiana and the Manufacturing Belt region, the program creates a fiscally responsible pipeline for student talent into the job market.

| ABOUT THE DATA MINE
2.1 | Why (living) learning communities?
The benefits of peer learning, experiential learning, and residential learning models are well known and thoroughly studied.In particular, the concepts of learning and working in communities have been examined for more than 30 years (see Lave & Wenger, 1991, for one of the foundational studies about "Situated Learning").For example, situated learning researchers posit that the process of learning occurs as a component of social practices rather than through more traditional classroom-based processes (Lave & Wenger, 1991).Moreover, learning with one's peers in a communal environment bolsters the potential for persistence and increases one's self-efficacy in STEM disciplines.See, for instance, Margolis and Fisher (2002) which also considers, through a four-year longitudinal study, the impact of gender and race on persistence in the computational sciences.Gundlach and Ward (2021) and Ward (2015) also emphasize the benefits of learning communities in Statistics.Caviglia-Harris (2022) examines the positive effect of living learning communities on undergraduate student retention and GPA.In addition, Timms et al. (2018) researched how students experience a sense of well-being as a result of participating in learning communities.Furthermore, Pistilli (2009) studied how female STEM students who participated in learning communities experienced success as a result of participation.The increasing research and literature on learning communities evidences the importance of the concept and its impact on student learning outcomes.

| History of the statistics living learning community
The origins of The Data Mine can be traced back 15 years, to a series of workshops coordinated by Nolan and Temple Lang (2010).Dr. Mark Daniel Ward, Executive Director of The Data Mine at Purdue University and co-author of this article, attended their 2008 and 2009 workshops at UC-Berkeley.Many of the attendees of these early data science workshops have established innovative computational, data science, and/or statistical programs at their own institutions.These early workshops can certainly be considered as formative in the pathways that have given rise to the broad introduction of data science coursework and learning throughout the USA and beyond.
In 2012, Ward and his colleagues wrote a National Science Foundation grant proposal for a new model of data science learning.In this proposed new model, students would live together in a residence hall, while taking a full year of sophomore-level coursework, pursuing research with a faculty member over the same 12-month period, and learning with their peers in a seminar series that highlights the experiences of academic and industry professionals.This proposal was funded by the National Science Foundation's program on Mentoring through Critical Transition Points in the Mathematical Sciences (DMS-#1246818).As a result, the Statistics Living Learning Community (STAT-LLC), supported by a $1.5 million NSF grant titled "Sophomore Transitions: Bridges into a Statistics Major and a Big Data Research Experiences via Learning Communities" ran from fall 2014 to summer 2019.
The first cohort of 20 sophomore undergraduate students began their courses and research in August 2014.During the 5 years (2014-2015 through 2018-2019), 20 sophomores per year (for a total of 100 students throughout the life of the program) lived together in a residence hall to learn data science through experiential projects (3 credits), probability and mathematical foundations (3 credits) and work on large data-driven research problems with faculty mentors (stipend).The students received a 12-month research stipend to include full-time research during the summer after the academic year program.This experience was not limited to students majoring in statistics but welcomed all undergraduate sophomore students from any program at Purdue University.Of the 100 students during the five-year program, 49 majored in Statistics, often combined with Math or Actuarial Science.Students from the STAT-LLC program authored 177 journal papers and conference talks during their time in the living and learning community.In the final year of the program (2018-2019), Purdue institutionalized this program by introducing, in parallel with the 20 students from 2018 to 2019, a new program called The Data Mine.
Recognizing that not all students have the ability to include data science credits in their schedule, The Data Mine changed the number of credits for the data science course.Originally, students in the STAT-LLC took a 3-credit data science course in the fall semester.In The Data Mine model, students instead take a 1-credit data science course each semester (fall and spring).This added flexibility has several advantages: it allows students to join mid-way through the academic year; it fits more easily into a student's course load; and it creates a perception of a lighter and more accessible pathway into the data sciences, especially for students learning these topics for the very first time.The Data Mine offered this revised, experiential, project-based course (now known as The Data Mine "Seminar") to 100 students, at 1 credit-hour each, in the fall and spring semester.This 1 credit-hour Seminar remains an essential aspect of The Data Mine at present.All undergraduate students are required to take this Seminar.Graduate students are welcome too (but are not required to join).
Building off of a 1 credit-hour pilot offered in 2016-2017, this Seminar enrolled approximately 100 students in The Data Mine (in addition to the 20 STAT-LLC sophomores) during 2018-2019.The program grew rapidly, to 600 students in 2019-2020, 800 students in 2020-2021, 1000 students in 2021-2022, 1400 students in 2022-2023, and about 1700 students for 2023-2024.Students apply to participate in The Data Mine through a Qualtrics application that is processed by The Data Mine staff.The application contains a series of questions, including their cohort preference, career aspirations, and background experience.See Section 2.3 below for a breakdown of cohorts The Data Mine offers.The Data Mine staff review applications and assign students to cohorts based on availability and preference.

| General cohort/seminar
For students who have limited availability in their schedule, and therefore cannot join a research experience with a professor or with a Corporate Partner, The Data Mine provides a "General cohort," in which the students take the 1-credit Seminar mentioned above.The majority of students in this General cohort are undergraduate students.They are pursuing more than 130 major programs of study at Purdue.These students range from incoming new, first-year undergraduate students (who often have very little knowledge of data science) to senior undergraduate students who have a broad range of technical skills.Some seniors enter The Data Mine for the first time without knowing any data science; others have been in The Data Mine for the duration of their time at Purdue and are well into advanced data analytics and related technologies.With students from over 130 majors, the majority come from technical backgrounds like computer science, computer information technology, data science, statistics, actuarial science, or various disciplines within engineering.However, many of the students also come from other disciplines in the sciences and liberal arts, such as chemistry, biology, physics, pharmacy, education, English, history, and so on.See Figure 1 below for a picture of the Seminar course.
Students in the General cohort participate for the full academic year, and complete one Seminar project per week.In the first semester of The Data Mine, students focus on learning how to perform data analytics with R. The second semester they learn Python.In more advanced Data Mine Seminar courses, they will also use some bash and SQL.They learn how to scrape webpages, construct data pipelines, use APIs, build models, analyze text and images, and so on.A total of eight levels of Seminar are available.Students in this General cohort also participate in three professional development events each semester as part of their curriculum.These events include various seminars, workshops, and skilldevelopment sessions hosted by other organizations on the Purdue campus which are then provided to The Data Mine students to choose from and attend.
Students have a resource called The Examples Book, containing thousands of pages of learning materials and hundreds of videos, over a wide range of topics.A wealth of examples about Python, R, SQL, shell, data visualization, and so on are provided for the students.Beyond topics about tools and languages, the students also have modules about advanced topics.All faculty and staff in The Data Mine team can make contributions to The Examples Book.Students can request permission to have access to the underlying GitHub account too, so that they can add materials.
The students each receive 1 credit per semester for participating in the Seminar.There are no formal lectures or slides associated with the Seminar.Instead, the students work in a project-oriented way, analyzing large data sets, answering questions (which are often open-ended), building models, and trying new libraries and features of the languages and tools that are appropriate for analyzing large data sets.Students are welcome and encouraged to work together on their Seminar projects.The course is held in one of the residence hall dining courts seen in Figure 1.
Teaching assistants are available to support the students from morning until evening each day, onsite in the residence hall where many of the students live.When designing projects for the students to work on, The Data Mine staff consistently chooses data sets that are open source and are focused on topics of broad interest for students, for example, the Internet Movie Database, baseball, donations to federal election campaigns, transportation, alcohol consumption, death records, rental properties, and so on listed at https://the-examples-book.com/projects/data-sets/introduction.

| Corporate partners cohort
Students in the Corporate Partners cohort (3 credits in both fall and spring) work on data-driven projects in partnership with corporate partners for the full academic year (fall and spring semester).The project teams consist of 5-15 undergraduate and graduate students, a paid Teaching Assistant, and a Corporate Partner mentor.Undergraduate students in the Corporate Partners cohort are required to also take the 1-credit Seminar course simultaneously.Graduate students have the option of taking the Seminar course as well but it is not required.Students in the Corporate Partners cohort work approximately 8-10 h per week on their student-led projects, which includes a 50 min virtual synchronous team meeting with the Corporate Partner mentor, as well as a 2-h in-person lab with their project team.Additional details of the Corporate Partners cohort will be discussed in Section 3. at the same time, specific to the academic domain (e.g., Actuarial Science cohort students take Probability and Data Validation in the fall, in the same section of the course) and also take the 1-credit Seminar course.Students work with faculty mentors, following a pre-defined curriculum.A key goal of participating in an Academic cohort is to learn how the data sciences are relevant in various disciplines in the university.Data science training and modules are valuable supplements to the standard curriculum in many departments on campus.In the long run, data science modules can be integrated into the curriculum of many departments, to provide a modern, data-driven approach to learning.

| Faculty research cohorts
Many of the students, both graduate and undergraduate, from any major, choose to work on academic research with a faculty member in small groups.Areas of focus range across a broad array of topics, including Nursing, Biology, and Biomedical Engineering.When faculty are applying for federally funded grants, The Data Mine is a natural partner.In particular, The Data Mine can assist faculty members with student recruiting plans, student training about data-driven tools and technologies, student professional development, and so on.The Data Mine also has a much more diverse group of students than is typically found in a STEM-related department, so the team is well prepared to assist with plans for broadening participation.

| Computational infrastructure requirements
The undergraduate and graduate students all work on the Anvil computing cluster described at: https://www.rcac.purdue.edu/compute/anvil.This cluster, funded by a $10 million National Science Foundation grant, features 1000 computing nodes with 128 CPU cores each.The students use a Jupyter Lab, utilizing a kernel that allows them to execute Python, R, SQL, or bash shell commands, including the potential to switch tools from cell to cell.The Data Mine team has also recently expanded the offering on Anvil to include popular code editors Visual Studio Code and R-Studio.

| The data mine staff
The Data Mine operates in such a way that requires a dynamic team of individuals working together collaboratively to establish an effective working model that combines students, registration and advising, recruiting, industry partners, academic partners, faculty, and technical and administrative support.The team is led by a faculty director who supports the team of Managing Directors and Managers, broken down into areas such as Operations, Academic Programs and Outreach, Corporate Partnerships, and Data Science.The entire Data Mine team (https://datamine.purdue.edu/about/welcome.html) of 24 professionals works together cohesively to support the student population, their most important stakeholder group.

| CORPORATE PARTNERS
The Corporate Partners (CRP) cohort is the largest cohort within The Data Mine with 900 students in AY 2022-2023 on 80 research projects.Project teams consist of an average of 12 students, typically three first-year undergraduate students, six upperclassmen undergraduate students, and three graduate students.Students as early as first year college students through doctoral students collaborate on data driven projects in partnership with mentors in industry.The CRP experience is designed for students to be mentored by industry professionals while working on projects proposed by the industry partner for the full length of the academic year, August to April.In these projects, the students receive credit, rather than getting paid, for their research.The projects are registered within the Purdue University course catalog as courses and students are eligible to sign-up for specific projects (i.e., courses) upon acceptance to The Data Mine.The nature of the credit-bearing experiences are more like courses, in which the students are learning how to apply data science skills in practice.Because tuition for full-time undergraduate students at Purdue allows for enrollment of up to 18 credits, for most students, participating in The Data Mine requires no extra expenses beyond their existing tuition fees.In addition, as mentioned below in Section 3.3, The Data Mine had several sponsored research contracts, in which the students are paid and the nature of the contract is much more like industry consulting.

| Corporate partner mentors
Corporate Partner Mentors, or often simply called mentors, are the employees at the industry partners who commit to mentoring and leading the projects during the academic year.The mentors propose the research projects for the student teams to work on using the Agile framework originally derived from http://agilemanifesto.org/iso/en/manifesto.html.The Agile framework is a methodology that continues to evolve but is based on the premise that software development (and work processes in general) should be iterative and incremental, with multiple revisions of planning, review and execution.The Agile framework refers to these iterations as sprints, and each of these sprints lasts 2 weeks in The Data.These Corporate Partners projects usually arise because of a need at the company, for example, to build a machine learning model, perform predictive analysis, construct a natural language model, do some front end or back-end stack development, and so on.
During the months before the academic year begins, each mentor contributes a brief slide describing the project.These project descriptions are written in (relatively) non-technical language on two to three PowerPoint slides, so that students can broadly understand the nature of the project.The first slide consists of a high-level description of the company's mission and the role of the team that the students will work with.The second slide consists of a description of the student project itself, including any necessary background skills, tools that the students will likely learn, potential impact of the project, and so on.In this way, it is easy for students to read many of the project descriptions and decide which team is most suitable for them.The ability to (largely) self-select their project bolsters the students' sense of belonging.Mentors also complete a project charter (common in the Agile framework) to identify objectives and project scope in more detail than the project description.The project charter is only shared with students after they register and sign the non-disclosure agreement (NDA).All of the credit-bearing projects in The Data Mine are covered by an agreement between the university and the Corporate Partners.These projects are governed by a 5-year agreement given here https://datamine.purdue.edu/corporate/sponsoracknowledgment.docx.The annual planning cycle is outlined in Figure 2.
During the academic year itself, the mentors meet with the student team for 50 min each week during the team meeting.Typically, this meeting is held virtually, but occasionally mentors travel to campus to meet in-person.Using Agile terminology, the mentors serve as the product owners of the projects.The mentors provide business perspective to the project and sometimes shift scope of the project depending on company priorities.While the mentors provide project guidance, they are so much more to the TAs and students.The TAs and students value building a relationship with the mentors and having weekly interactions with working professionals.One undergraduate student wrote, "The close interaction with the Corporate Partner Mentors each week is not something I expected to be so intimate or interactive to the extent that it had been so far."At the end of the academic year in May, students submit an evaluation (approximately 1600 student responses over 4 years of assessment).Students report learning professional skills such as communication, project management, and leadership skills from their mentors.By the end of the nine-month academic year, it's difficult to distinguish the students from the mentors.A mentor from industry reported, "It's literally like working with colleagues when working with you guys [students]-your ideas are so great, your thought processes are so wise."

| Teaching assistants
Corporate partner teaching assistants (CRP TAs or TAs) are key to the success of the projects.In Agile terms, the CRP TAs serve as the scrum masters.They are responsible for leading weekly team meetings and weekly labs, grading the student sprint reports every 2 weeks, and seeking assistance when the team faces roadblocks.The CRP TAs are the primary liaison connecting the students, mentors, and The Data Mine staff.Their primary responsibility is project manager, not necessarily technical expert or contributor.Each team has at least one CRP TA so there were more than 80 CRP TAs in AY 2022-2023.Teams that are larger, have more advanced topics, or include hybrid students might have two CRP TAs.The CRP TAs are paid hourly for up to 10 h per week broken down approximately as such: 1 h for team meeting, 2 h for team lab, 1 h for administrative work/planning, 1 h for grading, 2 h for individual student support, 2 h for learning/supporting technical needs, 2 h for buffer as needed.
The TA hiring for the next academic year commences in January with an application and then an invitation to a group interview.It is essential to complete the interview process before students leave campus for summer.The group interview is key to interviewing 200+ candidates each year in a timely and effective manner.The design was modeled on resident assistant (RA) hiring processes at Purdue.Group interviews allow the candidates to interact in small groups of their peers while The Data Mine staff observe their behaviors in situations that mimic expected TA responsibilities.The groups of five candidates participate in five activities in a 1 h session with each student leading one activity and acting as a peer in the other four (Figure 3).One Data Mine staff member is assigned per group and annotates observations on a rubric that includes topics on attire, timeliness, general professionalism; communication skills; candidate enthusiasm; initiative and leadership; ability to follow and remain engaged; and teamwork and interpersonal skills.Typically, group interviews are run with four to five groups per hour so The Data Mine team can interview 25 candidates with the support of five staff members in one session.The Data Mine hosts eight to 10 sessions between spring break and the end of spring semester.After the group interviews conclude, candidates are moved to the Seminar or CRP track based on their preferences and group interview notes.All candidates who move forward have a 1:1 interview online for 15 min.
F I G U R E 3 Student team lead (TA) applicants work together in a group interview activity to assess their teamwork and collaboration skills in a situation with limited resources or instruction.
During the summer, CRP TAs are placed on a project and introduced to their CRP mentor to begin summer planning.TAs are also expected to complete a series of in-person training and virtual onboarding modules before the first day of fall semester.Details are available in their resource book: https://the-examples-book.com/crp/TAs/introduction.

| Students
Students reported (from longitudinal surveys administered annually during the previous 4 years by Purdue's Center for Regional Development) that the Corporate Partners cohort provides them with valuable real-world experience via application of skills, collaboration, and mentorship.One graduating senior reported, "I really think The Data Mine is the most useful class at Purdue.All other classes are very theoretical and do not provide any real-world applications.
The Data Mine provides a way to use these skills in real-world scenarios and, besides internships, it is the only way to master your data science skills." Approximately 90% of The Data Mine students are undergraduate students, 9.9% of The Data Mine students are graduate students, and 0.1% of students are professional students; however, the current pool of applicants for spring 2024 and fall 2024 indicates an uptick in the number of graduate applicants for future terms.Students in The Data Mine report learning that they have experienced the value that data science brings to all disciplines.The cohort is accessible to students of any background or level in their academic career.They earn three credits per semester for Corporate Partners plus undergraduate students must take the one credit Seminar concurrently.Students are expected to commit to both the fall and spring semester in Corporate Partners and cannot switch projects mid-year as it would be too challenging to catch up from a semester of missed work.The 5-15 students per team are interdisciplinary in the sense of major, level of experience, technical background, and domain background.For example, a team working on computer vision models of dogs benefits from the domain experience of veterinary students, the technical experience of data science and computer science students, and the project management skills of management students.The value of working with industry partners on real-world problems allows the students to experience the data science lifecycle and understand the time consuming and challenging nature of data science projects.The teams apply Agile project management to coordinate the projects which welcomes flexibility as the students learn and develop their skills and outcomes.Sprints last 2 weeks and begin with a planning meeting and conclude with a retrospective and reviewing meeting.
At the end of the academic year, students in The Data Mine's Corporate Partners cohort complete a feedback survey and assessment and report learning outcomes on key professional skills including communication, collaboration, leadership, and persistence.The teams present regularly to their mentors throughout the academic year and once publicly at the symposium in April.The students practice preparing business reviews, technical reviews, and research posters.The students also practice their interpersonal and collaboration skills by working with peers of different experiences, learning to leverage unique strengths, and appreciating the value of teamwork in achieving goals.Most students report never working on a project with this many collaborators or for this length of time which enables them to learn leadership skills such as task delegation, conflict resolution, time management, and critical thinking.Sometimes over the academic year, the project scope will change due to business needs of the partner or technical barriers which allows the students to pivot.Summer internships are often only eight to 12 weeks; 9 months with The Data Mine provides comprehensive learning and reflection.
Overall, students report in the feedback surveys that The Data Mine influences their career aspirations.Often students gain a newfound or increased interest in data science related careers by discovering career paths they were previously unaware of or had not considered.The Data Mine has many partnerships with companies in domains outside of "big tech" such as aerospace, agriculture, consumer goods, manufacturing, and pharmaceuticals, so students learn about career paths in data science at companies they would not have previously considered.In addition to finding their passion for data science, students also gain self-efficacy in their ability to work on large data-driven projects.The Data Mine experience provides students with concrete experiences in data science that can be used as conversation topics in interviews which students report making them more competitive for jobs.

| Sponsored research
As mentioned above, the majority of the Corporate Partnerships in The Data Mine are focused on student experiences for which the students receive credit, and the students are not being paid.Over time, however, as relationships between The Data Mine and their Corporate Partnerships evolve and mature, these companies are eager to work on Sponsored Research projects with The Data Mine.This often happens after just 1 or 2 years of collaborative work.During a Sponsored Research project, the relationship between the university and the Corporate Partners is much more like a real-world consulting experience for the students, where they are paid for their time and expected to complete an agreed-upon deliverable.The contract is more formal.The budget is larger and more complex.Students and faculty members can be held to timelines and can be obligated to produce deliverables that are specified in the contract.
The quality of mentoring from employees at the companies that offer Sponsored Research projects has (so far) tended to be excellent.Companies are really committed to enabling the students to learn about how data science is genuinely used in practice and to enable all aspects of the business world.Students learn how data scientists in industry are embedded in many different ways at companies: sometimes data scientists work as consultants who are centrally managed but work on projects throughout their company, and other times data scientists are embedded directly in small or medium sized teams of three to five people.Especially in this latter model, the data scientists are especially well positioned to enable students to learn not only data science skills but also domain-specific skills related to the algorithms, methodologies, models, and so on, that the students learn from the team.
Students typically sign NDAs for credit-bearing projects.In contrast, Sponsored Research projects are tailored to the specific partnership and company.The university typically assumes some potential legal liability from Sponsored Research projects, for example, due to possible breaches of data security, confidentiality, mission-critical projects not delivered as agreed to, and so on.Students meet with their mentors from the companies much more regularly during Sponsored Research projects, often on a daily basis.The students are working in directly the same modality as employees, usually with computing accounts at the company, broad access to databases or cloud-based computing resources, hardware from the company, and so on.
Despite the complexities of establishing Sponsored Research projects, it is impossible to over-emphasize how meaningful these projects are for students.Such opportunities give students a tremendous amount of insight and experience about how data science is used in practice in industry.

| Symposium and end-of-academic-year
In late April each year, The Data Mine students present their work at the Corporate Partners Symposium.The Symposium was held virtually from 2020 to 2022 due to the pandemic, but finally in April 2023, The Data Mine hosted the first in-person symposium at the Purdue Memorial Union (i.e., in the student union).A total of 80 posters (one per project) were on display.Many mentors and alumni attended the event with more than 1000 attendees (Figures 4 and 5).Since most of the work is bound by NDA, the students value the opportunity to meander to their peer posters and learn about the diverse topics of research completed.The posters from the 2020 to 2023 symposiums can be found here: https:// datamine.purdue.edu/symposium/index.html.During this celebration of the year's activities, many partners participated in a luncheon, networking, a keynote address about the power of data science especially in broadening communities and participation in the STEM workforce, and so on.
Immediately following the Symposium, The Data Mine has a meeting of the External Advisory Council (EAC), with one representative per Corporate Partner.The number of participants varies from year to year based on the number of Corporate Partners that The Data Mine has active projects with and which companies choose to participate, but generally includes approximately 40-50 representatives in years past.A rotating external chair (selected annually by The Data Mine Staff) leads the meeting through five questions requesting live feedback from partners, though anonymous feedback is also welcome.The Data Mine staff observes during this meeting but does not contribute so that unbiased feedback during the EAC meeting can be collected.It is a time for The Data Mine to listen to live feedback, thoughts, questions, comments, insights, and so on, from the mentors on the EAC.The EAC chair summarizes the feedback in a report to The Data Mine staff.The Data Mine staff addresses the plan of action based on the feedback in a follow up report and publishes it to the partners so that The Data Mine is held accountable at the next EAC meeting.
Also, at the end of the academic year, the students and the teaching assistants all complete a comprehensive feedback survey about their experience.Such feedback allows The Data Mine opportunities to make adjustments and corrections before the next academic year.These anonymous surveys also provide longitudinal tracking and evaluation from year to year.A key goal is that, as The Data Mine is continuing to rapidly increase its population of students, industry partners, and academic partners from year to year, they are still providing high quality projects and research experiences for students.

| EXPANSION
The Data Mine model of student learning, Corporate Partnerships, and research experiences constitute a welcoming immersion into the data sciences and their applications across many domains.Institutional profiles, however, vary broadly.Student experiences and backgrounds also vary considerably from one institution to another, and indeed, even from department to department and from program to program, within the same institution.With this in mind, however, The Data Mine model is sufficiently mature and has evolved such that it is able to be replicated at many other institutions.Moreover, The Data Mine staff are firmly dedicated to making data science opportunities accessible and supporting students, faculty, research scientists, staff members, and Corporate Partners mentors, at a wide range of institutions (e.g., public/private companies, public/private universities and colleges in all disciplines, including two-year schools, etc.).The National Data Mine Network provides funding for 300 undergraduate students over the three-year life of the grant (i.e., averaged to 100 students per academic year) to participate in The Data Mine, including the Seminar course and a choice between the Corporate Partners program or performing research with a professor.Funding is split into nine monthly payments directly to the students from ASA during the academic year with a small portion set aside for conference travel reimbursement.The NDMN students are undergraduates pursuing degrees at Minority Serving Institutions, including Historically Black Colleges and Universities, Hispanic Serving Institutions, and Tribal Colleges and Universities.Communications were sent to all Minority Serving Institutions in the United States.During the first year of the program, 38 colleges and universities applied to participate and all eligible students who applied were accepted (e.g., undergraduate US citizens) and were represented in the program.Students who applied but were either graduate or international students were also accepted but not eligible for stipend funds, as articulated within the government funding terms.During the second year of the program, twice as many students applied to participate as compared to the first year.

| National Data Mine Network
Many of the schools from which NDMN student applicants come from do not have in-house data science programs or living learning communities.NDMN provides an opportunity for students to be exposed to foundational data science learning community from an R1 institution, while simultaneously gaining real-world experiences.NDMN students have the option to either work on a research project with a faculty member, or to participate (in virtual, synchronous meetings) with one of the Corporate Partners teams.When working with faculty and students at other institutions, NDMN staff endeavor to support potential new academic and corporate partnerships that develop organically and blossom into full-fledged sustainable relationships between The Data Mine and NDMN, their academic partners within the NDMN, and their associated corporate partnerships.The workshops (mentioned in Sections 2.2 and 5.1) are a key method of supporting faculty at other institutions who aspire to build Data Mine models that are appropriate for their own institutions.

| Indiana Data Mine
The Indiana Data Mine is another initiative geared toward supporting students throughout the state of Indiana.The mission of this program is to develop regional partnerships with specific technical areas of focus.Moreover, the intention is to meet other departments and colleges where they are, in terms of their level of technical expertise, maturity of corporate relationships, curricula for students, and so on.As an example, for instance, Purdue Fort Wayne (PFW) is a regional campus situated in a biotechnology corridor in northeastern Indiana.The Department of Mathematics at PFW has relationships with actuarial companies and has recently been starting some new data science opportunities for students, including curriculum, student activities and a new student organization, and new partnerships.The Data Mine at Purdue in West Lafayette has been including students from Fort Wayne in several of the Corporate Partnerships mentioned earlier.Students participate virtually but synchronously, working with the mentors from the companies in real time.In this way, students at Purdue campuses like Purdue Fort Wayne can take advantage of activities at the main campus in West Lafayette.
Beyond the regional campus at Fort Wayne, in fall 2023, The Data Mine team is planning data science opportunities with Purdue-Northwest, Trine University, the University of Notre Dame, and Valparaiso University.Last year, The Data Mine coordinated a partnership with Taylor University; due to the smaller institutional profile, they paused this partnership for 1 year because of faculty commitments and time conflicts, with the intention to resume working together in the future.For the upcoming 2023-2024 academic year, The Data Mine has approximately 55 students from Indiana universities beyond Purdue West Lafayette who are already committed to participating.They meet regularly with faculty at many other universities throughout the state of Indiana, to plan potential new partnerships, including Butler University, Franklin College, Goshen College, Hanover College, Indiana State University, University of Evansville, and University of Indianapolis.Moreover, they have reached out to every college throughout the state, to explore the possibility for potential partnerships.

| Deaf pods
The National Science Foundation awarded a Convergence Accelerator $750 K grant to Purdue in December 2022, for a new program called Developing Experiential Accessible Framework for Partnerships and Opportunities in Data Science (DEAF PODS).Altogether, 19 Deaf and hard of hearing students worked with DEAF PODS during spring 2023.A total of 32 students participated in summer 2023.The students worked with a total of five Corporate Partnerships, three of which were Deaf-owned businesses.Four of these teams worked remotely but synchronously with each other, enabling the students to develop friendships with peers beyond their own institution.One team worked onsite at Purdue during summer 2023.The five companies were 5 Star Interpreting, ASL Education Center, DEAFCYBERCON, the Indiana Family & Social Services Administration (FSSA), and Nationwide Mutual Insurance Company.

| IMPLEMENTATION
The Data Mine aspires to impact students, universities, and industry partners far beyond Purdue University in West Lafayette.Regardless of your affiliation, they offer the below options as next steps.The Data Mine developed an academic partners life cycle to support the launch of programs at other institutions.It begins with customer discovery to explore a partnership, meet key stakeholders, and discuss logistics.The first year of the partnership starts with a pilot phase with the commitment of at least one faculty mentor to facilitate a Seminar.The partner may choose to work on a corporate partners project supported by Purdue.Academic partners are welcome to utilize Purdue infrastructure for the first year.As a partner transitions to the second year ramp up phase, partners aim to increase student participation and establish a corporate partnership with a regional industry partner.Course management and infrastructure should be primarily supported by the academic partner.By the third year, the academic partner should be planning for sustainability by increasing corporate partnerships and support by the university.
The Data Mine started with one Corporate Partner supporting one project in AY 2018-2019 and grew to 70 partnerships in AY 2022-2023.Partnerships can come from an individual's network including connections with colleagues, university leadership, alumni, and professional societies.Universities can leverage their network via alumni offices, external partnership offices focused on engagement or industry partnerships, or partnerships with companies that already recruit students.Alumni of The Data Mine that have graduated and entered the workforce have launched partnerships with their employer and The Data Mine.These alumni serve as excellent early career mentors to the students, and it provides leadership opportunities to the mentors.
Projects are scoped with corporate partners to be value-added such as proof-of-concept work or exploring the "art of the possible," low hanging fruit due to competing priorities, and quick wins if the partner had a larger staff.Projects should avoid mission critical work; deliverables are not guaranteed since these are credit bearing experiential projects and not contracts.Overall, the focus should be on connection and engagement as the quality makes more of an impact than the project scope given that the student team spends (at least) 1 h per week with the Corporate Partner Mentor.
The partner benefits by gaining access to interdisciplinary talent for 9 months compared to 10 weeks in the summer and by building name recognition with students on campus.Partners can also test proof-of-concept projects by exploring innovative student solutions before allocating internal company resources and completing value-add projects that might be stuck in the backlog.

| How your company can partner with the data mine
The Data Mine developed a roadmap to partnership to simplify the process to engagement.The Data Mine is often referred to as a catalyst to deeper engagement with the university because of The Data Mine's easy engagement model.Supplemental links are available on the roadmap to partnership webpage here: https://the-examples-book.com/crp/ mentors/partner 1. Prospective partners start with a 25 min discovery call with a corporate partners team lead.This meeting covers details about The Data Mine, scopes potential project ideas, and concludes with next steps detailed in the onboarding checklist.2. The onboarding checklist includes action items that are completed as early as spring and throughout the summer for planning.a. Project descriptions: Partners provide a project description (PD) that enables students to review a summary before registering.Project descriptions are brief two to three slide PowerPoints that include a one slide introduction to the company and one to two slides about the project.No NDA is needed to review the project description.b.Project charters: Project charters, different than project descriptions, scope out the details of the project including roles and responsibilities, computing resources, and schedules.Computing resources typically fall under one of these three options: utilizing Purdue's computing infrastructure (currently Anvil), virtual desktops provided by the company, or physical hardware like sending laptops to the students.Project charters are only shared with students once they are registered for the project and have signed the NDA (if required).c.Legal paperwork: The Data Mine in coordination with legal counsel developed a standard sponsor acknowledgment (https://datamine.purdue.edu/corporate/sponsoracknowledgment.docx) to cover up to 5 years of partnership with a short addendum specific to each annual project.Most students also sign an NDA, and some partners also require an assignment or license of intellectual property (IP) at no cost.This is permissible because students at Purdue retain their own IP, although this is not the case at all universities.The NDA and the IP documents are between the company and an individual.

| CONCLUSION
The Data Mine acts as an exemplar of the power of collaboration between academic institutions and industry partners, and the potential for student success when students participate in experiential learning in a community-based environment.This model provides students with a safe space and adequate support to take on challenges, develop innovative solutions, and apply their problem-solving ability to real-world problems.This program started with only a passionate faculty member and wholehearted encouragement and support from university leadership.It has grown from one Corporate Partner and a handful of students just a few years ago to 80 project teams and to nearly 1700 students in just 5 years.This model has transformed the way that students are learning data science.This program empowers students to transition from their academic careers into their professional careers as emerging leaders in their fields.The Data Mine students have frequent opportunities to participate in professional development events, join field trips to various Corporate Partners' offices and plants, and build relationships with industry professionals.Invariably, students have increased hiring opportunities for permanent positions.This organic recruiting opportunity that exists for both the Corporate Partners and the students is especially powerful because the relationship between the students and their industry mentors is built over several months and is founded on trust and performance.Not only do students in The Data Mine reap enormous benefits of studying data science with real companies and real data, they also benefit from collaborating and building relationships with other students and professionals from a variety of backgrounds, degree programs, fields, and industries.As evidenced by many students from The Data Mine being recruited into full-time roles after graduation, The Data Mine expects the relationships stemming from this model to continue to grow over months and years and serve these students well into their careers.
The Data Mine's Seminar serves as the backbone of the program.It offers a foundational data science curriculum grounded in project-based student learning.Students work on weekly projects, in which the topics build on each other, over a period of several consecutive topics.The environment is conducive to hands-on, immersive, problem-solving techniques.Projects are completed by the students using the Anvil computing cluster, one of the most powerful computing clusters in the United States.Although Anvil is physically located at Purdue's Rosen Center for Advanced Computing, it is an NSF-funded cluster whose access is managed by the ACCESS team of colleagues at several universities.Operating under a 5 year grant, planning has commenced for the computing needs of The Data Mine subsequent to the grant's end-date.Currently, computing allocation requests are available to teams anywhere in the USA.The size of the data and the processing power needed for each project ensures that students become accustomed to utilizing the Anvil system along with tools like R and Python to complete their projects.Students gain access to this data on Anvil from the first day of each term.They quickly acclimate to using these data science tools, rather than being tempted to use desktop tools like Excel or trying to see all of the data with their eyes.They learn to rely on scripts and data science models.With numerous video tutorials that the students can access asynchronously, as well as The Data Mine's Examples Book, a repository of thousands of pages of detailed technical resources, students are equipped to excel in data analysis as it relates to their major or field.
The Data Mine's Corporate Partners program brings together students and industry experts.These students will be the next generation of data competent professionals and researchers.Operating in an Agile framework, students work together in groups of 5-15 with a student team leader and a corporate mentor.With nearly 100 project teams planned for 2023-2024, students participate in an hour-long meeting each week with their mentor, and also a two-hour inperson lab with their project team.During these labs, the students work on their projects, and then spend another three to 6 h outside of meeting times to work on the projects, either individually or with their teammates.When the students encounter technical issues that they are unable to resolve independently, the students (or the student team leader) contact The Data Mine's data science team members for technical assistance.The Data Mine team employs professional data scientists to provide student support specifically for such instances.
In addition to Seminar and Corporate Partners, The Data Mine also offers academic projects in various disciplines, including Nursing, Biomedical Engineering, EAPS (Earth, Atmospheric, and Planetary Sciences), Physics, Actuarial Science, Biology, Sports Engineering, Athletics, Student Life, and so on.These students work on academic research projects under the guidance of a faculty mentor.They present their research at various conferences and produce research papers for publication.
Critical to the success of The Data Mine program is the support from Purdue's leadership, including the Office of the Vice Provost for Teaching and Learning.Ongoing support and guidance from university leadership means that The Data Mine is equipped to focus on the students and has the autonomy to configure the structure of the program to best suit the model.By collaborating with academic partners from around the region and the United States, especially Minority Serving Institutions (including Historically Black Colleges and Universities, Hispanic-Serving Institutions, and Tribal Colleges and Universities), and universities supporting Deaf students, The Data Mine is uniquely positioned to equip students with critical technical knowledge and skills that they might not have otherwise been exposed to.The Data Mine is truly a program for any student from any background, experience, race, ethnicity, gender, socioeconomic status, ability status, and so on to learn data science.When students are provided with adequate support and guidance, and the tools and resources necessary to thrive in a STEM environment, the only limits are their imaginations.
Students may also participate in a variety of Academic cohorts.Some examples of Academic cohorts include Actuarial Science, Physics, and Analyzing Digital Gaming and Culture.Students in the Academic cohorts take a course together F I G U R E 1 The Data Mine students meet in a residence hall atrium for Seminar to work on assignments in the one credit-hour course.TAs and instructors mingle to support the active learning environment.
Annual cycle timeline that the Corporate Partners program follows for planning and execution of the program.

1.
What is The Data Mine doing well (including education, technology, culture)?2. What could The Data Mine do better?3. What does a "successful outcome" in a corporate partnership with The Data Mine look like?(e.g., name recognition, hiring full time, internships, completion of a project, fostering a welcoming and diverse culture) 4. What can The Data Mine do to bring more value to their corporate partners?5. What can The Data Mine do to better prepare future data scientists from a variety of backgrounds, including both technical and professional development skills?F I G U R E 4 Corporate Partners students working with Raytheon celebrate their research with Corporate Partner Mentor Mike Douglass at The Data Mine Symposium in April 2023.This is only one of 80 posters that were on display during the event.F I G U R E 5 Corporate Partners students working with DORIS celebrate their research with Corporate Partner Mentor Meghan Tooman at The Data Mine Symposium in April 2023.This is only one of 80 posters that were on display during the event.

The
National Data Mine Network (NDMN) is an NSF-funded initiative (NSF #2123321), coordinated through the American Statistical Association (ASA) and administered by Purdue University.The principal investigators are Katherine Ensor (Past-President of the ASA, and Noah G. Harding Professor of Statistics at Rice University), Monica Jackson (Deputy Provost & Dean of Faculty at American University), Talitha Washington (inaugural Director of the Atlanta University Center (AUC) Data Science Initiative, a Professor of Mathematics at Clark Atlanta University and an affiliate faculty at Morehouse College, Morehouse School of Medicine, and Spelman College), Donna LaLonde (Associate Executive Director of the ASA), and Mark Daniel Ward (Executive Director of The Data Mine at Purdue University).

5. 1 |
How to launch the data mine at your university Through expansion efforts discussed in Section 4, The Data Mine has enabled multiple universities to launch Data Mine programs.During spring 2023, The Data Mine hosted a series of five workshops on how to launch a program at your university.Recordings and slides from the first workshops series are available here: https://the-examples-book.com/ univ/workshop_outline.