Disrupting evaluation? Emerging technologies and their implications for the evaluation industry

This article surveyed different emerging technologies (ET), in particular artificial intelligence, and their burgeoning application in the evaluation industry. Evidence suggests that evaluators have been relatively slow in adopting ET in their practice. However, more recent data suggest that ET adoption is increasing. This article then analyzed if, and how, ET affect the evaluation industry and evaluation practice. The article finds that program evaluation is one of several competing forms of knowledge production informing decision‐making, particularly in the government and not‐for‐profit sectors. Therefore, evaluation faces a number of challenges stemming from ET. In this article, it is argued that evaluators must, albeit critically, embrace ET. Most likely, ET will complement evaluation practice and, in some instances, displace human tasks.


INTRODUCTION
Digital technologies have radically changed social life in the 21st century.Today, a large percentage of the global population is connected online.Reportedly, 5.2 billion of the globe's 8 billion inhabitants are online (Datareportal, 2023).Most use digital services for private and/or professional purposes.We are currently witnessing an exponential growth in globally generated data and new ways that such data are put to use in the private, public, and not-for-profit sectors (Nielsen et al., 2017).
Digital users leave footprints about their interests, preferences, consumer habits, and physical whereabouts.Behind user interfaces, powerful computers capture, store, and process, data about our online behavior.Data is the new gold.

DISRUPTING EVALUATION
Professionally, digital technologies are also pervasive and affect most industries.Some tasks are automated, or augmented, by digitally-driven emerging technologies (ET).ETs are defined as technologies expected to have significant future impact on the domains wherein they are applied (Rotolo et al., 2015).
In a comprehensive analysis, management consulting firm McKinsey concluded: "Our analysis of more than 2000 work activities across more than 800 occupations shows that certain categories of activities are more easily automatable than others.They include physical activities in highly predictable and structured environments, as well as data collection and data processing.These account for roughly half of the activities that people do across all sectors.The least susceptible categories include managing others, providing expertise, and interfacing with stakeholders" (Mankiya & Sneader, 2018, p. 2).
No profession or industry is left unscathed by digital technologies.It is not a question of whether an industry and its professions will be affected.It is a question of how.According to McKinsey the public and social sectors are among those that will be affected the most (2018).
Also, knowledge intensive industries are among those affected.It remains clear that research, foundational and applied, conducted by academics or practitioners, is equally affected by ET.A recent study focusing solely on AI also observes a potentially profound impact on higher-wage occupations with routine cognitive tasks (Eloundou et al., 2023).Evaluation is no exception.
Whilst it is debatable as to whether evaluation can be considered a profession and its practice demarcates an industry, it remains a form of knowledge, which often is procured and delivered in the marketplace (Nielsen et al., 2018a(Nielsen et al., , 2018b) ) and in competition with adjacent forms of knowledge production (Nielsen & Hunter, 2013).Nielsen et al. (2018c) argue that "evaluation services share an evaluative purpose and are offered through a set of activities informed by specialist evaluative knowledge" (pp. 21-22).
Public and social sectors were among those expected to be most affected by ET.Organizations within these sectors are also vastly dominant procurers of evaluation services in the United States of America (U.S.) and elsewhere (Kinarsky, 2018;Lemire et al., 2018).
Effectively this means that market dynamics frame whether we are evaluating, how we evaluate, what we evaluate.ETs already are, or are likely to become, part and parcel of what and how evaluators evaluate.A highly salient question beckons: What do ETs mean for the evaluation industry?
In this article, I seek to answer this overarching question.More specifically, the research questions are: (1) Will ETs affect evaluation as an industry?and (2) if yes, how will ETs affect evaluation practice?
To do so, I review relevant extant literature.I then describe the environmental scan methodology applied.Findings pertaining the research questions and a discussion pertaining implications for the practice evaluation follow.

Emerging technologies and evaluation
The rapid evolution of ETs and dramatically decreasing costs of storage have enabled innovation in techniques that instantaneously capture, analyze, and visualize huge data repositories.Largely, these developments concern new sources for data capture, such as online searches, social media platforms, satellites, drones, Internet of things, mobile phones, telecom records, and administrative registries (structured and unstructured data).Also, new sources for data storage and management such as cloud computing, Digital Ledger Technologies, and edge computing have emerged.Finally, sources for data processing such as Artificial Intelligence/Machine Learning (ML)-driven text analytics, other quantitative approaches, and visualization have rapidly expanded.These techniques process structured and unstructured, quantitative and qualitative data.Currently, massive sums are being invested in AI.Some AI solutions are tailored, while others are commodified.Figure 1 presents various ETs considered in this article.

Characteristics of the evaluation market and its industry
There is a relative dearth of research on the commercial aspects of evaluation.Nielsen and his colleagues (2018c) surveyed existing studies as part of a special issue of New Directions for Evaluation (NDE) dedicated to the topic (Nielsen et al., 2018a).They noted a general lack of empirical research on the evaluation market.
In their NDE volume, various contributors offered analyses of the largest segments of evaluation commissioners in the U.S., federal government (Lemire et al., 2018), and philanthropies (Kinarsky, 2018).Peck offered an incisive analysis of the large evaluation providers in the U.S. market (2018), and Hwalek and Straub surveyed smaller evaluation providers (2018).Also, the issue presented analyses of evaluation markets in Canada, United Kingdom, and Denmark (Davies et al., 2018;Lahey et al., 2018;Nielsen & Winther, 2014;Nielsen et al., 2018d).
New publications appeared since, including a recent issue of NDE dedicated to smaller evaluation enterprises (Martínez-Rubin et al., 2019).These contributions add further empirical findings to our knowledge, but do not contradict the overarching conclusions offered by Lemire et al. (2018).I shall therefore summarize key findings here.
The evaluation market is segmented and multiple-layered.It may be divided into national, regional, type of client, domain, or methodological segments.Different clients and different providers may dominate in each segment.
On the demand side, the public sector, particularly national government agencies is a dominant procurer of evaluation services.Public institutions are subject to procurement regulations such as framework contracts that provide access to a portfolio of potential contracts and requests for proposals (RfP) for individual contracts.The nature of the contracting schemes (win or nothing) and a limited number of contracts effectively limits market access.Framework contracts tend to favor larger evaluation providers.
Given the dominance of public government, shifting government priorities and sourcing strategies effectively, and quickly, alter market size and composition.Such policies contribute to the longer term ebb and flows in evaluation demand.
On the supply side, evaluation services are offered by management and research consulting firms, (semi-public) research institutes, universities, and individual evaluation consultants.Market position is dependent on which segment they cater to.However, for the largest contracts there is a limited number of larger consultancies that are dominant (Lemire et al., 2018;Peck, 2018).
Evaluators differentiate themselves in terms of methodological expertise, domain expertise, and utility focus.Larger firms tend to have both a broader range and depth of expertise and offer services beyond evaluation.Often there are strategic partnerships between market actors.Larger evaluation contracts are often carried by consortia consisting of a number of different firms and experts that deliver different bit parts into a project (Peck, 2018).
Market access barriers are low.There are no formal entry barriers such as credentialing programs.One can easily set up shop and offer evaluation services.Access barriers for larger contracts are typically in the form of resources towards responding to RfPs and access to framework contracts.
There are no clear boundaries between evaluation services and adjacent services such as performance auditing, policy analysis, monitoring, and business intelligence.Effectively, many evaluation providers also offer (some of ) those services.This may help to remain resilient towards shifting demands for evaluation.

METHODOLOGY
Before moving on to analyzing the research questions, let me outline the environmental scan methodology applied.Unlike a systematic review, environmental scans examine unpublished literature and publicly available information.
The research is based on existing data from published reports and articles.In the field of evaluation there are relatively few peer-reviewed articles on ETs and their implications on evaluation practice.I conducted search on titles, keywords, or abstracts for research articles in nine major evaluation journals from 2013-2023.The search terms were: "Big Data," "Artificial intelligence," "Machine learning," "Text analytics," or "Internet of Things."This search identified 18 distinct articles.In comparison, a Google Scholar search with same search terms yielded between 5 and 1.480 results when combining the search term with "evaluation."When scanning the documents, the majority refer to evaluation of the predictive performance of AI/ML and not integration with evaluation practice.Therefore, grey literature is the most likely source of data.
Apart from online searches, hand searching and citation chasing were applied as search strategies.Herein, relevant reports, books, and articles were identified.Emphasis was placed on documents that provided a discussion of ET for evaluation practice, and preferably provided empirical evidence to that effect.Initial scans indicated that most documents could be categorized as use cases, or reflective case studies on the application of ET in evaluation.Only a few manuscripts provided data pertaining to broader market coverage.Given the disparity of the content, no uniform data extraction form was developed.
Instead, an inductive thematic analysis was applied.Thematic analysis is a qualitative research method which is used across a range of epistemologies and research questions.Thematic analysis can be used for identifying, analyzing, organizing, describing, and reporting themes found within a data set such as existing literature (Nowell et al., 2017).Given the paucity of empirical data on the implications for the industry, most analyses are based on available knowledge of industry dynamics and practical ET case applications.

FINDINGS
In this section I present the findings for each of the two research questions.

Will ET affect evaluation as an industry?
Some professions have rapidly embraced the opportunities offered by ET.Evaluation has been slower in adopting ET (Picciotto, 2020;Raftree & Bamberger, 2014), but there is some work in the area.
Among the first books to focus on the interlacing between ET, specifically Big Data, and evaluation was the anthology edited by Petersson and Breul (2017).Herein, a survey among self-reported evaluators in the mid-2010s documented that about 10% had experience with Big Data (Højlund et al., 2017).No other survey of demand or supply side was identified.Since, York and Bamberger echoed these findings (2020).
The evaluation community appears to have shown increasing interest in the application of ET in recent years.Examples of ET in use are starting to appear in the peer-reviewed literature (Bonfiglio et al., 2023;Cintron & Montrosse-Moorhead, 2022;Roy & Rambo-Hernandez, 2021).Protagonists call for further cooperation and integration with data science (Bruce et al., 2020;Hejnowicz & Chaplowe, 2021;Raftree, 2020;York & Bamberger, 2020).
According to Raftree (2020), writing in the context of international development evaluation, ETs have begun to proliferate in this field with three distinct waves.The first wave essentially allowed evaluation practitioners to keep doing what they did, but amplified by new sources for data capture (geo-spatial data, large administrative registries, and mobile phones).The second wave focused on new forms of data capture such as Internet of Things, satellites and drones, and burgeoning focus on data analytics techniques such as AI and ML.The third wave appeared almost concurrent with the second wave and explored new technologies for data capture, storage, and data processing.Importantly, Raftree observes, "New disciplines (such as software development and data science) are entering the MERL field, bringing new ideas and ways of working" (2020, p. 15).
Raftree's identification of waves may be an appropriate metaphor for the adoption of ET in evaluation at large.Currently, only tangential empirical evidence exists about how ET has spread across domain segments in the industry, and to what extent practitioners today have more competencies and experience with ET.The recent proliferation of peer-reviewed articles is suggestive that, particularly, new ways of data processing of data such as texts and photographic images are part of the third wave (Cintron & Montrosse-Moorhead, 2022;York & Bamberger, 2020) When considering these developments, the market dynamics are crucial.As observed by Lemire et al. (2018), the evaluation market is demand-driven.Commissioners of evaluation service frame what is demanded in terms of scope, budget, timeframe, and competencies (and often methodology).If commissioners' demand for ET is explicit, the more likely it will spread throughout the industry.Only one, somewhat dated, study has noted that RfPs in international development evaluation did not request application of ETs (Forss & Norén, 2017).This may help to explain the relatively slow proliferation of ETs in evaluation.However, there are several indications of emerging use in this domain (i.e., Franzen et al., 2022).
The evidence suggests that it is a question of "how" rather than "if" ET will affect the evaluation industry.Let us therefore consider this question in further detail in the next section.

How will ET affect evaluation practice?
Two decades ago, Maynard noted that growth in the evaluation practice was driven by application into new domains and innovations in methodologies that gave competitive advantages in the market (2000).This observation rings true today.However, let us briefly consider what this implies in the current digital era.Several factors are likely to determine how ET will be applied in evaluation practice.These include: • Competitive strategies • Size and duration of contracts • Nature of the evaluation service • Capability of the evaluator

• Appropriateness of the technology
In what follows, I provide an analysis of each of these factors.Competitive strategies.Most evaluation practice is contracted.For larger, and especially government procured contracts, the financial terms are fixed.Evaluation commissioners set the terms through requests for proposals (RfPs).Evaluation providers compete on price, quality, and timeframe.
Quality differentiators are methodology, subject matter expertise, and utility (Lemire et al., 2018).The price differentiators are the overall sum, and sometimes for each staffing category.The timeframe differentiator is how quickly the work can be delivered.
Evaluators may compete on price when technology enables the provider to lower the price or deliver more for a fixed price.One example of such application may be to apply drones or satellite imagery to collect data on indicators for household income rather than field visits in difficult to reach areas (York & Bamberger, 2020).Another example is the application of a machine learning algorithm to classify the emergence of cybercrime in the entire dataset rather than in a sample using manual coding of police crime registries (Naess et al., n.d.).
Another strategy focuses on providing a higher quality service through the application of ET.One driver may be empirical (new sources, new data collection methods, new kinds of analyses).An example is the use of algorithms for building predictive and prescriptive models for child abuse or neglect combining machine learning and quasi-experimental design (Schwartz et al., 2017).
Another driver may be utility (variability, immediacy, frequency, and granularity of reporting).One example is predicting and preventing evictions in a homelessness prevention program where real-time reporting delivered granular data at street/block level for immediate action (Nielsen et al., 2017).
Size and duration of contracts.The size and duration of the contract, set by the commissioners of evaluation studies, is likely to be of importance for how ET affects the evaluation market.The human capability, technology, and time invested in conducting algorithm-based text analytics is significant (Franzen et al., 2022;Naess et al., n.d.).It may therefore prove too costly, time-consuming, and cumbersome for small budget evaluations.
Nature of the evaluation service.Using Nielsen and colleagues' definition of evaluation service (2018c), such services encompass building monitoring and evaluation systems, T A B L E 1 Potential application of ET in different evaluation service lines across categories of tasks.

Monitoring and evaluation systems
Evaluation studies ET is more likely to be applied in services wherein tasks concerning large, recurrent, and potentially automated and scalable, data capture, storage, and processing take places.

Evaluation capacity building
Most likely, building monitoring and evaluation (M&E) systems will be most directly affected.Such systems provide recurrent streams of data as opposed to episodic from evaluation studies (Nielsen & Ejler, 2008).The streams of data imply a recurrent data collection, analytic and reporting activity that holds the potential for automation.In some instances, M&E system data streams may be more or less automated by ET and displace some HTs.Consider, automated customer engagement surveys in the private sector.Once conceptualized, such services are more or less fully front-end and back-end automated.
Overall, this is less likely in short-term and episodic engagements that rely more heavily on specialist evaluator knowledge, such as many capacity building activities.Here such skills are crucial (but may be augmented by AI tools in training).
Table 1 presents a concise assessment (high, medium, and low) of displacing HTs with ET automation.
Capability of the evaluator.The application of ET will rely on the skills of the evaluation team.Teams often possess a combination of evaluation methodology and subject matter expertise (Hwalek & Straub, 2018;Peck, 2018).Echoing Raftree (2020), to make full use of ET, evaluation teams will need to add competencies from data science to its typical composition.
Appropriateness of the technology.Several aspects of the appropriateness of the technology play a role in how ET affects the evaluation market.Much has been written about the potential ethical problems and inequities of AI (Jasper et al., 2023;Reid, 2023).Undoubtedly, further issues will be raised as ET become more widespread.
Across these different factors that will affect how ET will be applied in the evaluation industry looms the question whether technology such as generative AI applications will replace humans.An in-depth analysis of the implications would require a more thorough analysis at task level for each service comprising the evaluation industry than permitted here.One can assume two overarching ways ETs may affect human tasks (HTs): displacing or augmenting HTs.
Most likely, we face a future where evaluators will work alongside and apply digital ETs to a much larger extent.Here, some tasks will be fully, or partially, displaced by technology, while others will be augmented.
More or less automatable tasks such as interview transcriptions (Da Silva, 2021), translations, screening of documents, high-level coding of texts are likely to be delivered by AI powered solutions (Leeuw, 2020).Some services may be more at risk of substitution than others.For example, sources of data, types of data collection, tools for data management, processing, and reporting may change.The revolution in technologies such as Internet of Things and AI driven tools for data processing implies that what data is being collected, how it is processed and who analyzes and reports data will likely undergo significant changes (Jasper et al., 2023).
Wilson and Daugherty speak of collaborative intelligence, where professionals and nonprofessionals play a critical role in training, explaining, and sustaining AI to make full use of its potential (2018).
Speaking directly to this point McKinsey Global Institute note: "… the transitions that will accompany automation and AI adoption will be significant.The mix of occupations will change, as will skill and educational requirements.Work will need to be redesigned to ensure that humans work alongside machines most effectively" (Mankiya & Sneader, 2018, p. 3).
Some tasks are most likely still left to humans, such as the overall research design, establishing evaluation criteria and standards, selection of sources, critical questioning, evaluative synthesis, and judgment.AI powered solutions (still) do not consider context, informative and performative particulars of text, what information is not there in a key stakeholder interview, etc. Yet, there remains a potential that such tasks can be augmented by AI.
These considerations all assume a continuous demand for evaluation services.However, the market size may be affected by shifts in demand.Currently, the promise of AI seems to be at the top of the hype curve.This implies that demand for evaluation services may be partially substituted by demand for other services.Ebbs and tides in demand have come and gone in the evaluation industry.Depending on the market segment and position, evaluation providers are likely to be affected differently.
Compared to highly specialized boutique evaluation firms, larger firms have more business lines (and requisite competencies) and larger financial acumen and may be more able to cope with shifts in demand, within and beyond evaluation services, because they hold a broader range of competencies.
In sum, there are several indications that evaluation at large is in the emerging approaches phase, and several factors are likely to influence how ET is used.In the next section, building on this analysis, I discuss the wider implications and challenges for the evaluation industry.

DISCUSSION
The emergence of digital technologies is considered a profound challenge to how we apply social science methods (Alvarez, 2016).Honing in on evaluation, it is an industry with considerable buyer power from particularly the public sector.Therefore, much will hinge on evaluation commissioners.This sector is expected to embrace ET (particularly AI) to a much larger extent than hitherto seen in the US (Desouza, 2018), Europe (Misuraca & van Noordt, 2020) and elsewhere.When what is being evaluated is bound to change, how it will be evaluated, and by whom will likely change too.
As noted above, AI and other ETs will most likely challenge evaluation practice by way of displacing some HTs and augmenting others.It will not entirely disrupt the industry and its reliance on specialist evaluation knowledge.That being said, the competency challenge is significant.On the supply side it is worth considering what different actors should do.
Individual evaluators may want to develop AI literacy to critically understand, identify ethical issues, apply, and assess ETs (Ng et al., 2021).This kind of upskilling will be necessary as AI evolves at a rapid pace.
Evaluation providers, as in any other professional service industry, must choose adaptation strategies; grow requisite competencies from within, hire the talent, or collaborate with data scientists.Depending on market position, evaluation providers will choose different strategies.Larger providers will likely invest in competency development programs and hire talent, whereas smaller providers will more likely seek collaboration.
Voluntary organizations for professional evaluations (VOPEs) play a crucial role in positioning evaluation as an indispensable form of knowledge for decision-makers.Boundaries between evaluation and other adjacent services are permeable.Evaluation is one of several competing forms of knowledge production informing decision-making, particularly in the government and not-for-profit sectors.
VOPEs must also face the challenge through targeted upskilling and reskilling programs and incorporating AI literacy in competency frameworks.This requires on the one hand clarity on the competencies that evaluators bring to ET enhanced service delivery and analytics, such as its understanding of theory, causality, validity, ethics, equity, valuing, and judgment (Leeuw, 2020), and one the other hand what evaluators need to learn for collaborating with data science.
Educational institutions offering evaluation programs and training must incorporate in ET in the curriculum and as a learning outcome.

CONCLUSION
ETs will significantly affect the evaluation industry.How ETs will affect the industry hinges on a number of factors, such as providers' competitive strategies, size and duration of contracts, nature of the evaluation service, capability of the evaluator, and the appropriateness of the technology.ETs will most likely displace some and augment other HTs.ETs will not entirely disrupt the industry and its reliance on specialist evaluation knowledge, yet evaluators and institutional structures must embrace the potential of ETs.

F
Kinds of emerging technologies available for program evaluation.Note.Adapted from York and Bamberger (2020) and Bruce et al. (2020).