SurF: an innovative framework in biosecurity and animal health surveillance evaluation

Surveillance for biosecurity hazards is being conducted by the New Zealand Competent Authority, the Ministry for Primary Industries (MPI) to support New Zealand's biosecurity system. Surveillance evaluation should be an integral part of the surveillance life cycle, as it provides a means to identify and correct problems and to sustain and enhance the existing strengths of a surveillance system. The surveillance evaluation Framework (SurF) presented here was developed to provide a generic framework within which the MPI biosecurity surveillance portfolio, and all of its components, can be consistently assessed. SurF is an innovative, cross-sectoral effort that aims to provide a common umbrella for surveillance evaluation in the animal, plant, environment and aquatic sectors. It supports the conduct of the following four distinct components of an evaluation project: (i) motivation for the evaluation, (ii) scope of the evaluation, (iii) evaluation design and implementation and (iv) reporting and communication of evaluation outputs. Case studies, prepared by MPI subject matter experts, are included in the framework to guide users in their assessment. Three case studies were used in the development of SurF in order to assure practical utility and to confirm usability of SurF across all included sectors. It is anticipated that the structured approach and information provided by SurF will not only be of benefit to MPI but also to other New Zealand stakeholders. Although SurF was developed for internal use by MPI, it could be applied to any surveillance system in New Zealand or elsewhere.

and to protect itself from biological risks through the early detection of pests and diseases, and the provision of evidence of pest or disease freedom. Given the importance of these activities to New Zealand stakeholders, it is essential that the performance of these programmes can be assessed to provide assurances regarding the quality of delivery and outputs of these programmes. The importance of understanding, and being able to assess, the quality of surveillance programmes was a focus of New Zealand's Biosecurity Surveillance Strategy 2020 (Ministry of Agriculture and Forestry [MAF], 2009), which identified three strategic goals related to the delivery of quality surveillance: • The most appropriate mix of surveillance activities is chosen to ensure surveillance programmes meet their specific objectives • Surveillance delivery is effective, efficient and responsive to changes in the biosecurity environment • The outputs of surveillance programmes can be relied upon by decision makers.
It is also critical to ensure that surveillance programmes are responsive to change and continually evolve to meet changing biosecurity needs in an efficient and responsive manner. As concluded by Drewe et al. (2015), evaluation can be used to help both identify and correct problems, as well as to protect, enhance and provide assurance on the strength of a surveillance system. Furthermore, in the animal health context, the assessment of surveillance systems is a component of both the import risk analysis and the veterinary services assessment procedures documented by the World Organization for Animal Health (Hendrikx et al., 2011).
The continuous evolution of surveillance systems therefore warrants periodic re-evaluation of their continued relevance and effectiveness and underscores the importance of surveillance evaluation in the surveillance life cycle (Figure 1).
The surveillance evaluation framework (SurF) was developed to provide a consistent generic framework for the assessment of the MPI biosecurity surveillance portfolio, including all of its components. It was also envisaged that in achieving MPI's cross-sector requirements that this framework could be applied more broadly by others delivering biosecurity surveillance activities. This novel crosssectoral effort aims to provide a common umbrella for surveillance evaluation in the animal, plant, environment and aquatic (including marine, aquaculture and freshwater) sectors. Here, we present technical details of the framework and its development.

| MATERIALS AND METHODS
In order to collate available information and example materials to inform development of the New Zealand biosecurity evaluation framework, a scoping review methodology was used to rapidly map the key concepts underlying surveillance evaluation in different sectors. The terminology proposed by Hoinville et al. (2013) was used wherever possible to align with existing standards. A surveillance evaluation framework was developed based on these findings. Three case studies were developed to test the developed framework and provide applied guidance to future users.

| Review methodology
A scoping review technique was used for the purpose of creating a common evidence base for the planning and development of the framework. Scoping reviews are considered a useful and increasingly popular way to collect and organize important background information and to gain an overview of the existing evidence base (Armstrong, Hall, Doyle, & Waters, 2011).
Initially, relevant documents were identified through discussions with stakeholders and surveillance experts. Reference lists of identified publications were considered as additional sources of information.
As two extensive reviews, including a full and systematic review of surveillance evaluation in the animal and human health field, have recently been completed (Calba et al., 2015;Drewe, Hoinville, Cook, Floyd, & St€ ark, 2012), it was considered most efficient to build on these rather than duplicating the work already conducted. However, to cover most recent publications, the literature search query developed by Drewe et al. (2012) was re-run in Web of Science covering articles published between 2011 and 15 February 2015.
To identify relevant non-animal surveillance publications, a scanning search of the scientific literature database Web of Science was conducted using the Boolean query: Topic = surveillance AND Title = ((surveillance AND (evaluat* OR analy* OR perform*)) OR (evaluat* AND perform*)) AND (environ* OR marine* OR plant*).
Through the use of wildcards (*), articles containing any variation of each of the search terms were identified. All articles published in the last 20 years (1995 and later) were included. To cover unpublished work, the grey literature was investigated through a Google web search built on the core search terms as described above (surveillance AND (evaluat* OR analy* OR perform*). The first 200 results were assessed and if relevant, findings were included in this report. F I G U R E 1 Evaluation as part of the surveillance life cycle 2.2 | Framework development A project team consisting of subject matter experts from the biosecurity sectors that the framework was aiming to cover was assembled. This included MPI experts from the environmental, aquatic, plant and terrestrial animal surveillance teams plus two external epidemiologists. Taking into account the literature review outcomes, the framework was specified during regular face-to-face group meetings that took place over an 18-month time period. Case studies were prepared by MPI subject matter experts between September and December 2015, using data and information that were already available. The objective of the case studies was to provide a proof of concept approach, to demonstrate that the framework was robust, complete, fit-for-purpose and user-friendly across the different biosecurity sectors it is targeting. Further the case studies were used to identify any framework components that needed rewording or further refinement.

| Review results
The updated search by Drewe et al. identified a total of 1,531 articles. All titles were scanned by the assessor. If a title appeared relevant to this review, the abstract was retrieved and reviewed.
Although a large number of titles were returned by the search, only one additional article (Hoinville et al., 2013)

| Framework development
Any framework for biosecurity surveillance evaluation will have to be very flexible and generic, as not only programmes with different objectives but also programmes in different sectors have to be assessed. Following the scoping review and expert discussions, it was concluded that several existing evaluation frameworks, while not originating from a cross-sectoral biosecurity surveillance perspective, could be readily adapted to the New Zealand requirements.
Following a series of expert meetings, it was concluded that SERVAL and EVA were most suitable tools to build upon as they offer the Each component describes the activities and decisions related to a phase within an evaluation project. Table 1 provides a schematic overview of the four components and their individual content. The framework and the supporting guidance notes describe the aspects to be considered during each specific activity of the evaluation process. Depending on the situation and the system under evaluation, it might not be possible to assess or describe all components in full detail; any abbreviations from the full protocol are therefore documented to ensure consistency. Further, for convenience, SurF provides users with an evaluation template to support consistency of outputs (Supporting Information 1 SurF Evaluation Template).
SurF includes a total of 29 different attributes (Table 2), which are divided into core attributes (n = 10; highlighted in bold) and accessory attributes (n = 19). Inclusion and categorization of attributes were jointly decided by the different experts participating in the framework development. This included experts representing each of the biosecurity sectors. However, attributes, their definitions and recommended methods for assessment build on existing frameworks, in particular SERVAL and EVA, but also the review of Drewe et al. (2015) and the Centers for Disease Control and Prevention (2001) and ECDC (2014) guidelines on surveillance evaluation and monitoring. SurF also includes some additional attributes, which were developed with the objectives and scope of SurF in mind, for example "Field and laboratory services." Also, some previously proposed attributes were modified to provide the framework with sufficient flexibility to be used across the whole spectrum of New Zealand's biosecurity surveillance portfolio. This was an important component of the development as existing frameworks were focused on surveillance of human or animal disease while the biosecurity context of MUELLNER ET AL. | 3 this project required extending several definitions to also encompass other risk organisms such as invasive aquatic species or pests of plants. Therefore, consideration was given to compatibility with plant and aquatic health and surveillance terminology. Ecological concepts and related terminology also had to be included to encompass the non-animal health sectors.
Traffic-light coding is, like in the SERVAL framework (Drewe et al., 2015), used to provide a summary appraisal in SurF for each of the attributes, using a standardized coding approach.
Within SurF, attributes are grouped into five "Functional Attribute Groups" based on the logic presented in Figure 2. Each group includes at least one attribute that is considered to be a core attribute. Core attributes assess essential aspects common to all surveillance systems, and it is recommended that they be included in all evaluations. If for any reason this is not done, justification has to be provided. The choice of accessory attributes is left to the evaluator's judgement and is not specified in SurF. The choice will ultimately be situation-and sector-specific and may be influenced by factors such as the evaluation question, the surveillance objective or the surveillance system's design.
Detailed guidance for the assessment of each SurF attribute is in dedicated guidance notes. While the aim was to align with existing standards such as those proposed by SERVAL (Drewe et al., 2015), the EVA Tool (Comin et al., 2016;The RISKSUR Project Consortium, 2013) or Hoinville et al. (2013) at times wording of the guidance had to be adapted to meet the needs of the non-terrestrial animal health sectors. For example, the text had to be extended to also apply to unwanted pest organisms (such as invasive plant or insect species) and hence had to consider, for example, an organism's habitat or the search efficiency of an activity. In addition, a methods' catalogue has been compiled to further   Figure 3). The aim was to develop a generic framework to allow sufficient flexibility for use across the wide range of MPI surveillance systems and to compare and assess system performance. While  F I G U R E 3 Visual outputs of performance assessment of attributes using the SurF framework. The format allows comparison between different evaluations or systems (described here as "System 1" and "System 2"). Attributes assessed positively are always placed at the top of the process box, while those in potential need of attention are placed below

| Framework testing
The case studies were commissioned with the goal of testing SurF and providing applied guidance to future SurF users. As such they provide non-peer-reviewed example evaluations to illustrate the framework at use, ready at hand to support MPI users of the framework.
Attribute assessment by SurF is supported by a visual output. At the individual evaluation level, this allows quick assessment of a system's strengths and weaknesses and, in addition to the evaluation template, standardizes the reporting of SurF results across different evaluations. An additional element of SurF is the framework's ability to support the assessment of the performance of MPI's surveillance systems and programmes to provide assurances around the quality of delivery and the outputs of those programmes. This may include business intelligence reporting requirements such as the number of MPI surveillance systems that have elements in need of attention, or the percentage of systems with the majority of attributes rated as good or excellent. However, the latter functionality should be applied with caution as it assumes that all attributes have the same weight. This is almost certainly not the case. Furthermore, previous results could be used to benchmark performance over time, if evaluations are conducted consistently and results are reported in a comparable format. We recommend using this feature mainly for providing a quick overview. Users should still refer to the detailed evaluation text to gain an in-depth understanding of each attribute and its assessment.
As outlined by Drewe et al. (2012), until recently there has been little comprehensive evaluation taking into account all aspects of a programme with quantitative indicators dominating at the cost of qualitative descriptors such as flexibility or acceptance of the programmes (St€ ark, 2012). While economic evaluation is strongly recommended as an integral part of a comprehensive evaluation framework, it is not commonly done and can be practically challenging (Drewe et al., 2015). Stakeholder participation or consultation is highly recommended in the literature to capture the programmes' acceptability, sustainability and impact (Calba et al., 2015). The importance of a high standard of documentation, including the value of visual outputs to support practical implementation of an evaluation effort, has been highlighted (Drewe et al., 2015). These were all important considerations in the development of SurF.
Differences in the use of terminology can pose major challenges to collaboration and cross-sectoral efforts such as SurF. However, the use of consistent specified terminology, that is understood across sectors, facilitates internal and external communication and the implementation of any evaluation. The development of SurF aided the project team in understanding where terminology and methods differ between sectors, and this new appreciation will likely lead to improved cross-sectoral collaboration in the future. The proposed terminology is based on current good practice of animal surveillance evaluation in an international context (The RISKSUR Project Consortium, 2017) but was extended in close collaboration with subject matter experts to align with the requirements of other sectors. However, it is noted that terminology is dynamic and can vary between sectors. It was therefore recommended that terminology is discussed and updated regularly as the framework is being used to assure a common understanding among users.
Designing and implementing surveillance programmes are becoming increasingly challenging (The RISKSUR Project Consortium, 2013) as factors like climate change and globalization impact on population health and impact on the risk of biosecurity incursions. A structured, transparent and logical evaluation process supports outputs that could become a source of assurance and credibility for the system examined (Drewe et al., 2015), both nationally and internationally. In our understanding, SurF is the first framework of its kind providing a unique cross-sectoral approach to surveillance evaluation. SurF is accessible via the MPI website: https://www.mpi.govt.nz/dmsdocu ment/18091-surveillance-evaluation-framework-surf-main-document and https://mpi.govt.nz/dmsdocument/18094-surveillance-evalua tion-framework-surf-appendix-1-surf-methods-catalogue.

ACKNOWLEDG EMENTS
This work would not have been possible without recently completed advances in animal health surveillance evaluation. We would in particular like to acknowledge SERVAL (Julian Drewe and colleagues) and the EVA Tool (RiskSur project team), which have both informed the development of SurF. The recent work on surveillance terminology by Linda Hoinville has also provided a foundation for this work to build on. Funding was provided by the Ministry for Primary Industries (New Zealand).