The Prevalence of Burnout and Secondary Traumatic Stress in Professionals and Volunteers Working With Forcibly Displaced People: A Systematic Review and Two Meta‐Analyses

s of the generated papers against the eligibility criteria (Fritha Roberts, Jennifer Lee). The concordance rate between the reviewers was 96.7%. The remaining 3.3% of the papers (n = 9), for which the reviewers did not reach an agreement, were reviewed by the research team, and a final decision was reached. The reference lists of the final pool of included papers were screened by the first reviewer for further relevant papers. All identified papers generated from the reference search that either met or were close to meeting the inclusion criteria were sent to the full research team for confirmation. From the references search, a further search term, “torture survivor,” was identified, as it was observed to be used within the research literature in reference to FDP. The original search was re-run to include this additional search term, and the resulting new papers were screened by the first reviewer.

residence but do not cross an international border, remaining in their country of origin, are referred to as "internally displaced" (Crisp, 2010). Individuals can also become displaced in the context of natural disasters, such as hurricanes or earthquakes (James et al., 2014). The term FDP is used throughout the current manuscript to refer to asylum seekers, refugees, refused asylum seekers, and internally and externally displaced individuals.
Due to increases in conflict worldwide, the number of FDP has increased substantially, "from 43.3 million in 2009 to 70.8 million in 2018, reaching a record high" (UNHCR, 2019). In addition to the political currents that must FDP negotiate, the journeys that they face are often dangerous (Bouhenia et al., 2017;Dolma et al., 2006;Farhat et al., 2018;Gerard & Pickering, 2013;Tello et al., 2017), and the processes of seeking asylum in host countries can be difficult to access (Bouhenia et al., 2017;Farhat et al., 2018). In a systematic review of 38 studies, which pooled data from 39,518 internally and externally displaced adults from 21 countries, the prevalence rates for posttraumatic stress disorder (PTSD), depression, and anxiety disorders were found to vary from 3% to 88%, 5% to 80%, and 1% to 81%, respectively (Morina et al., 2018). This evidences the potential emotional impact of forced migration and subsequent repercussions on the mental health of FDP. The observed increase in FDP also creates increased pressure on statutory and voluntary services supporting this population globally.
Research suggests that professionals and volunteers working with FDP also experience psychological effects, such as burnout, secondary traumatic stress (STS), and compassion fatigue (Apostolidou, 2016;Guhan & Liebling-Kalfani, 2011;Jones & Williamson, 2014;Robinson, 2013), due to their exposure to the trauma narratives reported by FDP. Burnout is considered to be a syndrome (World Health Organization [WHO], 2019a) comprising three key dimensions: "an overwhelming exhaustion, feelings of cynicism and detachment from the job, and a sense of ineffectiveness and lack of accomplishment" (Maslach & Leiter, 2016, pg. 103). As stated in the 11th revision of the International Classification of Diseases (ICD-11), burnout "refers specifically to phenomena in the occupational context and should not be applied to describe experiences in other areas of life" (WHO, 2019b) or viewed as a standard clinical diagnosis. In contrast, STS describes a worker's trauma reactions that are "secondary to their exposure to clients' traumatic experiences" (Trippany et al., 2011) and can be defined as "the natural consequent behaviors and emotions resulting from knowing about a traumatizing event experienced by a significant other-the stress resulting from helping or wanting to help a traumatized or suffering person" (Figley, 1995, pg. 7). The definition of compassion fatigue has evolved over time, leading to a lack of clarity and validity when quantifying this concept. Originally, compassion fatigue was seen as synonymous with STS (Figley, 1995;Stamm, 2005) and "the equivalent of PTSD" (Figley, 1995, pg. xv). More recently, compassion fatigue has been hypothesized to be a combination of burnout and STS (Stamm, 2010) and is more broadly defined as "a condition characterized by emotional and physical exhaustion leading to a diminished ability to empathize or feel compassion for others, often described as the negative cost of caring" (The British Psychological Society, 2020, pg. 4). Based on the described impact these syndromes have on one's professional quality of life and the resulting repercussions for service provision quality, it is important to research their prevalence in individuals working with FDP.
Despite ongoing issues related to the conceptualization and operationalization of STS and burnout, these two constructs have been shown to be distinct concepts that can be combined to form a measure of compassion fatigue (Geoffrion et al., 2019). As such, the current review focuses on the concepts of STS and burnout as two separate experiences. The term STS will be used in reference to the trauma symptomology a worker may experience as a result of being exposed to the traumatic accounts of FDP, making it distinct from burnout. Burnout remains distinct from STS in that it does not require a worker to be exposed to potentially traumatic narratives.
To date, no review of which we are aware has systematically identified and pooled prevalence data regarding levels of burnout and STS in professionals and volunteers working with FDP. The current systematic review addresses this knowledge gap. This insight has important implications with regard to professional quality of life, resource allocation, and recommendations to address these areas of need. The current review also identifies and critically evaluates measures of burnout and STS prevalence, with a focus on their capacity to conceptualize and operationalize these concepts. In the absence of a consistent diagnostic approach to identifying burnout in workers and varying approaches to operationalizing STS, a discussion is presented to explore the pros and cons of the measures identified within the review process, how measure selection impacts reported prevalence, and the limitations thereof when attempting to estimate pooled prevalence.

Eligibility Criteria
We included studies that collected data from participant pools defined as professionals and/or volunteers of any age who worked with client groups described as having experienced psychological trauma in any degree, description, or severity, labeled as asylum seekers, refugees, internally or externally displaced individuals, forced migrants, refused asylum seekers, or refused refugees. Studies were excluded if the participant pool did not reflect individuals who were primarily working directly with the previously defined client group at the time of assessment.
The review considered studies in which the prevalence of burnout and/or STS were reported or able to be obtained. Studies that reported data collected from any region and obtained using validated measures only were included. Studies that reported data collected using nonvalidated measures were excluded, as were studies that reported purely qualitative findings and those that used single case study designs. Peer reviewed manuscripts, defined as those published in an externally peer reviewed journal, that were written in the English language were considered. Manuscripts that were not peer reviewed, including grey literature, unpublished theses, and dissertations, were excluded. No limitations were placed on the publication date of the manuscript.
Searches were conducted in September 2019. We searched CINAHL Complete, E-Journals, ERIC, MEDLINE Complete, OpenDissertations, PsycARTICLES, and PsycINFO were searched historically to September 2019. The list of search terminologies used can be viewed in Table 1. All search terms from each individual column were combined using the search term "OR." The combined results of each column were then combined using "AND." As such, all papers that contained one or more search terms from each column in any part of the manuscript were identified for inspection. We included additional search parameters, such as "apply related words" and "apply equivalent subjects," to increase the inclusivity of the search. Two reviewers independently assessed the titles and abstracts of the generated papers against the eligibility criteria (Fritha Roberts, Jennifer Lee). The concordance rate between the reviewers was 96.7%. The remaining 3.3% of the papers (n = 9), for which the reviewers did not reach an agreement, were reviewed by the research team, and a final decision was reached. The reference lists of the final pool of included papers were screened by the first reviewer for further relevant papers. All identified papers generated from the reference search that either met or were close to meeting the inclusion criteria were sent to the full research team for confirmation. From the references search, a further search term, "torture survivor," was identified, as it was observed to be used within the research literature in reference to FDP. The original search was re-run to include this additional search term, and the resulting new papers were screened by the first reviewer.

Data Analysis
Where available, the prevalence data were extracted from the identified papers. For instances in which the prevalence data were collected but not reported or further clarifications were needed, the author or authors of the paper were contacted. Statistical heterogeneity was assessed using I 2 . Two prevalence meta-analyses were conducted using the statistical program OpenMetaAnalyst (Wallace et al., 2012). The first meta-analysis pooled burnout prevalence data, and the second meta-analysis pooled STS prevalence data. A random-effects model was used for both meta-analyses. For each comparison, the pooled prevalence was calculated and presented with 95% confidence intervals. Data were presented through forest plots. We did not construct a funnel plot to assess small sample size publication bias because no single outcome measure was assessed by 10 studies or more, making such a graph meaningless (Sterne et al., 2011).

Search Strategy Results
In total, 370 papers were identified via the initial search strategy. A total of 15 studies met all the inclusion criteria (Table 2). A further 35 papers were identified following the inclusion of the search term "torture." None of these additional papers met the inclusion criteria ( Figure 1).

Assessment of Methodological Quality
Papers selected for retrieval were assessed for methodological quality by the first reviewer (Fritha Roberts), using Items 1-5 and 9-11 of the Critical Appraisal Skills Programme (CASP) cohort study tool (CASP, 2018). Further items were not included as they did not apply to the observational design of the studies under appraisal or were qualitative and thus could not be rated as "yes," "no," or "can't tell" for the purposes of reporting and assessing rater reliability. Answers to items rated as "yes" were deemed satisfactory, whereas answers to items rated as "no" or "can't tell" were deemed unsatisfactory. There was a range of 25%-100% (M = 71.9%) satisfactory answers to the selected appraisal questions, indicating a varying but, on average, acceptable methodological quality within the studies identified. A second reviewer (Bonnie Teague) checked a sample of 20% of the final pool of papers (n = 3) to assess the reliability of the first reviewers' ratings of methodological quality. The first reviewer identified these three papers as being difficult to rate. The ratings were observed to be 100% concordant on satisfactory (i.e., "yes") versus unsatisfactory (i.e., "no" or "can't tell") answers between the two reviewers. This confirmed the validity of the first reviewer's assessment of methodological quality.

Measures Identified in the Included Papers
Two measures were used to assess STS only: the German questionnaire for secondary traumatization (i.e., Fragebogen für Sekundäre Traumatisierung [FST]; Daniels, 2006;Weitkamp et al., 2014) and the Secondary Traumatic Stress Scale (STSS; Bride et al., 2004). The Maslach Burnout Inventory (MBI; Maslach et al., 1996) was used to assess burnout only. The revised fourth edition and fifth edition of the Professional Quality of Life Scale (ProQOL; Stamm, 2005Stamm, , 2009Stamm, , 2010 were used to assess both burnout and STS. FST. The FST is a standardized, 31-item self-report questionnaire. Participants are instructed to rate items either based on the last week or on the most distressing week in relation to their work, scoring items on a 5-point Likert scale ranging from 1 (never) to 5 (very often). The FST consists of five subscales: Intrusion, Avoidance, Hyperarousal, Parapsychotic Sense of Threat, and PTSD Comorbidities. Of the five included studies that used this measure, one reported an alpha coefficient for internal reliability of .94 (Weitkamp et al., 2014), indicating an excellent internal consistency. A total score of 65-82 is classified as moderate STS, and a total score of 83 or higher is classified as "severe" STS (Weitkamp et al., 2014). The number of participants who reported scores falling within the moderate and severe ranges were combined and used to indicate the presence of STS for the purpose of the meta-analysis. These cutoffs were selected due to the observably lower prevalence rates 15 full-text articles assessed for eligibility Removed: • Participant pool not primarily working with FDP (n = 1) • Used the same participant pool as another included study (n = 1) Included: • References of all papers inspected for further papers (n = 2; references also checked).
Search rerun to include key term "torture" in reference to the term torture survivor identified through references search; 35 new papers identified; removed: • Nonexperimental designs (n = 20) • Did not focus on working directly with FDP (n = 13) • Only qualitative data reported (n = 2).
No further papers identified. reported when using the FST as compared to other measures of STS.

STSS.
The STSS is a 17-item, self-report questionnaire scored on a 5-point Likert scale ranging from 1 (never) to 5 (very often). Participants are asked to rate items based on the past week. The STSS consists of three subscales: Intrusion, Avoidance, and Arousal. Of the three included studies that used this measure, all reported on the reliability, with Cronbach's alpha values of .93 (Espinosa et al., 2019), .95 (Lusk & Terrazas, 2015), and .93 (Yeunhee, 2017) indicating excellent internal consistency. A total score of less than 28 (i.e., at or below the 50th percentile) indicates little or no STS, a score of 28-37 (i.e., 51st-75th percentile) indicates mild STS, a score of 38-43 (i.e., 76th-90th percentile) indicates moderate STS, a score of 44-48 (i.e., 91st-95th percentile) indicates high STS, and a score of 49 or above (i.e., above the 95th percentile) indicates severe STS (Bride, 2007). For the purpose of the meta-analysis, the number of participants who reported scores falling within the moderate to severe ranges (i.e., above the 75th percentile) were combined and used to indicate the presence of STS. These cutoffs were selected to align with the 75th percentile cutoff referenced within the ProQOL (Stamm, 2010), with an aim of increasing the homogeneity of these data in preparation for meta-analysis.

MBI.
The MBI is a 22-item questionnaire on which responses are scored using a 7-point Likert scale ranging from 0 (never) to 6 (every day). The MBI consists of three subscales: Emotional Exhaustion, Cynicism (previously labeled Depersonalization), and Professional Efficacy (previously labeled Personal Accomplishment). These scales are reported individually and cannot be combined. Of the three subscales, Emotional Exhaustion was previously considered to most closely reflect a measure of experienced work stress (Maslach et al., 1996); however, more recently, the Cynicism subscale has been reported to more closely reflect the negative endpoint of burnout . Of the two included studies that used this measure, one reported the reliability of the Cynicism subscale as Cronbach's α = .80 (Yeunhee, 2017), indicating a good internal consistency. On the Cynicism subscale, a score of 5 or below indicates low-level burnout, a score of 6-11 indicates moderate burnout, and a score of 12 or higher indicates high-level burnout. For the meta-analysis, the number of participants who reported scores falling within the high-level burnout range on the Cynicism subscale was used to indicate the presence of burnout.
ProQOL. The most recent version of the ProQOL (i.e., ProQOL-V), is a standardized, 30-item self-report questionnaire on which items are scored on a 5-point scale ranging from 1 (never) to 5 (very often). The previous version of the ProQOL (i.e., ProQOL R-IV; Stamm, 2005) is vastly similar to the ProQOL-V but uses slightly different item phrasing and a 6-point Likert scale (0 = never; 5 = very often). The Pro-QOL consists of three 10-item subscales: Compassion Satisfaction, Burnout, and STS (referred to as Compassion Fatigue in the ProQOL-IV). As currently theorized within the most recent ProQOL manual (Stamm, 2010), scores from the Burnout and STS subscales can be combined to produce a measure of compassion fatigue or analyzed separately. Using the t-score table reported in the concise ProQOL manual (Stamm 2010), the following scores can be observed at the 25th and 75th percentiles, cited as the thresholds for the following cutoffs: 0-15 for low, 16-25 for average, and 26-50 for high-level burnout; and 0-7 for low, 8-16 for average, and 17-50 for high-level STS. Of note, there is an error within the ProQOL-V (Stamm, 2009;) that incorrectly cites the same cutoffs for the Compassion Satisfaction, Burnout, and STS subscales. This raises issues when considering the validity of prevalence data reported using the ProQOL-V for the purposes of pooling within a meta-analysis. Of the seven included studies that used this measure, five reported on the reliability, with Cronbach's alpha values ranging from .66 to .86 (James et al, 2014;Kjellenberg et al., 2014;Lusk & Terrazas, 2015;Mehus & Becher, 2016;Raynor & Hicks, 2019), indicating internal consistency ratings for the Burnout subscale ranging from questionable (i.e., Cronbach's α = .66; Mehus & Becher, 2016) to good (i.e., Cronbach's α = .84; Raynor & Hicks, 2019) and internal consistency for the STS subscale ranging from acceptable (i.e, Cronbach's α = .70; James et al, 2014) to good (i.e., Cronbach's α = .86; Raynor & Hicks, 2019). For the meta-analysis, the numbers of participants who reported scores that fell above the 75th percentile (i.e., "high" cutoff) were used to indicate the presence of burnout and STS.

Meta-Analyses
For studies in which further clarification or data were needed for the meta-analysis, the study author or authors were contacted. Following return communications, further prevalence data were able to be obtained or clarified for two outcomes of burnout (Kjellenberg et al., 2014;Raynor & Hicks, 2019) and six outcomes of STS (Espinosa et al., 2019;Kindermann et al., 2017;Kjellenberg et al., 2014;Raynor & Hicks, 2019;Weitkamp et al., 2014). We were unable to obtain further information regarding two burnout outcomes (n = 8; James et al., 2014; n = 179, Yeunhee, 2017) and one STS outcome (n = 8, James et al., 2014) due to a lack of author response. It should also be noted that STS outcome data were only able to be obtained from a subsample of 165 participants out of the total 196 recruited in Weitkamp et al.'s (2014) study.
Due to the errors found within the ProQOL-V manuals (Stamm, 2009(Stamm, , 2010, all ProQOL-V prevalence data were deemed to be potentially invalid until the cutoffs used could be confirmed. Emails were sent to the authors whose findings were potentially affected to request access to their datasets (Lusk & Terrazas, 2015;Mehus & Becher, 2016;Posselt et al., 2019). All but one (n = 31, Lusk & Terrazas, 2015) of the ProQOL-V data sets were able to be obtained and/or checked for the number of participants who fell within each cutoff range as previously defined. As such, the ProQOL data for this study were excluded from the analysis. As Lusk and Terrazas (2015) also reported STS data using the STSS, the STSS outcome data were extracted for the purpose of the meta-analysis. This left a total of three burnout outcomes (n = 218) and one STS outcome (n = 8) outstanding.

Burnout
Two prevalence meta-analyses were conducted. The metaanalysis that combined burnout prevalence data collected using the MBI and ProQOL Burnout scale included six studies (  heterogeneity found between studies, Q(5) = 112.42, p < .001, I 2 = 95.6% (Figure 2). A subanalysis was conducted to assess differences by assessment measure. The assessment???? tool used to measure burnout was observed to have an observable effect on prevalence, with burnout prevalence higher when the ProQOL was used, 32.1%, 95% CI [10.0%, 54.3%], compared to the MBI, 18.9%, 95% CI [13.7%, 24.1%] (Table 3). Following the removal of two studies with a quality rating of 60% or less (i.e., 25%, Raynor & Hicks, 2019;50%, Mehus & Becher, 2016), the pooled prevalence of burnout was 26.4%, 95% CI [17.7%, 35.1%] (n = 341), with a large degree of heterogeneity found between studies, Q(3) = 6.69, p < .001, I 2 = 55.1%, indicating that the meta-analysis was not overly affected by study quality ratings. We chose the 60% threshold to allow for the removal of the studies with the most questionable methodological quality while retaining enough studies to provide a meaningful pooled prevalence.

Discussion
The results of the present systematic review and metaanalyses demonstrate the pooled prevalence of burnout and STS in individuals working professionally and voluntarily with FDP.
A total of 15 studies met the criteria for the systematic review. Prevalence data were obtained from 14 of these studies, including 13 outcomes for STS and six outcomes for burnout. The pooled prevalence of burnout was found to be 29.7%, 95% CI [13.8%, 45.6%], and the pooled prevalence of STS was found to be 45.7% 95% CI [26.1%, 65.2%]. These findings indicate that just under one-third of the population sampled reported high levels of burnout, and just under half of the participants reported moderate-to-severe levels of STS. These findings are supported within the literature, which evidences that individuals working with FDP experience high levels of burnout, STS, and compassion fatigue (Apostolidou, 2016;Jones & Williamson, 2014;Robinson, 2013).
A significant effect of measure was observed for both the burnout and STS meta-analyses, with observable clustering in the data, as shown within the forest plots. We found that the ProQOL Burnout scale produced a higher estimate of pooled burnout prevalence than the MBI Cynicism scale, suggesting that of the two measures, the ProQOL has a lower threshold for classifying high burnout. The FST was found to produce the lowest estimate of pooled STS prevalence, whereas the Pro-QOL STS scale was found to produce the highest as compared to the other measures of STS. This suggests that the FST has a higher threshold for classifying the presence of STS, whereas, again, the ProQOL has a lower threshold. This brings the validity of the ProQOL 75th percentile cutoffs into question with regard to a potentially oversensitive threshold for assessing the risk of burnout and STS. This propensity toward overinclusiveness is discussed in the ProQOL-V manual (Stamm, 2010), which argues that a lower cutoff threshold is preferable, as it is better to include someone who is not at risk than to exclude someone who is at risk of burnout or STS when screening for these areas of difficulty. When considering which measure to use to assess STS or burnout prevalence, it is worth noting the varying degrees of prevalence these measures produced and comparing them with the reliability of the measure and potential complexities of the population of interest. For example, although the FST was found to have excellent internal consistency, it currently only appears to be validated in the German language with participants working in Germany. It also appears to have a higher threshold for classifying STS, meaning that lower prevalence figures are likely to be observed. The ProQOL STS and Burnout subscales could be used instead, with careful consideration of the cutoffs and an awareness of how comparable research has applied them. It should be noted, however, that the ProQOL had the poorest internal consistency ratings, which ranged from questionable to good. The proposed construct validity of the ProQOL is also debatable, with studies reporting a two-factor model as opposed to the proposed three-factor model of professional quality of life (Geoffrion et al., 2019), leading to questions regarding the ability of the ProQOL to differentially conceptualize and operationalize burnout and STS and discriminate these concepts from other concepts (Cieslak et al., 2014). Despite this concern, the ProQOL has been widely used and is available in numerous languages, making it appealing for global research. The STSS may be the tool of choice when assessing STS prevalence, as it has numerous, clear, and welldefined cutoff thresholds, selected with reference to percentile scores. It has also been shown to demonstrate excellent internal consistency, as measured in three studies within the present review, and very low levels of heterogeneity between studies (i.e., I 2 = 0.7%). A further benefit is that the thresholds used to categorize high levels of STS appear to sit conservatively between the lower ProQOL and higher FST thresholds. At 17 total items, the STSS is also briefer than the 31-item FST. This conclusion is supported by the results of a further systematic search (Watts & Robertson, 2014), which suggests that the STSS is the only validated measure for assessing STS and recommends its wider application.
Based on the results of the present review, we were unable to draw conclusions regarding which measure of burnout may be preferable. The current literature, however, points away from the ProQOL and toward the MBI as the more robust, valid, and rigorously tested measure of burnout (Cieslak et al., 2014;Watts & Robertson, 2015). As such, future research seeking to assess burnout may consider using the MBI in favor of the Pro-QOL Burnout subscale.
To our knowledge, there is no current publication that reports on the prevalence of STS and burnout in the general public; thus, there is no comparator. However, when comparing the current prevalence figures to those reported in other helping workforces in highly stressful occupations, the following results can be observed. First, in a meta-analysis that combined MBI data collected from 464 nurses working in obstetrics and gynecology, the pooled prevalence of high burnout was lower than the pooled prevalence found in the current study (i.e., 19% vs. 29.7%;De la Fuente-Solana et al., 2019) as measured using the MBI Cynicism scale, which is the subscale most closely related to burnout . In a further meta-analysis, which pooled a sample of 1,600 pediatric nurses, 21% of participants reported high burnout on the MBI Cynicism scale (Pradas-Hernández et al., 2018), which is also observably lower than the present pooled prevalence of 29.7%. In a study that assessed STS in a sample of 128 trauma nurses, 27.3% reported high-level STS on the ProQOL R-IV (Hinderer et al., 2014), which was lower than the present pooled prevalence of 45.7%. In a further study that assessed STS using the STSS in a sample of 63 United Kingdom-based police officers, 27% of the total sample reported STS within the moderate-to-severe range (MacEachern et al., 2019), which was also lower than the present pooled prevalence of 45.7%. In a third study, which assessed STS in a sample of 118 clinicians based in a hospital treating traumatically injured patients, 19.5% of the sample reported STS within the moderate-to-severe range of STSS scores (Roden-Foreman et al., 2017), which was again lower than the present pooled prevalence. Taken together, this research suggests that individuals working globally with FDP experience higher levels of burnout and STS compared to the majority of those in other helping professionals.
The present results and comparators suggest that further support should be offered to individuals working with FDP in light of the high risk for burnout and STS, with a focus on preventing, monitoring, and treating these areas of difficulty. This support would hopefully allow these professionals and volunteers to continue to provide compassionate, high-level service provision while maintaining their own well-being and work satisfaction. In addition, it may also help reduce potential staff turnover, allowing for improved continuity of service and service-user experience. As we observed that individuals working with FDP experienced higher levels of burnout and STS than the majority of other helping professionals, staff wellbeing support interventions from other similar professions could be adapted and implemented with those working with FDP.
The strengths and limitations of the review and metaanalyses should be discussed. Due to the observable effect of measure on reported prevalence, the ability to pool different measures within a meta-analysis is debatable, thus bringing the validity of the estimated pooled prevalence reported herein into contention. This was reflected in the high degree of statistical heterogeneity observed when pooling the measures (i.e., I 2 values). The differences in measure-specific reported prevalence may be due to inconsistency in the score cutoffs, the validity and/or reliability of the measures used, or heterogeneity across the differing samples from which these data were extracted. For example, it is important to note that all five studies that used the FST were conducted in Germany, meaning that the data extracted may be less generalizable and comparable to other populations. The scores observed in studies that used the FST may then be viewed as potentially more reflective of the current context of individuals working with FDP in Germany and the overall political stance of acceptance toward refugees in Germany, which is the country shown to host the largest number of refugees in Europe (UNHCR, 2019). The pooled prevalence of studies that used the FST (15.3%) is observably much lower than the pooled prevalence of studies that used the ProQOL (78.5%) and STSS (48.1%), suggesting that pooling data derived from the FST with other measures may have negatively skewed the pooled prevalence identified via the STS meta-analysis. When selecting which score cutoffs to pool for the meta-analysis, we used the 75th percentile as a cutoff when possible, with an aim to increase the homogeneity of the data. This was achievable with the STSS and ProQOL; however, it was not possible with the FST or MBI. To account for this, subanalyses of measures were carried out, and data were reported for each pooled measure. This allows the reader to consider the overall pooled prevalence in the context of how this relates to the measure. The ability to pool measures may have also been affected by the differing and numerous conceptualizations and operationalizations of burnout and STS, as previously discussed. For example, the common unifying quality in the definition of burnout is exhaustion (Cieslak et al., 2014). As the present review pooled data obtained from the MBI Cynicism subscale, which is reported to deviate from this common approach to defining burnout (Cieslak et al., 2014), the resulting pooled prevalence figures may not be representative of the more common definition of burnout.
When conducting the sensitivity analyses, we found that some studies had a larger impact on the estimate of pooled burnout prevalence. In part, this may be because there were fewer studies that could be pooled for the burnout meta-analysis as well as a high degree of outstanding data, meaning that each study held more weight in regard to the pooled prevalence estimate. On inspecting the six studies included in the burnout meta-analysis, the quality rating was found to vary from 25% to 100%, with a mean of 75%, indicating that there was some variance in the methodological quality of the studies, which may have impacted the reported prevalence. Upon inspecting the study that had the largest negative impact on pooled prevalence (Mehus & Becher, 2016), we found that this study had a quality rating of 50%, which may have impacted the reported prevalence. Similarly, upon inspecting the study that had the largest positive impact on pooled prevalence (Raynor & Hicks, 2019), we found that this study also had a lower quality rating (i.e., 25%), which may also have affected the reported prevalence. In defense of this, after the removal of poorer-quality studies, including those by Mehus and Becher (2016) and Raynor and Hicks (2019), the pooled prevalence of burnout was reduced by 3.3%, suggesting that the study quality ratings did not overly affect the pooled prevalence.
A significant strength of the present review was the systematic and high degree of rigor with which it was conducted. An initial search and abstract screening were conducted independently by two reviewers, with a high level of agreement in screening outcomes (96.7%). Inconsistencies were assessed by the research team. Where the search strategy was found to be limiting, this was updated with additional search terms to improve inclusivity. The methodological quality of the papers was assessed, 20% of which were verified by a second reviewer, with 100% concordance for satisfactory versus unsatisfactory answers. All but one paper identified as having collected prevalence data were able to be included, with only four outcomes outstanding (n = 3 burnout, n = 1 STS). All of these elements contributed to the rigorous, systematic, and inclusive nature of the search and the validity of the findings.
The present systematic review and meta-analyses highlight the high levels of burnout and STS experienced by professionals and volunteers working with FDP, suggesting that these syndromes require systematic attention in those working with FDP. Future research could explore how these syndromes may be mitigated and evaluate the efficacy and acceptability of any resulting interventions. It may also be beneficial to monitor burnout and STS in individuals working with FDP, with the STSS suggested as the measure of choice to assess STS and the MBI as the measure of choice to assess burnout. We recommended, that where possible, the ProQOL is not used to measure STS or burnout and that when it is used as a comparator, careful consideration of the cutoffs used is applied.

Open Practices Statement
The review reported in this article was not formally preregistered. The data have not been made available on a permanent third-party archive; requests for the data should be sent to the lead author at fritha.roberts@nsft.nhs.uk.