Emotion constructs and outcome measures following false positive breast screening test results: A systematic review of reporting clarity and selection rationale

(i) To systematically identify constructs and outcome measures used to assess the emotional and mood impact of false positive breast screening test results; (ii) to appraise the reporting clarity and rationale for selecting constructs and outcome measures.


| BACKGROUND
Since the introduction of population-based breast screening programmes, there has been ongoing debate about whether the benefits outweigh the harms. 1,2National breast screening programmes aim to reduce breast cancer mortality and minimise aggressive treatments through early-stage cancer detection, but also produce harms such as overdiagnosis and false positive screening test results. 1 The latter occurs when a screening abnormality requires additional diagnostic tests that ultimately indicate cancer is not present. 3Globally, millions of women will receive false positive breast screening test results.In England alone, approximately 65,000 women receive false positive screening test results each year under the National Health Service Breast Screening Programme. 3ceiving a false positive screening test result can be distressing for women.The impact of such results is often broadly conceptualised (e.g.psychological impact) and measured in terms of emotional and mood disruptions (e.g.anxiety, depression).Robust evidence on such impact is needed to understand the balance of benefits and harms from screening.5][6][7][8][9] There is some evidence that diseasespecific outcomes are present longer-term. 4However, other studies have observed relatively transient effects of a few weeks to months. 6,8,9Therefore, there are inconsistencies in the evidence base regarding the prevalence and duration of quantitative effects.5][6][7][8][9] General outcome measures, as well as non-validated outcome measures with unknown or inadequate psychometric properties, may be less sensitive and therefore more likely to produce no effect.
The inconsistent findings from studies employing quantitative questionnaire measures appear to contrast with the relatively more consistent qualitative literature.A systematic review of qualitative research of women's false positive experiences indicated that, while reactions varied, many women encountered significant anxiety, worry and intrusive thoughts about further tests, breast cancer and mortality. 10While relief often followed a false positive screening test result, many women experienced it as a 'close call' with cancer. 10me experienced years-long worry, resurfacing before subsequent screenings, implying a persistent and intrusive impact. 10e possible discrepancies both within the quantitative literature and also compared to the qualitative literature may stem from limitations related to quantitative outcomes and outcome measures.
Previous systematic reviews and related research have identified the following key limitations of this field: (i) a lack of clear conceptual understanding of and consensus on the most pertinent outcomes related to the false positive experience, 11 (ii) explanations that do not integrate into theoretical accounts of why these emotions occur, 11 (iii) outcome measure heterogeneity, 8,12,13 (iv) the widespread use of possibly less sensitive general outcome measures 12 and (v) a lack of clinical validation of outcome measures. 6,14Moreover, systematic reviews generally result in heterogeneous findings in meta-analysis, which are complex to interpret.Instead, reviews are typically limited to narrative synthesis of findings.
A critical review of the false positive breast screening literature, published two decades ago, emphasised the need to resolve conceptual ambiguities and contradictions in common emotion constructs (e.g.fear, anxiety and worry) to facilitate the empirical differentiation, accumulation and testing of related outcomes in a more systematic manner. 11Including relevant theory in false positive research can help to define and clarify the concepts and constructs under investigation and provide a framework for understanding the relationship between specific constructs and variables.It follows that by grounding the choice of outcome measure in theory, researchers can ensure the measurements align with their theoretical understanding of the emotional impact of false positive screening test results.However, there has been no systematic investigation of the theoretical or methodological reasoning used to justify the measures that researchers choose to assess the emotional experience of false positive screening test results.This task could shed light on our understanding of the emotional impact of false positive screening test results.
There is need to update the outcome measurement field with a thorough and comprehensive systematic review of the available literature on false positive breast screening test results.Previous relevant systematic reviews pertaining to measurement issues were published approximately two decades ago and the searches are thus outdated. 6,12Recent systematic reviews of psychological outcomes and outcome measures have reviewed studies across the spectrum of cancer screening types, but only identified a small minority (i.e.≤10 studies) of the relevant literature on false positive breast screening test results. 13,14Therefore, the present systematic review aimed to identify the range of emotion and mood constructs and the associated measures employed in the false positive breast screening test result field to date, and to precisely describe measurement reporting practices related to constructs and outcome measures.The specific objectives were to: a) Systematically identify relevant emotion and mood outcomes and outcome measures used in primary research of the impact of false

| METHODS
The systematic review protocol was registered on PROSPERO (CRD42023394949) and the Open Science Framework (OSF; DOI 10.17605/OSF.IO/TQKBA).The present report follows the PRISMA guidelines 15 (Supporting Information S1).

| Search strategy
The search strategy was adapted from a relevant systematic review 4 (Supporting Information S2).The lead author conducted searches on the electronic databases MEDLINE, CINAHL and PsycINFO from 1970 (when breast screening using mammography was introduced) to 22 November 2022.The search terms were tailored to the indexing language of each database, including medical subject headings and other index terms, key words and synonyms.The review protocol was registered after searches were complete, to allow the protocol to be finalised, and before any data extraction began.
Retrieved articles were managed in EndNote.The lead author removed duplicates and screened all titles and abstracts.The full texts of potentially eligible articles were reviewed against the eligibility criteria.A second reviewer (MH) reviewed 20% of articles at the full-text stage.Agreement was reached through discussion of the article in question against the eligibility criteria.Forward and backward citation searches and hand-searching of the reference lists was conducted for all included articles.

| Eligibility criteria
Eligibility criteria were based on the Population, Intervention, Comparison, Outcomes, Study design (PICOS) tool (Table 1).Only fulltext, English language articles published in a peer-reviewed journal were included.
Outcome measures of emotion and mood were eligible.The American Psychological Association (APA) defines emotions as complex patterns of response, encompassing experiential, behavioural and physiological elements. 16These reactions arise when individuals confront personally significant matters or events, during which the emotion's specific nature (e.g.fear, shame) is determined by the event's significance. 16By contrast, according to the APA, moods represent (a) brief, low-intensity emotional states (e.g.cheerfulness, irritability) and/or (b) a predisposition to prolonged emotional responses, potentially lasting hours, days or weeks, often without a clear cause. 17[21] General and disease-specific outcome measures were eligible.
General outcome measures are broad, standardised measures used to assess health-related outcomes that are applicable across various health conditions or interventions (e.g. a measure of general anxiety).
Disease-specific outcome measures are tailored to a particular health condition or disease, and assess outcomes specific to that condition (e.g. a measure of breast cancer-specific anxiety).

| Data extraction
Templates for data extraction sheets 1-4 are available in Supporting Information S3.

| Primary study characteristics
The lead author recorded all article details in data extraction sheet 1 including those related to the inclusion criteria, reported constructs and any outcome measure amendments (e.g.use of only certain questionnaire subscales).When reported, details on sample sociodemographic variables associated with disadvantage were extracted into an Excel sheet, across factors described by the acronym PROGRESSþ. 22

| Outcome measure development and their characteristics
Once relevant standardised outcome measures were identified, the original outcome measure development articles and user manuals were obtained (see reference list in Supporting Information S4).From these articles, the following outcome measure details were extracted in data extraction sheet 2: the purpose or focus, original population, domains, subscales, number of items, response options and scoring.
Psychometric properties of the standardised outcome measures were recorded, specifically reliability (i.e.test re-test, internal consistency), content validity, construct validity, and structural validity.guidelines. 23See Supporting Information S4 for reference list of publications that provide evidence on the psychometric properties of measures.operationalise a construct, which, for the purposes of this review, can be delineated into standardised outcome questionnaire measures and non-standardised single scales and single items (see eligibility criteria in Table 1.).

| Constructs measured by outcome measures
For each standardised questionnaire measure, two reviewers (HAL, SH) independently extracted the target construct from the descriptions provided in the original questionnaire measure development articles.These details were recorded in data extraction sheet 3. Some articles employed multidimensional questionnaires as outcome measures, comprising multiple subscales, each measuring an individual construct.For example, the 90-item Symptom's Checklist (SCL-90-R) comprises nine subscales each measuring a specific construct (e.g.'anxiety') that can be aggregated into a total score measuring the higher-level construct 'psychopathology'.For these, it was necessary to extract the target constructs from subscale labels, subscale descriptions and/or subscale item content.All constructs identified within a questionnaire measure were extracted unless the questionnaire had been partially administered, in which case only the that are comprised within a multidimensional questionnaire measure of an overall construct (e.g. a 'depression' subscale within a questionnaire measure of health-related quality of life).� Non-standardised outcome measures of emotion and mood constructs, e.g. a single scale or single item pertaining to an emotion or mood construct developed or repurposed by study authors for the purposes of their study.(2) Any text in the primary study report related to authors' rationale, justification and conceptual considerations for their choice of emotion and mood constructs and outcome measures (Box 1).

Population
� Outcome measures measuring behaviour (e.g.screening attendance), physiology (e.g.blood cortisol levels) or cognition (e.g.breast cancer risk perceptions).� Outcome measures whereby a clinician rated participants or provided answers for participants.� Outcome measures not used as research outcome measures (e.g.measures used to screen potential participants for study eligibility).

Study design
� Primary studies of any quantitative or mixed-method design that meet the above criteria.
constructs assessed by the employed subscales or items were extracted.
To identify the target constructs of non-standardised outcome measures, the reviewers (HAL, SH) coded the primary study authors' presentations of the single item and content of items in a scale.
Extracted content and decisions were compared and percentage agreement calculated.This process was necessary to produce an accurate and comprehensive list of relevant constructs assessed in the false positive breast screening test result field.

| Quality appraisal
As the present review aimed to assess the quality of reporting and rationale for outcome measure selection, quality appraisal focussed on this, rather than a traditional overall appraisal of study quality.

| Rationale for conducting an appraisal of measurement reporting practices
The appraisal of measurement reporting practices in the primary study articles was structured by five overarching questions.These questions were formulated based on the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guidelines for reporting observational studies 24 and additional published guidance to promote transparent reporting of measurement practices. 25These guidance papers advocate for the explicit and transparent reporting of constructs and their associated measures employed as outcomes, 24,25 as well as the reasons for all outcome and measurement decisions, to facilitate judgements of study validity. 25e appraisal questions were iteratively developed, piloted, and refined (see Supporting Information S5 for additional detail on this process).The five questions were divided into distinct 'items' to facilitate consistent application across articles and given response options: 'yes', 'somewhat', 'no' or 'can't tell' (based on the information reported).
Responses to items and justifications for these were recorded in data extraction sheet 4 (Supporting Information S3).The final questions and items are presented in Box 1. Items 1-5 and 15 were answered for each construct, while items 6-7, 10-11, 13-14 were answered in relation to each outcome measure.Items 8-9 and 12 specifically relate to the reporting of non-standardised scales and items.
The lead author appraised the measurement reporting practices in the included articles.This appraisal was performed at the article level (rather than by study).This task involved identifying and assessing article details (or study protocol reports) concerning the clarity of the selected constructs and outcome measures.Further, information related to the theoretical and practical justifications given by primary study authors for selecting constructs and outcome measures was assessed.
2.5.2 | Assessing quality of rationale for constructs Assessment of whether there was a clear rationale for constructs (Item 3) indicated that some articles employed multidimensional outcome questionnaire measures that comprised several scales measuring individual constructs, in order to evaluate a higher-order construct (e.g. an overall measure of quality of life).For these cases (k = 20, where 'k' represents the number of articles), a clear rationale for the higher-order construct was considered sufficient and these details were thus appraised.Similarly, in such cases, these articles were assessed on whether they had clearly reported the overarching construct in the Aim (Item 4); these articles were not expected to report every questionnaire subscale construct in the Aim.interpreting a measure as a measure of anxiety when it more accurately represents a different construct, such as worry or stress).
Practical rationale for outcome measures included explicit considerations of their suitability and appropriateness for a given study.
Relevant factors included (but were not limited to) the suitability for a breast screening sample, the decision between general or diseasespecific and standardised or non-standardised outcome measures, language compatibility and/or translation needs, cultural sensitivity, and the suitability for different cultural or linguistic groups within a sample, among others.Evidence of explicit psychometric considerations was identified and recorded, but not assessed for accuracy; the focus was solely on whether it was reported.

| Search results
The electronic database searches identified 4973 records (4609 records after duplicates removed).The lead author screened the full LONG ET AL.
texts of 185 articles.A second reviewer (MH) screened 20% (k = 37) full texts (68% kappa).In total, 47 articles (reporting 39 unique studies) were eligible for inclusion.See flow chart of study inclusion process in Supporting Information S6 and reference list of included articles in Supporting Information S4.

| Study characteristics
Study characteristics are reported in Supporting Information S7.In nine articles, the first measurement was taken before women attended initial screening.Of the 45 articles that reported specific outcome measurement follow up points, outcomes were measured up to six months (k = 24), up to 12 months (k = 12), up to 18 months (k = 1) and between 18 months and 3.5 years (k = 8) after the screening or recall results.

| Constructs extracted from outcome measures
See Supporting Information S9 for a diagram of how data were extracted and processed in this review.
In summary, two reviewers independently extracted a total of 75 unique constructs (93% agreement) from all outcome measures.See Supporting Information S10 and 11 for a list of these constructs, including the name and number of outcome measures that were identified as assessing each construct.8: If single or multiple items are developed by the study authors for the purposes of their study, the number of items is reported.

BOX 1 Questions and items used to appraise measurement reporting practices
9: If single or multiple items are developed by the study authors for the purposes of their study, the item wording (stem) is described.
D. Is a clear rationale for construct and measure provided?10: The measure selection is justified on appropriate theoretical grounds (e.g.explicit theoretical rationale is reported to justify decision to measure certain constructs using certain measures).

11:
The measure selection is justified on appropriate practical grounds (e.g.explicit practical rationale, including considerations of psychometric adequacy, is reported to justify decision to measure certain constructs using certain measures).
12: If single or multiple items are developed by the study authors for the purposes of their study, the process of developing the items is described (e.g.how developed and, ideally, why developed).
13: (when applicable) Any measure modifications are described.
14: (when applicable) Any measure modifications are justified (e.g.theoretical and/or practical considerations are reported).
E. Is consistent wording used to describe constructs?
15: The same or very similar terminology is used to introduce, define, and report specific constructs throughout study report, so that these remain distinct and clear to the reader (i.e.no conflating of constructs in a way that leaves reader confused or unclear about what has been measured or what results show).
Note.Responses to individual items were qualitatively synthesised to generate an overall response ('yes', 'somewhat', 'no' or 'can't tell') to each of the five questions.

| Outcome measures
In total, 22 standardised general outcome questionnaire measures and three standardised disease-specific outcome questionnaire measures were identified.

| Appraisal of measurement reporting practices
The key findings of the measurement reporting practices appraisal are reported below (see Supporting Information S12 for additional results not reported here (e.g. for Question E) and Supporting Information S13 for the full results at the article-level).

| Question A. Is it clear what the constructs are?
The reporting clarity of constructs was appraised a total of 157 times across the 47 articles, ranging from one to 12 constructs per article.
Notably, none of the constructs were explicitly defined.

| Question B. Is a clear rationale for constructs provided?
Overall, 26 (55%) of the 47 articles reported a clear case for the target constructs, justifying the selection of constructs.The rationale for investigating the construct general anxiety was relatively clear (60% of the time), while the rationale for investigating depression was generally lacking (88% of the time, with no explicit rationale provided at all in these 14 cases).A clear rationale for distress was generally reported (75% of the time) (Table 3).Almost half of the included articles employed a non-standardised scale or item.It was generally clear from the reporting what constructs and outcome measures were employed, but reporting related to why these decisions were made was lacking.Notably, anxiety was generally justified, but the rationale for depression was almost always absent.Similarly, while the name of outcome measures were clearly reported, any rationale for their selection was mostly not provided.Psychometric evidence for outcome measures was generally lacking, as were other practical justifications.Theoretical considerations for outcome measure selections were entirely absent.
As suggested by research conducted two decades ago, 11 the present findings indicate that explanations of the emotional impact associated with false positive screening test results typically do not integrate theory to explain why these emotions arise.Study rationale for measuring certain constructs generally centred on the empirical need to further investigate outcomes in an inconsistent field, and the specific reasoning for selecting certain constructs was often absent.
For example, while depression ranked as the second most investigated construct, it was arguably the least substantiated relative to other outcomes.10] Moreover, the frequent investigation of depression in this field appears to contrast with other cancer screening literatures in which depression is rarely measured, for example, the impact of lung screening 27 and human papillomavirus (HPV) in cervical screening. 28is raises the question of why depression is assessed following false positive breast screening test results when there is no evidence that it is a suitable outcome.
In line with previous review findings, 8,12,13 the present findings provide evidence of significant heterogeneity in outcome measure choices in this field.The current review explicitly maps out the range and frequency of outcome measure use to date.The STAI, HADS and PCQ remain the most used, in line with previous patterns of outcome questionnaire measure use. 12This review also demonstrates significant outcome heterogeneity in this field.A wide range of constructs were extracted from the outcome measures.Notably, many 'distinct' constructs were measured by more than one outcome measure.For example, general anxiety and disease-specific anxiety were measured by 13 and 11 different outcome measures respectively (Supporting Information S10).It is important not to assume that constructs such as disease-specific anxiety are equivalent solely because the instruments claim to measure the same construct (i.e. the 'jingle fallacy'). 25Moreover, considering disease-specific anxiety and worry as examples, these constructs were measured using different nonstandardised scales/items in nine and 10 articles respectively.
There is also a risk that scales whose titles or item content suggest they are measuring different constructs may in reality measure the same or closely related constructs (i.e. the 'jangle fallacy'). 25In this field, conclusions are often drawn across a body of research with little consideration of the similarity of specific outcomes measures based on narrative reviews of constructs with insufficient consideration of precisely what is measured.Instruments that measure different phenomena under the same construct name are found in the fields of depression 29 and emotion. 30As such, greater transparency regarding the theoretical and operational definition of a construct may help to prevent 'jingle-jangle'. 25

| Study limitations
The present review has several limitations.Non-English language articles and unpublished literature were excluded, therefore other relevant constructs and outcome measures may have been missed.
Only one reviewer extracted and appraised information related to measurement reporting practices.Although data accuracy was regularly checked, errors may have been introduced and the reliability of appraisal decisions was not assessed.However, even if minor errors exist, the overall pattern of results is likely to remain the same.The material extracted is included in Supporting Information S13 to evidence the reasons for appraisal decisions and to allow other researchers to further interrogate the evidence base.Lastly, to identify constructs, the non-standardised outcome measures were necessarily considered at the item-level, whereas constructs within standardised outcome questionnaire measures were extracted from information given in the original outcome questionnaire measure development reports.Some questionnaires were treated at the subscale-level and others at the item-level when it was not possible to identify specific constructs from a subscale label (e.g. a subscale broadly labelled 'psychological health').Thus, there may be additional constructs not identified here.
The current review also has strengths.The protocol was preregistered on PROSPERO and the OSF.It was conducted and reported according to recommended guidelines. 15The appraisal of measurement reporting practices was informed by published relevant guidelines and research on measurement reporting quality, and previous examinations of outcome questionnaire measure selection in false positive screening test result research. 8,11,12,24,25The guiding questions and items underwent several iterations to optimise their suitability and to facilitate consistent application across articles.

| Implications
Understanding the balance of benefits and harms from screening depends crucially on our ability to evaluate appropriate outcomes heterogeneity were first raised 20 years ago, 11,12 yet are still a problem today.We believe this field needs a clearer conceptual and theoretical understanding of the nature of any impact and a standardised measurement approach to investigate this (e.g. a Core Outcome Set [31][32][33]   Alternatively, the omission of these details and considerations could be explicitly discussed as a study limitation (e.g.see recommendations for discussing limitations specific to outcome and outcome measure choices 35 ).
While appropriate guidelines were used to inform the specific psychometric properties by which outcome questionnaire measures were characterised, we did not undertake an assessment of the methodological quality of questionnaires, as is advocated by COS-MIN.Future research could perform a more rigorous examination of the quality of questionnaire measures identified by the present review through application of the COSMIN Risk of Bias tool, as has been done in comparable and relevant research in the colorectal cancer screening field. 36ere is need to bring together experts (academic, public, patient, healthcare professional) in cancer screening harms, emotional processing, and scale measurement methodology to reach agreement on what to measure (outcomes) and how to measure it (outcome measures). 31Existing relevant psychological theories could be utilised, such as those related to emotional response and processing (e.g.appraisal theories of emotion 37,38 ); those that explain why cognitive affective processes such as worry may persist (e.g.Metacognitive theory 39 ); and stress, uncertainty and coping theories (e.g. Theory of Stress and Coping 40 and the Model of Illness Uncertainty 41 ).The relevant qualitative literature should be inductively applied to further inform our conceptual understanding. 10,42In the process, relevant socioeconomic and demographic factors associated with disadvantage (e.g.those covered within the PROGRESSþ framework) 22 that the present review indicated have been historically overlooked and underreported in this field could be prioritised and better described in future relevant research.
There is ongoing need to evaluate the psychological harms associated with screening, as is happening in the latest research, 43 and given recent advancements in novel cancer testing, diagnostics and technology that may one day be routinely implemented. 44,45

2 of 12 -
LONG ET AL. positive breast screening test results in population-based breast screening.b) Identify the specific constructs assessed by outcome measures.c) Describe the characteristics of outcome measures.d) Describe and appraise the primary study authors' measurement reporting practices and rationale for selecting constructs and outcome measures.
These properties were informed by the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) Construct refers to the conceptual component of a variable targeted by an outcome measure.Outcome measures are used to LONG ET AL.

2. 5 . 3 |
Assessing quality of rationale for outcome measuresPart of the appraisal involved assessing the primary study reports on information related to the rationale and justification for selecting outcome measures to investigate constructs.Theory-based rationale and practical rationale for outcome measures was identified.Theoretical rationale for outcome measures included explicit consideration by primary study authors of the extent to which the theoretical foundation of their selected outcome measure (e.g.State-Trait Anxiety Inventory (STAI)) aligned with the defined construct (e.g.anxiety), and any theoretical framework in which the construct was situated (e.g. the nomological network).In theory, if authors deemed that the outcome measure accurately represented the construct definition through its item content then it can be said to have acceptable construct validity.Misalignment between the operationalisation of an outcome measure and the conceptualisation of the target construct leads to confusion and misinterpretation (e.g. Study sample details according to the PROGRESSþ characteristics are reported in Supporting Information S8.In summary, the 39 studies included observational studies (k = 35), randomised controlled trials (k = 3) and a non-randomised controlled trial (k = 1).The 47 articles were published across five decades, from the 1980s (k = 1), 1990s (k = 13), 2000s (k = 15), 2010s (k = 17), to the 2020s (k = 1).

6 of 12 -
LONG ET AL.The most measured general constructs were anxiety (k = 20) and depression (k = 16).Anxiety and depression were also measured in a further four and two articles each, within outcome questionnaire measures of psychiatric morbidity and psychopathology.The third most measured general construct was distress (k = 4).The most measured disease-specific constructs were anxiety and worry (each k = 12).

3. 5 . 3 |T A B L E 2
Question C. Is it clear what outcome measures were used?Standardised general and disease-specific outcome questionnaire measures Forty (85%) articles employed standardised outcome questionnaire measures (ranging from one to three questionnaire measures per article).Among these, 25 (53%) articles used general questionnaire measures only, while 12 (26%) articles used disease-specific questionnaire measures only.Three (6%) articles used both standardised and disease-specific questionnaire measures.Standardised outcome questionnaire measures used in k ≥ 4. Note: BSI-53 and BSI-18: 53-and 18-item Brief Symptom Inventories, which are later abbreviated versions of the SCL-90-R.LONG ET AL. which were general anxiety and depression and disease-specific worry and anxiety.These were assessed by a variety of outcome measures, notably the STAI, Hospital Anxiety and Depression Scale (HADS) and Psychological Consequences Questionnaire (PCQ).
using appropriate measures.High outcome and outcome measure heterogeneity coupled with the absence of theoretical rationale for outcome measures has implications for our broader conceptual understanding of the emotional impact of false positive screening test results in breast screening.Issues regarding a lack of clear understanding of the most relevant constructs, the need to interpret the presence of effects in light of relevant theory, and outcome measure LONG ET AL.

15 -
item coding scheme developed for the purposes of the present review could be adopted by other researchers (e.g. to inform their understanding and consideration of measurement reporting practices in their own studies or other published articles, or as a systematic framework to appraise other measurement reporting practices).
� Adult (age 18þ years) females who have received a false positive screening test result from population-based breast screening.�Samples of women with false positive and other screening test results (e.g.normal, positive) from population-based screening (e.g.screening that occurs through organised national screening programmes with a defined age group of women) or symptomatic screening (e.g.opportunistic screenings that are scheduled in response to a potential breast cancer sign or symptom) services, when findings related to false positive screening test results from population-based breast screening are reported separately to other result types.Samples that solely included women who: � Were diagnosed with breast cancer (invasive and ductal carcinoma in situ (DCIS)).�Had not received their final screening test result.�Were deemed high risk for breast cancer due to family history or genetics.Intervention � The emotion and mood impact of false positive screening test results received in population-based breast screening (including programme extension trials).
Comparison � Either baseline measurement and one or more follow-up measurement, or � A control group of women with normal screening test results and/or a comparison group of women with other screening test results.Outcomes (1) Outcome measures used to evaluate emotion and mood (as defined in 2.2.Eligibility criteria).Specifically: � Standardised general and disease-specific outcome questionnaire measures of emotion and mood, e.g.multidimensional questionnaire measures comprising several subscales, each measuring an individual construct, which can be aggregated into a higher-level construct.The original report(s) detailing questionnaire measure development and characteristics must be published.� Relevant subscales (and/or individual items within subscales) A. Is it clear what the constructs are? 1: Construct is explicitly defined (e.g.described using supporting theory and/or research).2:(when applicable) If not defined in Item 1, construct is clear (e.g. reported in a way that makes it clear to the reader what has

Table 2
indicates the standardised outcome questionnaire measures used in at least four articles.
).Without this, issues regarding the inclusion of 47is goes beyond outcomes associated with false positive screening test results and includes true positive, false negative and abnormal test results, as well as other changes in cancer screening and early detection practices.Increasing interest in multi-cancer early detection blood tests means that false positive screening test results are likely to become more common if these tests are implemented at scale.46Thus, this work is timely and has enormous potential to reduce research waste across the spectrum of cancer early-detection research.475|CONCLUSIONSThe present systematic review identified high heterogeneity in emotion and mood constructs and outcome measures used in the false positive breast screening test result literature to date.This heterogeneity, plus a lack of clear, compelling rationale for decisions related to selecting constructs and outcome measures, leaves significant gaps in our understanding of the emotional impact of false positive screening test results.The ambiguity in this field will likely continue unless these measurement issues are addressed.Future research could develop a shared conceptual understanding of, and standardised approach to measuring, impact from cancer screening test results.