The Pain Assessment in Impaired Cognition scale (PAIC15): A multidisciplinary and international approach to develop and test a meta‐tool for pain assessment in impaired cognition, especially dementia

Over the last decades, a considerable number of observational scales have been developed to assess pain in persons with dementia. The time seems ripe now to build on the knowledge and expertize implemented in these scales to form an improved, “best‐of” meta‐tool. The EU‐COST initiative “Pain in impaired cognition, especially dementia” aimed to do this by selecting items out of existing observational scales and critically re‐assessing their suitability to detect pain in dementia. This paper reports on the final phase of this collaborative task.


| INTRODUCTION
The need for better pain assessment in cognitively impaired individuals who are not able to verbally communicate their pain, including people with dementia, has been widely acknowledged (Achterberg et al., 2013;Hadjistavropoulos et al., 2014). Standard pain assessment methods that rely heavily on self-report lead to a dramatic under-detection and under-treatment of pain in these patient groups (Gibson & Lautenbacher, 2017;Hadjistavropoulos et al., 2014). To improve this situation, a considerable number of diverse observational scales has been developed, which aim to assess pain by observing behavioural responses, mainly including facial expressions, body movements and vocalizations (see Zwakhalen, Hamers, Abu-Saad, & Berger, 2006; for reviews). Most of these tools have undergone initial psychometric testing; however, many were not developed using evidence-based methods and they lack comprehensive psychometric data from larger samples of patients. Few define the specific situation in which assessment should take place (e.g. rest vs. activity of daily living), most were not developed for ease of use in clinical settings. Moreover, given the considerable number of scales, it is difficult to gather comparable data. As a result, no widely accepted and internationally agreed-upon tool for detecting pain in individuals with cognitive impairment exists and national guidelines vary in recommendations.
To change this, we initiated a collaboration and combined clinical, research and methodological expertize from different European countries and disciplines, with the aim to develop an internationally agreed-upon tool to assess pain in individuals with cognitive impairment, especially dementia . After reviewing and discussing the many existing scales, we came to the conclusion that all relevant pain-related observational items had been identified. However, existing scales include painirrelevant items or items of poor psychometric quality. Therefore, the main task was to reduce and refine the number of items. Thus, rather than starting from scratch, we

Funding information
The initiative was funded by the EU-COST (action TD 1005). especially dementia" aimed to do this by selecting items out of existing observational scales and critically re-assessing their suitability to detect pain in dementia. This paper reports on the final phase of this collaborative task. Methods: Items from existing observational pain scales were tested for "frequency of occurrence (item difficulty)," "reliability" and "validity." This psychometric testing was carried out in eight countries, in different healthcare settings, and included clinical as well as experimental pain conditions. Results: Across all studies, 587 persons with dementia, 27 individuals with intellectual disability, 12 Huntington's disease patients and 59 cognitively healthy controls were observed during rest and movement situations or while receiving experimental pressure pain, respectively. The psychometric outcomes for each item across the different studies were evaluated within an international and multidisciplinary team of experts and led a final selection of 15 items (5x facial expressions, 5x body movements, 5x vocalizations). Conclusions: The final list of 15 observational items have demonstrated psychometric quality and clinical usefulness both in their former scales and in the present international evaluation; accordingly, they qualified twice to form a new internationally agreed-on meta-tool for Pain Assessment in Impaired Cognition, the PAIC-15 scale. Significance: Using a meta-tool approach by building on previous observational pain assessment scales and putting the items of these scales through rigorous empirical testing (using experimental as well as clinical pain studies in several European countries), we were able to identify the best items for pain assessment in individuals with impaired cognition. These selected items form the novel PAIC15 scale (pain assessment in impaired cognition, 15 items).
included items from existing assessment tools, empirically evaluated each item, and reached expert consensus for each item to be included in our pain assessment tool. To accomplish this, we followed a Delphi-like consensus procedure (Hsu & Sandford, 2007), which is described in detail in the Method section. The reason for choosing such a consensus procedure is that this enables an interaction between a large multidisciplinary and international panel of experts and ensures a rigorous consensus approach.
This consensus procedure was undertaken by the members of the EU-COST Action "Pain in impaired cognition, especially dementia" (TD 1005) between 2011 and 2017. Members of the EU-COST Action were originally recruited by repeated calls in EU and associated countries (for interested experts in pain assessment in cognitively impaired individuals, especially dementia). Representatives from 16 European nations and from diverse disciplines participated in this collaborative task. This paper reports on the final phase of this collaborative task, namely a thorough empirical testing (see also accompanying article by de Waal et al.) and evaluation of items from established observational scales by an international and multidisciplinary team of experts, and its final product: an internationally agreed-upon tool for Pain Assessment in individuals with Impaired Cognition (PAIC15).

| METHODS
Our Delphi-like consensus procedure used to develop the PAIC meta-tool was carried out within the EU-COST Action "Pain in impaired cognition, especially dementia" (TD 1005). Members included representatives from 16 European nations, with members belonging to diverse disciplines (viz., nurses, geriatricians, psychiatrists, neurologists, anaesthesiologists, neuroscientists, neuropsychologists, psychotherapists, physiotherapists and dentists) with different expertize (viz., palliative care experts, experts in psycho-and clinimetrics, experts in geriatric medicine and geriatric nursing, experts in orofacial pain, dementia researchers, experts in neuropathology and psychopathology and murine model researchers). The group also included a number of developers (Husebo, Strand, Moe-Nilssen, Husebo, & Ljunggren, 2010;Pickering et al., 2010) or translators (Zwakhalen, Hamers, & Berger, 2007) of published observational pain assessment tools. Five Working Groups (Psychometrics and Algesimetry, Nursing and Care, Clinical Evaluation and Epidemiology, Experimental Evaluation and Palliative Care) were created to oversee specific areas, with regular plenaries ensuring effective collaboration. The consensus procedure comprised four rounds, which are depicted in Figure 1 and are described below. The focus of the article lies on round 4 and thus, rounds 1-3 are only described briefly below.

| Round 1 -3 of the consensus procedure
The first three rounds were described in detail (see Corbett et al., 2014) and are also depicted in Figure 1. In short, in the first round an expert panel (Working group "Psychometrics and Algesimetry") of the EU COST Action conducted an extensive systematic review of the literature to identify review articles on observational pain assessment tools published between 2005 and 2012. Based on the recommendations given in the review articles (considering reliability, validity, feasibility and usefulness in patients with cognitive impairment), the expert panel identified 12 eligible observational pain assessment tools for individuals with cognitive impairment (ABBEY Pain Scale [Abbey et al., 2004], ADD [Kovach, Weissman, Griffie, Matson, & Muchka, 1999], CNPI [Feldt, 2000], DS-DAT 1 [Warden, Volicer, Hurley, & Rogers, 2001], DOLOPLUS-2 [Lefebvre-Chapiro, 2001], EPCA-2 [Morello, Jean, Alix, Sellin-Peres, & Fermanian, 2007], MOBID-2 Pain Scale [Husebo et al., 2007], NOPPAIN [Snow et al., 2004], PACSLAC [Fuchs-Lacelle & Hadjistavropoulos, 2004], PAINAD [Warden, Hurley, & Volicer, 2003], PADE [Villanueva, Smith, Erickson, Lee, & Singer, 2003], and PAINE [Cohen-Mansfield, 2006]). In round 2, all items of the selected pain assessment tools were extracted, and the expert panel (Working group "Psychometrics and Algesimetry") grouped all items according to the categories "facial expressions," "body movements" and "vocalizations" (which had been found to be the three categories outlined by the American Geriatric Society (AGS, 2002) with general agreement in the review articles), removed all duplicates and further suggested subcategories in the case of heterogeneity within each category to ensure that the items systematically covered a broad scope within each category. This process was supervised in plenaries by all five Working groups of the COST Action to support the refinement process and to reach a second consensus. In round 3, 36 items were selected to be included in the research version of the PAIC, based on scrutiny of the evidence, consensus of expert opinion, frequency of use and alignment with the American Geriatric Society guidelines (AGS, 2002). Again, all members of the COST Action participated in forming this consensus through several expert panel discussions and in plenaries. The 36 items included in the research version were: (15 face items) pained expression, frowning, narrowing eyes, closing eyes, raising upper lip, opened mouth, tightened lips, clenched teeth, empty gaze, seeming disinterested, pale face, teary eyes, looking tense, looking sad and looking frightened; (10 body movement items) freezing, curling up, clenching hands, resisting care, pushing, guarding, rubbing, limping, restlessness and pacing; (11 vocalization items) using offensive words, using pain-related words, repeating words, complaining, shouting, mumbling, screaming, groaning, crying, gasping and sighing. On the scoring form, for each item a short description of the meaning (with synonyms) was stated. Items were scored on a 4-point scale: 0 "not at all," 1 "slight degree," 2 "moderate degree" and 3 "great degree." These first three rounds have been described in detail before , so the present study the focusses on the description of round 4 (see Figure 1), the final round of our consensus process, and its outcome, the final version of the scale (PAIC15).

| Round 4 of the consensus procedure
The aim of round 4 was to further reduce the number of items from the PAIC research version to create a final version of the scale (see Figure 1). The exclusion of items was based on empirical evidence that was gathered on the PAIC research version in a variety of experimental and clinical studies and on consensus of a multi-professional and international panel of experts, and followed a step-wise hierarchical process (see Figure 2).This panel of experts in the final round of the consensus procedure comprised seven members of the COST Action with expertize in the topic of "pain and dementia" and multi-professional backgrounds in psychology, geriatric medicine, geriatric psychiatry, dentistry, palliative care and epidemiology. The entire round 4 also consisted of several steps, which are described in the following. In a first step, our COST Action aimed at gathering evidence on the psychometric characteristics of all 36 items of the PAIC research version. To test which of these items are best suited to detect pain in cognitively impaired individuals, a variety of studies was conducted throughout the different European countries participating in the COST Action. For this, the PAIC research version was translated into seven European languages (Danish, Dutch, German, Norwegian, Serbian, Spanish and Italian), using the forward and backward translation approach (Ohrbach, Bjorner, Jezewski, John, & Lobbezoo, 2009). Table 1 gives an overview of all the studies. The various studies can be roughly grouped into three categories: (1) clinical pain studies following the same standardized testing protocol, (2) experimental pain studies following the same standardized testing protocol and (3) other (clinical pain studies with various testing protocols).
(i) Standardized clinical pain protocol: The design of these studies is described in detail in the accompanying article by de Waal et al. In short, using an observational study design, individuals with dementia were observed by healthcare professionals in two situations, namely at rest (e.g. sitting, lying in bed) and during movement (e.g. repositioning, standing up, being transferred). All individuals with dementia were observed at least once by two observers to allow for assessing inter-rater reliability. During or directly after the observation, healthcare professionals rated the observed behaviour using the PAIC research version. This standardized clinical pain protocol was developed by the Working Group "Clinical Evaluation and Epidemiology" of the COST Action and was carried out in four countries (see studies 1-4 in Table 1).

(ii) Standardized experimental pain protocol:
In an experimental study design, phasic (7 s duration) pressure stimuli of various intensities (50, 100, 200, 400, 500 kPa) were applied in an ascending order above the right and left trapezius muscle by use of handheld pressure algometer (either Somedic (Hörby, Sweden) or Fisher (Wagner Pain Test, Greenwich, USA). An ascending order was chosen to (a) reduce anxiety in patients as well as (b) to be able to immediately stop with the stimulation protocol if a participant shows signs of severe distress at any given intensity level (this did not occur in the tested samples). The face of the individuals receiving the pressure pain was filmed during the application and the video recordings were later shown to observers, who rated the observed facial responses using the face items of the PAIC research version (the experimental pain intensities are too weak and too short in time to consistently elicit body movements or vocalization among all participants). This standardized experimental pain protocol was developed by the Working Group "Experimental Evaluation" of the COST Action. The protocol was carried out in six countries and included diverse samples of individuals being observed (see studies 5-10 in Table 1).
(iii) Other: Three additional studies were carried out that did not follow the above-described two standardized protocols. Reason for that was that they (study 11 and 12, see Table 1) started before the standardized clinical protocol was finally approved. In study 11, nurses were asked to observe persons with dementia during rest and/or during movement and rate the facial expressions using the face items of the PAIC research version (for more details see Lautenbacher et al., 2016). In study 12 (see Table 1), the complete PAIC research version was used to observe persons with dementia in a clinical setting during a transfer situation. In contrast to the "standardized clinical pain protocol," persons with dementia were observed by only one healthcare professional. In study 13 (see Table 1), 24 video clips showing individuals with dementia in rest and during movement were presented to nurses (via a laptop screen). These video clips were taken from recordings made during the "standardized clinical pain protocol" of studies 1-3. Nurses rated these videos using all items of the PAIC research version. Ethics approval was obtained for each of the 13 studies separately at the local ethics committee consistent with local procedures. Written informed consent from the individuals being observed and/or from legal guardian (e.g. family) was obtained in all studies. The data of each study were registered in local databases. Analyses of the collected data were carried out by two of the authors (MK, MdW).

| Step 2: Evaluation of floor/ceiling effects (item difficulty)
Based on the empirical evidence collected in step 1, we started the exclusion of items by dropping items based on floor and ceiling effects. This step is comparable to the analysis of item difficulty. Item difficulty is one of the key steps in psychometric item analysis and refers to the frequency with which a response is scored. Thus, in our context, item difficulty refers to the frequency (in percentage) with which a behaviour is classified as being present and ranges from 0 (never) to 100 (always). If item difficulty is either close to 0 or close to 100, these items produce insufficient variance because of being almost never (floor effect) or almost always (ceiling effect) present. However, variance is the basis for all reliability and validity computations (see steps 3 and 4). Thus, items that T A B L E 1 Studies (N = 13) conducted to gather evidence on the psychometric characteristics of all items from the PAIC research version (step 1) are almost never observed ("item difficulty" <10), as well as those items that are observed almost always ("item difficulty" >90), were excluded. This seems reasonable, given that the studies included non-painful as well as painful intensities (experimental studies) or pain-free as well as pain patients (clinical studies), respectively, and thus, no item should be observed almost always or hardly ever. We separately computed for each study (see Table 2) how frequently a behaviour described in an item was observed (percentage of occurrence) during movement (clinical studies) or during painful pressure stimulation (400/500 kPa) (experimental studies), respectively. We defined an item as being observed, if it was scored with a number >0. We excluded items that showed poor item difficulty (percentage of occurrence <10% or >90%, respectively) in at least half of the conducted studies.

| Step 3: Evaluation of inter-rater reliability
After excluding items based on the criterion "item difficulty," the next step was to determine the reliability of the remaining items. The data collected using the standardized clinical pain protocol (see studies 1, 3-4 in Table 1 2 ) allowed computing inter-rater reliability (between two healthcare professionals) for the rest situation and the movement situation and were analysed using percentage of agreement in scores (De Vet, Terwee, Knol, & Bouter, 2006). Given that less pain-related behavioural responses were observed during rest (most items were scored as zero), we only report on the results for inter-rater reliability assessed during movement (additional results, e.g. inter-rater agreement at rest and intra-rater agreement across different time points, can be found in the accompanying article by de Waal et al.).
Only two studies using the standardized experimental pain protocol (studies 5-6) were used to compute reliability values (see Table 1), namely those studies where all video recordings were evaluated by five or more observers in parallel using the PAIC research version. This permits a very controlled comparison of rater agreement between more than two observers across a larger set of observations, which was one of the major reason to run these experimental studies. Here, inter-rater reliability was computed using intraclass correlation (ICC) that allows for comparing agreement of ratings between larger numbers of observers. ICC was also used to compute inter-rater reliability on the data collected in study 13 (Table 1). The data assessed in the remaining studies were not suitable for consideration in reliability analyses.

| Step 4: Evaluation of construct validity
After excluding items based on poor reliability, the next step was to evaluate the construct validity of the remaining items.
For the data assessed using the standardized clinical pain protocol (studies 1,2,4) as well as for study 13 (see Table  1), each item was compared between rest and movement observations, and we recorded whether each item score increased, decreased or stayed the same from rest to movement. Similarly, for experimental pain, we compared item scores between non-painful pressure (50 kPa) and painful pressure (400/500 kPa) intensities (studies 5-10). Given that pain should be more likely to occur during movement compared to rest situations or during painful pressure versus non-painful pressure, respectively, those PAIC items which scores do not increase during movement/painful stimulation were considered to have poor construct validity. Studies 2, 11 and 12 were not included in the validity computations, given that a clear differentiation between rest and movement was not possible (observers did not always indicate in these studies whether observations were performed in rest or during movement or during both combined).

|
Step 5: Evaluation of content validity, usability and comprehensibility as well as consideration of knowledge from the literature In step 5, the expert panel discussed each of the remaining items by taking into consideration (a) a content analysis that was conducted on the PAIC research version, (b) knowledge from the literature (e.g. studies on facial expressions of pain; Prkachin, 1992) and (c) the usability and comprehensibility of each item in different care situations. The content analysis of the PAIC research version has been published earlier (van Dalen-Kok et al., 2018). In short, a questionnaire survey was administered to clinical nursing home experts (nurses, physicians) to assess which of the PAIC items are indicative of pain or of other disorders (anxiety disorder, delirium, dementia or depression). Thus, step 5 assimilated all the existing evidence on content validity, usability and comprehensibility as well as knowledge from the literature to come to a well-informed, comprehensive expert consensus.

| Step 6: Feedback from invited "external" reviewers
The step-wise process of decision making (steps 1-5) was thoroughly documented (see also Figure 1) and was sent to a group of external reviewers (N = 5) for feedback on the process of decision making and on the item selection. The group of external reviewers was composed of other members of the COST Action TD1005 and added expertize that was partially missing in the expert group responsible for the step-wise evaluation in steps 2-5, namely expertize in nursing sciences (N = 2), neurology (N = 1), cognitive impairment that is not dementia related (N = 1) and physiotherapy (N = 1). Each external reviewer, independently, gave a written feedback.

T A B L E 2 Psychometric characteristics of the items from the PAIC research version (steps 2 -4)
Step 2: Item difficulty (mean percentage of occurrence in situations where pain is more likely) Step 3: Reliability (mean inter-rater reliability scores in situations where pain is more likely) Step 4 Step 2: Item difficulty (mean percentage of occurrence in situations where pain is more likely) Step 3: Reliability (mean inter-rater reliability scores in situations where pain is more likely) Step 4 Note: Grey shading: items that performed poorly in more than 50% of the studies with regard to the respective psychometric quality and were considered for exclusion.
Black shading: items that were excluded because of poor psychometric quality in previous steps.

| Step 7: Consensus on final item set
All statements and feedback of the invited external reviewers were discussed within the expert panel. For each suggestion from the reviewers, the options were to stay with the original item selection or to adapt the selection. Adaptions were classified as either major revision (e.g. include a whole new item) or minor revisions (e.g. change wording). Figure 2 gives an overview of the seven steps taken in the final round of the consensus process and their outcomes. It shows that the stepwise process of item selection was accompanied by a thorough discussion amongst the seven members of the formed expert panel (see the grey fields "expert discussions").

| Step 1: Clinical and experimental studies conducted to gather empirical evidence on the psychometric characteristics of each item of the PAIC research version
Thirteen studies conducted in eight countries tested psychometric properties of the 36 items of the PAIC research version (Table 1). Across the 13 studies, 587 persons with dementia, 27 individuals with intellectual disability, 12 Huntington's disease patients and 59 cognitively healthy controls were observed during rest and movement situations (clinical studies) or while receiving different intensities of experimental pressure pain, respectively. The persons with dementia were mostly at moderate (mostly in the experimental pain studies) to more advanced stages of the disease (clinical pain studies). Observations were mostly undertaken by healthcare professionals (nurses and physicians, N = 251) who did not receive any special training in how to assess pain using the PAIC research version.

| Step 2: item difficulty
As can be seen in Table 1, the data of all 13 studies were used to compute "item difficulty" values. However, given that several studies (studies 5-10, 12-13) only focused on facial expressions, these studies could not be used to evaluate body movements or vocalizations.
The mean values of "item difficulty" are displayed in the left columns of Table 2 and values are given separately for each of the three categories of studies having been conducted. For our item selection, we wanted to exclude those items describing behaviour that is either hardly ever (<10%) or almost always (>90%) observed. Although there were some variations across studies, there were seven items detailing behaviour which was hardly ever observed across F I G U R E 2 Overview of the last round (round 4) of the consensus procedure with its step-wise item selection approach based on empirical evidence and on consensus of a multi-professional and international panel of experts studies (percentage of occurrence was <10% in at least half of the studies). These items included "using offensive words" (<10% in six of six studies), "pacing," "screaming" and "crying" (<10% in four of six studies), "pushing" and "rubbing" (<10% in three of six studies) and "teary eyes" (<10% in 6 of 13 studies). These seven items, which are shaded in grey in Table 2, were considered for exclusion and each item was thoroughly discussed in the expert panel. No item reached a percentage of occurrence >90%; thus, no item was considered for exclusion based on "item difficulty" being too low. Based on an expert consensus, it was decided to nevertheless retain four of the items with too high item difficulty (see Figure 2), because it was argued that these items still are established indicators of pain in other contexts, despite being observed so infrequently in our studies, and it might produce a premature loss of pain-relevant information to already exclude them at the beginning of the step-wise decision process. Thus, step 2 resulted in the exclusion of only three items (see Figure 2).

| Step 3: inter-rater reliability
As can be seen in Table 1, the data of seven studies could be used to compute "reliability" values. Given that step 2 led to the exclusion of three items, reliability was tested for 33 items (see Figure 2). The mean inter-rater reliability values across studies are displayed in the middle columns of Table  2. Poor reliability was defined as values <70% agreement or <0.70 for intraclass correlation (ICC), respectively. Overall, reliability values were quite good for all 33 PAIC items. As can be seen in Table 2, inter-rater reliability values were higher in the clinical studies compared to the experimental pain studies and somewhat lower for the facial expression items compared to the body movement and vocalization items. Based on the reliability outcomes, there were only two items ("seeming disinterested" and "looking sad" shaded grey in Table 2) that showed poor reliability values in more than half of the studies. The item "seeming disinterested" showed poor reliability in five of seven studies, and "looking sad" poor reliability in four of seven studies. These two items were considered for exclusion. Based on an expert consensus it was decided to exclude both items (see Figure 2).

| Step 4: construct validity
As can be seen in Table 1, data from 10 studies were available for the "validity" analyses for the facial expression items. With regard to the "validity" analyses for the body movement and vocalization items, four studies were used. After the exclusion of five items in the preceding steps, validity analyses were run for the remaining 31 items (see Figure 2). For that purpose, we computed the number of studies that found a numerical increase, decrease, or no change for each PAIC item between rest versus movement or painful versus non-painful pressure stimulation, respectively. These numbers are combined across studies and are displayed in Table 2 (right columns). Across studies, there were seven items with scores that either remained unchanged or even decreased (rest vs. movement or non-painful vs. painful pressure stimulation) in more than half of the studies (these items are shaded in grey in Table 2). These items were "closing eyes," "empty gaze," "pale face," "clenching hands," "restlessness," "screaming" and "gasping" (see Figure  2) and these items were considered for exclusion. After the expert panel discussed each item, consensus was reached to retain the item "restlessness" (see Figure 2), because it was argued that due to the clinical testing protocol, "restlessness" was more difficult to be observed in a movement situation and thus, the decrease in "restlessness" might be a methodological artefact. The other six items were excluded.  (Prkachin, 1992) or on the differentiation between pain and discomfort (van der Steen et al., 2015)), as well as (c) evaluating the usability and comprehensibility of each item in different care situations (e.g. bedridden patients, palliative care settings). Based on this thorough and comprehensive discussion, 10 items were excluded (see Figure 2). In detail, four items were excluded because of low content validity for a pain indicator ("looking frightened," "repeating words," "crying" and "sighing"); two facial items were excluded because of low support from the literature, where these have never been reported as being pain related ("tightened lips" and "clenched teeth"; e.g. Prkachin, 1992) and four items were excluded based on restricted usability and difficult comprehensibility ("pained expression" (not descriptive enough, low correlation with self-reported pain ratings), curling up" (can only be observed if patient is lying in bed), "limping" and "pacing" (can only be observed in those patients who can still walk, problems might also be due to other conditions (e.g. stroke) besides pain).

| Step 6: feedback from invited "external" reviewers
The list of the remaining 15 items along with the description of the step-wise selection (steps 2-5) was sent to a group of "external" reviewers (N = 5) to gain feedback on the process. All external reviewers agreed that the process of item selection was well described, carefully conducted and methodologically sound. Furthermore, all reviewers agreed that the final number of 15 items is a satisfactory number, although fewer items would be preferable for day-to-day care practice. Overall, there were only two suggestions for more major revisions, namely to not eliminate the item "crying" and to add a new item "opening of the eyes." Furthermore, two suggestions were made for minor revisions, namely to change the item "freezing" to "stiffening" and the item "opened mouth" to "opening of the mouth" (see Figure 2).

| Step 7: Consensus on final item set
The suggested changes of the invited "external" reviewers were discussed within the expert panel. Consensus was reached that "crying" is not added, because it was observed so infrequently in our studies and might not have enough pain specificity. Furthermore, "opening of the eyes" was not be added, because this would be a completely new item that has never been included in other observational pain scales and has not been described within the literature on facial expressions of pain (e.g. Prkachin, 1992). As for the suggestions or minor revisions, consensus was reached to change the item "opened mouth" to "opening of the mouth" as to adjust it to the wording of the other facial expression items. Consensus was also reached to not change "freezing" to "stiffening" because "freezing" is a more common item in other observational pain scales and also performed well in the content analysis (van Dalen-Kok et al., 2018).
The final list of items is displayed in Figure 3, which shows the final scale with its 15 items, which we call PAIC15.

| DISCUSSION
After a thorough empirical evaluation and expert discussions of items already used previously in established scales for pain assessment in dementia, we selected the most promising items (N = 15 items) to assess pain in individuals with cognitive impairments, especially dementia. These items form the PAIC15 scale (Pain Assessment in Impaired Cognition, 15 items).

| Novelty of forming a meta-tool
The completely novel feature of this scale is that it represents a meta-tool, which is based on the best items (most reliable, most valid) out of the best scales already developed earlier for this purpose. Accordingly, the 15 items were taken from the following scales: Abbey (4 items 3 ), ADD (6 items), CNPI (6 items), DS-DAT (6 items), DOLO-Plus (2 items), EPCA (4 items), MOBID2 (5 items), NOPPAIN (4 items), PACSLAC (10 items), PAINAD (8 items), PADE (4 items) and PAINE (6 items). This list clearly shows that many of the best known instruments contributed to the PAIC15 and that all these scales contributed some but not all of their items. The advantage of our meta-tool approach was that it provided two rather than just one quality filter for each included item. The first quality filter was put in place during the development of the respective original scales; the second quality filter was added by our European initiative.
The strength of such an approach was also recently highlighted by Ersek et al. (2018), who followed a similar line of reasoning. They also included items from previous observational pain scales and tested which items best predicted clinicians' evaluations of pain intensity in persons with dementia. Thus, the similarity between their and our approach is that both build on previous knowledge about observational pain assessment tools in order to build a meta-tool. That two independent research groups conducted a meta-tool approach at the same time, stresses that this was timely. However, there are also several differences between our approaches. However, Ersek et al. (2018) used a national approach, using a clinical testing protocol and basing their item selection solely on the power to predict clinicians' evaluations, we used an international approach, using clinical as well as experimental testing protocols and basing our item selection on a variety of item characteristics. Despite these differences, there is considerable overlap in selected items. More precisely, their final list included eight items (Ersek et al., 2018), with four of these items being identical to our item selection (freezing/stiffening, complaining, frowning, groaning) and the remaining four items being at least comparable to our selected items (bracing (comparable to the PAIC15-item "guarding"), expressive eyes (comparable to the PAIC15-item "narrowed eyes"), agitated (comparable to the PAIC15-item "restlessness"), grimacing (comparable to the PAIC15 item "looking tense")).

| Strength of combining evidence from clinical and experimental pain studies
Most observational scales to assess pain in dementia have only been tested in clinical settings. Clinical settings such as hospitals and nursing homes are where these scales will be used and thus provide the highest ecological validity. However, the disadvantage of studying pain in individuals with dementia in a clinical setting is that there is no true certainty whether the observed individual is in pain or not, given that the self-report is often invalid. We and others have tried to tackle this problem by using substitutes for self-report ratings, such as clinicians' overall pain ratings (Ersek et al., 2018;Lautenbacher, Niewelt, & Kunz, 2013;Lautenbacher, Walz, & Kunz, 2018) or by comparing resting versus movement situations (Herr, Bjoro, & Decker, 2006), given that pain is more likely to occur during movement (Hadjistavropoulos, LaChapelle, MacLeod, Snider, & Craig, 2000;Srikandarajah & Gilron, 2011). However, relying on these substitutes is only an approximation. In order to gain certainty that a behaviour is truly a response to pain, experimental pain induction methods are needed. Experimental pain allows for a standardized, controlled nociceptive input and hereby allows comparison of behavioural indications of pain between non-noxious and noxious situations (Kunz, Scharmann, Hemmeter, Schepelmann, & Lautenbacher, 2007). Thus, combining the evidence from clinical and experimental pain studies allowed us to combine ecologically valid data (clinical studies) with data of high construct validity (experimental studies).
In two experimental studies the focus lied on non-dementia related cognitive impairment wherein individuals with intellectual disability or with Huntington's disease, respectively, were included. The analysis of the PAIC results showed that it can be used for diverse populations. Although the sample sizes were too small to allow for statistical comparisons between different patient groups, the descriptive data point to elevated facial responses in individuals with intellectual disabilities compared to individuals with dementia and Huntington's disease. However, future studies are needed to confirm this impression.

| Strength of using an international and multi-professional expert team
We based the item selection procedure on a thorough consensus procedure that included multiple discussion rounds of a very large team of international and multi-professional experts. This was novel and enabled us to take diverse expertize and viewpoints into consideration. Trying to find consensus within this large international and diverse group of experts is time consuming, but has led to a scale development that is not limited to one type of expertize or to one particular cultural background.

| Items included in the PAIC15 scale
PAIC15 is composed of five items on facial expressions, five items on body movements and five items on vocalizations. Thus, all three widely accepted categories of non-verbal pain behaviours are equally represented in the scale. The five selected items on facial expressions are in agreement with previous findings on facial expressions of pain (Prkachin, 1992;Kunz, Meixner, & Lautenbacher, 2019). In a recent review article (Kunz et al., 2019), the findings of 37 studies on facial activity elicited during clinical and/or experimental pain were analysed, and it was found that a consistent subset of painrelated facial responses emerged across studies: frowning, narrowing eyes, raising upper lip and opening mouth. These are the same facial responses that emerged as the best facial items in our PAIC15 item selection process. The only additional item that we found was the item "looking tense," which is more a subjective impression, whereas the other items are more anatomical descriptors. This combination of objectively descriptive with a few subjective items follows the advice of a recently published content analysis on tools to assess pain or lack of comfort in dementia (van der Steen et al., 2015). Here, the authors conclude that subjective items are informative in individuals with dementia if they are combined with objective descriptors. Combining several items instead of relying on a single item, such as "grimacing" or "pained expression," also follows the advice of previous publications. It has been continuously found that pain-related facial responses are most often not displayed all at once, but are differently combined (Kunz et al., 2019). This is an important finding, because it stresses that observational pain assessment scales should account for this variability by including several facial expression items.
With regard to the five body movement items, we also find strong agreement with previous findings (Keefe & Block, 1982;Prkachi, Schultz, & Hughes, 2007;Strand et al., 2019). A recent systematic review article (Strand et al. in press) on pain-indicative body movements in older people with cognitive impairment found strong (restlessness, rubbing, guarding) or moderate criterion validity for all five items included in the PAIC15. Similar to facial expressions, pain is not accompanied by one single prototypical body movement but rather by a combination of different movements (Walsh, Eccleston, & Keogh, 2014).
With regard to the category of vocalizations, the PAIC15 encompasses one very pain-specific item, namely "using pain-related words." The other four items proved to be good indicators of pain in our studies, although they might be less specific (e.g. shouting, mumbling). However, as already indicated above, the lack of pain specificity of single items might be of less relevance, given that the key is the combination of different items.

| Limitations
Although the variety of studies conducted to test the psychometric characteristics of the PAIC items (research version) is a strength of our approach, it is also a limitation. Conducting studies across countries, in different settings, and including seven different language versions has inevitably led to variations between studies that make it difficult to precisely compare the outcomes. We have therefore chosen a more liberal approach when excluding items, and only considered excluding an item when it performed poorly across the majority of studies. Another limitation regards the approach we used to determine "construct validity." For experimental pain, items were selected as pain indicative when they occurred more often during painful compared to nonpainful stimulation. However, experimental pain mainly mirrors acute pain states, yet much of the pain in dementia is chronic (although with potential acute exacerbations). Thus, our item selection might have favoured those items being indicative for acute pain. We tried to address the challenge of detecting chronic pain by also including a movement situation in the clinical studies, with the assumption that movements should elicit more acute exacerbations of pain than rest situations. This definition is surely true for some individuals but not for all. Analgesic trials might help to also capture behaviour or change in behaviour that is indicative of more or less chronic pain.

DEVELOPMENTS
After a thorough empirical evaluation of items derived from previously established observational scales for pain assessment in dementia by an international and multidisciplinary team of experts, 15 items proved to be best suited to form a new internationally agreed-on meta-tool for Pain Assessment in Impaired Cognition, especially dementia (the PAIC15). The PAIC15 scale is available in several languages and can be downloaded at https ://paic15.com. Given that many of the items showed floor effect during rest situations, we advise the user to apply the PAIC15 scale during movement situations where the occurrence of pain is more likely. To encourage usage and implementation of the PAIC15 into clinical care, a freely available e-training for PAIC has also been developed (https ://paic15.org). For the future, the following tasks are required: (a) study of implementation barriers, (b) analgesic trials to assess sensitivity to change, (c) empirical definition of cut-off scores for different pain intensities and (d) test the PAIC15 in various clinical pain types and in different groups of cognitively impaired individuals.