The City Gesture Checklist: The development of a novel gesture assessment

Background: People with aphasia rely on gesture more than healthy controls to get their message across, but use a limited range of gesture types. Gesture therapy is thus a potential avenue of intervention for people with aphasia. However, currently no gesture assessment evaluates how they use gesture. Such a tool could inform therapy targets and measure outcomes. In gesture research, many different coding categories are used to describe gesture forms and functions. These coding methods are prohibitively time-consuming to use in clinical practice. There is therefore a need for a ‘quick and dirty’ method of assessing gesture use. Aims: To investigate current practice among UK-based clinicians (speech and language therapists) in relation to gesture assessment and therapy, to synthesize gesture-coding frameworks used in aphasia research, to develop a gesture checklist based on the synthesized coding frameworks suitable for use in clinical practice, and to investigate the interrater reliability (IRR) of the checklist among experienced and unfamiliar users. Methods & Procedures: The research team synthesized seven gesture-coding frameworks and trialled three resulting prototype checklists at a co-design workshop with 20 clinicians. Attending clinicians were also consulted about their current clinical gesture practice using a questionnaire. A ﬁnal City Gesture Checklist (CGC) was developed based upon outcomes and feedback from the workshop. The IRR of the CGC was evaluated between the research team and 11 further clinicians within a second workshop. Both groups used the CGC to count gestures in video clips of people with aphasia talking to a conversation partner. Main Contribution: A total of 18 workshop attendees completed the current practice questionnaire. Of these, 10


What is already known on the subject
• People with aphasia rely on gesture more than healthy speakers, yet use a more limited range of gesture types.Gesture therapy is used by clinicians with the aim of helping people with aphasia to compensate for their language impairment and/or to facilitate speech.

Introduction
Gesture is a universal phenomenon of human communication, observed alongside speech across all cultures (Kita 2009) and present even when a speaker is not visible to their listener (Alibali et al. 2001).Its ubiquity extends to people with congenital blindness who use gesture while speaking, even though they have never seen others gesture (Iverson and Goldin-Meadow 1998).The field of gesture research has bloomed since the 1970s (Kendon 2004) with researchers producing different definitions, systems for classifying gesture types, descriptions of how gestures are used and explanations for why people gesture.
Gesture is a difficult phenomenon to pin down.Kendon (2004) defined gesture as 'visible action when it is used as an utterance or as a part of an utterance' (p.7), a definition which includes movements of the body and face as well as the hands.McNeill (1992) described 'Kendon's Continuum', which classifies types of gesture according to their language-like properties.At the most structured end of the continuum are sign languages, in which gestures have fixed, established meanings and are used with linguistic organization in the absence of speech.Emblems, such as a thumbs-up gesture, also have conventionalized meanings and can stand alone as complete utterances in the absence of speech, but do not have linguistic structure.Pantomime gestures do not have fixed meanings, but can be used alone, without speech.At the least language-like end of the continuum are the spontaneous gestures, or gesticulation, accompanying speech that the speaker may be unaware of using.Kendon's Continuum therefore describes the different functions that gesture can take on and how, in certain situations, it has the capacity to take on the full functions and structural properties of language.
McNeill (1992) further classified gesticulation as imagistic or non-imagistic.Imagistic gestures convey some sort of image, which could represent a shape or an action, whereas non-imagistic ones do not.Instead non-imagistic gestures include rhythmical movements during speech (beats) and pointing (deictic gestures).
Imagistic gestures are further subdivided into iconic and metaphoric gestures.Both of these gesture types involve the depiction of a shape or movement, but differ according to how the speaker uses the gesture in discourse.Iconic gestures are more concrete, depicting an action or image that is present in the accompanying speech (e.g., 'this lift is going up', accompanied by an upwards gesture), whereas metaphorical gestures are less clearly related to the spoken words.The image depicted in a metaphorical gesture represents an abstract concept.An example of a metaphorical gesture given by Kendon (2004: 100) is of a person describing someone else revealing details about themselves using the words: 'She spoke very rapidly and this was all coming out quite spontaneously.'A gesture depicting a substance gushing out of herself accompanied the words 'all coming out'.As Kong et al. (2015) note, this and other commonly used gesture-coding systems combine both forms (e.g., deictics) and functions (e.g., beats) of gesture, which is problematic.
The question of why people gesture remains controversial.Its obvious communicative functions do not account for the full range of types of gesture produced across different contexts, suggesting that gesture facilitates a speaker to organize their thoughts or to find words.In healthy speakers, gesture and speech are intimately intertwined and they collaborate in conveying meaning (Kendon 2000;McNeill 2005).Some researchers have argued that gesture can facilitate speech, although there is debate about whether this occurs at the conceptual (e.g., Melinger and Kita 2007) or word form level (e.g., Krauss et al. 2000).
The field of gesture research is therefore relevant to clinicians (speech and language therapists) working with clients with communication disabilities such as aphasia.Aphasia can have a profound impact on a person's ability to communicate, but varies in the forms of communication that it disrupts and spares.Clinicians need to be able to analyse changes to a person's understanding and use of gesture in order to have a holistic understanding of their communication skills.They may also seek to harness gesture to enhance a person with aphasia's communication skills, either as a compensatory means of communication (e.g., Roper et al. 2016) or to facilitate speech (e.g., Lanyon and Rose 2009).
Researchers exploring the gesture abilities of people with aphasia have reported a range of findings.Some have reported cases where gesture was relatively preserved and people with aphasia were able to use complex gesture to compensate for their impaired language skills (e.g., Kemmerer et al. 2007, Wilkinson et al. 2010).Other studies have compared groups of people with aphasia with healthy controls, examining both the functions and forms of gesture.
In terms of function, several studies have explored the communicative role of gestures (e.g., Akhavan et al. 2018, van Nispen et al. 2017) and how this interacts with speech.These studies have reported that people with aphasia rely on gesture more than healthy controls to get their message across.For example, van Nispen et al. (2017) examined the importance of gesture in successfully conveying information by categorizing gestures as essential to understanding an utterance, conveying similar meanings or additional information that supplemented speech, but was not key to understanding the message.They found that people with aphasia used more essential gestures than healthy controls, indicating that gesture carried a heavier communicative load.Johnson et al. (2013) investigated use of speech and gesture during tasks requiring participants to convey spatial information.They found that people with aphasia were more reliant on gesture than healthy controls, using more gesture in the absence of speech, when verbal spatial language was unavailable to them.
Research into the types of gesture produced by people with aphasia has generally found that they use a more limited range of gesture forms than healthy controls.People with aphasia are more reliant on 'shape' or 'outlining' gestures, which depict the physical properties of an object, rather than conveying its function (e.g., showing the shape/size of a ball, rather than pretending to kick/throw/bounce a ball) (Cocks et al. 2011, 2013, Mol et al. 2013, van Nispen et al. 2015).However, other researchers have argued that, if confounding factors such as limb apraxia and comprehension impairments are controlled for, people with aphasia's gesture production can remain functional, even when language is severely impaired (Akhavan et al. 2018).Johnson et al (2013) reported marked differences in the abilities of their participants to use gesture to compensate for their difficulties accessing verbal spatial language, arguing that this indicated the importance of clinicians investigating the language and gesture profile of individuals with aphasia.
Whether using gesture helps people with aphasia to resolve word-finding difficulties remains an area of controversy.Some authors have reported that people with aphasia are more likely to resolve their lexical retrieval difficulties when they are accompanied by semantically rich or iconic gestures (e.g., Akhavan et al. 2018, Kistner et al. 2019).However, others have observed that people with aphasia gesture during instances of word-retrieval difficulties, but have not found evidence that such gestures are facilitatory (e.g., Pritchard et al. 2013, Kong et al. 2019).In interpreting these findings, we need to allow for the fact that not every gesture made by a person with aphasia is intended to aid lexical access.Gestures can serve other purposes, such as conveying part of the message.This point links with a difficulty with many coding systems that was raised by Kong et al. (2015), namely that the coding of forms and functions is not clearly differentiated.These authors point out that one form of gesture can serve one or more functions.For instance, iconic gestures can facilitate word finding (e.g., Kistner et al. 2019) or they may help listeners (e.g., van Nispen et al. 2017).Kong et al. argue that the mixed coding of gesture forms and functions make studies difficult to compare, is conceptually problematic and may create confusion when it comes to interpreting gesture use.
Clinicians' role includes assessing communication skills across all modalities, to identify areas of strength, difficulty and aspects that may be suitable targets for therapy (RCSLT 2005).In the case of gesture, this poses many challenges.Unlike spoken language, it is hard to describe in written form due to its holistic, imagistic, transitory nature.Researchers have met this challenge in a variety of ways.Some gesture therapy studies have focused on the intelligibility of gestures before and after therapy, either to unfamiliar observers or familiar conversation partners (Marshall et al. 2012, Caute et al. 2013, Roper et al. 2016, Hogrefe et al. 2012).For example, in Marshall et al. (2012), judges were shown clips of individual gestures and asked to guess what they were, first with no clues and second from a choice of four multiple-choice options.In Hogrefe et al. (2012), participants were asked to retell 10 short video clips.Their narrations were shown to naïve observers who attempted to match them to the original video clips.These approaches give an objective measure of intelligibility but no clues about how information was conveyed.
In contrast, among studies comparing gesture abilities of people with aphasia with healthy controls, a key approach has been to describe and code gestures according to a set of gesture types (e.g., Kistner 2017, Van Nispen et al. 2015, Hogrefe et al. 2012).In most of these studies, gestures were videoed within a discourse context, such as conversation or narrative (e.g., Armstrong et al. 2007, Kistner 2017, Sekine et al. 2013, Sekine and Rose 2013).However, some studies used assessment tasks such as the Scenario Test (van der Meulen et al. 2010) or a barrier task (e.g., van Nispen et al. 2016) as a stimulus.The gestures were then analysed, often using coding software such as ELAN (Lausberg, & Sloetjes (2009).A few studies used a formal coding system, such as the Hamburg Notation System for Sign Languages (Hogrefe et al. 2012), while the majority used a novel coding scheme.
One of the challenges of reading and interpreting these research studies is that they employ different but overlapping classification methods that vary in the range and number of different categories included.For example, Mol et al. (2013) used four coding categories (outlining/moulding, handling, object/enacting and deictic), while Sekine et al. (2013) used 12 categories (concrete deictics, iconic character viewpoint, iconic observer viewpoint, emblems, metaphoric gestures, numbers, pointing to self, referential, time, pantomime, beats and letter gestures).Sometimes researchers use different labels for categories that are the same or very similar.For example, the rhythmic movements that accompany speech have been described as batons (Armstrong et al. 2007) or beats (Sekine et al. 2013).Likewise, gestures depicting someone holding and manipulating an object have been described as handling gestures (Mol et al. 2013, van Nispen et al. 2016) or kinetographs (Armstrong et al. 2007).
There are advantages and disadvantages to the different approaches to evaluating gesture.Coding of gesture types has the key benefit of providing rich, descriptive data.However, it is hugely labour intensive and time-consuming.In future, a potential solution to this may lie in technology.Novel research has explored the use of motion-tracking technology to analyse the kinematic features of gesture.However, this methodology is not expected to enable the automatic qualitative classification of gestures and hence replace manual coding (Trujillo et al. 2019).Rating methods that involve showing clips of gestures to unfamiliar observers offer the advantage of providing objective data about how communicatively effective a person's gesture use is.However, they offer no clues as to how the person with aphasia communicated the information, and are also very labour intensive and time-consuming.Neither approach is therefore a realistic option for the busy, fastmoving environment of clinical practice.
For clinicians working with people with aphasia, several commercially available assessments include a gesture subtest.For example, the Comprehensive Aphasia Test's (CAT, Swinburn et al. 2004) cognitive battery includes a 'Gesture Object Use' subtest, which assesses a person's ability to produce pantomime gestures from picture stimuli.In addition, several apraxia batteries include gesture subtests that assess for limb apraxia.For example, the Birmingham Cognitive Screen (Bickerton et al. 2012) assesses recognition and production of transitive and intransitive gestures, as well as imitation of meaningless gestures, while the limb apraxia subtest of the Apraxia Battery for Adults (ABA-2) (Dabul 2000) assesses the ability to produce transitive and intransitive gestures following verbal instructions.
These assessments have the advantage of being quick to administer and score, but have several disadvantages.These include a reliance on auditory comprehension skills in order to understand verbal instructions (e.g., ABA-2), the subjective nature of the scoring procedures due to the limited guidelines (e.g., CAT), a limited range of gesture types assessed (e.g., CAT only assesses pantomime gestures in response to picture stimuli), and a lack of information about gesture abilities in naturalistic situations.
A clinician embarking on a programme of therapy targeting a particular modality, such as speech, reading or writing, would typically be aiming to improve the client's communicative functioning either through developing their skills at an impairment level or by harnessing strengths to compensate for their weaknesses.They would therefore start by assessing the client's impairment and function, identifying processing strengths and weaknesses, and how these impact on their ability to carry out communicative tasks (RCSLT 2005).Impairment-based assessments may also reveal features of a client's underlying processing difficulties, for example, the integrity of their semantic system, and there is a developing evidence base that this may be the case for gesture (e.g., Cocks et al. 2011).At present, the gesture assessment tools available to clinicians do not facilitate the crucial first step of identifying the forms that a client is able to produce and those they struggle with.There is therefore a need for a 'quick and dirty' assessment tool that enables clinicians to capture a wide variety of gesture types in naturalistic communicative situations, such as conversation or narrative.Such a tool could support clinicians in goal setting by identifying targets for intervention or strategies that could be used to compensate for weaknesses.It could support clinicians in considering the frequency, complexity and variety of gestures that a client uses.
The aims of the current study are: • To investigate current practice among UK-based clinicians in relation to gesture assessment and therapy.• To synthesize gesture-coding frameworks used in research with people with aphasia in order to identify their key, common features.
• To develop a gesture checklist based on the synthesized frameworks that is suitable for use in clinical practice.• To investigate the IRR of the checklist among the research team and speech and language therapists (SLTs).

Methods and procedures
This study received ethical approval from the School of Health Sciences Research Ethics Committee at City, University of London.
A co-design workshop and survey of current practice was undertaken to address the first and third aims above.Participants were recruited via advertisement using Twitter, to SLT alumni mailing list, researcher contacts, the British Aphasiology Society and related clinical excellence networks.Attendance was open to all practising UK speech and language therapists interested in gesture in aphasia.

Current practice
Before workshop attendance, participants were invited to complete a brief survey to report current gesture practice.Respondents were asked if they currently undertook gesture assessment or therapy within their aphasia practice, what techniques they used if yes, and what were their top tips for gesture therapy in aphasia.

Synthesis of gesture-coding categories
The researchers (three SLTs and one clinical linguist) identified seven recently published articles that had used gesture-coding categories to describe and evaluate the types of gestures produced by people with aphasia.The studies used a range of elicitation techniques and their participants differed in the severity and type of aphasia, ranging from mild to severe and global aphasia.The search was conducted in January 2017 and all articles included were written in the previous decade.The authors were from a range of countries, including Australia, Japan, the Netherlands and the UK.The articles included do not represent an exhaustive or systematic search of the literature, but rather represent a purposive sample, with the authors drawing on their knowledge of the literature and aiming for good coverage of different approaches to analysing gesture.Details of these studies are shown in table 1.
The researchers held a discussion about which categories were distinct and which could be merged, how each category related to either form or function of gestures, which were feasible to use in the checklist, and how the categories could be synthesized into a framework which grouped related categories together.Following the suggestion by Kong et al. (2015), we decided it was best to aim for a list of categories that described gesture form only.

Development of a gesture checklist
A co-design workshop was undertaken to explore the acceptability and usability of three prototype checklists developed by the researchers following the synthesis exercise described in 2. The three prototypes were designed to have two key features in common: first, they all included the six synthesized coding categories.Second, they were all frequency measures designed to record the number of instances of particular gesture types.They differed in other respects, such as layout, use of images and instructions.
Using principles of co-design illustrated within Sanders and Stappers (2008), the workshop lasted 3 hours and comprised three elements: a discussion of current practice (based on survey outcomes in 1), a codesign session and a presentation by the researchers on gesture theory and research into gesture therapy and assessment in aphasia.
The co-design session began with small group work, with each group using one of the three prototypes to code three different video stimuli: • A person with aphasia describing a procedural narrative (making scrambled eggs).• A person with aphasia retelling a video story to her husband.• A person with aphasia describing a book she had read to a SLT.
Each video was shown once only.Following review and scoring of the videos using the allotted prototype checklist, smaller groups fed back to the whole group about their experiences.They were prompted to reflect on the terminology used, the clarity of instructions, the layout of the document, use of images and the clinical relevance of the identified categories.Attendees were also asked to reflect on stimuli and elicitation techniques they could envisage using for the checklist, for example, live conversation or using a video recording.
A further iterative process took place to refine the City Gesture Checklist (CGC).After the workshop, the researchers produced a final version of the CGC which incorporated the workshop attendees' key preferences and recommendations.This version was emailed to attendees who were asked to give feedback on it, specifically on how they had used it, what was good about it, any barriers they had experienced and suggestions for how it could be improved.The final version was also presented at a second workshop at a national SLT conference where delegates trialled using the CGC and gave feedback to the researchers in response to two questions: What makes the CGC fit for purpose in your clinical practice?What prevents it being fit for purpose?

Interrater reliability (IRR)
IRR was evaluated for both the research team, who were highly familiar with the CGC, and among a group of novel users who had received no training in using it.
The four familiar researcher raters used the CGC to describe the gestures in 20 video clips, each lasting 3 min.Videos were a convenience sample drawn from previous gesture research studies and had been filmed either in the home or clinic context.Clips included participants with a range of severities of aphasia, ranging from mild to severe, a balance of gender and a mix of ages within the range 40-90 years.All videos showed the person with aphasia talking to a single conversation partner (either a familiar family member or a clinician or an unfamiliar clinician or student clinician).Videos included a range of types of discourse, including procedural discourse (e.g., describing how to wrap a present or make scrambled eggs) and narrative discourse (e.g., talking about when they had a stroke, or describing the content of a video they had watched).This choice of videos sought to represent a variety of different interaction contexts, to reflect the aim that the CGC might be usable in a variety of situations.
The four researchers watched the videos and rated them concurrently.However, the researchers did not discuss their ratings or confer.The mean tally of gestures identified per video across all 20 videos was 7.93 (minimum = 2; maximum = 19) for the four familiar researcher raters.
Reliability among novel, unfamiliar users of the CGC took place at a third workshop, held at City, University of London.This was advertised through networks of aphasia clinicians, to the university's alumni and through Twitter.Eleven clinicians attended this workshop.On arrival, attendees were presented with the CGC and asked to read the cover sheet and key, which described and gave examples of the different gesture-coding categories.The attendees rated 10 videos at the workshop, a representative sample of the 20 rated by the researchers, including a range of types of discourse and levels of aphasia severity.The mean tally of gestures identified per video across all 10 videos was 6.17 (minimum = 0; maximum = 15) for the 11 unfamiliar clinician raters.
After both groups had rated the videos, the number of times each rater had recorded each category of gesture for each video was recorded in an Excel spreadsheet.The scores used to calculate the intraclass correlation coefficients (ICCs) were therefore frequency counts within gesture categories, for each video.These frequency scores were recorded for each participant and each video as a total number of gestures within each of the following eight categories: concrete pointing, abstract pointing, emblems, outlining/shape, pretending, number, air writing and other.ICC estimates and their 95% confidence intervals were calculated using SPSS (v. 25, SPSS Inc., Chicago, IL, USA) based on a single rating, consistency agreement, two-way random effects model.
The ICC estimates provide the following information about reliability (Koo and Li 2016): • < 0.5 = poor reliability.

Current practice
A total of 20 UK clinicians registered to attend the workshop, plus 10 more were on the waiting list, indicating a high level of interest in the topic.A total of 18 of 20 the attendees responded to the survey.Of these, 10 of 18 (55%) reported that they currently undertook the assessment of gesture in their practice and 11 of 18 (61%) reported undertaking gesture therapy.
All those who undertook assessment reported using informal methods, including gesture-to-object matching, demonstrating object use and informal object/picture gesture.In addition, three respondents reported using the CAT gesture subtest and two reported using formal apraxia assessment.
Of 11 respondents who reported undertaking therapy, 10 described the therapy techniques and targets that they used.Specific therapy techniques included modelling gesture, using a hierarchy of reducing prompts and Visual Action Therapy (Helm-Estabrooks et al. 1982), while others reported using basic principles such as ensuring that the client and those around them were motivated, starting with successful gestures and extending them, and providing opportunities for practice and consistency.The targets included functional objects personal to the client, using gesture to describe picture cards and using gesture to elicit spoken verbs.

Synthesis of coding categories
Analysis of the gesture-coding categories revealed that there were considerable similarities and overlap between the categories used in the papers, although the terminology used was sometimes different.The first step was therefore to identify categories that were the same or very similar, but had been described using different terminology.Examples of different terminology used for the same gesture types were deictic and pointing, batons and beats, handling and kinetographs, and outlining/moulding and shape (for definitions, see table 1).The next step in the analysis was to separate coding categories into those describing form or function.Form categories were defined as those describing the physical properties of a gesture, while function categories described how the gestures were used in an utterance (Kong et al. 2019).Table 1 illustrates how different gesture classifications were classed as form or function.
Following discussion about which aspects of gesture would be feasible to code online, a decision was taken to focus on categories that described the form of gestures.Although ideally the checklist would have described both the forms and functions, the researchers decided that it would not be possible to analyse gesture function online, as this would require analysis of that gesture's function in a whole utterance.The researchers therefore extracted the form categories for further analysis.
Having merged categories that were the same or very similar, and extracted those that described form, the researchers identified five main categories of gesture form: deictic (pointing), emblems, iconic, number and air writing.A sixth category of 'other' was included for gestures that did not fit into the main categories, such as gestures representing time or personalized gestures.Deictic gestures were subdivided into concrete and abstract gestures, while iconic gestures were further classified as outlining/shape, handling, enacting and object.Table 2 illustrates how the descriptions of gesture types in the seven papers were grouped into these six categories.

Development of prototype checklists
The three prototypes were all frequency measures, including the six categories of gesture form, but differed in their instructions, layout, use of images, terminology and descriptions of gesture types.Two of the prototypes included some additional questions, for example, whether gestures accompanied or replaced speech, the variety of gesture types used, other communication methods used alongside gesture, presence of hemiplegia/hemiparesis, motor difficulties and perseveration.

Feedback on prototype checklists
Two key recommendations emerged from consultation with workshop attendees: • Layout: Attendees expressed a preference for a score sheet that fitted on one side of A4 only, so that they did not need to turn the page while using the checklist.• Explanation and examples of gesture classification categories: Attendees identified that a separate reference sheet should be used to give further detail about the categories.This could sit alongside the score sheet and be referred to while using the checklist.

Use of images
Attendees felt that images were useful for illustrating the different gesture types.They preferred line drawings to photographs.The images in the final version were used with permission from british-sign.co.uk (https://www.british-sign.co.uk, n.d.).

Instructions
Attendees commented on the clarity of the instructions in the three prototypes.They found the term 'on-line' to be ambiguous and recommended the phrase 'This checklist is for use in real time'.They also thought that the checklist should ideally be used while observing a conversation, rather than whilst engaging in a conversation.They recommended adding the wording 'You could use it while watching a video or observing a conversation'.They recommended using the wording 'Tally the number of each gesture type you observe the client using' in order to clarify that the boxes should be used for counting the number of gestures observed in each category, rather than for writing descriptions of the gestures.

Gesture categories
Attendees stated they did not feel it was useful to distinguish between three out of the four subtypes of iconic gestures, namely handling, enacting and object.They gave examples of gestures that would be difficult to classify into these subcategories, for example, a gesture for feeling hot could be described as enacting or object (pretending to use one's hand as a fan).However, they felt it was useful to make a distinction between these types of iconic gestures and shape/outlining gestures.They proposed instead that it would be more helpful to subcategorize iconic gestures as either 'iconic pretending' (i.e., handling, enacting or object) or shape/outlining gestures.

Terminology
Attendees were asked to comment on the use of terminology, particularly whether they preferred technical terms to describe gesture types (e.g., iconics/deictics) or descriptions using more everyday terms (e.g., 'pretending' rather than iconics/'pointing' rather than deictics).The final terminology used was agreed with attendees who felt the term 'iconics' was useful, but otherwise preferred less technical terms or to use both the technical term and description (e.g., emblems/conventional gestures).

Inclusion of additional questions
Attendees felt that these were clinically useful.They suggested adding a space for additional comments or notes.

Further feedback after the workshop
The presence of a reference sheet was reported to be useful for illustrating the different gesture types.Attendees recommended that images on the scoresheet should also be accompanied by very brief written descriptions to explain the examples illustrated, for example, 'scissors', 'hello'.Attendees further requested the addition of a cover sheet to introduce the CGC, explain its aims, background, elements and structure, and recommendations for how to use it.Without this, participants did not always orientate themselves to all parts of the CGC before starting to use it, for example, some did not realize that there was a reference sheet at the back of the checklist.
The final version of the CGC comprises four pages: cover sheet, score sheet, additional questions and space to write comments/notes, and finally reference sheet.It is intended that users read all four pages before carrying out the assessment.They then use the score sheet while observing the conversation or video, making use of the reference sheet if necessary.The additional questions and comments are designed to be completed immediately after the observation.The CGC (see Appendix 2 in the supplemental data online) is also available at https://figshare.com/s/04a60389e5ba500a88da.

IRR
The IRR was calculated separately for experienced and novel users.The experienced users were the four researchers who each rated twenty 3-min video clips of people with aphasia in conversation.ICCs were calculated (using the model, type and definition reported in the Methods) and agreement between the experienced users was moderate (ICC = 0.681, 95% confidence interval (CI) = 0.617-0.740).
The novel users were 11 SLTs attending a workshop who had received no training in using the CGC.They used the CGC to rate a subset of the videos rated by the researchers, comprising ten 3-min video clips.ICCs were calculated and agreement between the novel users was also moderate (ICC = 0.513, 95% CI = 0.429-0.606).

Discussion
This study explored current clinical practices for gesture assessment and created an evidence-based gesture checklist for clinical use.Just over half (55%) of the clinicians we surveyed reported undertaking assessment of gesture in their current clinical practice.They reported limited or no use of formal tools to assess gesture production, highlighting the gap this study aims to fill.Our research synthesis resulted in a gesture checklist containing six categories of gesture form, which was refined and amended following user feedback.On trialling the gesture checklist, we found moderate reliability for both experienced and untrained users.
Despite the fact that several assessment batteries include a gesture subtest, the only standardized tool used for the assessment of gesture by respondents to our survey was the CAT (Swinburn et al. 2004).The cognitive battery of the CAT includes a 'Gesture Object Use' subtest that assesses gesture production.The person with aphasia is shown six photographs of everyday objects (e.g., comb, toothbrush) and is asked to imagine that the examiner has put each object in their hand.They are then asked: 'Show me what you would do with it.'The assessor rates the gesture on a scale of 0 to 2, with 2 indicating that the gesture was correct with no ambiguity, 1 indicating that the action or orientation was incorrect, or that a body part was used as an object (e.g., pretending that one's hand is a comb/toothbrush, as opposed to pretending to hold a comb/toothbrush).A score of 0 indicates that the gesture was considered incorrect.The CAT Gesture Object Use subtest has the advantage of being quick to administer and it gives an insight into a person's ability to produce pantomime gestures from picture stimuli.Its disadvantages include the subjective nature of the scoring due to the limited guidelines, and the limited information about intelligibility and how an item was conveyed (the form of the gesture).For example, a gesture using a finger to represent a toothbrush could be clearly intelligible, but would be marked down due to being classified as a 'body part as object' error.Furthermore, this subtest only assesses one type of gesture production and so does not give a holistic picture of a person's use of gesture.Indeed, people with aphasia may find transitive gestures involving action on an object more difficult than other types of gesture such as intransitive gestures (e.g., actions such as swimming or running) or emblems (Power and Code 2006).Finally, it gives no insight into a person's use of gesture in more naturalistic communicative contexts.Research also sug-gests there is a complex relationship between people with aphasia's ability to perform pantomime gestures from a picture stimulus/to command and their spontaneous use of gesture in more naturalistic situations such as conveying a narrative (e.g., Hogrefe et al. 2012).
In the present study, respondents were more likely to report using informal methods and clinical judgement, perhaps motivated by a desire to evaluate use of gesture in more naturalistic communicative contexts.However, the use of informal methods would still entail choosing an aspect of gesture to evaluate, leading the clinician to the same choices faced by researchers: choosing to evaluate gesture intelligibility (e.g., Marshall et al. 2012, Caute et al. 2013, Roper et al. 2016, Hogrefe et al. 2012) or how a gesture is conveyed (e.g., Kistner 2017, Van Nispen et al. 2015, Hogrefe et al. 2012); and deciding between gesture function (e.g., Akhavan et al. 2018, van Nispen et al. 2017) and gesture form (e.g., Mol et al. 2013, Sekine et al. 2013).
The 11 clinicians who completed our survey all reported informal methods of evaluating gesture such as gesturing the content of a picture or demonstrating object use.They described the targets rather than how they evaluated the gesture.As argued above, clinicians need to include gesture in their holistic appraisal of their client's communication skills.Research shows that people with aphasia rely on gesture more than healthy controls to get their message across (e.g., Akhavan et al. 2018, van Nispen et al. 2017); and that they use a more limited range of gesture forms than healthy controls (e.g., Cocks et al. 2011, 2013, Mol et al. 2013, van Nispen et al. 2015).For an effective appraisal of a client's gestural abilities, clinicians will therefore need to be able to evaluate both gesture intelligibility and assess the range of forms a client can produce.
The aim of this study was to create a usable checklist of gesture forms.We synthesized the range of frameworks used for coding gesture identified in the evidence base.Our aim was to identify key, common features in order to amalgamate the varied frameworks into a single checklist for clinical use.As a result of merging categories that were similar, and feedback from clinicians in the co-design workshops, the resulting checklist contains six categories describing gesture form: pointing, emblems, iconic, number, air-writing and 'other', with the categories pointing and iconic each having two subcategories.Following discussion with clinicians about the gesture they have seen in their own clients, as well as the gesture videos we provided, the pointing category is subdivided into concrete versus abstract pointing.This coding distinction parallels one made by Sekine et al. (2013) between 'referential' (aligning with the term 'abstract deictic' in our checklist) and 'concrete deictic' gestures.In their study comparing the spontaneous co-speech gestures of people with aphasia and healthy controls, they found that whilst both groups used 'referential' gestures, only the PWA group used 'concrete deictic' gestures.This suggests that this distinction is useful clinically, thereby justifying its inclusion in our checklist.
Similarly, in our checklist, iconics are subdivided into pretending versus shape/outlining.The clinicians in our co-design workshops did not see any clinical utility in distinguishing iconic gestures that depicted 'handling', 'enacting' and 'object'. Interestingly, Mol et al. (2013) did make these coding distinctions but found that they were of more use in describing the gestures of healthy controls than those of PWA.In contrast, the clinicians in our co-design groups did feel it would be clinically useful to distinguish iconic pretending from shape outline gestures.This parallels the evidence base in which PWA have been found to use more shape/outlining iconic gestures than healthy controls (Cocks et al. 2011, 2013, Mol et al. 2013).
In line with the suggestions of Kong et al. (2015), we made an a-priori decision to create a checklist for the form of gesture only, and not also include codes for gesture function.The driver for this decision was practical.It was thought that it would not be possible to analyse gesture function online, as this would require an analysis of that gesture's function in a whole utterance.However, there are other good reasons for not combining form and function in a single checklist, such as making analysis difficult to compare, potentially creating confusion when it comes to interpreting gesture use.Many commonly used gesture-coding frameworks do cover both, including frameworks used in the aphasia literature (Sekine et al. 2013) and frameworks used more widely (McNeill 1992).When they do, however, the functions included tend to be fairly limited (i.e., beats or batons which reflect features of intonation or help to pace the language).The wider range of functions of gesture is more usually evaluated using different frameworks.For example, van Nispen et al. (2017) used three functional categories (essential, similar and additional), and Akhavan et al. (2018) used several functional categories of gesture (matching, complementary, compensatory, social cueing and facilitating).Whilst not necessarily needing to code at this level of detail, we assumed that clinicians would want to be able to assess a client's ability to use gesture for more than one function.For this reason, we included some additional questions on the checklist about whether gesture was effective and also how it was used.Specifically, we asked whether a client used gesture in place of speech (i.e., was it essential, compensatory or facilitatory) or did they use it alongside speech (i.e., was it similar, additional, complementary or matching).The clinicians in our co-design sessions agreed that these additional questions helped provide a more holistic evaluation of a client's gesture use.They also indicated that the additional questions had further clinical use, such as aiding decisions about treatment goals.
The workshops were therefore successful in allowing us to iteratively refine the coding categories; to reach a consensus on the need for additional questions; and finally, to give feedback on presentational features (instructions, layout, terminology and use of images).The workshops also provided us with an opportunity to measure the IRR of the checklist with two user groups: experienced gesture researchers and new users with no training.We found that reliability was moderate for both experienced users and new users.While new users were at the lower end of the moderate range, experienced users were at the higher end (Koo and Li 2016).Indeed, in other guidelines for interpreting ICCs, their reliability would have been classed as good (Cicchetti and Sparrow 1981).This suggests that the CGC can be used by clinicians with no training, but that reliability could be improved with practise.
Limitations of the study relate to the synthesis of the literature, co-design process, evaluation of IRR and challenges inherent in carrying out a frequency count of gesture types.
The choice of articles to include in the synthesis was not carried out systematically, but rather was based on purposive sampling, selecting a range of articles available at that time.
There was a relatively small number of participants in the co-design process.It should also be noted that workshop attendees were clinicians with an interest in attending an aphasia gesture workshop and so cannot be taken as representative of the wider population of SLTs.Ideally the clinicians would have been involved in the co-design process from the outset in order to ensure that it met the needs of end-users.
Limitations relating to the evaluation of reliability were the small number of videos used for coding and the small number of participants in the two groups of experienced and novice raters.One challenge of using the CGC was the difficulty of deciding where one gesture began and ended.Clinicians attending the co-design workshop did not raise this as an obstacle to using the CGC.However, this could have been a source of inconsistency between raters because it introduces a subjective aspect to conducting a frequency count of gesture types.Identifying the beginning and end of gestures may have been harder to determine for some types (e.g., beats) than others (e.g., iconic/shape gestures).The CGC aims to give an indication of how frequently the different gesture types are used, even if this is an approximate rather than an exact figure.The emphasis is on providing a rapid profile of types of gesture used and their relative frequency.
Future research is needed to further evaluate the psychometric properties of the CGC, exploring IRR in more detail, but also evaluating intra-rater reliability.Its sensitivity to change and potential to be used as an outcome measure should also be explored.Further investigations could investigate how to improve IRR, for example, by developing training, and enhancing the reference materials and examples provided.It would also be informative to analyse IRR separately for the identification of gesture from people with different types of aphasia (e.g., fluent versus non-fluent) or levels of severity, as well as exploring sources of disagreement among raters, for example, whether certain gesture types are more challenging to code.Finally, further research is needed to explore the feasibility of using the CGC in clinical practice, for example, whether it can be used to analyse a live conversation rather than a video.
In conclusion, the CGC is a novel method of assessing different types of gesture form produced by people with aphasia.It can be used in real time to evaluate gesture use in conversation without the need for coding.It was developed following a co-design process involving clinicians with an interest in gesture and aphasia.Findings suggest that it can be used by novel users who have not received training in its use, with a moderate level of IRR.Therefore, it offers clinicians a novel means to assess gesture, enabling the exploration of how an individual uses gesture to communicate.