Development of a measure of genome sequencing knowledge for young people: The kids‐KOGS

Abstract Genome sequencing (GS) is increasingly being used to diagnose rare diseases in paediatric patients; however, no measures exist to evaluate their knowledge of this technology. We aimed to develop a robust measure of knowledge of GS (the kids‐KOGS') suitable for use in the paediatric setting as well as for general public education. The target age was 11 to 15 year olds. An iterative process involving six sequential stages was conducted to develop a set of draft true/false items. These were then administered to 539 target‐age school pupils (mean 12.8; SD ± 1.3), from the United Kingdom. Item‐response theory was used to confirm the psychometric suitability of the candidate items. None of the Items was identified as misfits. All 10 items performed well under the two‐parameter logistic model. The internal consistency of the test was 0.84 (Cronbach alpha value) indicating excellent reliability. The mean kids‐KOGS score in the sample overall was 4.24 (SD; 2.49), where 0 = low knowledge and 10 = high knowledge. Age was positively associated with score in a multivariate linear regression. The kids‐KOGS is a short and reliable tool that can be used by researchers and healthcare professionals offering GS to paediatric patients. Further validation in a clinical setting is required.


| INTRODUCTION
In 2019, genome sequencing (GS) will become part of the NHS England commissioned national genomic medicine service for rare disease and cancer, facilitating systematic access to genomic testing across the country. 1 This service follows from the 100 000 Genomes Project, the largest national sequencing project of its kind in the world delivering research on how best to use genomics in healthcare and interpret data to help patients. 2 Children and young people are significant benefactors of GS technology: of the rare disease proband participants in the 100 000 Genomes Project, around a quarter of them were 15 years of age or under at the time of taking part (data accessed from the Genomics England Research Environment, 11th November, 2018). GS has been shown to improve diagnostic yield over targeted gene sequencing in the paediatric setting as the reported diagnostic yield in previously unsolved paediatric cases is already around 40% and will likely continue to increase as knowledge grows. 3 This offers an end to the "diagnostic odyssey" for many children (and their parents) with rare diseases. 4 Other benefits include enabling targeted therapy for some, reproductive planning and opportunities to make contact with other families whose children have similar conditions. 5 A key challenge in implementing GS in clinical practice is how to counsel patients so that they can make an informed choice, defined as a decision "that is based on relevant knowledge, consistent with the decision-maker's values and behaviourally implemented". 6 With regards to knowledge, in the context of GS this will require health professionals to explain to patients (including parents and their children) about issues including what GS is, the possible genomic results that may be revealed and the limitations and uncertainties. 7 In 2018, our group developed the knowledge of GS (KOGS) measure, to address the need for a valid, reliable measure that can be used with individuals in a range of settings. 8 The measure is "contextneutral," that is, the items can be administered to patients or other stakeholders (eg, healthcare providers, students, general public) regardless of the clinical or other context. This measure was developed with and administered to adults aged 18 years and over.
Engaging young people as active participants in the decisionmaking process is increasingly seen as good clinical practice 9 and the few qualitative studies about GS that have been conducted with this age group suggest that they want to be engaged in that process. [10][11][12] In the 100 000 Genomes Project, 11 to 15 year olds invited to take part in the study were given tailored participant information sheets, encouraged to be active participants in the decision-making process, and if they wanted to take part, sign an "assent" form, in addition to their parents ultimately consenting on their behalf. 13 (Patients aged   16 years and upwards were, in contrast, considered adults and consented on their own behalf). Currently, no knowledge measure exists that is specifically aimed at young people who may be involved in decision-making about GS. Given that they make up a significant pro-

| Target age groups
The measure was developed for a target age group of 11 to 15 year olds.

| Selection of knowledge domains
During the development of the adult KOGS, 8

| Item development
A set of 10 true/false knowledge items (Table 1) were developed to cover each of the three domains, which were informed via six sequential phases.
The first phase involved speaking directly with young people aged 11 to 15 years to identify the key questions they would want answered if they were considering having GS. To do this, CL con-  10. If someone with a health problem has genome sequencing, they will always find helpful information about the cause of the problem □ □ □ preparation) and CL and SS conducted two school visits in London (one primary school and one secondary school). To ensure that the pupils had some background understanding of genomics, they were first shown a short section of an educational video. 14 Ten broad questions were identified in this phase which covered the three domains, including "What is a genome?", "How do you do genome sequencing?", "How accurate are the results from genome sequencing?" and "Will you always get an answer from genome sequencing?" (see: Table S1).
The second phase was to map items from an early 17-item draft of the (adult) KOGS measure 8 (including both true and false items), onto the questions that had been identified by young people for potential inclusion in the kids-KOGS (see Table 1). After this exercise, three questions remained that had not been addressed using the draft 17-item (adult) KOGS; "What is DNA?", "How does our genome affect our health?" and "How similar is our genome to other people's?" Three items were developed specifically to address these questions.
The third phase was to develop a true and false version of each of the items. These were either taken from the previously developed true and false items from the 17-item draft KOGS or developed specifically for the kids-KOGS by CL and SS. This resulted in 20 items (10 true and 10 false) (see Table S2). In the fourth phase, we randomly chose five numbers between 1 and 10 using a random number generator, and the true versions of these items were selected, resulting in 5 true and 5 false items. In the fifth phase, cognitive interviews were conducted with two young people taking part in the 100 000 Genomes Project (who had taken part in the qualitative interviews), two science teachers and one adult parent to provide feedback on wording and comprehension.
Minor changes to wording were made at this stage. In the final sixth phase, the 10 items were then administered to a group of 83 pupils at a school in the East of England aged 11 to 12 and feedback was sought on the wording and comprehension. At this stage, one of the false items was swapped for a true item as it was considered by pupils to be ambiguous ("whole genome sequencing is done through an X-ray") leaving 6 true and 4 false items in the knowledge scale.

| Questionnaire administration
The 10-item knowledge scale (Table 1)   Programming Environment. 15 Both parallel and scree plot analysis were used to identify the number of factors in the data.

| Psychometric analyses
We used item response theory (IRT) to analyse the psychometric properties of the 10-item scale. IRT is widely used to evaluate the relationship between the test takers' ability (in this case, knowledge of genome sequencing) and their responses to individual questions assessing knowledge of genome sequencing. 16 IRT is increasingly favoured as a psychometric analysis tool as it offers deeper insights into the way in which questionnaires function. Under IRT paradigms, further investigation in psychometric performance is possible using sophisticated methods (eg, differential item functioning, local dependency, person and item fit). We employed the two-parameter logistic (2PL) IRT model, which is often applied to scales with dichotomous (ie, yes/no) responses, for item analyses, including item difficulty and discrimination parameters. 17 We assessed the quality of the nascent scale by evaluating the fit of the data to the model at the item, person, and whole scale levels.
Additionally, we ensured that the data did not violate the assumptions of the model, namely; dimensionality (assessed using factor analysis described above), local independence of items, and differential item functioning. Further details of these analyses are provided in Supporting Information: Methods section.
Participants were split into two groups based on the median age

| Statistical analyses
The proportion of "correct" responses for each of the 10 items was described. Bivariate analysis was conducted to examine whether there were differences in responses to the total kids-KOGS score according to age, sex and school. We used a t test to examine the association between the Kids-KOGS score and age, a Pearson's correlation for age and an analysis of variance (ANOVA) for school. A multivariate linear regression was conducted to explore the independent associations between the dependent and three independent variables. All tests were two-tailed and significance level was set at P < .05. Statistical analyses were conducted using SPSS v22.

| Sample characteristics
In total, 554 participants were offered and attempted the questionnaire (

| Psychometrics
Scree-plot analysis confirmed that the scale does assess a single underlying construct (ie, knowledge of genome sequencing, see Figure 1). The eigenvalue of the first factor was 4.97, with no other factor greater than 1. Monte-Carlo analysis suggested that some mild multidimensionality was present, but the additional factors did not account for much variance and may have been spurious, given the marginality of the result and known issues with this procedure. [20][21][22] Factor loadings were all greater than 0.30, ranging between 0.35 and 0.66 (see Table S3). It was, therefore, deemed appropriate to treat the scale as a unidimensional measure and continue with IRT analyses on the 10 items. We fitted data from 539 respondents to the 2PL IRT. Twenty-nine participants were removed from further analysis because of aberrant responses patterns (eg, correct responses to hard questions, incorrect response to easy questions; ZH value | > ±2|). We refitted the 2PL model using data from the remaining 510 participants and both items and person fitted the model (see Figure 2A and Table 3). Local dependency was not evident for any items. The item difficulty and discrimination parameter estimates for the final 10 items are shown in Table 4. Item 1 is the easiest item with a difficulty (θ) estimate of −1.66, while item 4 is the hardest with a difficulty (θ) estimate of 1.65. All items have acceptable discrimination values ranging from 1.13 (item 2) to 2.51 (item 3). The item characteristic curves for each item is displayed in Figure 2B. No items displayed DIF for either age or sex, indicating that the scale functions uniformly across demographic groups. Items were flagged as displaying DIF (- Table S4A,B), indicating that respondents with the same underlying true ability from different sub-sample groups (sex and age) did not have a different probability of endorsing the same response.
With reference to the model fit, the M 2 statistic was 73.01

| Descriptive analyses
As shown in Table 5, the item that was most frequently answered correctly was "Our DNA is inside our cells" (true, 83.1% correct) followed by "Our DNA doesn't have an effect on how our body works" (false, 69.2% correct). The items that were least often answered correctly were "Around 1% of our genome is the same as other people's" (false, 14.1% correct) and "Our complete set of DNA is called our genome" (true, 21.9% correct).

| Statistical analyses
The mean (SD) kids-KOGS score in the sample overall was 4.24 (2.49), where 0 = low knowledge and 10 = high knowledge. There were differences by sex, age, and school in bivariate analyses. The mean kids-

| DISCUSSION
This is the first measure of KOGS that has been developed specifically for young people. The strengths of the methodological approach are that (a) the questions were developed with young people (including those with rare diseases) to ensure they addressed aspects of GS they thought were important, (b) feedback on wording was sought with a range of stakeholders including young people at multiple stages, (c) analysis of dimensionality and item properties was conducted using a rigorous psychometric measure development approach, and (d) the measure can be used in a range of settings including with paediatric patients in clinic as well as with young people in schools.
As GS becomes mainstreamed into clinical care to diagnose young people with rare diseases, it will be increasingly important to assess their understanding of genetics and genomics as well as the limitations and uncertainties of the technology. This is particularly  which may lead to feelings of frustration and disappointment. 24,25 Studies have also shown that age may play a factor in young people's understanding of genetics 26 as well as whether there is a family history of a genetic condition. 27 Using this measure, health professionals may be able to identify those young people that might have limited knowledge or misunderstandings about WGS and who may therefore require more in-depth counselling or information provision.
We found age to be significantly associated with knowledge: older pupils scoring higher on the kids-KOGS than younger pupils. This finding differs to that of Sabatello et al. 28 who did not find any differences in objective genomic knowledge between 14 and 17 year olds. However, their study included a more limited age range and a different set of questions. Our findings might reflect the National Curriculum in England where concepts, such as genetics and DNA are only formally introduced into science lessons at Key Stage 3 which is during the first 3 years of secondary school (ages [11][12][13][14] and genomics at general certificate of secondary education (GCSE) level (ages [15][16]. 29 The term "genome" is also a more recent concept and may therefore be less well understood within the public sphere. A recent report on genome editing found that there was confusion about the term 'genome' even among patients and families affected by rare diseases who might be considered likely to encounter the term. 30 Introduction to the concept of DNA from the age of 11 might also explain why the two questions most frequently answered correctly related to the location and function of DNA rather than questions related to genomics which is only formally introduced as school at GCSE level.
The main limitation of this study is that the sample was not evenly balanced between male and female participants, however, sex was not found to be significant in the linear regression. We have also not yet had the opportunity to use the kids-KOGS in a clinic setting with young people. Another limitation is that a nested structure may exist because the scales were administered to students from difference schools, indicating possible dependencies in the data structure.
Hence, future research is warranted to investigate the possibility of multilevel dependencies between schools in greater detail. In

CONFLICT OF INTEREST
Nothing to declare.
T A B L E 5 Proportion of correct responses to each of the 10 knowledge of genome sequencing measure for young people (kids-KOGS) items

DATA ACCESSIBILITY
An SPSS file of the total dataset is available on request to the corresponding author.