Patient‐reported outcomes in childhood head and neck rhabdomyosarcoma survivors and their relation to physician‐graded adverse events—A multicenter study using the FACE‐Q Craniofacial module

Abstract Introduction Adverse events (AE) of treatment are prevalent and diverse in head and neck rhabdomyosarcoma (HNRMS) survivors. These AEs are often reported by physicians; however, patients' perceptions of specific AE are not well known. In this study, we explored patient‐reported outcomes measuring appearance, health‐related quality of life (HRQOL), and facial function in HNRMS survivors. Second, we assess the relationship between physician grading of AE and patient reporting. Materials and Methods Survivors of pediatric HNRMS, diagnosed between 1993 and 2017, who were at least 2 years after completing treatment were invited to an outpatient clinic as part of a multicenter cross‐sectional cohort study. At the outpatient clinics, survivors aged ≥8 years filled out the FACE‐Q Craniofacial module; a patient‐reported outcome instrument measuring issues specific to patients with facial differences. AE were systematically assessed by a multidisciplinary team based on the Common Terminology Criteria of Adverse Events system. Results Seventy‐seven survivors with a median age of 16 years (range 8–43) and median follow‐up of 10 years (range 2–42) completed the questionnaire and were screened for AEs. Patient‐reported outcomes varied widely between survivors. Many survivors reported negative consequences: 82% on appearance items, 81% on HRQOL items, and 38% on facial function items. There was a weak correlation between physician‐scored AEs and the majority of patient‐reported outcomes specific for those AEs. Conclusions Physician‐graded AEs are not sufficient to provide tailored care for HNMRS survivors. Findings from this study highlight the importance of incorporating patient‐reported outcome measures in survivorship follow‐up.


| INTRODUCTION
Rhabdomyosarcoma (RMS) accounts for around 4% of all childhood cancers and originates in the head and neck (HN) area in 40% of patients. 1 Survival has increased significantly since the use of multimodality therapy, including local treatment with radiotherapy, and in some cases, added surgery. However, both radiotherapy and surgery damage healthy tissues. This damage can cause a wide range of adverse events (AEs) in survivors, including visible facial differences, ocular impairment, hearing impairment, speech abnormalities, and endocrinopathies. [2][3][4][5][6][7] With more patients becoming long-term survivors, AEs are an important topic. The Common Terminology Criteria of Adverse Events (CTCAE) 8 is a clinical grading system used to report AEs. 9 However, the relation between the grade of AEs and the patients' perception of those AEs is not consistent in adult studies [10][11][12] and is not well described for children and adolescents. A better understanding of the patients' perception could improve the quality of care for survivors.
Our group 13 has previously reported on the psychosocial well-being of a partially overlapping cohort of 65 childhood HNRMS survivors. That study showed that health related quality of life (HRQOL) of survivors was comparable to general population norms on most psychosocial domains. However, survivors reported diseasespecific issues such as negative self-image and lack of satisfaction with appearance. To further characterize these issues, condition-specific patient-reported outcome (PRO) instruments can be used. It was previously shown that the majority of available PROs for children and youth with craniofacial conditions contain limited appearance and facial function items and lack content validity. 14 To address this limitation, the FACE-Q Craniofacial module was developed. 15 This PRO instrument is composed of a comprehensive set of independently functioning scales that are applicable to a wide range of conditions associated with facial differences, including childhood cancer. The scales measure outcomes related to appearance, HRQOL, and facial function.
The aim of the present study was to explore specific PROs for appearance, HRQOL, and facial function within a cohort of pediatric HNRMS survivors, using relevant scales from the FACE-Q Craniofacial module. We explored differences between survivors in terms of gender, age at diagnosis, attained age, follow-up period, tumor site, laterality, and local treatment strategy. Second, we assessed relationships between physicians' grading of AEs and specific PROs.

| Setting
Survivors were recruited from five international centers: Great Ormond Street Hospital, London, United Kingdom; University of Florida Health Proton Therapy Institute, Florida, United States; Institute Gustave Roussy, Paris, France; Emma Childrens' Hospital, Amsterdam, which later transferred all pediatric care to the Princess Máxima Center for pediatric oncology, Utrecht, The Netherlands. Survivors of pediatric (0-18 years) HNRMS, diagnosed between 1993 and 2017 who were ≥2 years after completion of treatment were eligible. All survivors were treated with multiagent chemotherapy and local treatment. 1,16,17 Four local treatment strategies were available during the period studied: definitive external beam radiation with photons (RT); definitive external beam radiation with protons (PT); microscopically (R0) radical surgery combined with RT or PT (the Paris-method); macroscopic radical surgery combined with brachytherapy (AMORE). 18 Data on AEs were collected during standardized multidisciplinary outpatient clinics held between January 2017 and December 2019. Survivors aged ≥8 years were also invited to complete the FACE-Q Craniofacial scales before clinic; they were sent by mail or given when entering the outpatient clinic. Oral or written informed consent was obtained based on national and local standards. In the United Kingdom and United States, this study was approved by the national and local ethics committee and written consent was obtained from all participants. In the Netherlands and in France, this study was exempted from ethical approval as the study fell under regular healthcare practices.

| Patient-reported outcomes
We used 11 of the FACE-Q Craniofacial module 15 scales that were developed as part of the CLEFT-Q (15) and field-tested in a large sample of noncleft craniofacial patients. 19,20 Each scale containing 7-12 items, answered on a 1-4 Likert scale. This PRO instrument assesses concepts from three different domains: appearance (of face, nose, teeth, lips, and jaw), HRQOL (psychological, social, and school function and speech distress), and facial function (speech function and eating & drinking). The appearance scales ask how much the respondent like their current appearance. The HRQOL and facial function scales ask respondents how often or how much a set of statements applied to them in the previous week. Participants completed only relevant scales (e.g., jaws, for participants aged ≥12 year; school, for participants aged ≤18 year and attending school). The eating & drinking scale was only used as an item checklist. 21 For all other scales, the sum score of items was available as a Rasch transformed score 22 from 0 to 100. Lower scores reflect worse outcome. Internal consistency of scales was good, 23 with Cronbach's alpha between 0.83 and 0.97 in our cohort. If missing data comprised <50% of the scale's items, the mean of the completed items for a scale was used, otherwise a score was excluded for that survivor.

| AE assessment
A predefined list of AEs were graded according to CTCAE 4.0 1 , was added to Supplemental Data A. We assessed musculoskeletal deformity, short stature (<-2SD), speech abnormalities, oral malfunction (trismus, xerostomia, taste alterations), hearing impairment, ocular impairment, and facial nerve paresis. AEs were dichotomized into </≥ grade 2 to reflect the absence/presence of a clinically relevant problem (i.e., being symptomatic, requiring alterations in activities of daily living, and/or the need for an intervention or medication) (Supplemental Data A).

| Statistical analysis
Data were analyzed with SPSS version 26.0. To explore PRO scores, mean and standard deviations (±SD) were calculated for the scales, for the whole cohort and for subgroups. Subgroups were based on: gender, age at diagnosis, attained age, follow-up period, tumor site, laterality, and treatment strategy. Differences between subgroups were tested with a one-way ANOVA and/or independent sample t-test. Differences between appearance scale scores within survivors were tested with a dependent t-test.
Effect sizes (Cohen d) were calculated and considered as: 0.2 small, 0.5 medium, and ≥0.8 large. 24 Correlations between scale scores were calculated with Pearson correlation coefficient (r) and considered as: 0.1 weak, 0.3 medium, and ≥0.5 strong. 24 To get more detailed insight, item level analyses were explored. We calculated the percentage of survivors that reported negatively for items on the appearance scales (i.e., "not at all," "a little bit"), HRQOL scales (i.e., "never," "sometimes") and speech distress, speech function, and eating & drinking scales (i.e., "always," "often").
To assess the relation between grading of AEs and PRO scores, we compared the mean scale scores of the survivors with a clinically relevant AE to that of survivors without a clinically relevant AE, using independent sample t-test and Cohen's d. For the psychological and social scales, the relation with every AE was assessed. In addition, appropriate scales were examined per AE. The relation of the number of different AEs with the psychological and social scale scores was examined with Spearman rho test.

| Survivors
Ninety-five survivors aged ≥8 years attended the clinics. Seventy-seven (81%) completed the questionnaire. The 18 nonparticipants were more often treated with the Paris-method compared to the participants (p = 0.004) (Table S1). Table 1 presents the survivor's demographic and clinical characteristics.

| Exploring patient-reported outcomes
The face, psychological, school, and social scales are presented in Table 2. Table S2 shows the scales concerning specific aspects of the face (nose, teeth, lips, jaw), and the speech distress and speech function scales. The prevalence of negative reporting at item level is presented in Table 3.

| Appearance
The distribution of scores on the face scale varied widely: range 7-100. The mean face score was significantly higher for survivors aged 8-12 years compared to survivors aged 13-17 years (d 0.6). The mean score on the lips scale was significantly higher for survivors aged 8-12 years compared to older survivors (13-17 years d 0.7; ≥18 years d 0.8). Mean lips and jaw scores were significantly higher for orbit site compared to PM site (d ≥ 0.9). Mean face score was significantly lower for survivors treated according to the Paris-method compared to survivors treated with protons (d − 1.2). Mean lips score was significantly lower for survivors treated according to the Paris-method compared to survivors treated with protons (d − 1.3) or AMORE (d − 1.2).
Within survivors, scores on appearance of the lips, nose, and jaw were significantly higher compared to their face score (d 0.9, 0.8, 0.5, respectively).
Sixty-three (82%) survivors reported negatively on ≥1 of the appearance-scales items. Every item of the face, jaw, and teeth scales was reported on negatively by >20% of survivors. Sixty percent of survivors reported negatively on the item "…how well both sides of your face match."

| HRQOL
The mean psychological scale score was significantly higher for survivors aged 8-12 years compared to older survivors (13-17 years d 0.7; ≥18 years d 1.0). Survivors with ≥10 years follow-up had lower mean psychological score compared to those with shorter follow-up (6-9 years d − 0.8). The mean psychological score was significantly higher for survivors treated with protons compared to survivors treated with RT (d 0.7) or the Paris-method (d 1.2).
Sixty-two (81%) survivors reported negatively on ≥1 of the HRQOL-scales items. Nearly half (47%) of all survivors reported negatively on the item "I feel good about how I look."

| Facial function
The mean speech function score was significantly higher for AMORE-treated survivors compared to the survivors treated with RT (d 1.1), protons (d 0.9), or the Paris-method (d 1.3). Eighteen percent of survivors reported that they need to speak slowly to be understood. Twenty-nine (38%) survivors reported negatively on ≥1 of the speech function items. Twenty-eight (36%) survivors reported negatively on ≥1 of the eating & drinking items.
Strong correlations (r ≥ 0.5) across the domains were seen for the: face and psychological scale; face and social scale; and speech function and speech distress scale (Table S3).

| Relation between AEs and PROs
Both the highest and the lowest scores on the face scale were reported by the survivors with a grade 0 or 1 deformity ( Figure 1). No differences were seen between survivors with or without a musculoskeletal deformity grade ≥2 on any of the tested scales (Table 4).
Large (d ≥ 0.8) differences in some PRO scale scores between survivors with and without a clinically relevant AE were seen for: speech abnormality, oral malfunction, and facial nerve paresis (Table 4), with lower scores for the survivors with the AE present.
The number of different AEs was nonsignificantly, weakly associated with the mean psychological and social scores (r − 0.106 and − 0.129. respectively) ( Figure S2).

| DISCUSSION
The PROs scores for appearance, HRQOL, and facial function varied widely in this cohort of HNRMS survivors. Many survivors reported negative consequences: 82% on appearance items, 81% on HRQOL items, and 38% on facial function items. PRO scores across the three domains were associated with each other. The correlation between the presence of a clinically relevant AE as graded by physicians and PROs was weak for the majority of the tested PROs, and strong for only a few.

T A B L E 3
Percentage of survivors reporting negatively on the scale items of (A) appearance, that is, "not at all" or "a little bit" (B) psychological, social, and school, that is, "never" or "sometimes" (C) speech distress, speech function, and eating & drinking, that is, "always" or "often." Items negatively reported by ≥20% of survivors in bold.

T A B L E 4 (Continued)
Our group published previously on a partially overlapping cohort, 13 and showed HNRMS survivors experienced negative disease-specific issues. In the current study, we further characterized these issues by using a questionnaire designed to measure facial appearance and function in addition to HRQOL. The FACE-Q Craniofacial module is the first PRO instrument designed for children and young adults to appraise their appearance rather measure appearance distress.
In general, the scores of survivors with clinically relevant AEs did not differ significantly on appearance, HRQOL, and facial function scales compared to those of survivors without these AEs. We only observed lower scores on a few specific scales for survivors with a speech abnormality, oral malfunction, and facial nerve paresis compared to the survivors without these problems. These findings suggest AE categorization by physicians does not account for patient perspective. Similar findings have also been observed in the adults cancer literature, with multiple studies reporting weak to moderate correlation between CTCAE grading and associated PROs. 12 These findings have led to the development of a patient language version of the CTCAE (CTCAE-PRO), 25 to complement the CTCAE and incorporate patient reporting of symptoms more systematically into research and decision making. The described weak correlation between physician reporting and PROs provides further support to the theories that claim factors other than the presence of a chronic condition affect the consequences of the condition on an individuals' psychosocial well-being. [26][27][28][29] Overall, HRQOL is lower in groups of people with a visible facial difference compared to groups without such a difference, but large individual variations exist. [30][31][32][33] These variations may be attributable to multiple psychological and social factors (i.e., personality, coping strategies, social support) 28,34-36 which warrant further investigation.
In our study, survivors with younger age (8-12 years) and shorter follow-up time (<10 years) scored significantly higher on appearance and HRQOL than older survivors and longer follow-up time. Similar findings were observed in a large international cohort of patients with cleft lip/palate, assessed with partly overlapping scales from the CLEFT-Q. 21 This age and time effect might be explained by the importance of appearance during different developmental stages. 29 In addition, in HNRMS survivors, facial deformity may aggravate over time with the growth of the facial bones. Some differences in scoring on appearance, HRQOL, and facial function scales were seen between survivors treated with different local treatment strategies. These differences should be interpreted cautiously because of differences in patient characteristics (Data Table S4), especially in terms of tumor site, attained age, and follow-up time. Besides that, the Paris-method is used in a specific subgroup of PM-site tumors with a worse prognosis and is aimed at improving survival. This might lead to a different definition of acceptable toxicity. Additionally, local treatment strategy is partly dependent on the country of treatment. Differences in scoring might reflect underlying differences in country-specific HRQOL.
Within our cohort. we did not find differences in subgroups based on gender, age at diagnosis, and laterality. Previous studies on HRQOL in childhood cancer survivors have described more negative scoring on emotional health for females compared to males, 37,38 and on worry and social function for patients with older age at diagnosis compared to younger age at diagnosis. 38 This difference with our results might be explained by the specific (instead of generic) HRQOL items included in the current study that do not address these general HRQOL domains.

| Strengths and limitations
We present an international cohort of HNRMS survivors with long follow-up. Our results on specific aspects of appearance, HRQOL, and facial function give a detailed description of the issues HNRMS survivors' experience.
An important limitation of the study is inherent to the population under investigation: patient numbers are small and cohorts heterogeneous. Therefore, the results are mainly exploratory and the analyses have limited power.
To date, normative values were not available for the FACE-Q Craniofacial module, which impairs interpretation of our results in reference to the general population. Ideally, our data would be compared to a general population control group or a childhood cancer survivor group in whom cancer treatment has not affected the head and neck area. The larger portion of our currently described cohort was used for a validation study which is in preparation for publication 39 and reference values are expected to follow from this. However, given the intended use to improve care for individual survivors, we do believe that the use of the FACE-Q Craniofacial module without existing normative values adds value in the clinical setting to address unmet medical needs by giving a clear insight in the specific problems the individual survivor experiences.
Once reference values become available, future research can use these to evaluate whether interventions (both psychological and/or surgical) initiated based on problems identified via de FACE-Q Craniofacial module helped to improve individual patients' outcomes. Furthermore, for the individual survivor, changes in scoring over time can be objectified. Important to take into account are the differences in patient and treatment characteristics between the participants and nonparticipants. The nonparticipants were more often treated with the Paris-method and had PM site tumors. The combination of these factors was unsurprising since the Paris-method is developed for PM site tumors. This method includes extensive surgical tumor resection and thereby introduces a risk of significant facial deformation. Because of this, a proportion of the objectively more severely affected children have not been included in the current study. However, only a minority of all international HNRMS patients are treated according to this method. The reasons for not participating was not documented as this is not a permitted question by most ethical boards.

| Clinical implications
Many survivors reported negatively on appearance, HRQOL, and facial function items. Relying on the physician-graded AEs is not enough to provide tailored care to the survivors because of the weak correlation between AEs and the majority of PRO scores. We recommend health care professionals to pay attention to issues on all three domains in every HNRMS survivor. The FACE-Q Craniofacial module can be used to obtain this goal. Training to help physicians use PROs in clinical care and how to discuss these with their patients is recommended in order to incorporate the patients' perspective next to objective measures of AEs. 40 The systematic use of questionnaires can be facilitated by the use of electronic portals such as the Dutch "Kwaliteit van Leven In Kaart" (KLIK) PROM portal. 41 In this portal, patients are asked to complete online PROs at home before a consultation. Scores are then converted into an individual electronic profile and discussed during the consultation. The use of PROs in clinical practice has been shown beneficial as it resulted in increased discussion of patient outcomes, enhanced patient-clinician communication, higher patient satisfaction, better HRQOL, and improved treatment outcomes. 42,43 Furthermore, children should be provided if possible with psychosocial interventions to empower them in coping with the consequences of their disease 44 We would recommend to add PRO assessment to outpatient clinic visits but no more than once a year, given the possible change in scoring over time dependent on the survivors age and development of the face and consequently facial function. Currently, in the Netherlands, all head and neck sarcoma survivors are invited to a multidisciplinary follow-up clinic every 2 years, at least until the age of 18 years and we will invite them to fill out the questionnaire during each visit.

| CONCLUSION
PRO scores for appearance, HRQOL, and facial function varied widely between HNRMS survivors, though many survivors reported negative consequences in all three domains. The presence of clinically relevant AEs as graded by physicians was weakly correlated with the majority of disease specific PRO scores. We therefore advise a systematic assessment of potential concerns from the patient perspective, such as by use of the FACE-Q Craniofacial module, in the care for every individual HNRMS survivor.

AUTHOR CONTRIBUTIONS
Marinka L.F. Hol contributed to conception and design; contributed to acquisition, analysis, and interpretation; drafted the manuscript gave final approval; and agreed to be accountable for all aspects. Michèle Morfouace contributed to analysis and interpretation; drafted the manuscript, and agreed to be accountable for all aspects. Reineke A. Schoot contributed to conception and design; contributed to acquisition, analysis, and interpretation; critically revised the manuscript; gave final approval; and agreed to be accountable for all aspects. Olga Slater contributed to acquisition, analysis, and interpretation; critically revised the manuscript; gave final approval; and agreed to be accountable for all aspects. Daniel J. Indelicato contributed to acquisition, analysis, and interpretation; critically revised the manuscript; gave final approval; and agreed to be accountable for all aspects. Frédéric Kolb contributed to acquisition, analysis, and interpretation; critically revised the manuscript; gave final approval; and agreed to be accountable for all aspects. Prof. Ludwig E. Smeele contributed to conception and design; contributed to acquisition, analysis, and interpretation; critically revised the manuscript; gave final approval; and agreed to be accountable for all aspects. Johannes H.M. Merks contributed to conception and design; critically revised the manuscript; gave final approval; and agreed to be accountable for all aspects. Charlene Rae contributed to conception and design; contributed to acquisition, analysis, and interpretation; critically revised the manuscript; gave final approval; and agreed to be accountable for all aspects. Heleen Maurice-Stam contributed to acquisition, analysis, and interpretation; critically revised the manuscript; gave final approval; and agreed to be accountable for all aspects. Anne F. Klassen contributed to conception and design; contributed to acquisition, analysis, and interpretation; critically revised the manuscript; gave final approval; and agreed to be accountable for all aspects. Martha A. Grootenhuis contributed to acquisition, analysis, and interpretation; critically revised the manuscript; gave final approval; and agreed to be accountable for all aspects.