Quality assessment of patient‐reported outcome measures for patients with multiple ear complaints

There is an increased demand for well‐validated PROMs in otology. This study will systematically assess the methodological quality of all published patient‐reported outcome measures (PROMS) for patients with multiple ear complaints and to identify the best suitable PROM for use by clinicians treating patients with multiple ear complaints.


| INTRODUCTION
The high prevalence of ear complaints, the call for shared decision making, and the need to evaluate and improve treatment modalities in a way to improve health care have contributed to a rising interest in patient-reported outcome measures (PROMs).PROMs can be used as an essential tool next to clinical outcomes and elemental in Value-Based Health Care, as outcomes 'that matters most to the patient' are valued to the costs to reach these outcomes.
PROMS can be generic or specific.They can be developed to obtain information on a specific complaint, disability or disease, perceived quality of life (QoL), and may be developed for a specific population (e.g., adults or children).A Generic PROM can be used to measure pain, depression, fatigue or anxiety for example.
Successfull applicability of a PROM is dependent on the type and quality of the questionnaire.In otology, a questionnaire that focuses on multiple complaints can be domain-or disease-specific PROMs.This is thought to be important, as many patients with ear diseases have more than one complaint and the variety of diagnoses is wide.
Most patients with hearing loss have tinnitus, patients with otorrhoea can have otalgia, but also experience dizziness and/or itch.
The patient perspective is a subjective outcome and does not necessarily correlate with other clinical outcomes.For examples, audiometric results vs. perceived hearing disability.This is explained by the fact that the impact of disease is not only the direct result of symptoms, but it also can be heavily affected by accompanying personal cognitive and emotional and environmental factors. 1 It can be challenging for the clinician to choose which PROM to use in daily practice.Questionnaire selection is often guided by prior experiences or by copying from the work of peers and the most suitable questionnaire for the patient in its specific situation is not always selected. 2 The objective of this study is to identify and systematically assess all validated closed-ended multiple complaint questionnaires in the literature and published in the English language for adults covering more than one ear complaints.By doing so, the secondary objective is to create a comprehensive overview of questionnaires and their clinimetric assessment.This review will give a valuable addition to the current literature and will facilitate the selection of questionnaires by caregivers.

| METHODS & MATERIALS
[5][6][7][8] An extensive systematic mapping review on all otology questionnaires up until August 26, 2019 was conducted by Viergever et al. 9 The search for this review was repeated on the 28th of April 2021 and conducted to identify English-language, peer-reviewed journal articles published between 2000 and the April 2021.

| Search and article selection
The inclusion and exclusion criteria used for the assessment of questionnaire eligibility are listed in Table 1.We specifically wanted to evaluate potential ear domain-specific questionnaires that address more than one ear complaint in adult patients, that are not reported by proxy respondents.

| Assessment of methodological quality of included studies
The assessment of the questionnaires was done by the first (Jeroen T. Kraak) and last (Donald L. Patrick) author independently.One questionnaire at a time was assessed.All differences in assessment were discussed in detail after which agreement was reached in all cases.
The COSMIN methodology for systematic reviews of Patient-Reported Outcome Measures and Risk of bias checklist were used to assess the methodological quality of each study.Studies were stratified as having very good, adequate, doubtful or inadequate methodological quality.If there was no data available about an item or if it was not performed, we assessed it as not available (NA).
During this assessment, a new COSMIN guideline was published in 2021. 10This Reporting Guideline was developed for research groups reporting on studies that evaluate the measurement properties

Key points
This work will facilitate choosing the best ear questionnaire.
If the implementation of a PROM is considered the authors advise to have a critical look at the design of the questionnaire.Modifications of earlier versions of PROMS or combinations of multiple questionnaires lead to ongoing (crosscultural) validation of questionnaires albeit the potential mediocre design and validation.Most disease-specific questionnaires cover Chronic Otitis Media.The two ear domainspecific questionnaires are the COQOL and OQUA, both with adequate quality but different focus.
T A B L E 1 Inclusion and exclusion criteria used for the assessment of questionnaire eligibility.

| RESULTS
The systematic mapping review on all otology questionnaires was con- The mapping review and the repeated search thus revealed 16 multiple complaint questionnaires.Four out of these 16 questionnaires were not eligible for methodological assessment and were excluded.The CEDRA 11 and SOFI 12 questionnaires were excluded as they are not PROMs for complaint evaluation but were developed as a risk assessment.The dizziness symptom profile 13 (DSP) was excluded because of difference in the overall design.The objective of the DSP is to generate a differential diagnosis and not to evaluate complaints over time.The Dizziness, tinnitus and taste disturbances questionnaire 14 was excluded since this was developed for persons with a cochlear implant.
The development paper of the included COMOT-15 15 was in German language, and although a non-English development paper was an exclusion criterion, this questionnaire was included.This inclusion was deemed necessary to assess the ZCMEI-21 which is developed based on de COMOT-15 and CES.
T A B L E 2 Basic PROM characteristics.The Stapesplasty Outcome Test (SPOT-25) was found by reference checking, but excluded as It was not validated in English literature. 16e SPOT-25 is a disease-specific HRQOL PROM questionnaire for patients undergoing stapes surgery for otosclerosis.
After this large mapping review eventually 12 questionnaires were eligible for quality assessment.
The included questionnaires and their basic characteristics are listed in Table 2.4][25] The other two questionnaires were already designed to be ear domain-specific and considered as potentially useful for any patient with one or more ear complaints presenting at the ENT surgeon.All but one questionnaires were developed, translated or crosscultural validated in the English language.The number of items in the questionnaires varies between 7 (EDTQ-7) 17 and 68 items (Gopen-Yang Superior Semi-circular Canal Dehiscence Questionnaire). 22e number of subscales of the questionnaires varied between three and nine.Most of the subscales were based on the different complaints, but impact, use of medical resources, and a more general subscale were mentioned as well.

| PROM design
Total PROM design was assessed as 'inadequate' in eight out of twelve questionnaires.The most frequent reason for this unsatisfactory result was the development of the questions without consulting the patients affected by the complaint or disease.The other four questionnaires scored an adequate score and are the COQOL, 26 ETDQ-7, 17 OQUA 27 and ZCMEI. 24The ETDQ-7 and ZCMEI 24 are disease-specific questionnaires whereas the other two are more domain-specific otology questionnaires.
The complete assessment leading to these scores is listed in Data S1.

| Content validity
Content Validity scores are listed in Table 3.The GYSSCDQ 22   questionnaire is designed based on a reflective model.There are no included studies describing a confirmatory factor analysis except for the OQUA. 27This study is multidimensional and has a CFA for the impact questions only, they describe an EFA for the rest of the questionnaire of which the design model is formative.The only other study that elaborates on the EFA is the PAN-QOL. 18The COMQ-12 25 performed a principle component analysis (PCA), EFA and PCA are both variable reduction techniques but differ in their purpose.The structural validity of the COMQ-12 using the PCA was scored 'very good'.
Internal consistency was more frequently performed than structural validity.Five questionnaires scored a 'Very good' with a Cronbach's Alpha >0.7.Six questionnaires scored 'Acceptable' and one 'Inadequate'.
Five questionnaires have their questions translated into one or more languages.The method of translation is often described and acceptable.The studies discuss the validity of the questionnaire in a different population and good or valid, but none of the 'translations' has actually done interviews with patients to see whether adaptations needed to be made to the original questionnaire based on relevance, comprehensiveness or comprehensibility.
None of the studies developed their questionnaire using an Item Response Theory (IRT) model.

| Remaining properties
Measurement error is part of reliability and important order to interpret the data correctly.Repeated measurements may display variation arising from several sources leading to measurement.A Standard Error of Measurement (SEM), Smallest Detectable Change (SDC) or Limits of Agreement (LoA) was not calculated in any of the studies.Although not mentioning a SEM, SDC or LoA four studies had an 'Acceptable' measurement error.Eight studies had 'Inadequate' measurement error reporting.
Test-retest is the second part of reliability.Test-retest period varies greatly between the questionnaires ranging from 1 to 2 days up to 6 months.Eight out of 12 studies mentioned their Cronbach's alpha.Cronbach's alpha varied between 0.761 and 0.91.

| Interpretability and feasibility
Distribution of scores and subscores and change scores of all relevant (sub)groups were often mentioned in the articles addressing the PROM.
The other items of interpretability were often lacking.In specific, a minimal important change or difference was often not calculated or mentioned.Considering these limitations interpretability was scored 'doubtful' in five questionnaires, 'adequate' in six questionnaires and 'very good' in one questionnaire.None of the questionnaires scored 'inadequate'.
For feasibility assessment type and ease of administration and length of the instrument and completion time were best assessed in an objective manner.There was very limited information or no mention of the feasibility items.Feasibility was scored 'doubtful' in three questionnaires, 'adequate' in eight questionnaires and 'very good' in one questionnaire.None of the questionnaires scored 'inadequate'.

| DISCUSSION
This article provides ENT surgeons with a more comprehensive understanding of the available questionnaires, thus aiding them in selecting the most suitable ear domain-specific questionnaire for their specific clinical context or for integration into their healthcare facility.This is to give a wide view of the complaints prior and post-intervention.
Among the questionnaires considered, 10 were tailored to specific diseases, focusing on chronic otitis media, vestibular disorders and vestibular Schwannoma, while two were designed to address earrelated issues more broadly (OQUA and COQOL).This finding may be attributed to ENT surgeons' preference for disease-specific questionnaires, which tend to be less time-consuming, more relevant to the patient and exhibit higher sensitivity.However, it is crucial not to underestimate the potential advantages of a wider view of the complaints, as ear complaints almost never are 'stand-alone', employing a more domain-specific questionnaire.In an era marked by escalating healthcare demands and the necessity for cost-effective interventions based on PROMs, the utilisation of a single, domain-specific questionnaire could yield substantial benefits, even within budget constraints.
The OQUA and COQOL can be used in an ear-specific manner, as opposed to single patient level, this can be of additional value, particularly in research.
Established principles for good PROM design include mentioned well-documented validity, reliability and responsiveness.In the context of domain specific ear PROMs, additional considerations become paramount.Patient-centred care is becoming increasingly important, recognising that the role of surgery in ear diseases may not always be straightforward.These PROMs should adeptly capture the patient's unique perspective and experience, empowering ENT specialists to customise treatments based on individual patient needs.
Concept elicitation in PROMs refers to the process of systematically gathering information directly from patients about their experiences, symptoms and perceptions related to a specific health condition or treatment.By actively involving patients in the development of the questionnaire, researchers can ensure that the questionnaire items accurately reflect the patients' experiences and concerns.The majority of multiple complaint questionnaires assessed in this article lacked good design with concept elicitation and patient involvement, resulting in ongoing (crosscultural) validation of questionnaires.This might be a result of the absence of clear guidelines for the development of PROMs, coupled with a prevalent missing focus on aligning patient perspectives, functionality and the associated burden from a medical standpoint.
The COSMIN reporting guideline is an important step towards being able to assess questionnaires on a higher qualitative level. 10The absence of detailed reporting, such as whether appropriate interview guides are used, whether interviews are recorded and transcribed verbatim, and the limited number of patients included in interviews or pilot testing, along with the absence of quality parameters for the interviews, may raise concerns about the reliability of the results.In essence, the overall assessment score for a given measurement property was derived by considering the lowest score among all the items, using the 'worst score counts' method.It's worth noting that more favourable scores on these items might have been attainable had more comprehensive information been provided.inadequate reporting of details is a common shortcoming observed in many studies.
While interpretability and feasibility are not considered formal measurement properties because they not refer the quality of an instrument they do play a pivotal role in the successful implementation of a PROM. 28Scores and change scores and Minimal Important change (MIC) should be available for relevant (sub)groups (e.g., for normative groups, subgroups of patients or the general population).
The interpretability often receives insufficient attention was the conclusion of the COSMIN panel in the COSMIN taxonomy 6 in 2010.
Furthermore from the standpoint of a practicing clinician, a questionnaire should be quick to administer, concise, user-friendly and cost-effective, especially in a busy clinical setting.Regrettably, many studies in this article have not placed sufficient emphasis on addressing interpretability and feasibility concerns.
Although the current standard of questionnaire development according to recent guidelines is high.The current level for applying PROMS to individuals though might be considered higher than on the group level.Individual assessments require high measurement precision and reliability.Few studies focus on individual patients and report group level.Multiple factors should be taken into consideration when assessing the outcome of the PROM. 28,29Group-averaged comparisons cancel out measurement error but individual PROMS do not. 30e issue of applying PROMs initially developed and validated in selected populations to more diverse or mixed populations, is a critical concern in healthcare research and clinical practice.PROMs validated in highly selected populations, may not be generalizable to broader or mixed populations.Mixed populations can be highly heterogeneous in terms of disease severity, comorbidities and patient demographics.
In cases where a PROM developed in a highly selected population needs to be applied to a mixed population, it may be necessary to adapt and revalidate the instrument to ensure its suitability and reliability across diverse groups.Researchers and healthcare providers should continuously assess the performance of PROMs in mixed populations.

| Strengths and limitations
While it is acknowledged that the assessment inherently involves a degree of subjectivity, the authors made diligent efforts to safeguard against any form of bias.This was accomplished by adhering to well-defined criteria, specifically the COSMIN criteria, which had been collectively agreed upon prior to the initiation of the assessment process.This rigorous approach underscores the authors' commitment to maintaining the integrity and impartiality of the evaluation.
Given that the Otology Questionnaire Amsterdam (OQUA) was developed collaboratively by the first and second authors (Jeroen T. Kraak and Paul Merkus), a conscious effort was made to mitigate any potential conflicts of interest during the assessment process.To achieve this, the evaluation of the OQUA was entrusted to the final author (Donald L. Patrick).It is important to note that the inclusion of the final author (Donald L. Patrick) on the research team occurred prior to the commencement of the questionnaire assessments.This strategic decision was taken to minimise the risk of bias that might have arisen.

| Recommendations & future work
Assessment of additional otological questionnaires on tinnitus, hearing loss e.g. can be provided using the mapping review 9 as a backbone.
This will help to further untangle the puzzling collection of different disease-and symptom-specific questionnaires.
The two identified domain-specific questionnaires are the COQOL and OQUA, both with adequate quality but different focus.
COQOL to quantify the quality of life and OQUA to measure and evaluate the severity and impact of ear complaints separately.They both need further work on interpretation and cross-cultural.
None of the studies had an Item Response Theory (IRT) model.
An IRT model contains all the questions in the PROM, but based on the previous answer(s), the model establishes a link and selects questions that are relevant for patients. 31An IRT would give a more precise value/score of a PROM and is potentially less time-consuming for the patient.In order to set up an IRT model, however, data of thousands of patients are needed for calibration making this a high hurdle.

| CONCLUSION
Existing PROMs for patients with multiple ear complaints were evaluated according to the COSMIN criteria.Two types of multiple complaint questionnaires were identified and assessed; disease-specific and ear domain-specific.Depending on the certain need the presented work can facilitate the process of selecting the most suitable questionnaire for addressing specific issues and integration into healthcare facilities.
If implementation of a PROM is considered, the authors advise to have a critical look at the design of the questionnaire.As it is of major importance, that besides proper validation, reliability and responsiveness, a PROM is actually measuring the problem you want to measure.
For the majority of questionnaires, the quality assessment was inadequate as only a few questionnaires were developed after build- is based on other validated questionnaires, for this reason, relevance, comprehensiveness and comprehensibility were adapted from 'inadequate' to doubtful in the assessment of content validity.The COMQ-12 and ZCMEI-21 are also based on other questionnaires (CES and COMOT-15), but since the content validity of these questionnaires is inadequate, the assessment of ZCMEI-21 and COMQ-12 was also inadequate.Six studies performed a pilot test and made adaptations to their first version of the questionnaire.Six questionnaires did not perform a pilot test.This often resulted in using their first and only version of the questionnaire in the study population.The complete assessment leading to these scores is listed in Data S2.
ing a construct based on a broad consultation with patients in the target population (concept elicitation).Modifications of earlier versions of PROMS or combinations of multiple questionnaires lead to ongoing (cross-cultural) validation of these questionnaires albeit mediocre design and validation.Most disease-specific questionnaires cover Chronic Otitis Media and some are modifications of earlier versions or combinations of multiple questionnaires.The two domain-specific questionnaires are the COQOL and OQUA, both with adequate quality but different focus.COQOL to quantify the quality of life and OQUA to measure and evaluate the severity and impact of ear complaints.AUTHOR CONTRIBUTIONS Jeroen T. Kraak has made substantial contributions to the concept and design of the work, assessments of the PROMS, analysis of the data, interpretation of data, and has drafted the work.Donald L. Patrick has made substantial contributions to the assessment of the PROMS, interpretation of data and has substantially revised the work.Paul Merkus has made substantial contributions to the concept and design of the work and substantially revised the work.All authors agree to be accountable for all aspects of work.
Content validity assessment.
3.3 | Internal structureStructural validity is a parameter that can only be performed if the PROM is based on a reflective model.Most of the studies do not elaborate on this.If a formative model is used or if the model is not mentioned, then an exploratory factor analysis (EFA) is best applicable.A confirmatory factor analysis (CFA) can thus only be performed if theT A B L E 3Note: Ratings were scored as 'very good' (V), 'adequate' (A), 'doubtful' (D) and 'inadequate' (I). a Since this questionnaire was developed from another (validated) questionnaire, the assessment in absence of an interview was graded doubtful instead of inadequate.