Identifying monoclonal gammopathy of undetermined significance from electronic health records

Abstract Background Monoclonal gammopathy of undetermined significance (MGUS) precedes multiple myeloma (MM). Use of electronic health records may facilitate large‐scale epidemiologic research to elucidate risk factors for the progression of MGUS to MM or other lymphoid malignancies. Aims We evaluated the accuracy of an electronic health records‐based approach for identifying clinically diagnosed MGUS cases for inclusion in studies of patient outcomes/ progression risk. Methods and Results Data were retrieved from Kaiser Permanente Southern California's comprehensive electronic health records, which contain documentation of all outpatient and inpatient visits, laboratory tests, diagnosis codes and a cancer registry. We ascertained potential MGUS cases diagnosed between 2008 and 2014 using the presence of an MGUS ICD‐9 diagnosis code (273.1). We initially excluded those diagnosed with MM within 6 months after MGUS diagnosis, then subsequently those with any lymphoid malignancy diagnosis from 2007 to 2014. We reviewed medical charts for 100 randomly selected potential cases for evidence of a physician diagnosis of MGUS, which served as our gold standard for case confirmation. To assess sensitivity, we also investigated the presence of the ICD‐9 code in the records of 40 randomly selected and chart review‐confirmed MGUS cases among patients with a laboratory report of elevated circulating monoclonal (M‐) protein (a key test for MGUS diagnosis) and no subsequent lymphoid malignancy (as described above). The positive predictive value (PPV) for the ICD‐9 code was 98%. All MGUS cases confirmed by chart review also had confirmatory laboratory test results. Of the confirmed cases first identified via M‐protein test results, 88% also had the ICD‐9 diagnosis code. Conclusion The diagnosis code‐based approach has excellent PPV and likely high sensitivity for detecting clinically diagnosed MGUS. The generalizability of this approach outside an integrated healthcare system warrants further evaluation.

While population-based screening may be considered the gold standard for observational research, this approach requires the availability of archived biospecimens. For MGUS, the diagnosis is not clinically actionable at present, 1 and thus widespread clinical screening to detect MGUS is not justifiable. Moreover, laboratory assays are expensive and therefore may not be feasible to use for broad-scale screening to detect MGUS. Alternatively, manual medical chart review could be conducted to confirm MGUS diagnoses, but this approach is time-and labor-intensive and would be prohibitively expensive for use in large epidemiological studies.
Additionally, issues such as variation in reviewers' attention to detail can introduce ascertainment errors. 7 More recently, electronic algorithms have been developed to identify MGUS cases from electronic health records using diagnosis and utilization codes (e.g., for oncologist visit[s] and relevant lab tests without incorporating lab results). Studies of such algorithms have reported positive predictive values (PPV) between 76% and 88%. 8,9 To build on these efforts to facilitate large scale epidemiologic research of MGUS using electronic health records-in particular, studies of factors associated with risk of progression to malignancy or other outcomes-we evaluated the performance of an electronically searchable diagnosis code-based algorithm to identify patients with clinically diagnosed MGUS using electronic health records from a large integrated health care delivery system.

| MGUS case identification algorithm and eligibility criteria for chart review confirmation
The algorithm that we evaluated for identifying patients with clinically diagnosed MGUS had the following steps: i. We searched for patients with a first ICD-9 diagnosis code of 273.1 between 2008-2014 (e.g., the "index" ICD-9 code).
ii. We excluded those with a MM diagnosis within 6 months following the record of the index ICD-9 code. 10 iii. Of the potential MGUS cases identified by steps (i) and (ii), we further restricted to those with at least 1 year of continuous health plan membership prior to the date of the index ICD-9 code for the manual chart review confirmation (so that sufficient medical records would be available to confirm the MGUS diagnosis). iv. We then randomly sampled 100 individuals from the remaining sample of eligible putative MGUS cases for chart review.
The initial chart reviews revealed that some recorded electronic ICD-9 codes for MGUS corresponded to a work-up that led to diagnoses of other lymphoid malignancies (since M-protein may also be used to monitor disease status in patients with other lymphoid malignancies 10,11 ). We thus subsequently revised the case-identification algorithm outlined above to further restrict the sample of potential cases to those without evidence of other lymphoid malignancies from 2007 to 2014 and applied the same revision to the randomly selected subsample.
When developing the case-identification algorithm, we had considered developing a second algorithm based on records reporting serum M-protein and immunofixation test results indicative of MGUS.
We found that while serum M-protein results can be queried as a discrete data field in Kaiser Permanente Southern California's electronic health records, they are sometimes not quantifiable, hindering their interpretation to determine the presence or absence of MGUS or a more advanced condition. 2 Further, immunofixation results exist only as free text and thus cannot be readily queried without language processing tools that were not available to the project. Given these limitations, we could not develop a comprehensive case-identification algorithm based on laboratory results. Nonetheless, we used the initial M-protein-based efforts to identify a separate sample of plan members with clinician-diagnosed MGUS in whom we could assess the sensitivity of the ICD-9 diagnosis-code based approach, as described below.
2.3 | Chart review confirmation for clinically diagnosed MGUS among individuals with an ICD-9 diagnosis code for MGUS We manually reviewed medical chart notes within (±) 6 months of the first recorded ICD-9 code (273.1) for documentation of a physician diagnosis to confirm clinically diagnosed MGUS for the randomly selected putative cases. For any unconfirmed cases, we further conducted review of the entire medical history to understand potential reasons for the inaccuracies. All reviews and confirmation dispositions were verified by a second chart reviewer. Because our purpose was to validate the diagnosis code-based algorithm for identifying patients with clinically diagnosed MGUS (rather than to ascertain all diagnosed and undiagnosed MGUS in the Kaiser Permanente Southern California population or to determine the accuracy of the physician diagnosis against standard diagnostic criteria for MGUS 2 ), a physician diagnosis of MGUS in the chart notes was considered the gold standard for confirmation.
During chart review, we collected information on relevant test results, including serum or urine M-protein, immunofixation and free light chain tests when available, and on the presence of clinical signs of end organ damage that contribute to a diagnosis of full-blown MM, such as hypercalcemia, renal failure, anemia and bone lesions.
The latter information was not always documented, and when present, the underlying conditions leading to the associated form of end organ damage were often not specified (e.g., renal failure could be due to long-term diabetes rather than to MM or other malignancy).
These challenges supported our decision to rely on evidence of a physician diagnosis as the gold standard for confirming clinically diagnosed MGUS in plan members with the corresponding ICD-9 code rather than relying on reported clinical symptoms. We initially sought to determine the timing of the physician diagnosis. However, as the chart notes often had insufficient documentation of this timing, it was rarely possible to distinguish patients with newly diagnosed MGUS from those with a history of MGUS that predated our study period.
To address potential misclassification between MGUS and smoldering MM, we specifically searched the chart notes to capture potential smoldering MM diagnoses for those without a quantifiable M-protein value (e.g., whose smoldering MM would have remained undetected by a review of M-protein test results). We also searched for and conducted chart review to confirm an electronic ICD-9 code for MM diagnosis (203.0) within 2 years after the index date among all confirmed MGUS cases, as misclassified smoldering MM cases would have a higher probability than true MGUS cases of progressing that quickly after the MGUS diagnosis. 1,12 2.4 | Estimation of the sensitivity of the ICD-9-based MGUS case identification algorithm To estimate the sensitivity of the electronic ICD-9 code-based case-identification algorithm, we used the same process described 2.5 | Hematologist adjudication to explore the accuracy of the clinician diagnosis of MGUS As an exploratory exercise to assess the accuracy of the MGUS diagnosis made by physicians in the chart notes in comparison to current MGUS diagnostic criteria, 2 two hematologists (co-authors Hoda Pourhassan and Scott Goldsmith) independently adjudicated 10 randomly selected chart review confirmed MGUS cases. This chart review process was a separate exercise from the steps described above to confirm the index ICD-9 diagnosis code for MGUS via medical chart review. The two hematologists reviewed relevant clinical information available within 6 months of the index MGUS diagnosis code and provided an assessment of their certainty of the presence of MGUS by designating the putative MGUS case as "definite, probable, possible, no evidence of MGUS, or unable to determine." They also provided notes articulating the rationales for their assessments. Discrepancy between the two hematologists was resolved by discussion. A priori, we considered the cases adjudicated as "definite" and "probable" MGUS as confirmed cases and those with "possible," "no evidence of MGUS," or "unable to determine" as unconfirmed cases according to current diagnostic criteria.

| Statistical analysis
The distributions of demographic characteristics (age, sex, race/ ethnicity) and the Charlson comorbidity index were obtained for the subsample of 100 ICD-9 algorithm-identified potential cases randomly selected for chart review. We also utilized the chart review findings to calculate the PPV for this subsample to inform the probability that an MGUS patient identified by the electronic record ICD-9 code-based algorithm truly had a physician diagnosis of MGUS (the gold standard for case confirmation for this project). Specifically, among the subsample of ICD-9 algorithm-identified putative MGUS cases subjected to chart review, the PPV was calculated as:  Figure 1). These 100 individuals were randomly selected from all putative MGUS cases identified by the original algorithm, that is, as health plan members with a first ("index") ICD-9 code in the electronic record between 2008 and 2014, no subsequent diagnosis of MM (within 6 months after the index ICD-9 code date), and at least 1 year of continuous plan membership prior to the index ICD-9 code.

| Chart review findings on the confirmed clinically diagnosed MGUS cases
Sensitivity ¼ #of putative MGUS cases confirmed by chart review to be clinically diagnosed and who had an electronically recorded ICD ð À 9 code of 273:1 before or within 1year after the M

| Sensitivity estimation
In the analyses to estimate the sensitivity of the ICD-9-based algorithm, 40 of the 54 potential MGUS cases initially identified by an eligible M-protein test results were confirmed to be true clinically diagnosed MGUS cases by manual chart review. Of those 40 confirmed clinically diagnosed MGUS cases, 35 had an ICD-9 diagnosis code before or within 1 year after the index M-protein test, corresponding to an estimated sensitivity of 87.5% (Table 3).
F I G U R E 1 Flowchart for ICD-9 code-based case identification and chart review confirmation  c We defined confirmed cases as those with evidence in the medical chart of a physician diagnosis of MGUS. d The PPV suggests that $98% of health plan members identified by the algorithm summarized above truly had a physician diagnosis of MGUS (our "gold standard" for defining a confirmed case of MGUS).

| Hematologist adjudication findings
The adjudication of the 10 hematologist-reviewed cases concluded 5 cases as "probable." The remaining 5 were designated as "possible" due to the lack of complete information on the presence or absence of end organ damage (e.g., hypercalcemia, renal failure, anemia, and bone lesions) and/or the lack of findings from a bone marrow study.
The unavailability of those details in the medical charts prevented the hematologists from ruling out a more advanced diagnosis than MGUS, for example, smoldering MM or MM, for the "possible" cases. (The hematologists did not note any findings that refuted the presence of either MGUS, smoldering MM or MM in any of the 10 charts they reviewed.)

| DISCUSSION
We confirmed an algorithmic approach that can be used to efficiently and accurately identify clinically diagnosed prevalent MGUS cases for population-based research to study outcomes of MGUS and factors associated with progression to malignancy. Our results suggest that the diagnosis code-based algorithm has excellent PPV and likely also satisfactory sensitivity for ascertaining individuals with clinically diagnosed MGUS.
Our findings are aligned with results of a Danish study that used ICD diagnosis codes to identify patients with MGUS who were subsequently confirmed with chart reviews. 9 That study reported an initial PPV of 82.3% but subsequently applied additional exclusion criteria to eliminate cases diagnosed with malignant monoclonal gammopathy prior to or within 1 year of the MGUS diagnosis. This change resulted in a PPV improvement of approximately 4 percentage points.
Similarly, when we expanded our exclusion criteria to include other lymphoid malignancies, the PPV of our diagnosis code approach increased from 90% to 97.8%.
More recently, an algorithm was developed to identify MGUS cases using electronic health records from a large, community-based healthcare group. 8,13 The criteria required at least two MGUS diagnosis codes entered on different dates in a 12-month period, as well as at least one serum protein electrophoresis or immunofixation test proportion (potentially the vast majority) of clinically diagnosed MGUS patients at any given point in time. As an additional limitation, we note that our study period ended prior to the transition from ICD-9 to ICD-10 code use, which occurred in 2015. Potential differences in relevant code(s) in the ICD-10 system should be considered when applying the ICD-based algorithm to more recent years, as should the incorpora-