Quality control of uroflowmetry and urodynamic data from two large multicenter studies of male lower urinary tract symptoms

The International Continence Society (ICS) has standardized quality control and interpretation of uroflowmetry and urodynamics. We evaluated traces from two large studies of male lower urinary tract symptoms (UPSTREAM and UNBLOCS) against ICS standards of urodynamic equipment and practice.

Inconsistent documentation of service records mean equipment accuracy is uncertain.

K E Y W O R D S
LUTS, overactive bladder, standards, urodynamics, uroflowmetry

| INTRODUCTION
Lower urinary tract symptoms (LUTS) in men are a common reason for referral to urologists, leading to consideration of surgery for benign prostatic obstruction (BPO). The diagnostic assessments for evaluating male LUTS are clearly set out in several good quality guidelines, 1 based on a history and examination, symptom score, urinalysis, bladder diary, and uroflowmetry. 2 Uroflowmetry is intended to measure maximum flow rate (Q max ), voided volume (VV) and postvoid residual (PVR), with an assessment of flow pattern to give some initial impression of potential causative processes explaining LUTS. 3 Additional testing in the form of urodynamic studies (UDS) may detect detrusor overactivity (DO) during bladder filling, and whether bladder outlet obstruction (BOO) or detrusor underactivity (DU) is responsible for the men's voiding symptoms. Both the American Urological Association 4 and the European Association of Urology 2 Guidelines indicate that UDS are an optional part of assessment, but state that only pressure-flow studies (PFS) are able to differentiate DU from BPO as the cause of the men's symptoms. Both uroflowmetry and UDS facilitate therapy decisions in men with LUTS, and this is especially significant when considering surgical intervention. Diagnostic assessment is mainly designed to focus therapy on causative mechanisms and therefore try to minimize the risk of "unnecessary surgery," or to predict which men may have suboptimal outcomes if surgery is performed.
Any staff undertaking diagnostic tests need to have adequate training and experience in ensuring that testing is done to appropriate quality standards and understanding the implications of results for decision-making. The International Continence Society (ICS) working groups have proposed the current approaches to standardization, quality control, and interpretation of uroflowmetry and UDS. 3,[5][6][7][8] There are several key aspects with implications for test results: 1. Adherence to the ICS good urodynamic practices 5,6 ensures standardization so that tests reliably derive the main observations and results can be compared. 2. Monitoring the technical aspects in real time during the test helps ensure prompt intervention to reduce the incidence of artifacts. 9,10 3. Post-test processing is needed to ensure that key metrics to derive urodynamic observations are not taken at the time of an artifact and to calculate indices. 4. Sometimes a test is identified as unrepresentative of day-to-day life, notably as a result of patient anxiety or the artificial nature of the testing situation. 5. A department doing clinical testing regularly needs to ensure that the equipment measures values reliably, undertaking departmental calibration checks every 10 tests. 11 6. Annual maintenance and checks of system performance should also be undertaken. 7 These are complex issues requiring experience to achieve UDS testing in line with the required standards. Nonetheless, a wide range of staff come into contact with patients during the male LUTS assessment pathway, including doctors and nurses at various career stages. Thus, each department needs to ensure adequate training and scrutiny of performance.
UPSTREAM 12 and UNBLOCS 13 are two large UKbased studies set in 30 different urology departments, studying the therapeutic pathway for male LUTS. UP-STREAM is a randomized controlled trial assessing urodynamic testing for the diagnosis and management of BOO in men. UNBLOCS is a randomized controlled trial to determine the clinical and cost effectiveness of thulium laser vaporesection of the prostate vs standard transurethral resection. The aim of this paper is to assess the quality of a sample of the uroflowmetry and UDS data from the two studies.

| METHODS
Uroflowmetry was undertaken in both UNBLOCS and UPSTREAM trials, with UDS testing also undertaken in the UPSTREAM trial. Two sites overlapped between the two trials. Ten percent of uroflowmetry (30 sites) and urodynamics traces (26 sites) were selected at random from sites. A data capture template was designed from the ICS Fundamentals of Urodynamic Practice checklist 7 to examine parameters that could be identified from source traces and data (Table 1). Two pretrained blinded assessors (MA, JJ) independently extracted the data for each trace. Where there was disagreement, a third assessor (MJD) arbitrated the conclusion. Trace scrutiny evaluated the presence of recognized urodynamic features and artifacts 9,10 and adherence to the ICS good urodynamic practices 5,6 ; nonadherence was categorized as an error of standardization. Apparent failure to ensure real-time monitoring and correction of artifacts was categorized as an error of technique. Post-test processing was checked to ensure that urodynamic observations were not taken at the time of an artifact, and to validate the derived indices (BOO Index [BOOI] and Bladder Contractility Index [BCI] 14 ). Inaccuracy at this stage was categorized as an error of interpretation. Where applicable, patterns of flow rate were categorized as normal, intermittent, compressive (classically BPO), or constrictive (typically urethral stricture) as specified in the ICS good urodynamic practices documents. 5,6 In addition, an "indeterminate" category was included, where a flow pattern could not be attributed to an established category.
In addition, each participating research site in the UPSTREAM trial was asked to provide routine departmental records of regular calibration checks to evaluate against ICS standards on urodynamic equipment. 11 These were examined to look for evidence of the calibration of the flow meters (volume and flow rate measurement) and the pressure transducers used to measure vesical (P ves ) and abdominal pressure (P abd ). Copies of annual equipment T A B L E 1 Quality and interpretation recommendations (derived from Gammie and Drake 3  Documented calibration checks Ensure calibration is checked and documented regularly Equipment maintenance records Ensure equipment maintenance is carried out and recorded maintenance records were also requested. The anonymized database of UDS trace reviews and calibration/maintenance records is given in Supporting Information Material S1.

| Equipment calibration and maintenance
Twenty-five of twenty-six UPSTREAM sites (96%) responded to a review of departmental calibration and annual maintenance records. The calibration of flow meter volumes was undertaken by 12 out of 25 (48%). Calibration of flow meters and pressure transducers was catalogued in 4 out of 25 (16%) and 8 out of 25 (32%), respectively. The method of transducer calibration was with a water column by 4 out of 8 (50%), and reliance on internal equipment "self-calibration" in the rest. Where undertaken, 66% of sites undertook them monthly or more often. Seven out of twenty-five (28%) urodynamic departments reported no calibration checks. In terms of urodynamic equipment maintenance, 19 out of 25 (76%) departments held an on-site annual service record. A further two sites (8%) reported an annual service contract with records held externally by the manufacturer of the urodynamic equipment. Four sites (16%) were unable to provide confirmation of annual servicing, of which two also did no regular department equipment calibration checks.

| Uroflowmetry testing
Three hundred thirteen uroflowmetry traces selected at random were reviewed from 30 sites. Of these, 14 traces from 10 centers (4.5%) could not be analyzed, as the patient was unable to do a flow test or did not comply with test requirements (10 of these had a bladder diary showing the largest void volume was above 150 mL). Accordingly, 299 traces were used for analysis. These came from 236 patients, as 48 patients from 15 sites did two flows, and another 15 from four sites did more than two flows. The uroflowmetry traces did not display the volume or maximum flow rate (Q max ) values in 47 out of 299 (15.7%) from 17 sites. Voided volume was missing in 11 out of 299 (3.7%), Q max in 12 (4.0%), and PVR in 43 (14.4%). Flow pattern was categorized as normal in 55 out of 296 (18.6%), indeterminate in 49 (16.6%), intermittent in 3 (1.0%), compressive in 163 (55.1%), or constrictive in 5 (1.7%), and 21 were considered uninterpretable (7.1%).
In 32 out of 296 traces (10.8%), the Q max information was affected by an artifact. This took the form of an interruption of flow followed by a spiked increase of flow rate in 13, a spike without preceding interruption in 5, and suspected Valsalva or strain in 14. In only 2 out of 32 was a correction applied after the study to derive Q max data from a part of the trace unaffected by an artifact.
Consistency of information between the source trace and the test report was checked in 289 traces. Errors in values on the test report were identified in 90 (31%) traces from 14 sites, with each 3 of the 14 sites responsible for 10 or more consistency errors. The errors affected documentation of VV in 26 out of 90 (28.9%), Q max in 23 (25.6%), and PVR in 41 (45.6%).

| Urodynamics review
One hundred twenty-three traces were provided from 26 sites, of which 13 were excluded (1 due to poor quality reproduction, 1 because the test was not done, 7 because only a typed report was provided, and 4 for unspecified/ unknown reasons), giving an analysis set of 110 traces. In these, some features of the filling cystometry traces were not clearly reproduced for 11 traces. Features of the PFS could not be evaluated in three traces, including two patients who were unable to void during the test.
Three particular errors of standardization were identified: 1. Wrong display sequence of traces. In 11 records (10%), the pressure traces of key data (P ves or P det ) were incomplete as high values went off-scale, as a result of being placed too high in the display sequence. 2. Failure to set atmospheric pressure as the zero reference ( Figure 1). In 30 out of 110 (27.2%) traces from 14 sites, the zero-reference pressure was not set to atmospheric pressure; instead, the "zero all" software command was given when transducers were recording from the patient. 3. Annotations to indicate aspects of the test were often not provided. For example, the position in which the test was run (seated or standing) was not marked in 90 (81.8%), and markers indicating provocation tests were not present in 98 (89.1%). Normal and abnormal subjective sensation markers (at least one of the first sensation of filling, normal desire to void, strong desire to void, or urgency) were generally included; they were missing in six (from four 3 sites, with three from one site). At the start of the PFS, permission to void (indicated by markers such as "start void"/"voiding"/ "PFS") was not marked in 15 out of 110 (13.6%) from six sites.
Several errors of technique were identified in a high proportion of cases: 1. Unreliable pressure recording (Figure 1) (Figure 1), which was not dealt with before voiding, was seen in 1 case (bad pressure transmission of P abd ); and the cough test was apparent and showed good subtraction in 61 (55.5%). Cough testing after voiding was done in 79 (71.2%), but the resulting cough spikes were significantly different (≥30% height discrepancy between the respective spikes in the abdominal and vesical pressure traces) in 13 out of 79 (16.5%), resulting in large subtraction artifacts on P det . Both the pre-and post-voiding cough tests were missing in 24; for another 16 studies, it was missing in 1 and inaccurate in the other (without action taken to correct the poor quality).

| Interpretation of urodynamics traces
PFS traces with derived voiding parameters (BOOI and BCI) were available for 107 patients (97.3%): central review diagnosis was assumed to be the correct calculation from the raw data. For diagnosis of BOO, there was an agreement between sites and central review for 71% of traces where BOO was present (BOOI > 40), 79% where it was equivocal (BOOI, 20-40), and 42% of men who were unobstructed (BOOI < 20) ( Table 2). Particularly notable was the group of 19 patients for whom central review was unable to derive a BOOI value, although sites gave a diagnosis for 15 of them, among whom 10 were given a diagnosis of BOO. The level of agreement was the lowest for the diagnosis of DU; being only 52% where DU was present (BCI < 100) and 31% where it was absent (BCI > 100). For 21 patients, central review was unable to derive a BCI value, for whom sites gave a diagnosis for 4 (2 of these being given a diagnosis of BOO). Data about the urodynamic observation of DO was available for 99 traces (90.0%). Fewer traces were evaluable for DO, as some sites did not provide traces that included filling cystometry. There was agreement between central review and site categorization in 57% of available traces where DO was present, and 51% where it was absent ( Table 2). The reviewers also identified additional factors needing caution when deriving BOOI and BCI, namely, straining at the time of Q max in 15 out of 99 (15.2%), voiding data derived from a DO incontinence episode in 4 out of 99 (4.0%), and a major drop in P abd sufficient to alter BOO diagnosis in one case. Annotation of key volumes (trace or attached data) was lacking for PVR in 37 out of 110 (33.6%), cystometric capacity in 35 (31.8%), and VV in 15 (13.6%). Twenty-four of those traces missing key volume data (21.8% of the 110-sample population) were missing two or more of the volumes.

| DISCUSSION
Questions exist regarding the influence of technique on the utility and perceived value of urodynamics. 15 The findings from the analysis of UDS and uroflowmetry data indicate that a detailed quality control process is essential for research involving urodynamics, and supports the importance of developing accreditation/reaccreditation schemes for all health care professionals running these types of test. The UK Continence Society recently reviewed its minimum standards for urodynamics, mindful of the fact that there is concern over the quality of urodynamic testing in practice, 16 and the report has been endorsed by the ICS. Experience of central reading of urodynamic traces from multicenter studies on therapies have reinforced this concern and made it clear that this is an international issue (Abrams and Gammie, personal communication, December, 2019). Therefore, it is most important that all research studies with reliance on UDS assess the quality of their tests. In the current data review, we identified an erroneous diagnosis of BOO in 6 cases (5.5%), indicated by a site diagnosing BOO when the central review categorized the BOOI is less than 20.
This has to be regarded as a serious error of interpretation, as it could lead to major consequences, such as unnecessary surgery. There is a significant chance that this figure could underestimate the true extent of the problem in wider practice. This is because we applied a high level of confidence before labeling a conclusion as erroneous. Furthermore, these were well-resourced high-quality research studies, which implies a clearer focus on diagnostics than what may be applied in other contexts.
Copies of on-site records of calibration and equipment maintenance were also checked for compliance with the ICS recommendation that calibration checks are done every 10 tests. 11 If the calibration of any piece of equipment is inaccurate, then it should be recalibrated by a qualified individual before the next test to avoid recording inaccurate data for a patient. In addition, annual maintenance checks will cover the system in more detail, for example, whether the infusion pumps are delivering liquid at the specified rate. Twenty-eight percent of urodynamic sites did not record any departmental calibration checks, and 16% were unable to provide confirmation of annual service. This means that the equipment accuracy may not necessarily have been reliable. While the extent to which this genuinely impacted individual cases cannot be measured, it is an avoidable risk that can be minimized by adhering to suitable standards.
The need for caution when interpreting studies also arose because sites sometimes failed to identify a potential recording problem at the crucial moment of Q max . For uroflowmetry across two major trials in the United Kingdom, the Q max information was affected by an artifact in 11%, and in the majority no correction was applied after the study to derive flow rate data from a part of the trace unaffected by the artifact ("corrected Q max "). For PFS, cough testing after voiding was omitted in 29% and was significantly inaccurate (≥30% height discrepancy between the pressure spikes) in 17%, making the reliability of the P Q det max value uncertain. Sites sometimes did not run tests in accordance with the ICS standardizations. For UDS tests, this included not setting the reference zero pressure to atmospheric pressure (27%) and limited annotation of traces to help interpretation (eg, permission to void not marked in 14%). This generally does not have direct implication for an individual patient test, though sometimes it may obscure a pressure recording error not picked up by discrepancy in the cough pressure spikes. However, it makes comparisons between centers very difficult, which is unhelpful for standard audit and exploratory research. In the United Kingdom, the Improving Quality in Physiological Services (IQIPS; https://www.ukas. com/services/accreditation-services/physiologicalservices-accreditation-iqips/) scheme allows sites to become accredited against good practice standards that encompass patient experience, facilities, safety, and technical quality. The results we present here show that there is a need for such accreditation to enable departments to demonstrate and maintain good practice for patient benefit.
The strengths of the current study were the use of quality control protocols derived from the ICS Fundamentals of Urodynamic Practice document, 7 random trace-selection, and using specifically trained central readers blinded to the sites under review with expert adjudication for resolution of discrepancy. The main limitation was the necessity to sample 10% of traces, as this may give an unrepresentative picture of individual site performance, though this does not affect the overall evaluation.

| CONCLUSIONS
This quality control study has shown that there are significant issues with respect to equipment maintenance and calibration, testing procedures, and the interpretation of results for both uroflowmetry and urodynamics in a large number of urodynamic units. This has led to a small proportion of men being wrongly characterized as having BOO, and hence at risk of being counseled inappropriately to surgery.