Transparency of high‐dimensional propensity score analyses: Guidance for diagnostics and reporting

Abstract Purpose The high‐dimensional propensity score (HDPS) is a semi‐automated procedure for confounder identification, prioritisation and adjustment in large healthcare databases that requires investigators to specify data dimensions, prioritisation strategy and tuning parameters. In practice, reporting of these decisions is inconsistent and this can undermine the transparency, and reproducibility of results obtained. We illustrate reporting tools, graphical displays and sensitivity analyses to increase transparency and facilitate evaluation of the robustness of analyses involving HDPS. Methods Using a study from the UK Clinical Practice Research Datalink that implemented HDPS we demonstrate the application of the proposed recommendations. Results We identify seven considerations surrounding the implementation of HDPS, such as the identification of data dimensions, method for code prioritisation and number of variables selected. Graphical diagnostic tools include assessing the balance of key confounders before and after adjusting for empirically selected HDPS covariates and the identification of potentially influential covariates. Sensitivity analyses include varying the number of covariates selected and assessing the impact of covariates behaving empirically as instrumental variables. In our example, results were robust to both the number of covariates selected and the inclusion of potentially influential covariates. Furthermore, our HDPS models achieved good balance in key confounders. Conclusions The data‐adaptive approach of HDPS and the resulting benefits have led to its popularity as a method for confounder adjustment in pharmacoepidemiological studies. Reporting of HDPS analyses in practice may be improved by the considerations and tools proposed here to increase the transparency and reproducibility of study results.

• Diagnostic tools can offer important insights into the properties of the features selected for inclusion in HDPS models.
• We provide considerations for reporting to increase the transparency and reproducibility of HDPS analyses.
• We hope more widespread use of the guidance and tools presented will help to breakdown 'black-box' criticisms of the HDPS.

Plain Language Summary
The high-dimensional propensity score (HDPS) algorithm is an established method for identifying and selecting confounders in large healthcare databases. The implementation of HDPS approaches can be complex, requiring investigators to identify data for consideration and specify tuning parameters. We present graphical tools and sensitivity analyses for evaluating HDPS analyses and discuss key reporting considerations to help promote transparent implementation of HDPS approaches. These are illustrated using a study of proton pump inhibitor users from the UK Clinical Practice Research Datalink. Seven items for reporting key aspects of the HDPS procedure were identified, including clear specification of the types of input data considered and the number of variables selected by the HDPS. Graphical visualization tools included summaries of the types and properties of variables selected by the HDPS.
Sensitivity analyses included investigation of the robustness of results to the key decision surrounding number of variables selected. We hope the tools and considerations provided allow researchers to better scrutinize HDPS analyses and improve the reporting of implementation details.

| INTRODUCTION
Bias arising from confounding is a key concern for pharmacoepidemiological studies and its mitigation depends on the ability to identify, measure and adjust for underlying differences between patients receiving different therapies. 1 Successful adjustment for confounding often hinges on capturing hard to measure concepts, such as markers of frailty, disease severity, or health-seeking behaviour.
The high-dimensional propensity score (HDPS) algorithm 2,3 is a method for variable identification, prioritisation, and adjustment tailored for large healthcare databases. The HDPS conceptualises information in these databases as proxies to key underlying constructs; some are likely to be strongly correlated with other measured variables, but others act as proxies for constructs that would otherwise be unmeasured. The procedure treats these features as additional covariates for adjustment with the aim of optimising confounding capture and control.
Whilst the HDPS often incorporates several hundred additional covariates, the types of features included are rarely communicated leading some to label the HDPS a 'black-box' approach. Diagnostic tools can offer important insights into the properties of these features, enhancing our knowledge of the factors driving treatment decisions and checking for possible errors, e.g., the presence of certain codes in the pool of selected HDPS covariates can highlight possible errors relating to linkage error or the application of exclusion criteria.
Despite studies highlighting the potential lack of robustness to investigator decisions (e.g., the number of covariates chosen), 4,5 reporting of sensitivity analyses remains inconsistent and this can undermine the transparency and reproducibility of HDPS analyses. Recent guidelines surrounding the reporting of pharmacoepidemiological studies state that "high dimensional proxy adjustment" methods should be reported in full; guidance is needed about what this entails. 6 Building on existing guidance for propensity score (PS) analysis, [7][8][9] we describe and illustrate diagnostic tools and sensitivity analyses for HDPS analyses. We also provide considerations for reporting relevant information.

| HIGH-DIMENSIONAL PROPENSITY SCORES
The generic five steps of the HDPS procedure are as follows 2 : • Step one, investigators specify the data structure. This can involve declaring data dimensions capturing different aspects of care in the database under investigation.
• Step two, pre-exposure features are generated, and a prevalence filter is typically applied (often selecting the top 200 most common features from each dimension). Features are usually in the form of codes or free-text information and grouped at a specific granularity level. For example, codes might be truncated to the first three digits if they are International Classification of Diseases, 10th edition (ICD-10) codes.
• Step three, the recurrence of features is assessed in a preexposure period, creating binary covariates based on a set of frequency-based cut-offs. 2 The standard implementation of the HDPS defines three indicators for each patient capturing whether a feature was recorded: ≥once, ≥the median, and ≥the 75th percentile.
• Step four, the large pool of covariates generated in the previous step is prioritised. This is typically achieved using the Bross formula, which uses univariate associations of covariates with treatment and outcome, to identify those with the highest potential to bias the treatment-outcome relationship. 2,3 • Step five, a number of HDPS covariates (typically the top 200-500 from the covariate prioritisation) 2,10 are selected to augment a set of pre-defined variables (selected by the investigators based on background knowledge) used for estimation of the PS model. Standard PS methods (e.g. matching or weighting) 8,11 are used to estimate treatment effects based on both sets of covariates. The guidance presented subsequently should be considered additional to existing practices surrounding the reporting of PS methods e.g., summarising weights by exposure groups or presenting the proportion of patients unmatched. 7

Item 1: Specify data dimensions
Data dimensions identified should be summarised, indicating which aspects of care they capture and possibly note data quality and completeness metrics. These summaries should include a description of the features included in the data dimensions (e.g. codes, free-text information, laboratory test results) and any corresponding coding systems used. Investigators should describe how features are generated, e.g. specifying the code granularity for a particular coding system (e.g. 3-digit ICD-10) or how free-text information has been processed. 12 Furthermore, investigators may describe the number of candidate features available within the specified dimensions.
Ongoing debate in the literature surrounds the use of marginal prevalence for prioritising features in Step 2 of the HDPS procedure. 13 The main concern is the possible omission of influential features where despite a low marginal prevalence there exist strong imbalances within exposure groups. Investigators should indicate whether the prevalence filter is used and if so, state the number of features selected per dimension.

Item 3: Describe feature recurrence assessment
Whilst feature recurrence is typically assessed using the cut-offs outlined by Schneeweiss et al, deviations from these cut-offs exist and should be described in full. 10,14 One example suggests explicitly considering the proximity to exposure start. 10

Item 4: Specify covariate prioritization method
Investigators should describe the method of covariate prioritisation used. Whilst ranking is typically based on the Bross formula, exposure-based ranking (prioritising covariates based on the confounder-exposure association) has been employed in settings with few outcome events. 2,5 Recent evidence indicates the potential for machine-learning methods to enhance the performance of HDPS, both for covariate prioritisation or by reducing the set of covariates prioritised by the Bross formula. [15][16][17] Item 5: Specify total number of HDPS covariates to select The number of HDPS covariates selected for inclusion in the PS model should be reported, in addition to routine reporting of the investigator-identified covariates. Machine learning-based approaches to determine the number of codes selected should be described in full. 4,5,17 Item 6: Specify software Investigators should describe which software was used to implement the HDPS. There are commonly used packages available in R, 18 SAS,19 or Aetion.  20 The CPRD is a database capturing information pertaining to contacts with primary care services (including clinical diagnoses, referrals and prescriptions) and is broadly representative of patients registered at general practitioners in the UK. 21 The study investigated whether a pharmacokinetic interaction between clopidogrel and use of proton pump inhibitors (PPI) could reduce clopidogrel effectiveness, leading to increased risk of vascular events. Results indicated an increased risk of MI associated with PPI use, which was hypothesised to be largely due to residual confounding between treatment groups. 20 A reanalysis using the HDPS obtained results much closer to the hypothesised null association, 20,22,23 suggesting an improved ability to account for between-patient characteristics that were important for confounding control. 14

| Summary of HDPS analysis
We defined three dimensions assessing clinical, referral, and therapy information in the year prior to cohort entry. We applied a prevalence filter selecting the top 200 features from each dimension and adjusted for the top 500 HDPS covariates (prioritised by the Bross formula). 14 The PS was estimated using multivariable logistic regression including both pre-defined and HDPS covariates. Hazard ratios (HR) for the treatment effect were obtained using Cox regression weighted by inverse probability of treatment weights. Standard errors for treatment effects were obtained using robust standard errors. 14

| DIAGNOSTIC AND VISUALISATION TOOLS
In this section, we illustrate and discuss novel and established PS diagnostics for assessing HDPS models (summarised in Table 3).

| Model summaries
We recommend simple descriptions for communicating the covariates included in HDPS models, e.g., highlighting the proportion of selected codes that came from each data dimension. Investigators may also summarise high-level clinical concepts captured by the covariates included in the HDPS. Our study categorised codes using British National Formulary (BNF) paragraph level (prescription dimension) and ICD-10 (clinical and referral dimensions). We exploited the hierarchy of these coding systems to investigate codes aggregated by the chapter level. Figure 1 shows that in the clinical and referral dimensions, the majority of covariates selected corresponded to codes relating to symptoms, signs and abnormal findings. Additionally, covariates derived from the therapy dimension corresponded most to prescriptions from the cardiovascular system or nutrition and blood BNF chapters.

| Comparison of PS distributions
Inspecting the distributions of the estimated PS by treatment group is a common diagnostic highlighting the ability of covariates included in the PS model to predict treatment received in the population being studied. As with all PS analyses, investigators should verify the positivity assumption, 9 a violation of which is lack of overlap. One common approach for handling this is PS trimming. 26,27 Whilst inspection of the estimated PS distribution is rec- This highlights important between-patient differences not apparent when only including the pre-defined covariates. These differences would not be accounted for under the investigator-led PS analysis.

| Covariate balance
To investigate the overall balance of HDPS covariates we can plot the prevalence of selected covariates between the two treatment groups (shown in Figure 2). 28 Figure 3 highlights that for most covariates there is a similar prevalence in both groups, with slightly higher prevalence amongst the PPI users. There are several covariates from the prescription dimension (Figure 3, prevalence ratio >2.0) with moderate to high prevalence amongst PPI users and a low prevalence amongst the non-users.
Measures of covariate balance (e.g. absolute standardised differences) are commonly used when assessing PS models to check for imbalances. In the HDPS setting, investigators should check the balance in the HDPS covariates before and after adjustment. Figure 4 indicates some covariates with large imbalances (substantially >10%) in the unweighted population but all achieve good balance in the HDPS weighted population.
There is a concern that adjusting for many additional HDPS confounders can make achieving balance in pre-defined confounders more difficult, as the PS model tries to simultaneously balance many more variables. If the HDPS variables are weak confounders or even not true confounders, addition of these variables can result in unnecessarily increased bias and variance. 29,30 Achieving balance is more T A B L E 2 Summary of Clinical Research Practice Datalink study used for illustration investigating the association between proton pump inhibitor use and risk of myocardial infarction in a population of clopidogrel and aspirin users F I G U R E 2 Overlap plot comparing the propensity score distributions including only 10 pre-defined pre-exposure covariates and additionally including the 500 top-ranked high-dimensional propensity score covariates F I G U R E 1 Summary of high-level concepts captured in the top 750 Bross-prioritised high-dimensional propensity score pre-exposure covariates separated and colour-coded by data dimension important in strong confounders compared to weak confounders. 31 Therefore, we recommend assessing the balance on selected key confounders before and after inclusion of all selected HDPS covariates. 32 For illustrative purposes, we assume that all pre-defined covariates are important confounders and Figure 5 presents the balance of these covariates under models additionally including 250, 500 and 750 HDPS covariates. We observe that even after adjusting for 750 HDPS covariates, we achieve good balance in the pre-defined covariates, indicating the suitability of any of these models for preserving balance in the pre-defined covariates.
Another approach investigates the covariate balance in both the pre-defined and a set of key HDPS confounders ( Figure 6); we additionally assume all key HDPS confounders are in the top 250. Figure 6 highlights that in the pre-defined weighted population, a number of the top-ranked HDPS covariates remain imbalanced. However, when weighting by our primary HDPS model we achieve good balance in both the pre-defined and top 250 covariates.
In covariates. Overall summaries of imbalance could be modified to put more weight on imbalance in covariates thought to be stronger confounders (in which imbalance is more likely to result in confounding bias); Table 4 presents one method for achieving this.
The HDPS aims to optimise confounder adjustment but there is a potential trade-off between better adjustments for a broader array of potential confounders versus tighter balance on key confounders.
How much imbalance we are willing to permit in key confounders is primarily driven by how strongly these confounders are associated F I G U R E 5 Comparison of absolute standardised differences in a set of key covariates between unweighted, pre-defined covariate weighted, and pre-defined and highdimensional propensity score covariate weighted samples F I G U R E 6 Comparison of absolute standardised differences in the pre-defined and top 250 highdimensional propensity score covariates between unweighted, predefined and HDPS (+500 covariates) weighted samples with the outcome. Therefore, a lack of imbalance in pre-defined and HDPS covariates does not necessarily mean all confounding has been removed and key unmeasured confounders may still exist.

| Identification of potentially influential covariates
Whilst the full list of covariates selected is sometimes provided, 2 this is not easily digestible when interrogating several hundred HDPS covariates. However, manually inspecting the top covariates included can identify groups of codes relating to previously overlooked concepts that are important for minimising confounding bias. 33 An initial step is to investigate the distribution of Brossderived bias values; Figure 7 shows the ranking score for the top 500 covariates. 4 The colour coding indicates which dimension the covariates originated from and highlights that the majority of covariates were from the prescription dimension.
Furthermore, this plot allows investigators to observe highly ranked covariates, which might have a large amount of influence in the PS model.
The data-driven nature of the HDPS approach does not preclude adjustment for certain variables, such as instrumental variables (IVs) and colliders, which are typically excluded from PS models. 29,30,34,35 Whilst Step 4 of the HDPS often attempts to down-weight covariates with these properties (e.g. prioritisation by the Bross formula downweights IVs), these variables could inadvertently be included, especially if the total number of covariates available is small relative to the proportion selected. However, the potential reduction in confounding bias from the inclusion of these covariates will often outweigh any increase in bias and variance induced. 30,34,36 Whilst there are no statistical tests for classifying these types of variables, we can attempt to T A B L E 4 Comparison of the mean absolute standardised differences in the unweighted, pre-defined, and pre-defined and high-dimensional propensity score (HDPS) weighted populations Machine learning approaches have been proposed to determine the number of covariates selected for adjustment, but these have not yet been widely adopted. [15][16][17]28 Investigators are usually agnostic about how many covariates to select and therefore should assess the sensitivity of results to this decision. Another approach investigates the impact of incrementally adjusting for the empirically selected variables ( Figure 9B). 4 Figure 9B indicates stabilised results with the inclusion of between 250 and 600 covariates. Where results do not stabilise, investigators should try to understand the driving factors and avoid undue focus on a specific HDPS analysis. Instead, it may be more suitable to report a range of effect estimates.

| Quantifying impact of potentially influential covariates
In this section we quantify the impact of potentially influential covariates on results obtained in our primary analysis.
The distribution of Bross values (Figure 7) highlights that the top three ranked HDPS covariates are modestly higher than the rest. To understand the extent to which these covariates explain changes in In Section 5.3 we identified covariates that behave empirically like IVs. To test the sensitivity of results to their inclusion, we conducted analyses based on Figure 8 (removing seven near-IVs) and the two cut-offs previously described. Removing empirically identified IVs altered results in the 2nd decimal point only, indicating no change in the overall interpretation (Table 5). Furthermore, removal of the empirical near-IV variables resulted in reduced variance around the treatment effect estimate compared to the primary HDPS analysis (Table 5).

| DISCUSSION
The HDPS approach has become a popular and scalable method for augmenting confounder adjustment in a given data source. 10 However, as with PS analyses more generally, use of diagnostics and reporting of the details of the implementation is suboptimal. 37,38 Using data from the UK CPRD 14,20 we highlighted diagnostic tools for assessing HDPS models and proposed considerations for reporting key features.
Drawing on established PS methodology, we described the importance of inspecting the estimated PS distributions before and after inclusion of the HDPS covariates. We recommended assessing covariate balance on important key confounders before and after inclusion of the HDPS covariates to investigate the potential impact of adjusting for many covariates on a set of strong confounders. Additionally, we described diagnostic tools more specific to the HDPS setting, for example, for identifying instrumental-like variables and informing sensitivity analyses surrounding influential covariates.
We recommend that thorough sensitivity analyses should be conducted and reported when applying the HDPS. A key issue surrounds the number of covariates selected for inclusion in the PS model, 4,17 especially since the optimal number in a given setting is often unknown. Where inconsistencies are found, efforts should be made using the tools described to understand the drivers of variability.
HDPS covariate prioritisation is often based on univariable associations (e.g., via the Bross formula) and this can potentially lead to the inclusion of covariates which conditionally are not confounders. This has motivated recent developments focussing on the refinement of covariate prioritisation and selection within the HDPS procedure, especially using machine-learning methods. 15,17,28,39 Whilst such developments can potentially improve HDPS analyses, no single approach is always optimal and applying the diagnostic tools described here is important to better understand the differences between these approaches.
We hope reporting of these analyses may be improved through more widespread use of the considerations and tools presented here.