Establishing a balanced scorecard measurement system for integrated care organizations in China.

School of Management, Zhejiang University, Hangzhou, China Business School, Sichuan University, Chengdu, China Correspondence Xinli Zhang, Business School, Sichuan University, Chengdu 610065, China. Email: zhangxinli@scu.edu.cn Funding information Sichuan University, Grant/Award Number: 2018hhs‐54; Health and Family Planning Commission of Sichuan Province, Grant/ Award Number: 17ZD047; Zhejiang Soft Science Research Project, Grant/Award Number: 2017C35G2010204; National Key Research and Development Program of China, Grant/ Award Number: 2017YFB1400601; National Natural Science Foundation of China, Grant/ Award Number: 71832013

Integrated health care is a relatively new trend in international health care reform. However, over two decades ago, in 1996, the World Health Organization (WHO) published Integration of Health Care Delivery to suggest the establishment of such a system of regional health care that would focus on the integration of primary health care providers, including the full integration of hospitals and their services. 1 Since then, the Institute of Medicine has published Crossing the Quality Chasm: A New Health System for the 21st Century and indicated that the integration and coordination of all parts of the medical system could generate more health benefits. 2 Globally, many countries face difficulties and pressure delivering effective medical services, especially in developing countries such as China. The severe imbalance (see Figure 1) between the limited supply of medical service resources and the huge demand for them makes health care service delivery a major challenge today.
One of problems is the fact that health care services, which represent a large proportion of health system resources, are often concentrated in large or top-class hospitals and are difficult for patients to access. At the same time, the aging of the population has increased medical demand, and people are expecting better treatment. According to WHO statistics, the global population has been aging rapidly. 3 Between 2000 and 2050, the proportion of the population over 60 is expected to double, increasing from 11% to 22%. At the same time, the absolute number of those over 60 is estimated to increase from 0.605 thousand million to 2 thousand million.

| Characteristics of integrated care organizations
Among the new trends in development and reform in international health care, the establishment of integrated care organizations (ICOs) and the legitimate integration of health resources have been proposed to deal with the mismatch in the supply and demand of medical services. The increasing attention on integrated health care may help build a more reasonable health care system model as determined by policymakers, managers, and practitioners.
ICOs can be found in many different countries but with different aims and features. In China, ICOs are seen as important initiatives to extend health reform. 4 Integrated delivery networks (IDNs), which prevail in the United FIGURE 1 Severe unbalance between supply and demand of medical resources States, link together health care organizations or medical staff at all levels to form medical service networks. 5,6 These are aimed at helping patients avoid discontinuous medical care or dependency on emergency services as such patients usually encounter the most fragmented forms of medical care and need lower costs.
Integrated care networks (ICNs) used in England are aimed at coping with the financial constraints on National Health Services (NHS) in the past and the increasing pressure created by population growth and aging. 7,8 ICNs are expected to coordinate general practice, community, and hospital service to satisfy the demand of patients, especially those with severe diseases.
Compared with elderly population problems in the United States and England, China is facing additional problems that include a large number of elderly people in need, a rapid rate of aging, and the fact that aging preceded industrialization. Thus, in China, solving these specific problems through ICOs may be trickier. The medical system involving three main levels of medical institutions and organizations has been perfected to a certain extent in developed countries such as the United States and England. As a result, their ICOs aim to enhance collaboration and improve medical services for patients. However, the capability and medical resources of primary medical and health care institutions in China are far from the top-class public hospitals. Chinese ICOs (Figure 2) are looking to establish networks involving all levels of hospital and medical institutions with the top-class hospitals at their core. Through the establishment of information-sharing platforms and the syncing of high-quality medical resources, the homogeneity of medical examinations throughout the consortium can be achieved.
Thus, the overall medical service capacity can be improved. Further, closed-service, business, and value chains can be attained.

| Balanced scorecard-based performance measurement
Since the end of the 20th century, a variety of ICOs have emerged without any effective performance measurement.
In addition, although seemingly integrated, all of these failed due to a lack of long-term strategic thinking. 9 In this context, a performance measurement is seen as playing a key role in evaluating and uncovering the ICO system's main problems.

FIGURE 2
The integrated care organization model in China Since Kaplan and Norton 10 initially introduced the balanced scorecard (BSC), a popular performance management system consisting of four measurement perspectives (financial, customer, internal business process, and learning and growth), numerous articles have appeared on its development and application. It is a holistic method that converts an organization's mission and strategy into comprehensive performance measures that provide the basis for a successful strategic measurement and management system. 11 To date, the BSC has been applied successfully worldwide in many organizations including government, private, and nonprofit organizations. 12 Today, the BSC is increasingly popular among medical organizations and institutions as well. As evidence, in a study assessing the performance and condition of the Tria Dipa Hospital in Indonesia from the four core BSC perspectives (except for learning and growth), the hospital was found to perform well on all other counts. The study also detailed the dissatisfaction of medical staff in the hospital. 13 In Australia, the development of a performance measurement specific to Australian nonprofit health care organizations was first undertaken (into an application domain) by validating a nonprofit version of the BSC. 14 In another example, researchers used the BSC to evaluate the strategic value of the implementation of cloud computing solutions at a Saudi hospital. 15 As for China, in 2000, the BSC began to be used in health care engendering a wide range of research and applications. 12 The wide application of the BSC reflects a change in management approach by policymakers and managers. That is, in the information society, the traditional method of performance management is not comprehensive enough and needs more balance. This includes balancing financial and nonfinancial indicators, longand short-term objectives, outcome and motivational indicators, internal and external organizational groups, and leading and lagging indicators. Similar to the way the BSC is used throughout the world today to support high-level corporate strategy, it can be used by organizations to support specific strategies, such as sustainability, and, thus, to link sustainability objectives with appropriate actions and performance outcomes. 16 Therefore, the application of the BSC to ICO performance measurement offers significant value by enabling balanced and sustainable ICO planning and management.

| Study objective
The objective of our study is to establish a comprehensive evaluation indicator system valid for Chinese ICOs from the perspective of experienced executives who have insights into ICO operations. Further, the indicator system will reflect the characteristics of the ICOs in China and offer guidance and be of practical significance for their sustainable planning and management.
From the perspective of quantity, Chinese ICOs have attained a certain level of success. However, whether the ICOs bring benefits to their core medical hospitals and institutions remains under discussion. Hence, a comprehensive performance evaluation system is needed to test the operational effectiveness of these ICOs. ICOs face some unique challenges in adapting the BSC to their environment. BSC theory is traditionally applied to both private corporations and nonprofit organizations such as public hospitals. Although the degree of emphasis on the financial, customer, internal process, or learning and growth perspectives depends on the nature of the organization, the organizations are considered single entities with unified strategic targets.
However, due to the coexistence of top-class public hospitals and primary medical and health care institutions comprising the ICO, it is unreasonable to recognize ICOs as simple single organizations. Large gaps exist between top-class public hospitals and primary medical and health care institutions, such as their health service capabilities and the bottleneck faced. Consequently, the ICO can be regarded as a crossorganizational system that is required to balance the conflicts and benefits among multiple organizations. Furthermore, owing to the ICO's specialization, the order of importance among the four core perspectives (financial, customer, internal process, and learning and growth) may differ from the traditional order for profit or nonprofit organizations as well.
The study was structured as shown in Figure 3, and we present each of the study elements in this paper. The rest of the paper is organized as follows. In Section 2, we present our methods including our survey of ICO experts. In Section 3, we show the results of the weighting of the indicators. In Section 4, we discuss the results from theoretical and practical standpoints. Suggestions for ICOs are also given from the financial, patient, internal process, and learning and growth perspectives of the BSC. Finally, we summarize the key findings, strengths, and limitations of our study and point to possible directions for further research.

| METHODS
We began our study by searching the China policy database for the key word ICO and found that although ICO was referred to in the content of some policies, it was rarely referred to in policy titles. However, such policies have dramatically increased in 2017 compared with 2016 (from nine to 51). Through June 2018, there were more than 5000 Chinese ICOs. 17 However, according to the Beijing Report on Social Development (2015-2016), only around one of 3000 outpatient emergency treatments in Beijing was referred 18 through ICOs from 2013 to 2015. The report points out that the "contract rate" of ICOs has been the main assessment by government departments to date. However, first-visit rate and referral rate, which reflect the actual operations of the ICOs, have not been included, which is evidence of the need for other measures.
Subsequently, we identified the strategic targets of the ICOs according to the characteristics of top-class public hospitals and primary medical and health care institutions and the interactions between them. The fieldwork included investigation among experts across the field as well as questionnaires from the staff members of a large China ICO and survey analysis. Finally, by synthesizing the expert opinions, we establish the BSC-based performance evaluation indicator system for ICOs in China.

| The Delphi method
In this study, we used the Delphi method, a structured communication technique, originally developed as a systematic, interactive forecasting method, which relies on a panel of experts. We selected 216 experts with significant work experience and knowledge from the ICO of West China Hospital, one of most famous top-class public hospitals in China. We administered self-designed questionnaires to these experts who provided suggestions for modifying the indicator framework and graded the importance of the indicators. We then revised the indicator system according to the feedback. Figure 4 shows the revised version of the BSC-based evaluation indicator framework of an ICO in China.
The importance of the four first-level indicators scored and summed to 100%. The importance of each secondlevel indicator was categorized according to five levels ranging from 1, very unimportant, to 5, very important.

| Development of the BSC-based evaluation indicator system via case studies
We constructed an indicator system framework based on BSC theory. Guided by ICOs' strategic targets, the framework was created by consulting ICO experts and summarizing the interactions and problems that ICOs face. The design also includes research [19][20][21][22][23] from China and other countries on hospital performance evaluation.

| Strategic target
Considering all the elements of the ICOs, the strategic development target is constructed with the causality of "inputoutput-purpose-goal" as shown in Figure 5, based on the logical framework application (LFA) method, 24 combined with China's guiding opinions on promoting the establishment and development of ICOs and the problems facing medical providers.

| Principles of the indicator design
Cross-organizational synergies in ICOs require breaking down information barriers between organizations and decomposing the long service flow path at hospitals at all levels into a multinode path. This allows essential productive factors in each node, like registration, consultation, examination, treatment, hospitalization, and rehabilitation, to flow freely as "public objects" in the ICO and medical resources to be dispatched without restrictions. At the same time, for the purpose of realizing the integration and resharing of fragmented medical resources, the ICO reorganizes and re-innovates across these nodes. Thus, in addition to following the general principles of a single organization's performance evaluation indicators, such as objectivity, science, validity, and operability, 25 the ICO evaluation index system should be established from the cross-organizational perspective and based on other special principles as follows.

Pertinence
Although the mechanisms underlying ICOs have begun to be improved, their actual effects lack pertinence verification. Correct analysis and evaluation of the characteristics of an object are critical criteria for evaluation indicator selection. Thus, when selecting indicators, the pain and pressure points during the development of the ICO should be examined. For instance, although the ICO assists in the use of information multilaterally by creating public resources that break through multiorganizational boundaries, the lack of homogeneity in the information from the various organizations hinders its smooth flow and sharing. As a result, information synergy needs to be evaluated through pertinent indicators.

Completeness
ICOs include many stakeholders such as patients and medical staff at hospitals at all levels, involving dual referrals and other aspects of operational processes. Accordingly, indicators should be designed as multidimensional based on a single organization's evaluation indicators combined with the changes the ICOs create for hospitals at all levels and the synergies in each node across organizations.

Hierarchy
In the process of the index system establishment, the hierarchy of its structure should be constructed. That is, the strategic target of the ICO is decomposed into specific subtargets. The dimensions and the index layers are set up so that the indicators can be refined and quantized according to the data and guided by the dimensional layer.

Unity
In the analysis of a certain dimension, the indicators should relate closely to the core content of the strategic target and remain coordinated accordingly. Further, the impact of the indicators should be consistent, namely, the greater the index, the greater the impact. At the same time, each index should be standardized.
On the basis of universality, representativeness, and relevance, we selected a large hospital and its ICO for our main case study. The questionnaires were completed by medical staff including managers, doctors, and paramedics. The questionnaires consist of two main parts shown in Tables 1 and 2, respectively.
In the study, the participants at the ICO were selected by random sampling. The questions covered the indicators to be graded. These were divided into four BSC dimensions: financial, process, patient, and learning and growth. The respondents were asked to score each dimension and the index based on their knowledge of the dimensions and indicators listed. Figure 6 presents the process used for data collection.

| Sample size estimation
The sample size was first calculated with an estimated error of not more than 6% at P = 0.5.
where n is the sample size, Z 2 a=2 is the critical value, and Δ 2 P is the limit error.  Considering the invalidity of some questionnaires, we needed to attain the number as calculated in formula (2): We collected 216 valid questionnaires, including 109 from the large hospital and 107 from the primary-level hospitals in its ICO. Figure 7 presents the process used for data analysis.

| Agreement among the expert suggestions
Due to the diversity of the 216 individuals we interviewed, we needed to test the agreement among these, as high consistency represents the high reliability of the results. The consistency of the scores of each indicator is defined by its coefficient of variation, which is used to explain the convergence among the internal sample data. The coefficient

First-level indicators
To analyze the average level and agreement of the scores by the first-level indicators, the four BSC areas of emphasis, the mean, standard deviation, and coefficient of variation were calculated as shown in Table 3. As all the first-level

Second-level indicators
The mean, standard deviation, and coefficient of variation were calculated, as shown in Tables

| K-means clustering-based first-level indicator weight
The main objects of the weight analysis of the four perspectives are the scores provided by the 216 respondents on the basis of their degree of importance. However, due to the large coefficient of variation for the four first-level indicators, that is, the difference in the score distribution was significant, K-means clustering analysis 26 was used additionally to refine the weighting further. We used SPSS 20.0 to explore the respondents' common knowledge of each indicator implied by the data. In this case, we chose K as three, which means the data would be divided into three clusters and we would receive a new set of results, that is, three new cluster centers including a different number of cases in every operation.
Step 1:. Determination of the initial set of cluster centers. SPSS 20.0 gives the initial cluster center after fast clustering analysis.
Step 2:. Determination of the final set of cluster centers.
The K-means clustering algorithm calculates the distance from each point to the cluster's center, divides each point into the nearest category, and determines a new cluster center by updating the coordinate mean of each point in each category. The above process is repeated until the cluster center does not move significantly.  Step 3:. Determination of the final cluster center.
By comparing the number of cases in the final three clusters, the cluster with the most cases is chosen as the basis of the weight calculation for the first-level indicator.
Step 4:. Determination of the weights of the first-level indicators.
The weight of each first-level indicator is the ratio of the final cluster center's coordinate value to the sum of the coordinate values of all cluster centers. In the study, the initial cluster number was three. After five iterations, the cluster center no longer moved. The final weight of the four first-level indicators was W ⋅j⋅ (j = 1, 2, 3, 4).

| Coefficient of variation-based second-level indicator weight
Although the mean of the scores can evaluate the overall situation of the indicator, the coefficient of variation can compensate better for the defects of equalization. When dealing with weighting the second-level indicators, those under the same perspective were regarded as a whole. The means and standard deviations of the scores were calculated, and the coefficient of variation (CV) was calculated by the ratio of the standard deviation to the mean. Using the CV method, the index weight was determined by the ratio of the CV of a certain index to the sum of the CV of all the indicators; the BSC with the index weight was established as well. The specific calculation process is as follows.
The kth index score of the jth dimension of the ith respondent is defined as in formula (3).
where N is the number of respondents, M is the number of first-level indicators (perspectives), and L is the number of second-level indicators for each perspective.
Step 2:. Calculation of mean, standard deviation, and coefficient of deviation.
The mean, standard deviation, and coefficient of deviation of each indicator are calculated, respectively, as in formulas (4) to (6).
Step 3:. Determination of the weights of second-level indicators.
According to the CV method, the indicator weight is achieved from the weighted mean as shown in formula (7).
Step 4:. Determination of the synthetic weight.
The synthetic weight takes both first-level and second-level weights into account, calculated as in formula (8).

| RESULTS
The case study had 216 respondents and 40 second-level indicators from four perspectives forming 8640 sample points, as shown as formula (9).
The number of indicators in each dimension is shown in formula (10).

| Weight of first-level indicators
In the study, the initial clustering number was three (k = 3). After five iterations, the maximum absolute coordinates of any center changed to 0.000, which means that the cluster center no longer moved. The process is shown in Table 8.
The final set of cluster centers is shown in Table 9.
According to the clustering results shown in Table 10, the 157 cases in cluster 2 represent 73% of all effective cases, which indicates that over 70% of ICO work participants agreed on the degree of importance of the four perspectives. Thus, the final center of cluster 2 was taken as the basis for the weight determinations of the first-level indicators. After transforming the center values of each dimension of cluster 2 (shown in Table 10) into decimals, the weights of the first-level indicators are as shown Table 11.
According to the order of importance, the final result of the four perspectives (first-level indicator) can be logically described as in Figure 8. Convergence occurs as there are no changes or minor changes in the cluster center. The maximum absolute coordinates of any center are changed to 0.000. The current iteration is five. The minimum distance between the initial centers is 83.857.
Bold figures emphasize that this is the key result of this step and the basis for allowing the next step. Bold figures emphasize that this is the key result of this step and the basis for allowing the next step.

| Weights of second-level indicators
The final BSC-based performance evaluation indicator system for the ICOs, with the index weights, is shown in Table 12.

| Analysis of BSC-based performance evaluation indicator system
The weighted order of the four first-level indicators in the BSC-based performance evaluation system was as follows: patient, internal process, learning and growth, and financial. Each indicator had a different weight at different levels.
In the analysis of the agreement of the first-level indicators, the score distribution of learning and growth was the most discrete with the biggest deviation. Compared with the other three perspectives, this requires a long time to verify its validity, which reflects the ICOs' future development direction and upward trend. Learning and growth   uation. This implies a lack of synergy in the understanding among the ICO work participants, which to some extent will influence the further development of the ICOs.
In the weight determination of first-level indicators, the weights of the patient, internal process, learning and growth, and financial perspectives decreased in turn. The weight of the patient perspective was the largest and that of the financial perspective the smallest, which represents the characteristics of the Chinese ICOs. That is, the weights of the first-level indicators emphasize "human-orientation." China is committed to solving the problem of difficult and expensive access to medical care caused by the uneven distribution of medical resources from a social welfare perspective, with medical economic benefits relatively less of a consideration.

| Implications for further planning and management of the ICO
According to our study results, attention should be paid to the following aspects of ICO performance improvement.

| Implications from the financial perspective
In the agreement analysis, from the financial perspective, the mean of the indicators based on the strategic target of increasing economic benefits was relatively large, and the CV of that was small, while the mean of the indicators based on the target of controlling medical costs showed the opposite results. This illustrates that the ICO staff held divergent views on controlling medical costs and were more attentive to improving the profitability of the ICO.

| Implications from the internal process perspective
From the internal process perspective, the mean of indicators based on the strategic target of achieving homogeneity and mutual recognition was relatively large, and the CV of that was small, which points to the significant need for standardization to improve ICO internal processes, whereas the mean of the indicators based on the target of increasing referral efficiency showed the opposite results. This may be because ICO participants work in different quality hospitals, which means that participants from superior hospitals care more about indicators representing downward referrals (transfers from top-class hospitals to primary medical institutions or lower-level hospitals) and those from primary-level hospitals care more about indicators representing upward referrals (transfers from the primary medical institutions or low-level hospitals to top-class hospitals).

| Implications from the patient perspective
From the patient perspective, the mean of the indicators based on the target of enhancing patient acceptance of community first-treatment and dual referral was relatively large, and the CV of that was small; however, enhancing patients' satisfaction showed the reverse results. This may be because ICO workers focus more on directly related patient behaviors rather than patients' personal experience with the ICO medical services.

| Implications from the learning and growth perspective
From the learning and growth perspective, except for the index, based on the target of the amount of scientific research, projects, and papers at primary-level hospitals, the mean of other indicators was large, and the CV of that was small, which may be because ICO workers at primary-level hospitals place more emphasis on improvement in practical abilities rather than medical research right now.

| Strengths, limitations, and suggestions for future research
We believe this is a pioneering study of ICO BSC measurements and evaluations of collaborative performance of cross-organizational medical services. The ICO's performance evaluation index system is based on the actual situation among China's ICOs and a comprehensive consideration of the actual principles, methods, and operations. Based on of the comprehensiveness of the BSC, it can reflect all aspects of the ICO's performance measurements, and the method is feasible as well as beneficial in the acquisition of practical information. The analysis of the indicator weighting shows that when the BSC is applied to actual performance measurements, data-driven index weight calculation is necessary, and clustering analysis and the CV method can help in the BSC design.
However, this study has some limitations that point to areas of further study: 1. The BSC here is designed based on the practice at a large hospital in China, and the indicators are considered from the perspective of its characteristics. The specific indicators of the BSC require further scope of application.
2. The weight analysis is divided into two parts: first-level and second-level indicator layers. The process of determining the weights of the two levels is independent. The relationship between cross-dimensional indicators and the influence between the weighting in the two parts is not explained and is worthy of further study.
3. The K-means clustering algorithm and the CV method are used to determine the weights. Although the K-means clustering algorithm and the CV method have relatively high efficiency and flexibility, they are determined by the samples and driven by the sample data, which will change with different sample data. Thus, a problem still remains as to how to determine the reasonable initial clustering center during the K-means cluster calculation and how to improve the CV method to be more reliable. In follow-up research, because the weight is driven by the concrete data, the data processing method still needs improvement. However, the normal form of the index importance ranking given here has a large reference value.
4. Although the evaluation system reflects the general experience of internal experts in a larger hospital in China and has significant reference value, the index system has not yet been implemented in the hospital. At the same time, China ICOs are still in a semi-mature stage of development. Thus, there are some subjective elements and uncertain factors in the index system, and its validity still needs further testing. Based on the results of the system's practical application and the deviation between them and the actual quality improvements at ICOs, a follow-up study could integrate the results of the measurement evaluation of the indicators into the BSC to further improve the indicators.

AUTHOR CONTRIBUTION
The study was designed by X.W. in collaboration with all coauthors. The data were collected by S.L. and N.X. The first and final drafts were written by N.X. The paper was critiqued by S.L. The results were analyzed by X.W. and D.W.
The research and key elements of the models were reviewed by X.W. The writing of the corresponding parts and the major revisions of this paper were completed by X.Z.