Matching on poset‐based average rank for multiple treatments to compare many unbalanced groups

In this article, we propose an original matching procedure for multiple treatment frameworks based on partially ordered set theory (poset). In our proposal, called matching on poset‐based average rank for multiple treatments (MARMoT), poset theory is used to summarize individuals' confounders and the relative average rank is used to balance confounders and match individuals in different treatment groups. This approach proves to be particularly useful for balancing confounders when the number of treatments considered is high. We apply our approach to the estimation of neighborhood effect on the fractures among older people in Turin (a city in northern Italy).

A set equipped with such a relation is said to be ordered. If the comparison is drawn using several variables, it may be that some elements are neither equal nor ordered, in which case they are defined as incomparable 2 . The word "partially" is added to "ordered set" when some of its elements are incomparable, so the order relation has to be changed to a partial order relation, which takes the incomparability (indicated with ||) of the elements into account: Incomparability: || ↔ ≰ ≰ , , ∈ . Comparing the individuals in a population gives rise to a list of comparabilities and incomparabilities, which can be represented in a graphic form called a Hasse diagram. This diagram represents the elements in a poset: each node is an element, two  Table 1; part (b) lists all the linear extensions for these individuals; and part (c) their exact average rank. or more equal elements still form one node, and every line segment is an order relation between comparable objects. Let us suppose that we have a population comprising six individuals characterized by three dichotomous variables, as represented in Table 1: age (which takes a value of 0 for individuals who are between 60 and 70 years old, and 1 if they are older); education (which takes a value of 0 if they have a higher education, and 1 otherwise); and homeowner (which takes a value of 0 if they own the house in which they live, and 1 otherwise). The set of observed characteristics of each individual is called "profile". These variables are ordered according to the risk of experiencing the outcome.
In this example, for the sake of simplicity, we included only dichotomous variables, but categorical and discrete variables may be also considered in a poset. However, in order to contain the complexity of the poset, it is recommended to reduce each discrete variable in few meaningful classes.
A Hasse diagram can be used to visualize the order relations between the elements in a poset, and it is based entirely on the order of the elements, disregarding any quantitative information.
In Figure 1(a), the six individuals are represented by their profile in the Hasse diagram, where each node stands for a profile. When two individuals are comparable, they are connected by line segments in the diagram, like A and B (where A and B have same values for education and homeowner, and B has a higher value for age than A) or B and E (where B and E have same values for age and homeowner, and E has a higher value for education than B), whereas there is no ascending or descending path between incomparable elements, like B and C (where B and C have same value for homeowner, B has a higher value for age than C, but a lower value for education).
The list of all the ranks that each individual may occupy is shown in part (b) of Figure 1, where all the linear extensions of the poset are listed. Linear extensions are all the possible rankings of elements in the poset that respect its comparabilities (the connections in the Hasse diagram) and incomparabilities 1,2 . The average rank (AR) of a node represents the mean of all the ranks that the element occupies in all possible linear extensions, starting from the known order relations, as listed in Figure 1 part (c).
The AR is a single value for each element in the set that describes the relative position of a given element with respect to the rest of the population. It can be normalized in the interval [0;1].

A.2 -Approximating the average rank
If the number of individuals and variables increases, the linear extensions become too many to be examined thoroughly, and it becomes computationally almost impossible to find the exact AR as in the example in Table 1. That said, satisfactory approximations of the number of linear extensions of a poset can be found in works by Dyer 3 , and De Loof 4 .
Researchers have used two main approaches to obtain a computationally efficient calculation of the AR, by sampling linear extensions 5,6 , or defining an approximation formula. Different approximation formulas have been proposed in the literature, such as the Local Partial Order Model 7 , or the one based on Mutual Probabilities 4 . The present work is based on De Loof's approach (2009) 8 because it provides better results than other methods in terms of accuracy with a large sample size 8 .
Two concepts help us to understand this approximation, for a sample with | | elements: The rank probability ( ( ) = ) is the fraction of linear extensions in which an element's rank equals , where assumes the value of all possible ranks in the sample of size | |, so = 1, … , | |.
The mutual rank probability ( > ) of two elements , ∈ is the fraction of linear extensions in which the element is ranked higher than element .
Now we can establish a relation between the last-mentioned two concepts and the real AR of elements ,h( ), starting from a sample with | | elements, including and : In other words, the first part of formula 1 describes the real AR value,h( ), as the expected value, multiplying each possible rank value by the fraction of linear extensions in which the element's rank equals . The second part of formula 1 expresses the real AR value as the sum of all the mutual rank probabilities that involve the element . Starting from this formula, we need to find an approximation for the mutual rank probability. To do so, we have to define three subsets of the poset P, given a generic element ∈ : If ∈ ( ), then ( ( ) > ( )) equals 1, and if ∈ ( ), then ( ( ) > ( )) equals 0, so the mutual rank probabilities only need to be approximated with respect to the reciprocal ranks of the incomparable elements. The following approximation was proposed by Brüggemann 9 * and the AR approximation proposed by De Loof 4 is That is to say that using formula 5, the AR of is given by the number of elements in its downset and the sum of probabilities of being a part of 's downset for all incomparable elements with respect to , using the approximation of the mutual rank probabilities. Following the toy example in Table 1, the steps needed to approximate the AR with the De Loof 4 approach are solved in Table 2, including the estimation of the AR.  In the present work, the approximated AR was computed using the R software, with an R function, called deloof, proposed by Caperna 10,11 that can cope with large datasets 12,13 .

C -ATENC ESTIMATES AND CORRESPONDENT P-VALUES FOR THREE GEOGRAPHICAL PARTITIONS (10 DISTRICTS, 23 AREAS AND 70 ZONES)
In the table below, we report the ATENC estimates and correspondent p-values for three geographical partitions (10 districts 3, 23 areas 4 and 70 zones 5). Moreover, asterisks indicate whether a False Discovery Rate below certain thresholds (1%, 5%, 10%) is guaranteed or not according to the Benjamini-Hochberg procedure 14 .