Convex transform order of Beta distributions with some consequences

The convex transform order is one way to make precise comparison between the skewness of probability distributions on the real line. We establish a simple and complete characterization of when one Beta distribution is smaller than another according to the convex transform order. As an application, we derive monotonicity properties for the probability of Beta distributed variables exceeding the mean or mode of their distribution. Moreover, as a byproduct, we obtain a simple alternative proof of the mode-median-mean inequality for unimodal distributions that are skewed in a sense made precise by the convex transform order. This new proof also gives an analogous inequality for the anti-mode of distributions that have a unique anti-mode. Such inequalities for Beta distributions follow as special cases. Finally, some consequences for the values of distribution functions of Binomial distributions near to their means are mentioned.


Introduction
How to order probability distributions according to criteria that have interpretable probabilistic consequences is a common question in probability theory. Naturally there will exist many different order relations, each one highlighting a particular aspect of the distributions. Classical examples are given by orderings that capture size and dispersion. In reliability theory some ordering criteria are of interest when dealing with ageing problems. These help decide, for example, which lifetime distributions exhibit faster ageing. An account of different orderings, their properties, and basic relationships may be found in the monographs of Marshal and Olkin [13] or Shaked and Shanthikumar [19].
In this paper we shall be interested primarily in two such orderings, known in the literature as the convex transform and the star-shape transform orders. These orders are defined by the convexity or star-shapedness of a certain mapping that transforms one distribution into another. The convex transform order was introduced by van Zwet [22] with the aim of comparing skewness properties of distributions. Oja [16] suggests that any measure of skewness should be compatible with the convex transform order, and that many such measures indeed are. Hence, this ordering gives a convenient formalisation of what it means to compare distributions according to skewness.
With respect to the ageing interpretation, the convex transform order may be seen as identifying ageing rates even in the case of lifetimes that did not start simultaneously, while the star-shape order requires the same starting point for the distributions under comparison [15], as described in Nanda et al. [15].
Establishing that one distribution is smaller than another is often difficult and tends to rely on being able to control the number of crossing points between various affine transforms of distributions functions. Based on such techniques explicit characterisations of the ordering relationships within the Gamma and the Weibull families are given in Arab and Oliveira [2,3] and Arab et al. [4].
This family of Beta distributions is a two-parameter family of distributions supported on the unit interval. It appears in various context, for example in the study of order statistics and in Bayesian statistics as a conjugate prior for a variety of distributions arising from Bernoulli trials.
The main contribution of this paper is to characterise when one Beta distribution is smaller than another according to the convex-and star-shaped transform orders. This characterisation implies various monotonicity properties for the probabilities of Beta distributed random variables exceeding the mean or mode of their distribution. Using this allows one to derive, in some cases, simple bounds for such probabilities. These differ from concentration inequalities such as Markov's inequality or Hoeffding's inequality in that they study the probability of exceeding, without necessarily significantly deviating from, the mean or the mode.
A well known connection between Beta and the Binomial distributions allows us to translate these results into similar monotonicity properties for the family of Binomial distributions. Such probabilities of exceeding means have received attention in the context of studying properties of randomised algorithms, see for example Karppa et al. [12], Becchetti et al. [5], or Mitzenmacher and Morgan [14]. They have also found applications in dealing with specific aspects in machine learning problems, such as in Doerr [8], Greenberg and Mohri [9], with sequels in Pelekis [18] and Pelekis and Ramon [17], or Cortes et al. [7] for more general questions. Such an inequality for the Binomial random variables was also used by Wiklund [20] when studying the amount of information lost when resampling.
These properties also allow one to compare the relative location of the mode, median, and mean of certain distributions that are skewed in a sense made precise by the convex transform order. Such mode-median-mean inequalities are a classical subject in probability theory. While our condition for these inequalities to hold has previously been suggested by van Zwet [23], our proof appears novel. The proof also allows us to establish a similar inequality for absolutely continuous distributions with unique anti-modes, meaning distributions having densities with a unique minimum. For an account of the field we refer the interested reader to van Zwet [23] or, for more recent references to Abadir [1] or Zheng et al. [21]. This paper is organised as follows. In Section 2 we define the relevant concepts and definitions. The main results, characterising the order relationships within the Beta family is presented in Section 3. Consequences are discussed in Section 4, while proofs of the main results are presented in Section 5. Some auxiliary results concerning the main tools of analysis are given in the Appendix A.

Preliminaries
In this section we present the basic notions necessary for understanding the main contributions of the paper.
Let us first recall the classical notion of convexity on the real numbers.
We will also need the somewhat less well known notion of star-shapedness of a function on the real numbers.
Star-shapedness could generally be defined on general intervals with respect to an arbitrary reference point. For our purposes it suffices to consider functions on the nonnegative half-line, star-shaped relative to the origin.
It is immediate that a convex f : I → R on an initial segment of the non-negative half-line that satisfies f (0) ≤ 0 is star-shaped. Moreover, f is star-shaped if and only if x is increasing in x ∈ I. We refer the reader to Barlow et al. [6] for some more general properties and relations between these types of functions.
Our main concern in this paper is to establish certain orderings of the family of Beta distributions that are defined in [0, 1]. Definition 3 (Beta distribution). The Beta distribution Beta(a, b) with parameters a, b > 0 is a distribution supported on the unit interval and defined by the density given for We shall be interested in ordering with respect to orderings determined by the convexity or star-shapedness of a certain mapping. Primarily the following order due to van Zwet [22]. In order to avoid working with generalised inverses, we restrict ourselves to distributions supported on intervals.
Definition 4 (Convex Transform Order ≤ c ). Let P and Q be two probability distributions on the real line supported by the intervals I and J, with strictly increasing distribution functions F : I → [0, 1] and G : J → [0, 1], respectively. We say that P ≤ c Q or, equivalently, F ≤ c G, if the mapping x → G −1 (F (x)) is convex. Moreover, if X ∼ P and Y ∼ Q, we will also write X ≤ c Y when P ≤ c Q.
It is immediate from the definition that if X ∼ P and Y ∼ Q then both X ≤ c Y and Y ≤ c X if and only if there exist some a > 0 and b ∈ R such that X has the same distribution as aY + b. In other words, the convex transform order is invariant under orientation preserving affine transforms.
Although this order relation is popular in reliability theory, the convex transform order was first introduced by van Zwet [22] to compare the shape of distributions with respect to skewness properties. The idea is roughly as follows. Let X and Y be random variables having, say, absolutely continuous distributions given by distribution functions F and G, respectively. Then G −1 (F (X)) has the same law as Y . Convexity of x → G −1 (F (x)) implies that the transformed distribution tends to be spread out on the right tail while being compressing on the left tail. In other words, Y will have a distribution more skewed to the right. Indeed, if ψ is an increasing function then X ≤ c ψ(X) if and only if ψ is convex.
In the reliability literature the convex transform ordering is known as the increasing failure rate (ifr) order. Indeed, assuming that F and G are absolutely continuous distribution functions with derivatives f and g and failure rates r F = f /(1 − F ) and The second order of interest is defined analogously to the convex transform order, but now with respect to star-shapedness.
Definition 5 (Star-shaped order ≤ * ). Let P and Q be two probability distributions on the real line supported by the intervals I = [0, a] and J = [0, b], for some a, b > 0, with strictly increasing distributions functions F : I → [0, 1] and G : J → [0, 1], respectively. We say that P ≤ * Q or, equivalently, F ≤ * G, if the mapping x → G −1 (F (x)) is star-shaped. Moreover, if X ∼ P and Y ∼ Q, we will also write X ≤ * Y when P ≤ * Q.
If X ∼ P and Y ∼ Q for appropriate P and Q then X ≤ * Y and Y ≤ * X if and only if there exists an a > 0 such that X has the same distribution as aY .
The star transform order can be interpreted in terms of the average failure rate, which is why it is sometimes known as increasing failure rate in average (ifra) order. In fact, where r F (x) and r G (x) are known as the failure rates in average of F and G, respectively, and are defined by While the star-shaped order is strictly weaker than the convex transform order for distributions having support with a lower end-point at 0, such as the Beta distributions, it is of some independent interest and a useful intermediate step in proving ordering according to the convex transform order.
The stochastic dominance order is also known also as first stochastic dominance (fsd) in reliability theory, and captures the notion of one distribution attaining larger values than the other. It is generally easier to verify than the convex transform order or starshaped order and will serve here primarily to establish necessity of the sufficient conditions for convex transform ordering between two Beta distributions. Definition 6 (Stochastic dominance ≤ st ). Let P and Q be two probability distributions on the real line with distributions functions F : R → [0, 1] and G : R → [0, 1], respectively. We say that P ≤ st Q or, equivalently,

Main results
The main results of this paper describe the stochastic dominance, star-shape transform order, and convex transform order relationships within the family of Beta distributions. The proofs are postponed until Section 5.
The stochastic dominance relationships within the family of Beta distributions are known and fairly straightforward to establish. In this paper they will serve to establish necessity of the sufficient conditions for being ordered according to the convex-or starshaped transform order.
The star-shape ordering relationships within the family of Beta distributions has been addressed previously by Jeon et al. [11,Example 4], but only for the case of integer valued parameters that satisfy certain conditions. Here we extends this to a complete classification.
As mentioned above, the star-shape transform order is of some independent interest and it serves as an intermediate step when establishing the ordering according to the convex transform order. It turns out to be the case that two Beta distributions are ordered according to the convex transform order if and only if they are ordered according to the star-shaped order.

Some consequences of the main results
A first simple result follows from the invariance of the convex ordering under affine transformations. Recall that the family of Gamma distribution with parameters α, θ > 0, denoted Gamma(α, θ), is defined by the density functions given for x ≥ 0 by It is easily seen that if X b ∼ Beta(a, b) for a, b > 0, then, leaving a fixed, the distributions of bX b converge weakly to Gamma(a, 1) as b tends to +∞. The following proposition is an immediate consequence of the transitivity of the transform orders and Theorems 8 and 9.

Probabilities of exceedance
It was noted already by van Zwet [22] that the probabilities of random variables being greater than (or smaller than) their expected values is monotone with respect to convex transform ordering of their distributions. This is essentially a immediate consequence of Jensen's inequality. The idea generalises directly to any functional that satisfies a Jensentype inequality.
Theorem 11. For any interval I, measurable function h : I → R and X ∼ P with P supported in I denote the distribution of h(X) by P h . Let F be a set of continuous probability distributions on intervals in R and T : F → R a functional satisfying for all P ∈ F and h convex and increasing with If T satisfies instead h(T (P )) ≥ T (P h ) then, under the same assumptions on X and Y , the conclusion becomes P(X ≥ T (P )) ≤ P(Y ≥ T (Q)).
Proof. Assume T satisfies the first inequality, h(T (P )) ≤ T (P h ). Let F and G be the distribution functions of X and Y , respectively, and The second statement, for T satisfying h(T (P )) ≤ T (P h ), follows by reproducing the same argument with the inequality reversed.
Note now that the standard Jensen inequality implies that we may take as T in Theorem 11 the expectation operator T (P ) = E(X) for X ∼ P . Hence the following is immediate.
As a direct consequence of Theorem 9 and Corollary 12, we have the following monotonicity properties of Beta distributed random variables exceeding their expectation.
This provides immediate bounds for the probabilities of Beta distributed random variables exceeding their expectation.
Proof. Compute P(X a,b ≥ E(X a,b )) for a = 1 or b = 1, use the monotonicity given in Corollary 13, and, finally allow a, b → +∞ to find both numerical bounds.
Using Theorem 11 we may prove similar monotonicity properties for the probabilities of exceeding modes or anti-modes. Recall that an absolutely continuous distribution is unimodal if it has a continuous density with a unique maximum and uniantimodal if it has a continuous density with a unique minimum.
Corollary 15. Let X ∼ P and Y ∼ Q be two real valued random variables with absolutely continuous distributions P and Q supported on some intervals I and J and such that X ≤ c Y .
If P and Q are unimodal with modes mode(X) and mode(Y ), respectively, then P(X ≥ mode(X)) ≤ P(Y ≥ mode(Y )).
Proof. We prove only the result for modes, the statement about anti-modes being proved analogously. To prove the statement we shall rely on Theorem 11.
Define F as the set of absolutely continuous unimodal distributions supported in some interval in R and T : F → R the functional defined by T (P ) being equal to the unique mode of P , for every P ∈ F. By Theorem 11 it suffices the prove that T satisfies h(T (P )) ≤ T (P h ), for every P ∈ F, and h convex and increasing such that P h ∈ F. For this purpose, choose f a continuous and unimodal version of the density of P , and denote, for notational simplicity, the unique mode by m. Simple computation yields that g( ) is a density for P h . Since P h has some continuous density with a unique mode and h is increasing and convex, g must be a such a density. Denote the mode T (P h ) by m h .
Since m is a mode of P it follows that f (m) ≥ f (h −1 (m ′ )) and, by the unimodality of P h , it follows that The conclusion now follows immediately from Theorem 11.
Similarly to Corollary 13, the previous result implies monotonicity properties for the probability of exceeding the mode or anti-mode for Beta distributions. For that, we must restrict ourselves to parameters a and b such that Beta(a, b) actually has a unique mode or anti-mode, that is to say, when a, b > 1 or a, b < 1, respectively. In either case the mode or anti-mode, respectively, is (a − 1)/(a + b − 2).
If a, b < 1 let anti-mode(X a,b ) = (a−1)/(a+b−2), then the mapping (a, b) → P(X a,b > anti-mode(X a,b )) is increasing in a and decreasing in b.
Recall that B ∼ Bin(n, p) if P(B = k) = n k p k (1 − p) n−k or equivalently if it is a sum of n independent and equally distributed indicators that are equal to 1 with probability p. Using a link between the Beta and the Binomial distributions allows to prove some monotonicity properties for the probabilities that a Binomial variables exceeds certain values close to its mean. As noted in the Introduction, the quantity P(B n,p ≤ np), where B n,p ∼ Bin(n, p) has garnered some interest recently. The map p → P(B n,p ≤ np) is not monotone even when restricting to p = 0, 1 n , . . . , 1 − 1 n , 1, where np is an integer. Using our results we prove that slightly changing np renders monotonicity. Proof. For each a, b > 0 let X a,b ∼ Beta(a, b). It is well known that P(X k+1,n−k ≥ p) = P(B n,p ≤ k), for k = 0, . . . , n. The equality can for example be established by repeated integration by parts. As the distribution of X k+1,n−k has mean (k + 1)/(n + 1) and mode k/(n − 1), it follows from Corollaries 13 and 16, that k → P(B n, k+1 n+1 ≥ k) is decreasing and k → P(B n, k n−1 ≥ k) is increasing. Reparameterising in terms of p yields k = np + p − 1 and k = np − p, so the result follows.

(Anti)mode-median-mean inequalities
If X a,b ∼ Beta(a, b) then the random variable 1−X a,b is distributed according to Beta(b, a). As the convex transform order is invariant with respect to translations, Theorem 9 implies that when a ≤ b we have that −X a,b ≤ c X a,b . Since the convex transform order orders only the underlying distribution the following definition due to van Zwet [23] is justified.
Definition 18 (Positive/negative skew). Let P be a probability distribution and X ∼ P a random variable with distribution P . We say that P is positively skewed if −X ≤ c X and that P is negatively skewed if X ≤ c −X.
Thus, according to this definition, the Beta distributions have positive skew when a ≤ b and negative skew when a ≥ b.
As noted by van Zwet [23] Definition 18 provides an intuitive condition for inequalities between the mode, median and mean to hold. We give an alternative proof of this fact, based on the results in the previous section. This alternative proof yields a similar inequality for the anti-mode.
Theorem 19. Let P be a positively skewed distribution.
If P is unimodal with mode m 0 , then there exists a median m 1 of P such that m 0 ≤ m 1 . If P has finite mean m 2 , then there exists a median m 1 of P such that m 1 ≤ m 2 .
If P is uniantimodal with anti-mode m 3 , then there exists a median m 1 of P such that m 1 ≤ m 3 .
Proof. We prove only the first statement as the remaining ones are proved analogously. Let X be a random variable with distribution P and m 0 the mode of P . Then m 1 = sup{m | P(X ≤ m) ≤ 1/2} is a median of P . Since P is positively skewed it follows by Corollary 15 For the second statement apply Corollary 12 instead of Corollary 15.
Having a median lying between the mode and mean is usually called satisfying the mode-median-mean inequality. Analogously we will say that a distribution satisfies the median-anti-mode inequality if it has a median smaller than its anti-mode.
As already noted before, when a ≤ b, the distribution Beta(a, b) is positively skewed. The following slight generalisation of the known result concerning the ordering of the mode, median, and mean of the Beta distribution is now immediate. Beta(a, b) satisfies the mode-median-mean inequality. If a ≤ b ≤ 1 then Beta(a, b) satisfies the median-mean and median-anti-mode inequalities.

Proofs
This section collects all the proofs related to establishing Theorems 7, 8 and 9, stated in Section 3. To improve the readability, results relevant to each Theorem are presented in a separate subsection.
Most of the proofs rely on keeping track of sign changes of various functions. Throughout S(x ∈ I → f (x)) = S(x → f (x)) = S(f (x)) = S(f ) ∈ S = {0, -, +, -+, +-, . . . } denotes the sequence of signs of a function f : I → R. Formal definitions, notation, and standard results concerning sign patterns can be found in Appendix A.
The following technical lemma summarises the basic strategy used throughout the proofs of the main results in the upcoming section Lemma 21. For a, b, a ′ , b ′ , c > 0 and d < 1 denote by F and G the distribution functions of Beta(a, b) and Beta(a ′ , b ′ ) and ℓ(x) = cx + d.
≤ σ 1 · σ 2 · S(x ∈ I → p 4 (x)), where and ). By construction the first and third terms are just a single sign that coincides with the first and final sign of S(x ∈ I → F (x) − G(ℓ(x))) and can hence be dropped. This proves (2).

Stochastic dominance ordering
Before actually proving Theorem 7, we shall prove that the stochastic dominance is a necessary condition for ordering compactly supported distributions with respect to the star-shape transform or the convex transform orders. Although the result concerning the stochastic dominance is well established, we present a proof using sign patterns.
A first result concerns a simple relation between the star-shaped transform ordering and the stochastic dominance order.

Proposition 22. Let X ∼ P and Y ∼ Q be random variables with distributions P and Q supported on
Proof. Let F and G be the distribution functions of X and Y , respectively. As G −1 (F (x))/x is increasing, it follows that G −1 (F (x))/x ≤ G −1 (F (1)) = 1, thus G −1 (F (x)) ≤ x and Since the convex transform order implies the star-shape transform order, the following is immediate. Corollary 23. Let X ∼ P and Y ∼ Q be random variables with distributions P and Q supported on [0,1].
In the above statement the use of the unit interval is for notational convenience. Using invariance under orientation preserving affine transformations the statement generalises to distributions on any bounded interval.
Using the above we may now establish necessary conditions for one Beta distribution to be smaller than another according to convex-or star-shaped transform orders. We do this by characterising when one is smaller than the other according to stochastic dominance.
The proof is elementary, but since it illustrates well the style of the upcoming proofs we formulate it in terms of an analysis of sign patterns.
Proof of Theorem 7. Let F , G, f , and g be the distribution and density functions of Beta(a, b) and Beta(a ′ , b ′ ). Denote H(x) = F (x) − G(x). We need to prove that S(H) =if and only if a ≥ a ′ and b ≤ b ′ . We have Since the case a = a ′ and b = b ′ is trivial, we may assume H is not constant 0 and so, since  Beta(a, b) and that ≤ st is a partial order covers the remaining cases.

Star-shape ordering
We now prove Theorem 8, showing that, apart reversing the order direction, we find the same parameter characterisations as for the stochastic dominance.
Proof of Theorem 8. The necessity follows from Proposition 22 and Theorem 7. As for the sufficiency, it is enough to prove the statement when a > a ′ , b = b ′ and when a = a ′ , b < b ′ . The general statement then follows by transitivity since then Beta(a, b) ≤ * Beta(a, b ′ ) ≤ * Beta(a ′ , b ′ ). Moreover, we may assume that either b ≤ b ′ ≤ 1 or 1 ≤ b ≤ b ′ . Since the remaining case, b ≤ 1 ≤ b ′ , follows again by transitivity.
Let F and G be the distribution functions of Beta(a, b) and Beta(a ′ , b ′ ), respectively, with f and g the corresponding density functions as in (1). By Proposition 38 we need to prove that for every c > 0 As the assumptions on the parameters are the same as in Theorem 7, it follows that G −1 (F (x)) ≤ x, meaning (7) is trivially satisfied when c ≥ 1. Moreover, both G −1 and F are increasing, so (7) is again trivial for c ≤ 0. It is therefore enough to consider c ∈ (0, 1). The conclusion follows by analysing three different cases. (6) gives Since This concludes the proof.
As will become apparent in the next section, this characterisation of the star-shape transform ordering is an essential first step towards proving the corresponding statement for the convex transform order.

Convex transform ordering
To characterise how the Beta distributions are ordered according to the convex transform order we will apply a strategy similar to the one used in previous sections. According to Proposition 38 we need to prove that for a ≥ a ′ and b ≤ b ′ the distribution functions F and G of Beta(a, b) and Beta(a ′ , b ′ ), respectively, satisfy for every affine function ℓ with positive slope. First we need an auxiliary result, generalising Theorem 6.1 in [4], which corresponds to taking x 0 = y 0 = 0 = inf I in the statement below.
The analogous conclusion holds considering the sign pattern +-and taking x 0 ≥ sup I satisfyingl(x 0 ) ≤ y 0 . Proof. The proof is not too difficult. The main idea is given graphically in Figure 1.
The above statement may be combined with the characterisation of how Beta distributions are ordered according to the star-shaped transform order that was established in the previous section. Doing so allows us to immediately take care of a number of affine ℓ in (8). Beta(a, b) and Beta(a ′ , b ′ ), respectively, and assume that a ≥ a ′ and b ≤ b ′ . If ℓ is any affine function satisfying ℓ(0) ≥ 0 or ℓ(1) ∈ (0, 1) then (8) is satisfied.

Corollary 25. Let F and G be the distribution functions of
Proof. We analyze three different cases.
For any c ∈ R we may apply this to c ′ = 1/c, which gives The proof of Theorem 9, establishing the convex transform ordering within the Beta family is achieved through the analysis of several partial cases. For improved readability we will be presenting these in several lemmas.
Proof. Let F and G be the distribution functions of the Beta(a, b) and Beta(1, b) distributions, respectively, and f and g their densities. Taking into account Proposition 38 and Corollary 25, we need to show that, for every affine function ℓ(x) = cx + d satisfying ℓ(0) = d < 0 and ℓ(1) = c + d ∈ (0, 1) one has that (8) is satisfied. We need to separate the arguments into three cases.  (5) gives we have S(F (x) − G(ℓ(x))) ≤ + · σ 2 · + ≤ +-+ no matter the value of σ 2 .
The second lemma is similar in that but covers the case where a ≥ 1.
Proof. Let F and G represent the distribution functions of Beta(1, b) and Beta(a, b), Note that the meaning of the symbols F and G are interchanged relative to their use in the proof of Lemma 26. Taking into account Proposition 38 and Corollary 25, we need to show that (8) holds for ℓ(x) = cx + d such that ℓ(0) = d < 0 and ℓ(1) = c + d ∈ (0, 1). This is equivalent to Reversing the roles of F and G the proof is now analogous to that of Lemma 26 except that a < 1 and we wish to establish S(x ∈ I → G(x)−F (ℓ * (x))) ≤ -+-for ℓ * (x) = c * x+d * with ℓ * (0) = d * ∈ (0, 1) and ℓ * (1) = c * + d * > 1 on the interval I = (0, (1 − d * )/c * ).
Comparing the distributions for more general pairs of parameters a and a ′ requires separate analyses depending on whether b > 1 or b ∈ (0, 1).
Proof. We may assume, without loss of generality, that a−a ′ < 1. Indeed, if a−a ′ ≥ 1, one may choose for sufficiently large N a sequence a 0 = a, a 1 , . . . , a N = a ′ such that a i 1 −a i < 1 for all i = 1, . . . , N and apply transitivity to conclude Beta(a 0 , b) ≤ c · · · ≤ c Beta( a N , b).
Let F and G be the distribution functions of Beta(a, b) and Beta(a ′ , b), respectively. Based on Proposition 38 and Corollary 25, it is enough to prove that (8) holds for every ℓ(x) = cx + d such that ℓ(0) = d < 0 and ℓ(1) = c + d ∈ (0, 1). Using Lemma 21 we have for For Restricting to I we have that q 2 is decreasing and concave. Hence, as b ≤ 1, it follows that x ∈ I → −Cq 2 (x) 1−b is non-decreasing and convex. A simple computation yields q ′ 1 (x) = ((a − a ′ )cx + (a − 1)d)/(x 2−a ℓ(x) a ′ ) which has unique root at Case 2. a > a ′ > 1: A direct verification shows that x 0 ∈ I so that I 1 = (−d/c, x 0 ] and I 2 = (x 0 , 1) are well defined and non-empty. Since I = I 1 ∪ I 2 and I 1 < I 2 S(x ∈ I → q(x)) = S(x ∈ I 1 → q(x)) · S(x ∈ I 2 → q(x)).
We now state, without proof, a straightforward result, helpful for the conclusion of the final characterisation within the Beta family.
Proposition 30. Let X ∼ P and Y ∼ Q be random variables with some distributions P and Q, then X ≤ c Y if and only if 1 − Y ≤ c 1 − X.
We now have all the necessary ingredients to prove the main theorem.
Proof of Theorem 9. The necessity is a direct consequence of Theorem 7. The sufficiency follows from Lemmas 26, 27, 28, 29, and the transitivity of the convex transform order. First note that we obtain Beta(a, b) ≤ c Beta(a ′ , b), when a = a ′ (trivial), b > 1 (use Lemma 28), b ≤ 1 and either 1 > a > a ′ or a > a ′ > 1 (use Lemma 29). The order relation (10) also holds if b ≤ 1 and a > 1 > a ′ by combining Lemmas 26 and 27, since then Beta(a, b) ≤ c Beta(1, b) ≤ c Beta(a ′ , b). Using this and Proposition 30 we also have Beta(a ′ , b) ≤ c Beta(a ′ , b ′ ), concluding the proof.

A An algebra for sign variation
The main tool of all proofs concerning the ordering within the Beta family is the study of sign patterns of functions. While such techniques have a long tradition in probability theory, for our purposes it turns out to be computationally convenient to give a presentation slightly more algebraic as compared to what appears to be the convention, using a suitable monoid (see, for example, Jacobson [10]).
We can now describe the sign variations of a function in terms of the simple sign function.
Definition 35 (Sign patterns and finite sign variation). Given I ⊆ R, we say that a function f : I → R is of finite sign variation if the set {Sign(f (x 1 )) · Sign(f (x 2 )) · · · · · Sign(f (x n )) | n ∈ N, x 1 ≤ · · · ≤ x n ∈ I} has a (unique) maximal element in S. This maximal element is then denoted by S(x ∈ I → f (x)) and called the sign pattern of f . When unambiguous, we will abbreviate S(x ∈ I → f (x)) = S(x → f (x)) = S(f (x)) = S(f ) and write for readability S(f ) = S(f ).
The proposition below gives some standard rules of calculation for sign patterns which are straightforward to prove and used without explicit mention throughout the proofs.
Proposition 36. Let I ⊂ R and f, g : I → R be such that f and f − g are of finite sign variation.
2. For any J ≤ K such that I = J ∪ K one has S(x ∈ I → f (x)) = S(x ∈ J → f (x)) · S(x ∈ K → f (x)).