A new look at conditional probability with belief functions

We discuss repeatable experiments about which various agents may have different information. This information can vary from a full probabilistic description of the experiment in the sense that the probabilities of all outcomes are known to the agent, to having no information whatsoever, except the collection of possible outcomes. We argue that belief functions are very suitable for modeling the type of information we have in mind. We redevelop and rederive various notions of conditional belief functions, using a viewpoint of relative frequencies. We call the two main forms of conditioning contingent and necessary conditioning, respectively. The former is used when the conditioning event may also have not occurred, whereas the latter is used when it turns out that the event on which we condition occurs necessarily. Our approach unifies various notions in the literature into one conceptual framework, namely, the updated belief functions of Fagin and Halpern, the unnormalized conditional belief function of Smets, and the notions of updating and focusing as used by Dubois and Prade. We show that the original Dempster–Shafer definition of conditional belief functions cannot be interpreted directly in our framework. We give a number of examples illustrating our interpretation, as well as the differences between the various notions of conditioning.


MODELING PARTIAL INFORMATION
In this article, we are concerned with (in principle) repeatable experiments about which various agents may have different information. This information can vary from a full probabilistic description of the experiment in the sense that the probabilities of all outcomes are known to the agent, to having no information whatsoever, except the collection of possible outcomes. Between these two extremes, we have situations in which agents have partial information in a specific form. To illustrate what kind of situations we have in mind, here is a first introductory example.
Example 1. Suppose that an experiment results in a number between 1 and 10, inclusive. If the actual choice is uniformly at random, then a fully informed agent will assign probability 1/10 to each of the 10 possible outcomes.
However, an agent who is only told that the probability of an even number is 1/2 and that the probability of an odd number is also 1/2 has partial information about the experiment. This agent would assign probability 1/2 to the set {1, 3, 5, 7, 9}) and probability 1/2 to the set {2, 4, 6, 8, 10}.
The most extreme situation is the one in which an agent receives no information whatsoever about how the choice of the number is made. For this agent, the only meaningful thing is to assign probability 1 to the full set of outcomes 1, 2, … , 10.
How would each of the agents in Example 1 assess the probability that the experiment results in a prime? When we think of probabilities as describing relative frequencies, then the first agent knows that when the experiment is repeated many times, roughly 4 out of 10 times the outcome will be prime.
As far as the second agent is concerned, for all he knows, the actual experiment could possibly consist of drawing a 9 with probability 1/2, and drawing a 6 with probability 1/2. Such an experiment would be consistent with his knowledge but would never result in a prime. Obviously, the third agent knows even less.
Therefore, depending on the information of the various agents, they draw different conclusions about the event of seeing a prime. If they were to repeatedly buy a bet on the event that the result is prime, and the bet would pay out 1 if the actual result is in fact prime, then the first agent may reasonably be willing to pay up to 4/10 for such a bet. For the second and third agents, however, it would not be an unreasonable position to decide not to pay anything at all for such a bet. Indeed, their knowledge is consistent with an experiment in which a prime never shows up and betting any positive amount could lead to losing a lot of money when done repeatedly. In other words, their knowledge is incomplete to the effect that a betting a prime might lead to a sure loss. Although of course not forced, this means that not betting is a reasonable position. Here is a second example.
Example 2. Consider a situation in which we have two coins and a die. Someone rolls the die and follows the following instruction. If he rolls 1 or 2, he puts both coins with heads facing up. If he rolls 3 or 4, he puts both coins with tails facing up. If he rolls 5 or 6, we have no information about what the agent does. In particular, we do not even know whether in this case the outcome can be described by a probability distribution at all.
How can be describe our partial knowledge of this experiment? Let where the first and second coordinates represent respectively the side of the first and second coins facing up (where "h" stands for heads and "t" stands for tails). The information we have can be summarized by assigning probability 1/3 to each of the sets {(h, h)}, {(t, t)}, and Ω. This assignment precisely captures the idea that we know that with probability 1/3, two heads will be up; with probability 1/3, two tails will be up; and with probability 1/3, we have no information whatsoever. Note that we do not assume that the process that leads to the position of the coins can be completely described by some underlying probability measure. We have no idea what happens when the die shows 5 or 6, and for all we know, the outcome may be deterministic in that case or may depend on nonrandom things we are not aware of. Assigning mass 1/3 to Ω simply represents total ignorance about what happens when the die shows 5 or 6.
Situations as described in Examples 1 and 2 can be conveniently modeled with belief functions which we introduce now. The following definitions go back to Shafer (1976). We start with a mapping on the subsets of a finite outcome space Ω.
The mapping m naturally defines a probability measure P on the space 2 2 Ω via for any collection  of subsets of Ω. Note that P({C}) = m(C) for all C ⊆ Ω. Shafer defines the belief in A ⊂ Ω as the probability that when we choose a subset according to m, the outcome implies A: is called the belief function corresponding to the basic belief assignment m.
It is well known (see Shafer, 1976) for a proof) that m can be retrieved from Bel via where | · | denotes cardinality. Note that Bel is a probability distribution if and only if m concentrates on atomic events.
Definition 3. The plausibility of an event A is defined as The connection between these definitions and the examples above is as follows. Each agent in Example 1 has its own m, and therefore, also his own belief function. The m corresponding to the first agent is just the uniform probability measure on the outcome space. The m of the second agent is given by m({1, 3, 5, 7, 9}) = m({2, 4, 6, 8, 10}) = 1∕2 and m(A) = 0 for all other subsets of the outcome space. The third agent corresponds to m, which puts all probability mass on the set Ω itself. This latter belief assignment corresponds to having no information apart from the relevant outcome space.
How should we now interpret Bel( A) and Pl( A)? One interpretation of the belief in A is the probability that A is implied by the outcome of an experiment. This interpretation goes back to Pearl (1990), who calls this "the probability of provability." An application where this interpretation is used is Kerkvliet and Meester (2016), in a forensic context. Dubois and Prade (1992) offer, among others, the following interpretation. (We briefly comment on their other interpretations in the last section.) According to them, any random experiment whose outcomes cannot be precisely observed lends itself to a treatment by belief functions. They see m(S ) as the frequency of making observation S when the experiment is repeated, meaning the frequency that we learn precisely that the truth is in S, and nothing else.
This idea can be formalized as follows. Consider a random vector (X, Y ), where X takes values in Ω and Y takes values in the subsets of Ω. The joint distribution Q of (X, Y ) must be so that Q(X ∈ Y ) = 1. We assume that we know the marginal distribution of Y, namely, the one given by m. What can we say about the marginal distribution of X ? Clearly, the fact that we assume X ∈ Y implies that so that, writing Bel for the belief function corresponding to m, we have In Fagin and Halpern (1989), the collection of all probability distributions Q on Ω for which Bel(A) ≤ Q(A) ≤ Pl(A) for all A ⊆ Ω is called the collection of consistent probability distributions. The belief in an event A in this context boils down to the minimum probability of A within that collection (called a lower probability or a lower envelope). Indeed, it is not hard to see that for any A ⊂ Ω, there is a consistent probability measure for which the lower bound in (6) is an equality and similarly for the upper bound. A useful way to think about the relation between the basic belief assignment m and the consistent probability distributions is to think of m(A) as the probability mass that is confined to A, which however can be distributed over the elements of A in any desired way.
Although the position in Dubois and Prade (1992) and Fagin and Halpern (1989) is elegant and useful, we prefer not to make the assumption that the pair (X, Y ) has a joint distribution. We prefer to only assume that there is an outcome (x, y) of the experiment, with x ∈ y and such that the second coordinate is described by a marginal distribution function determined by m. This really corresponds to having no information whatsoever about x other than that it is contained in y. Within this interpretation, Bel(A) can be interpreted in terms of a statement concerning relative frequencies, as we see below. We remark, though, that within the framework of consistent probability distributions that we described above, versions of all our theorems can also be formulated and proved, in very much the same way as we do.
Suppose we repeat the experiment many times, that is, we have (x 1 , C 1 ), (x 2 , C 2 ), … , where x i ∈ C i for all i and where the distribution of the C i is known and described by m. The C i are observed. The x i are not observed and we only know that x i ∈ C i for all i. For C = (C 1 , C 2 , C 3 , … ), we write for the collection of all realizations compatible with observing C. We write P ∞ for the infinite-dimensional product probability distribution capturing independent drawings of subsets of Ω according to P.
The following result expresses Bel(A) in terms of relative frequencies.
Although the x i are not observed, we have that the minimum limit inferior of the relative frequencies of A over all realizations compatible with our observations is precisely Bel(A). This may seem a somewhat overly complicated way to characterize Bel(A), but by using this formula, we prepare for similar expressions concerning conditional beliefs in the next section.
Moreover, the infimum is attained P ∞ , almost surely.
Proof. Let C = (C 1 , C 2 , C 3 , … ) be a sequence of subsets of Ω and let ∈ (C ) be chosen such that for all x ∈ (C ) and all n. Hence, where I i = 1 if C i ⊆ A and I i = 0, otherwise. The strong law of large numbers tells as that the last lim inf is in fact almost surely a limit and equal to ∑ S⊆A m(S) = Bel(A).
The type of information that can be described with belief functions in our interpretation seems rather natural and intuitive. The first goal of the present paper is to convince the reader that belief functions can be successfully used to model partial information of agents in many situations. To this end, we give more examples below and in Section 4.
The second and main goal of this paper is to introduce a new way of looking at conditional beliefs and to show that this new angle leads to satisfactory and understandable results in applications. In the literature, a number of suggestions for conditional (sometimes also called "updated") beliefs are developed. In his original book, Shafer (1976) arrived at a notion of conditioning as a special case of Dempster's rule of combination. In Fagin and Halpern (1989), another notion of conditional belief was developed, based on the interpretation that belief function express uncertainty about the underlying classical probability measure of an experiment. Dubois and Prade (1992) distinguish between "focusing" and "updating" and suggest different formulas for either type of conditioning; see also Jaffray (1990). Finally, there is Smets's unnormalized conditioning (see, e.g., Cuzzolin, 2016) developed in yet another context.
The main contribution of this paper is, in brief, as follows. We approach an conditional belief from frequentistic reasoning on the level of the basic belief assignment, and we place the various notions of conditional belief into one conceptual framework. To this end, we make a new caesura and distinguish between contingent and necessary conditioning. When an agent receives information upon which he wants to update his beliefs, it matters whether or not the information is about the outcome of an experiment or about the structure of the experiment itself. The former case leads to contingent conditioning, whereas the latter case leads to necessary conditioning.
Although we develop the various notions of conditioning independently, it turns out that contingent conditioning leads to the same formula as the updated belief functions in Fagin and Halpern (1989) and the "focusing" version of updating belief functions in Dubois and Prade (1992). The notion of necessary conditioning turns out to lead to a formula that is a special case of Smets's unnormalized conditioning. In our interpretation, there seems to be no place for the original Dempster-Shafer (DS) notion of conditional beliefs as developed in Shafer (1976), and we will explain why this is so.
The next well-known example shows that the distinction between necessary and contingent conditioning is meaningful.
Example 3 (The Monty Hall problem). Consider three boxes, numbered 1, 2, and 3. One box contains a prize; the other two are empty. The candidate chooses one of the boxes but does not open it. Monty Hall subsequently opens another box that is empty, and the candidate is offered the possibility to swap to the third (unopened) box. Should he do this? We can model this with a belief function as follows. Assume that the box that contains the prize is chosen according to a classical probability distribution so that box i is chosen with probability p(i), where p(1) + p(2) + p(3) = 1. Further suppose that the initial choice of the candidate is uniformly distributed; he simply makes a random guess. Set Ω = X 3 where X = {1, 2, 3} and let P denote the box containing the prize, S the box initially chosen, and O the box opened by Monty Hall. The coordinates of = ( 1 , 2 , 3 ) ∈ Ω correspond to P, S, and O, in that order. Now, we set the basic belief assignment from the viewpoint of the candidate, as follows: for all x, y ∈ X. Indeed, the position of the prize and the choice of the candidate are independent, and all we know at this point is that Monty Hall will open a box that was not chosen by the candidate. It is now easy to compute that Bel(P = S ) = 1∕3, as it should. Next, we want to incorporate the fact that Monty Hall opens a box that does not contain the prize. Hence, we want to condition on the event H = {O ≠ P}, and we are interested in the conditional belief in the event that S = P, conditioned on H. The key question now is whether or not Monty Hall knows the position of the prize and hence necessarily opens an empty box or that he is ignorant and happens to open an empty box. We will see below that the difference is significant and leads to different answers. Necessary conditioning leads to the classical answer, but contingent conditioning does not.
At the end of this introduction, we position our article in the context of epistemic interpretations of probability, that is, uncertainty having to do with partial knowledge rather than with objective facts. Epistemic probability cannot always be satisfactorily described with the classical axioms of Kolmogorov, something which has been recognized and confirmed by many researchers from such different disciplines as mathematics (Cohen, 1977;Marinacci, 1999;Royall, 1997;Shafer, 1976Shafer, , 1981Shafer, , 2008Walley, 1991;Walley & Fine, 1982), legal science (Cohen, 1977), and philosophy (see Dietz, 2010;Fagin & Halpern, 1989;Haenni, 2009;Walley, 1991, and the references therein). These authors have argued that the classical axioms of probability are too restrictive, for various reasons. From Example 1, we already see that rational betting behavior cannot always be captured by a classical probability measure. Indeed, the belief functions of the second and third agents are no probability measures.
In the literature, several arguments are given why epistemic probability does not always comply to the usual axioms. To name just two of them: (1) It is impossible for an agent to distinguish between disbelief in A, by which we mean P(A c ), and lack of belief, by which we mean 1 − P(A). Lack of supporting evidence for A should not necessarily be supporting evidence for A c ; (2) It is impossible to model complete ignorance with classical probability. Often, one resorts to uniform prior probabilities, but this habit is not well founded (see, e.g., Royall, 1997;Shafer, 1976). Ignorance cannot be modeled with a uniform distribution.
The belief functions as originally developed by Shafer (1976) are one out of several attempts to develop a notion of probability or belief, which better suits the needs for epistemic interpretations of probability. Other approaches can be found in Cohen (1977) and Walley (1991). The approach of Walley (1991) is in fact related to belief functions, as follows: We interpret belief functions as the amount of money an agent wishes to maximally pay for a bet on an event A. In Walley's formulation, these betting amounts are formalized into his lower provisions. We have shown elsewhere (Kerkvliet & Meester, 2016) that a belief function is in fact characterized as a lower provision with one extra weak requirement of consistency.
In the next section, we develop contingent conditioning. In Section 3, we develop necessary conditioning, and in Section 4, we extensively discuss two examples illustrating the difference between classical, contingent, and necessary conditioning. After that, we explain how the classing DS conditioning relates to our approach, and we end with a concluding section in which we relate the various forms of conditioning to the existing notions mentioned above.

CONTINGENT CONDITIONING
In classical probability theory, one of the most convincing ways to think about conditional probability is to consider many repetitions of the same experiment. Conditioning on an event, H then means that we simply ignore all outcomes that are not in H, and the conditional probability of an event A given H is then given by the relative frequency of A in the resulting subsequence.
This idea can and has been (Dubois & Prade, 1992;Fagin & Halpern, 1989) used in the context of belief functions as well. The problem is, however, that we do not always know whether or not A and/or H actually occurs because we do not receive complete information about the outcome of the experiment. Indeed, when we receive information that the truth is in S, or that S occurs, this may be compatible with both H and H c and similarly for A.
However, we can, in the same spirit as Theorem 1, compute the largest lower bound of the limit inferior of the relative frequency of and event A relative to H, given the information we have. This then naturally leads to a definition of conditional belief.

Theorem 2. For all
Moreover, the infimum is attained P ∞ , almost surely.
Proof. Let Choose ∈ (C ) in such that We claim that for all x ∈ (C ) and for all n for which the denominators are nonzero.
To see this, we observe that (12) is true if and only if which is equivalent with By construction, we have 1 A∩H (x i ) ≥ 1 A∩H ( y i ) and 1 A c ∩H (x i ) ≤ 1 A c ∩H ( i ), and thus, (14) holds. Hence, , which we can rewrite as It follows from the law of large numbers that 1 n ∑ n i=1 1 A∩H ( i ) converges almost surely to Bel(A ∩ H ) and that 1 n ∑ n i=1 1 H ∖A ( i ) converges to Pl(H ∖A). Hence, the whole expression in fact converges almost surely, with limit .
This leads to the following definition.
Definition 4 (Contingent conditioning). Assume that Pl(H ) > 0. We define the conditional belief in A given H by .
We make four remarks.
1. One can also directly arrive at (16) by replacing the lim inf and inf by lim sup and sup, respectively, in Theorem 2. The proof of this completely parallels the proof of Theorem 2.
2. Note that if Bel is a probability distribution, we find which means that this notion of conditioning generalizes the classical notion. 3. Formula (15) was obtained by Fagin and Halpern (1989) and by Dubois and Prade (1992).
In their interpretation, a belief function describes the uncertainty we have about an underlying probability distribution. We, on the other hand, make no assumption of such underlying probability distribution, and our derivation is very different and more elementary than theirs. 4. In Fagin and Halpern (1989), it was shown that Bel H is again a belief function if Bel(H) > 0.
As such it corresponds in a one-to-one way to a basic belief assignment, which we denote by m H .
We next give two examples. In the first example, we see that contingent conditioning is not commutative. The second example shows that the law of total probability does not hold. Both issues have been discussed in the literature (see, e.g., Pearl, 1990).
Suppose we condition on H = {a, b, c} in a contingent way. We find and From these beliefs, we can reconstruct the corresponding conditional basic belief assignment: Note that the belief in {b, c} is higher after conditioning, whereas the belief in {a} remained the same. Obviously, this cannot occur if Bel is a probability distribution: In that case, the ratios between the probabilities remain the same after conditioning. The reason that it happens in this example is that there is a chance of 1∕3 on {c, d}, which is consistent with both H and H c . Precisely for that reason, as described before, the beliefs in If in general one wants to condition on two events H and J, the most natural thing to do is to condition on H ∩ J and not on H and J in any given order.
Example 5 (Example 2 continued). The outcome space is given by where the first and second coordinates represent respectively the side of the first and second coins facing up (here, "h" stands for heads and "t" stands for tails). Let A = {(t, t), (h, h)} be the event that both coins have the same side facing up. It is now easy to check that Bel(A) = 2∕3. Now suppose that we are given the information that the second coin is heads. We write In exactly the same way, we compute This might be somewhat surprising: If the chance is 1∕2 in the case, we learn the second coin is heads as well as in the case we learn the second coin is tails, one might think that the unconditional chance should also be 1∕2 because we know beforehand that the second coin is either heads or tails. However, we must be very careful with interpretation here. The quantity Bel H (A) gives the belief in A contingent upon H and Bel H c gives the belief in A contingent upon H c . While it is true that either H occurs or H c occurs, it is clearly not true that either every time that H occurs the outcome is ignored or every time that H c occurs the outcome is ignored. Therefore, we do not know at all beforehand that the belief in A should be considered either contingent upon the second coin being heads or contingent upon the second coin being tails. Hence, the law of total probability does not hold for contingent conditioning.

NECESSARY CONDITIONING
Now, we consider updating according to the information that some set H ⊆ Ω necessarily occurs. This must be understood in the context of the chance experiment: We learn that in every repetition of the experiment H occurs. This is information about the experiment rather than about an individual outcome. This means that this type of conditioning can only be used for updating if every outcome of the experiment C ⊆ Ω that has positive probability (i.e., P({C}) > 0) is consistent with H, that is, must satisfy C ∩ H ≠ ∅. Hence, we must have Pl(H) = 1. The viewpoint of the limit inferior of relative frequencies, in the same vein as in Theorems 1 and 2, is again useful here. To this end, for H ⊆ Ω and C = (C 1 , C 2 , C 3 , … ) as before, we set That is,  H (C ) is the collection of realizations compatible with our observation C and for which H always occurs. Note that  H (C) = ∅ if C i ⊆ H c for some i.

Theorem 3. Assume Pl(H
Moreover, the infimum is attained P ∞ , almost surely. for all x ∈  H (C ) and for all n. Hence, Because y i ∈ A precisely when C i ⊆ A ∪ H c , the law of large numbers implies that the last limit inferior is in fact a limit almost surely and equal to Bel(A ∪ H c ).
This leads to the following definition.
Definition 5 (Necessary conditioning). Assume that Pl(H) = 1. We define We make some remarks. First of all, our notion of necessary conditioning is very close to the notion of Smets's unnormalized conditioning. Ours is technically more restrictive because we insist that Pl(H ) = 1 when using it. In our frequentistic interpretation, we need this technical condition, as is clear also from the proof of Theorem 3.
Contrary to contingent conditioning, necessary conditioning is commutative, which seems to be what intuition requires. To show this, for H, J ⊆ Ω such that Pl(H ) = Pl(J ) = 1, we have Similarly to contingent conditioning, necessary conditioning does not satisfy the law of total probability as the following example, in the spirit of Example 2, illustrates.
Example 6. We set Ω = {(h, h), (h, t), (t, h), (t, t)} as before, and define a basic belief assignment by and Let be the event that the two coins have the same face showing up. We compute Next, we condition on the event E 1 = {(h, h), (t, h)} that the second coin is heads. The sets {(h, h), (h, t)} and {(t, h), (t, t)} are the only outcomes with positive basic belief assignment and because they are both consistent with E 1 , necessary conditioning applies. Intersected with E 1 , they become respectively {(h, h)} and {(t, h)} so we find In exactly the same way, we find

COMPARING CONTINGENT AND NECESSARY CONDITIONING: TWO FURTHER EXTENSIVE EXAMPLES
In the following examples, we illustrate the difference between conditioning in the classical theory, contingent conditioning, and necessary conditioning. We also discuss under which circumstances any of these options should be applied.

Example 7.
Consider N + 1 individuals, labeled 1, 2, … , N + 1 forming the space X = {1, 2, … , N + 1}, an empty vase, and a fair coin. We flip the coin. If heads comes up, person number 1 puts a black ball in the vase while all other persons put a white ball in the vase. If tails comes up, then everybody puts a black ball in the vase. Next, a ball is drawn from the vase, and we are told that the drawn ball is black. Is the drawn ball put in the vase by person number 1? As we will see below, our assessment of the answer to this question will depend on the (lack of) information we have about how the black ball was drawn. In particular, we cannot assume from the outset that the ball was chosen uniformly at random, and we should keep in mind that the draw of the ball may depend on the coin flip or on other things.
We work in the space The first coordinate refers to the outcome of the coin flip, the next N + 1 coordinates refer to the color of the balls put in the vase by persons 1, 2, … , N + 1 (in that order) and the last coordinate refers to the origin of the drawn ball. We denote the coordinates by A, Γ 1 , … , Γ N + 1 , and F, respectively.
Case 1 (Classical situation). Suppose that we are informed that the ball drawn from the vase is chosen randomly, with each ball being equally likely. We are now in a classical situation, which can be satisfactorily described with probabilities. To allow comparison with the upcoming cases, we however prefer to give the basic belief assignment m c corresponding to this case. This basic belief assignment is given by m c (0, B, B, B, … , B and m c (1, B, W, W, … , W, i ) = 1 2 for all i = 1, … , N + 1. Because m c concentrates on atomic events, the corresponding belief function Bel c is a probability distribution.
Write H = {Γ F = B}, the event that the drawn ball is black, and E = {F = 1}, the event that the drawn ball came from person 1. We are now interested in Bel c H (E) (which is the same as the classical probability of E given H because Bel c is a probability distribution). We compute Case 2 (Necessary conditioning). In Case 1, the procedure we had to model included the fact that the ball from the vase was randomly drawn. Without this information, the basic belief assignment m we start out with has the following form: and Indeed, in this case, we have no information whatsoever about the way the ball is drawn from the vase. Now suppose that we do receive such information, namely, that the procedure is such that the drawn ball has to be black. This is, therefore, information that H occurs necessarily, and we have to compute Bel H,nec (E). This is by definition equal to Bel(E ∪ H c ), that is, the belief in the event that if F ≠ 1, then Γ F = W. It is easy to see that only the set in (42) implies this event, and therefore, the belief in this event is equal to 1 2 . Case 3 (Contingent conditioning). Suppose now that we start out as in Case 2, that is, with the same basic belief assignment. This time however we do not get the information that the drawn ball is necessarily black, but simply learn that the outcome is a black ball. In this case we, therefore, need to condition in a contingent way. Starting out with the same belief function as in Case 2, it is easy to see that none of the sets in (41) and (42) is contained in E∩H, and therefore, we find that Bel H (E) = 0.

Discussion.
We have distinguished between three cases, obtaining three different answers: 2 2+N , 1 2 , and 0. These are different answers that correspond to different situations, and we next argue that each of these answers is reasonable given the information we have.
In the first (classical case), we simply draw a uniform ball, and we asked what the probability is that the ball came from 1 given it was black. It is not a surprise that the answer tends to 0 when N gets larger. Indeed, most instances where the drawn ball is black will come from a situation in which A = 0, in which case the probability that the ball came from 1 is only 1∕ (N + 1).
In Case 2, we are sure that the drawn ball will be black. However, only when A = 1, are we sure that the drawn ball did come from 1. Indeed, when A = 0, everybody put a black ball in the vase. Because belief functions correspond to adding up the basic beliefs of sets which imply the target set, the answer 1/2 is completely reasonable. Indeed, for all we know, it might be the case that only when A = 1 the black ball coming from 1 is chosen, and when A = 0, another black ball is chosen. There is no way we can rule out this scenario given the information we receive. Therefore, we cannot be sure that the relative frequency (when the procedure would be repeated many times) will be larger than 1/2, and assigning conditional belief 1/2 to E is completely reasonable.
The situation in Case 3 is very different and can be compared with that in Case 1. Indeed, in both Cases 1 and 3, the concept of contingent conditioning is used. In Case 1, however, we have much more information about the way the procedure is carried out. In Case 3, we do not know how the balls are drawn, and we cannot exclude the possibility that the procedure is such that it will never be the case that a black ball coming from Person 1 is drawn. Hence, it is completely reasonable not to have any conditional belief in E.
Example 8 (The Monty Hall problem Example 3 continued). We recall the situation. We assume that the probability that the prize is in box i is given by p(i) and that the initial choice of the candidate is uniformly distributed. Set Ω = X 3 , where X = {1, 2, 3} and let P denote the box containing the prize, S denote the box initially chosen, and O denote the box opened by Monty Hall. The coordinates of = ( 1 , 2 , 3 ) ∈ Ω correspond to P, S, and O, in that order. Now, we set our basic belief assignment, as follows: for all x, y ∈ X. Indeed, the position of the prize and the choice of the candidate are independent, and all we know at this point is that Monty Hall will open a box that was not chosen by the candidate. Next, we want to condition on H = {O ≠ P}, the event that Monty Hall opens an empty box. What is the conditioned belief in A = {P = S}?
First, we consider contingent conditioning. For this, we need to compute Bel(A ∩ H ) and Pl(H ∩ A c ). This is not difficult. The basic belief assignment in (43)

DS CONDITIONING
The theory of belief functions is detailed and subtle enough as to allow for various ways of conditioning. Which conditioning (or updating) rules are appropriate in a given context may depend on the interpretation one has in mind. In our interpretation, a belief function refers to a (in principle) repeatable experiment about which an agent has some, but typically not all, information.
When the agent does receive information, then it is a reasonable question to know whether or not the information is about the whole experiment itself, or only about a particular outcome of one given experiment. It is very different to learn that the outcome will always be a black ball because that is how the experiment is set up, than to learn that this time the color of the ball happens to be black. Necessary conditioning leads to adapting the basic belief assignment: The new information reveals that the experiment is actually set up differently from what the agent knew or thought before, and the new basic belief assignment accounts for precisely that. There is some transfer of probability mass taking the new information into account.
Contingent conditioning, on the other hand, applies when the agent receives information about an outcome of the experiment. The agent accounts for this by investigating what this new information means in terms of relative frequencies. As such, the new conditional beliefs are obtained by rescaling, analogous to what happens when we perform conditioning in classical probability theory. Indeed, in classical probability theory, P(A|B) is the frequency of A in the sequence of outcomes where B occurs so that outcomes that do not give B are discarded. This is what happens in contingent conditioning as well.
However, there is one major notion of conditioning in the theory of belief functions that we have not introduced so far, namely, the original DS way of conditioning introduced in Shafer (1976). We now discuss this notion of conditioning and place in into our context and interpretation.
Consider a basic belief assignment m and corresponding belief function Bel. Suppose that we want to condition on the event H. In the DS way of conditioning, one proceeds by assigning the basic belief assignment of a set A to the set A ∩ H. After this, normalization takes place by dividing through the original belief of H c so that the sum of the new basic belief assignments is 1 again. It is not difficult to check that this leads to the belief function Bel H,DS given by with corresponding plausibility When we compare DS conditioning to our notions of contingent and necessary conditioning, we see that DS conditioning combines ingredients from both necessary and contingent conditioning. The transfer of basic belief assignment from A to A ∩ H is a "necessary component" in the sense that the basic belief assignments are different than before. However, the subsequent rescaling is a "contingent component" in the same vein as classical and contingent conditioning. In fact, it is precisely contingent conditioning after the preceding transfer of mass in the necessary component. As such, DS conditioning is, from our point of view, a hybrid form of conditioning that is difficult to interpret in our context because it is neither necessary nor contingent, sharing characteristics with both of them.
To reinforce this point, DS conditioning can sometimes be shown to be a concatenation of necessary and contingent conditioning, as follows. Hence, Applying necessary conditioning on N, we find If we apply the necessary conditioning on N first, we have We now observe that

DISCUSSION AND CONCLUSIONS
In this paper, we made a distinction between what we called necessary and contingent conditioning. Compared to the focusing and updating of Dubois and Prade (1992), our caesura is slightly different. Focusing, in their setup, is frequency based, as is our contingent conditioning. Because they implicitly use the fact that the outcome could have been different, it is perhaps no surprise that they arrive at the same formula as our contingent conditioning. To be precise, they consider the same collection of consistent probability distributions as the one in Fagin and Halpern (1989), which we discussed before, that is, the collection of all probability distributions P for which Bel(A) ≤ P(A) ≤ Pl(A) for all A ⊆ Ω. They explain that their focusing form of conditional belief in A given H can be seen as the infimum op P(A|H), where the infimum runs over all consistent P. This is in line with the arguments in Fagin and Halpern (1989).
The updating interpretation in Dubois and Prade (1992) seems to refer to an experiment that takes place only once; the example they give is that a friend happens to give some information about the outcome of an experiment to the agent in question. If it is the case that, at least in principle, the situation is repeatable, then a frequency based interpretation still makes sense though. For instance, the Monty Hall Example 3 might be of this type. The friend (or Monty Hall) may give information either about this particular outcome or about the way the experiment is set up. In the first case, contingent conditioning still seems meaningful, and in the second case, necessary conditioning is appropriate. Hence, our caesura is different form theirs. We ask the question whether or not the information is necessarily true or contingently true, and this question seems to be relevant in all cases mentioned in Dubois and Prade (1992).
For their updating type of conditioning, Dubois and Prade arrive at the original DS form of conditioning. This form of conditioning can be derived from a special case of Dempster's rule of combination and essentially boils down to assign the mass m(S) that was originally assigned to the set S to the set S ∩ H when conditioning on H, followed by proper normalization.
We have explained why we think that this form of conditioning is difficult to interpret in a frequentistic framework as ours, and with our interpretation of belief functions. Dubois and Prade seem to realize this when they remark (p. 303): "Hence, as an updating rule in the frequentist framework, (5) makes sense only if [...] the piece of evidence B does not contradict the set-valued statistics on Ω." (Here, equation (5) is the DS form of conditioning.) This remark is essentially the same as our earlier remark that in the definition of necessary conditioning on H, we need to assume that Pl(H) = 1.
Nevertheless, DS conditioning pops up as a certain mixture of necessary and contingent conditioning. Given the fact that the very definition of DS conditioning is somewhat hybrid (as we have argued), this may not be a big surprise perhaps. This form of conditioning is interpretable in special circumstances, but not in all.
In this paper, we have not used Dempster's combination rule in our derivation or motivation. Instead, we have only used frequency-based ideas, which appears to be reasonable in our interpretation of repeatable experiments about which an agent has limited information. We see the fact that our derivations do not depend on Dempster's rule as an advantage. The rule is somewhat controversial, and we thought it would be best not to immediately lose those readers that reject it. If rules of conditioning can be derived from first principles (as we do), then we prefer this over using a rule that not all researchers endorse.
Finally, our new caesura of contingent versus necessary conditioning seems to be applicable in many situations and seems to lead to understandable answers. It relates to current notions of conditioning in the literature and approaches the concept from a new direction that we hope is found illuminating.