A Gaussian alternative to using improper confidence intervals

The problem posed by exact confidence intervals (CIs) which can be either all‐inclusive or empty for a nonnegligible set of sample points is known to have no solution within CI theory. Confidence belts causing improper CIs can be modified by using margins of error from the renewed theory of errors initiated by J. W. Tukey—briefly described in the article—for which an extended Fraser's frequency interpretation is given. This approach is consistent with Kolmogorov's axiomatization of probability, in which a probability and an error measure obey the same axioms, although the connotation of the two words is different. An algorithm capable of producing a margin of error for any parameter derived from the five parameters of the bivariate normal distribution is provided. Margins of error correcting Fieller's CIs for a ratio of means are obtained, as are margins of error replacing Jolicoeur's CIs for the slope of the major axis. Margins of error using Dempster's conditioning that can correct optimal, but improper, CIs for the noncentrality parameter of a noncentral chi‐square distribution are also given.


INTRODUCTION
Using exact confidence intervals (CIs) C(y) to estimate a scalar parametric function ( ) when some C(y)'s are equal to (−∞, ∞) over a nonnegligible set of sample points y-although their confidence coefficient 1 − is strictly between zero and one-has been studied by Bennett neighbourhood with the confidence distribution method and with the generalized confidence method (Weerahandi, 1993(Weerahandi, , 2013Bebu & Mathew, 2008). The additive structured model for continuous cases (Plante, 1979a(Plante, , 1979b, which uses both Fraser's (1971) and Dempster's methods, can be used within the renewed theory of errors, but at the price of introducing an extra assumption. In the additive structured model, the upper and lower probabilities (or upper and lower error measures) coincide. This puts some distance between the additive structured model and those primarily using discrete distributions in the Dempster and Shafer tradition-such as those used by Edlefsen, Liu & Dempster (2009), Martin et al. (2010), Martin & Liu (2013, 2014b.
The article is organized as follows. Section 2 explains the problem posed by improper CIs. Section 3 discusses how R. A. Fisher and J. W. Tukey contributed to the renewed theory of errors. An outline of the renewed theory of errors is given in Section 4 where it is also shown how one can use Dempster's conditioning within a continuous model to repair improper CIs such as the uniformly most accurate CIs estimating the noncentrality parameter of a noncentral chi-square distribution. An extended Fraser's frequency interpretation is defined in Section 5 which also explains the relationships between the renewed theory of errors, the improper Bayesian approach, and CI theory. Section 6 describes the error simulation algorithm used to estimate any (computable) parameter derived from the five parameters of the  2 ( x , y , 2 x , 2 y , x y ) distribution. Applications of this algorithm to estimate a ratio of means are given in Section 7 and the slope of a major axis in Section 8. Possible objections to the renewed theory of errors such as Stein's fiducial example, marginalization paradoxes, the Buehler and Feddersen paradox and so on, are discussed in Section 9, which also points out that marginalization is both unavoidable and manageable. Section 10 is a general discussion. In addition to the Appendix, mostly containing proofs, there is available a Software Web Appendix in the Supplementary Material containing the prototype programs used (accompanied by a README file). A Web Appendix in the Supplementary Material is also available containing data, tables, additional elementary examples, bootstrap, auxiliary Monte Carlo experiments and connections with the improper Bayesian and CI approaches.

How Improper CIs Arise
Improper confidence sets for a parametric function ( ) usually occur when generated by an improper pivot T{y; ( )} = e. The pivot is improper when the set T{y; (Θ)} can have a probability less than one for a nonnegligible set of sample points, where Θ is the parameter space (Plante, 1979b(Plante, , 1991. If ( ) is a scalar parameter, this means that the p-value function p( ) = F{T(y 0 , )} (Fraser, 2019;Fraser, Reid & Wu, 1999;Schweder & Hjort, 2002;and Fraser, Reid & Wong, 2004) is not essentially onto the error space (0, 1) (modulo a uniform distribution) for a nonnegligible set of sample points, where F( ⋅ ) is the distribution function of T{y; ( )} and where y 0 is the observed value of y. Given a measurable space (U, ) and a probability space (V, , R), a measurable function ∶ U −→ V is essentially onto V modulo R if −1 (H) = −1 (H ′ ) implies that R(H △ H ′ ) = 0 for any H and H ′ in , where △ is the symmetric-difference operator, as is explained in Plante (1991, Section 4.1) and where −1 is the inverse image set function of the point function . (Several p-value functions are illustrated in the Web Appendix in the Supplementary Material.) Note that using a p-value function as a means to construct CIs is introduced in Corollary 3.5.1 in Lehmann & Romano (2005), i.e., Corollary 3 of Chapter 3 in Lehmann (1959). Also, if y and are scalars, and if T(y, ) is the (continuous) distribution functions of y, then the distribution function of T(y, ) restricted to the interval (0, 1) is an identity function. This is the formulation, described with a different notation, in Fisher (1930). Plante (1976) examined a conjecture by Fisher (1930) concerning the monotonicity of a p-value function associated with a maximum likelihood estimator. Using "Natural CIs," recommended by Stein (1959) as preferable to using fiducial intervals, provides a good example of how improper CIs arise. In his example a random variable d 2 has a noncentral chi-square distribution with p degrees of freedom and noncentrality parameter 2 ≥ 0, and is associated with a pivot F(d 2 ; 2 , p) = e, where F(d 2 ; 2 , p) is the distribution function of d 2 , and where e has a uniform density on the interval (0, 1). Since F(d 2 ; 2 , p) < F(d 2 ; 0, p) for every 2 > 0, if d 2 > 0, it follows that for any fixed d 2 > 0, the set F{d 2 ; [0, ∞), p)} has a length smaller than one. Consequently, the lower confidence bound 2 − = 2 − (d 2 , 1 − ) = inf{ 2 ∶ 2 ≥ 0 and F(d 2 ; 2 , p) ≤ 1 − } for 2 is equal to zero-and thus improper-with probability F{F −1 (1 − ); 0, p); 2 , p} > 0. Figure 1 illustrates this. Note that for a fixed p, the natural CIs, are the uniformly most accurate CIs to estimate 2 because the family of noncentral chi-square distributions is strictly ordered according to the nondecreasing likelihood ratio and because of Corollary 3.5.1 in Lehmann & Romano (2005).
It is shown in the Appendix that this lower confidence bound 2 − , although uniformly most accurate, is improper with probability 1 − at the limit when p → ∞, provided 2 = o(p 1∕2 ), and that it fails-with probability one-to underestimate 2 for the remaining coverage probability. Note that this solution makes sense if 2 = 0. (A referee pointed out that the signal to noise ratio tends to zero if 2 = o(p 1∕2 ).) Gleser & Hwang (1987) showed how improper CIs can unavoidably occur due to a singularity in the parametric function ( ) being estimated. A confidence belt containing 100% improper CIs can even be uniformly most accurate-as occurs, for instance, when estimating a mixture parameter (Plante, 1991, where the mis-copied symbol "≤" on line 14 of p. 391 must be inverted). Restricted parameter problems also produce improper CIs (Quenouille, 1958, Sections 6.6 and 6.9;Mandelkern, 2002). Several authors have unsuccessfully attempted to produce improved CIs which would be proper at all levels. For instance, Casella (2002) suggested an attractive frequentist solution for a 68% upper confidence bound when y is from an  1 ( , 1) ( ≥ 0) distribution with a sample of size of one, but his method-which consists in gently modifying the model-still gives an empty one-sided interval at the 45% confidence level when the observation is y 0 = −1. Other improper CIs, besides the ones discussed here, exist. For example, the natural CIs to estimate the parameter in the general model of the Corollary to Theorem 1 below can be improper for the same reason that the natural CIs to estimate a noncentrality parameter and the CI for a restricted normal mean can be improper.

Corollary 1 (Corollary to Theorem 1 in the Appendix). Let F(d; ) be the distribution function of a statistic d taking its values in an interval
is continuous as a function of two variables and that the partial function F(d; ⋅ ) is one-one for every d in (d 0 , ∞) . Then F(d; ) is an improper pivot.
(The graphs for the various distribution functions cannot cross each other because the distribution functions are continuous and the partial function with fixed d is one-one and, consequently, the family of distributions is stochastically ordered. Therefore, either none of the graphs can be above the graph of F( ⋅ ; 0 ) or none can be below it. In either case the family of graphs do not fill the strip [d 0 , ∞) × (0, 1), which is a clear sign that the pivot is improper.)

The Problem with Fieller's CIs for a Ratio of Means
Where correction is needed can be determined if one adopts the viewpoint of a user of Fieller's method who has only his observed sample and a bivariate normality assumption. Letx,ȳ, s x , s y , and r be the usual estimators of x , y , x , y , and . Suppose thatx > 0 and that r ≥ 0. Imagine a modified mean sample point (x ′ ,ȳ ′ ) = (kx, kȳ) (k ≠ 0 an accessory variable), from possible  2 distributions, moving under the influence of k on a given radius at an angle with the x axis sufficiently less than ∕2, towards a point at infinity and starting at a 0+ distance from the origin of the plane. If only the mean sample point is varied, initially, Fieller's 95% CIs are the interval (−∞, ∞); but they become the complement of an interval except if kȳ∕kx = (s x ∕s y )r and kx = (s x ∕ √ N)t ∕2 , according to the proof, in the Appendix, of Proposition 3 in Section 7. Next, they become two-sided intervals. As k increases, CIs for a ratio are eventually satisfactory-when the denominator kx of the ratio kȳ∕kx is far from zero. However, between the limit point where CIs have an infinite length and points where CIs look quite regular, there is an interval along the given radius where the CIs are transitionally misplaced. This is essentially the region for which I can suggest a correction and can also correct CIs for some points where they have an infinite length. Fortunately, for a ratio of means, this treacherous region appears short.
In a ratio of means problem, the DPE becomes progressively flatter as the factor k in kȳ∕kx decreases in absolute value-with the result that it becomes more and more difficult to calculate the 2.5% and 97.5% quantiles of the DPE in order to obtain a 95% ME. Thus, the number of significant figures determining a ME is important. Using a laptop computer, I found it reasonable to aim at results with 4 ± 1 significant figures. To simulate this required 10 million iterations. Using the DPE technique to obtain a generalized fiducial solution (Hannig, 2009;Hannig et al., 2016), i.e., approximating the coverage frequency for the entire parameter space is a technically different problem than that investigated here-which concerns a unique sample. A computing speed of greater magnitude is required to deal with sufficient points in a five dimensional space.

The Problem with Cox's Revised CI Theory
After Neyman (1954) recognized that his theory of CIs needed improvement, Cox (1958) proposed a revised CI theory possibly inspired by Egon Pearson's attitude towards the joint Neyman-Pearson theory (Lehmann, 1994). This revised theory was expanded in Cox & Hinkley (1974). It included, besides the basic concepts of alternative hypothesis and power, an emphasis on likelihood, sufficiency, minimal sufficiency, ancillary statistics, and maximum likelihood-all concepts borrowed from Fisher's approach. Cox even included the idea of DOI: 10.1002/cjs The Canadian Journal of Statistics / La revue canadienne de statistique a confidence distribution which kept some components of Fisher's fiducial distribution. For example, he considers that if the parameter space Θ is of the form Θ 1 × Θ 2 , a confidence distribution for 1 is a distribution P * y ( ⋅ ) over 1 -events such that P * y {S 1− (y)} = 1 − for all exact 100(1 − )% confidence sets S 1− (y) for 1 in a large class and for all in the interval (0, 1).
However, a corollary to Theorem 1 in Plante (1991) is that if a -field  of confidence belts is generated by a continuous pivot T(y; ), then a confidence distribution exists over every y-section of  iff T(y; ) is a proper pivot, i.e., iff the difficulty met by Neyman (1954) does not occur. Furthermore, confidence sets are always determined by a pivot except in esoteric cases irrelevant to statistics (Plante, 1991, Lemma 1). Therefore, Cox's confidence distribution theory does not solve the problem raised by improper CIs.
That a confidence distribution must be induced by a proper pivot is bad news for confidence distribution users, for it implies that parameters like 100 ∕| |, + 2 ∕2, and so on cannot be estimated using a natural confidence distribution based on the normal model for observations-as was shown by Plante (1991). Furthermore, being restricted to proper pivot models also implies that models that have more than one error term, as do standard variance component models, are not estimable using a natural confidence distribution. All variance component problems share the difficulty illustrated (using Stein's natural CIs) in Section 2.1 when underestimating a noncentrality parameter. (The case of a noncentrality parameter is solved using MEs in Section 4.2.) An ad hoc conditional fiducial solution was proposed in Quenouille (1958, Section 6.9) for variance components and a systematic solution is available using an "error" or formal third space given in Cisewski & Hannig (2012) where Dempster's conditioning (defined in Section 4.2) is possible.

A Fundamental Problem with Neyman's Coverage Frequency Interpretation
In the physical sciences, observation errors are quantified so that estimates from one investigation can be used in other investigations and that the precision of the final results be appraised. Errors are partially due to observer's perceptions and manipulations. Neyman added a subjective potential error-causing component-the decision which is supposed to be based on these scientific observations. Wald (1939) criticized both the theory of the testing of hypotheses and CI theory, pointing out that they cannot always provide a solution since these theories are at their base founded on the concept of a discrete action-space {accept, re ect} rather than on the concept of a continuous space-where randomization ensures that every problem has a solution as is the case in the theory of games. Wald's approach is entirely based on a system of weights. Bartholomew (1967Bartholomew ( , 1971)-claiming to avoid known paradoxes-described several numerical applications of such a modified Neyman-Pearson theory. Plante (1987) proposed a solution to the problem of choosing between two simple hypotheses consisting of a Bayesian posterior-like everywhere-randomized procedure in which prior probabilities are replaced by losses for taking a wrong decision, as factors of the likelihood function.

The Unavoidable Persistence of Improper CIs
When commenting on improper CIs occurring in the restricted problem (y has a  1 ( , 1) distribution, ≥ 0), Gleser (2002) suggested that Neyman-Pearson's method should be restricted to the design stage of an investigation and that Fisher's fiducial and likelihood methods should be used to analyze data.
My position, based upon counter-example logic explained in Plante (1971), is that although coverage frequency remains an important concept, the strong repeated sampling principle (Cox & Hinkley, 1974, p. 45) can no longer be an axiom used as a general validity criterion because The Canadian Journal of Statistics / La revue canadienne de statistique DOI: 10.1002/cjs one counterexample is sufficient to prove a general principle wrong. Since, in the case of a ratio of means, CI theory needs to be improved (as Neyman, 1954, admitted); and, since Koschat (1987) proved, using a simplified version of the ratio of means problem, that Fieller's exact CI solution is the only reasonably continuous one (since observation errors are the main cause of coverage errors and since observation errors are included within Gauss's theory of errors), I suggest using observation errors in order to provide minimal, but necessary, corrections to confidence belts containing improper CIs. Fisher (1912), while a graduate student specializing in the theory of errors, deviated from Gaussian theory when introducing his concepts of likelihood function and the maximum likelihood estimation method (both not then yet named). These concepts do not conform exactly with Gauss's concept of observation error density, nor with his maximum error density estimation method. However, Fisher's fiducial distribution (Fisher, 1930(Fisher, , 1935(Fisher, , 1959 can be considered a major step forward within the theory of errors. Yates (1960, Chapter 7) adapted the theory of errors to sampling techniques. Tukey (1957) warned military engineers and scientists about the technical difficulties hidden within CI theory and advised them to use Gauss's error propagation formula instead or to use the Gaussian formula extensions which his article was providing which were also capable of yielding approximate DPEs. He demonstrated how his approach was consistent with Kolmogorov's axiomatization of probability (Kolmogorov, 1933(Kolmogorov, , 1956 in which a probability measure and an error measure obey the same axioms, although the connotations of the two words are different. It seems evident that errors (which concern past events) and probabilities (which concern future events) should be used differently as the past and the future are not symmetrical.

Three Spaces and a Pivot
The theory of errors founded by the astronomer Gauss is based on a physics postulate very different from the mathematical repeated sampling axiom of Neyman-Pearson theory. Gauss's (1809Gauss's ( , 1855, Note I, Section 1, par. 1) error postulate can be stated as follows: Postulate 1 (Gauss's error postulate). When apart from systematic errors, fortuitous observation errors are the only reason (corrected) observations and the quantities measured do not coincide, fortuitous observation errors are propagated according to the laws of probability.
The renewed theory of errors initiated by Tukey (1957) and continued below uses three spaces before the sample is drawn; a sample space, , a pivot space , and a parameter space Θ p . They are assumed to be Borel subsets of Euclidean spaces. The symbol p designates the dimension of the Euclidean space for . There is a known family of absolutely continuous probability distributions {P } ∈ Θ p containing an unknown unique member associated with a "unique" observed random sample y 0 occurring in . Before sampling, the sample is a random element y of  governed by one and only one of the P 's; after sampling, y 0 is a known element of . To apply Gauss's postulate, it is necessary and sufficient to be willing to assert that fortuitous observation errors are the "only" reason why observations and the quantities measured do not coincide.
In this renewed Gaussian theory of errors, a DPE is the end product of the error analysis process, whereas a ME is just a summary of a DPE-useful for being expedient and able to reduce publishing costs. Since the theory of errors is a physics theory, every assumption formulated to solve a problem should have a physical basis.
To initialize a renewed theory of errors solution for a problem one needs an invertible pivot T(y; ) = e for which the pivot space is reduced so that the partial function T(y; ⋅ ) ∶ Θ p →  is one-one and onto its range space for (almost) each y in . Both this partial function and its inverse must be measurable. One hopes to find either boundedly complete minimal sufficient statistics to estimate or a Fraser structural model for this initial reduction, but one may be left with maximum likelihood estimators and ancillary statistics. This produces a reduced sample spaceΘ p , a duplicate of Θ p from which a reduced pivot space  p can (theoretically) be easily obtained. One may now suppose that the P s are defined onΘ p and the pivot T onΘ p × Θ p . Different pivots will produce different DPEs, unless it is proven otherwise, since calculations from each pivot, while complying to the laws of probability, may concern different fields of events.
The pivot distribution Q is obtained from the sampling distribution of an estimator̂by means of the equation Up to now, the model concerns the future since the real time is after y 0 is known. Because of Postulate 1, the distribution Q is then kept as an error distribution instead of a probability distribution. The space  p is now an error space and a pivot-event A is now called an error-event. The DPE P * y o over parameter-events is given by P * which is the inverse of the partial point function T(y 0 ; ⋅ ) when the first argument is held fixed.
A renewed theory of errors approach does not imply a unique answer to a problem, but promises to evaluate faithfully any error resulting from the invertible pivot used. If one uses only part of the data, this inefficiency should be indicated by a larger resulting error.
A derived scalar parameter = ( ) has a DPE induced from P * y 0 -as is done in an improper Bayesian approach after a posterior distribution is known over the parameter space. This process is often called "marginalization." Section 9.4 points out that marginalization is unavoidable and that it is not difficult to choose which pivot to use in the few cases in which more than one of them is available.

Dempster's Conditioning as an Error Propagation Method Capable of Solving Some Improper CI Problems
The noncentrality parameter ( 2 ) estimation problem in Section 2.1 can be solved by using what I call Dempster's conditioning-which is to use a conditional error measure, given an error-event which is known to have happened-although revealed by one or by several statistics which may be nonancillary (Dempster, 1967(Dempster, , 2008. This was used by Plante (1979aPlante ( , 1979b, Edlefsen, Liu & Dempster (2009), Hannig (2009), Martin et al. (2010, Martin & Liu (2013, 2014a, 2014b and Hannig et al. (2016). The solution, which is explained in the Appendix, is where d 2 0 is the observed value of d 2 and where the lower index −1 refers to the inverse function of the restricted partial function F(d 2 when the first and third arguments are fixed. A corresponding two-sided ME would be

An Observation-Centred Frequency Interpretation
Error analysis at its elementary level uses only inequalities such as | y 0 − | < Δ, where y 0 is an observation, the quantity measured and Δ is the largest error (in absolute value) judged possible-thus it does not require the use of probability or measurements of uncertainty. Following the Gaussian advance from | y 0 − | < Δ to y 0 − = e, 19th-century metrologists did not object to y 0 -centred errors. This is likely because, in Gauss's theory, a physical law expressed by an equation is deterministic and errors are associated with the observation of a phenomenon-consequently with an estimate rather than with a parameter. However, after the introduction of CI theory in the 20th-century, errors became considered to be -centred.
Fraser's frequency interpretation and its extension, provided in Section 5.3, helps bridge this gap between a Gaussian and a Neymanian viewpoint. Fraser's frequency interpretation for linear or for group-constructed pivots, briefly described here (Fraser, 1961;Cox & Hinkley, 1974, p. 248) and its extension useful for nonlinear pivots, both show how a pre-sampling -centred error random variable can produce the needed post-sampling y 0 -centred DPE.

Fraser's Frequency Interpretation
Suppose that a linear pivot y − = e is available. Then, consider a large number of observations y 1,0 , … , y N,0 observed under apparently the same conditions and corresponding to known values 1 , … , N of a parameter . Let y N+1,0 be a new observation for which the parameter value N+1 is unknown and for which a reference distribution for N+1 centred on y N+1,0 is needed. Fraser's frequency interpretation can then be activated as follows. Consider the errors  1961). The corresponding DPE is that of N+1 = y N+1,0 − e N+1 .

An Extended Fraser's Frequency Interpretation
Fraser's frequency interpretation can be extended to a nonlinear pivot as follows. Suppose that an invertible nonlinear pivot T(̂, ) = e is available. Consider a large number of estimateŝ1 ,0 , … ,̂N ,0 observed under apparently the same conditions and corresponding to known values 1 , … , N of a parameter . Let̂N +1,0 be a new estimate for which the parameter value N+1 is unknown and for which a reference distribution for N+1 centred on In both Fraser's frequency interpretation and in this extension, the * i,0 s are an answer to the question: what would the * i,0 values have to be for all thêi ,0 s to fall at̂N +1,0 and all the e i,0 s to be unchanged?
With a nonlinear pivot it is not possible to slide each y i,0 and i by the "same" amount as can be done with a linear pivot. However, a transformation comparable to a "motion" in differential geometry (Matsushima, 1972, Section II.9) is achieved. The suggested frequency interpretation extension is a class of transformations  ∶Θ p × Θ p −→Θ p × Θ p leaving errors invariant, with (̂N +1,0 , ) as the image, wherêN +1,0 is fixed and is variable. Since these transformations DOI: 10.1002/cjs The Canadian Journal of Statistics / La revue canadienne de statistique satisfy the equation  ∘ =  , each can be called a projection into the hyperplanê=̂N +1,0 . If the number of points projected is large, the result is essentially a DPE. Note that the set of points corresponding to a fixed error e in  p is is the inverse image set function of the point function T ∶Θ p × Θ p −→  p and where  indicates the power set of a set. Suppose that the partial function T( ⋅ , ) ∶Θ −→  p is also a bijection, that, for every fixed , every partial derivative of T(̂, ) with respect to the components of̂and and of T −1 (e, ) and T −1 (̂, e) with respect to the components of e,̂, and exists and is continuous. Thus, a projection  is comparable to a motion within the manifold T −1 ({e}), although an error is not a metric as in the differential geometry definition. If the e-space has dimension one, this "motion" is made along the curve T −1 ({e}). Given an error-event A and an estimatê When examining the relationship between Gauss's postulate and Neyman's repeated sampling axiom/principle for a scalar parameter, it is interesting to find that the "bundle of curves" Gauss's condition that "the only reason whŷ0 ≠ is the presence of error e" naturally supposes the potentially useful facts that: (i) the origin 0 p of E p is in  p and (ii) T(̂0, ) = 0 p iff =̂0 for any observation̂0. Thus, the following relation holds:

Margin of Error as a Posterior Credible Set Using an Improper Prior
Fisher (1959, p. 101) called the strong repeated sampling axiom of Neyman-Pearson theory "artificial" likely because it supposes that the unknown value of the parameter is controlled. Actually, to suppose that an unknown value is controlled is illogical and does not correspond to the situation met in scientific research. Furthermore, applying the strong repeated sampling principle rigidly leads to too many improper CIs, while if problematic confidence belts are excluded, too many problems are left without an answer.
Gauss's error postulate implies a necessary assumption concerning how parameter values occur in scientific work. This assumption, which is analogous to an improper Bayesian prior, appears in his first theorem in Gauss (1809;1855, Note I). Proposition 1 in Section 5 of the Web Appendix in the Supplementary Material expresses this assumption in modern statistical language. To observe the right average coverage frequency experimentally for an "error belt" made up of MEs (of the same level and on estimates from the same estimator) when the pivot is linear, one must simulate an almost improper prior to produce parameter proxy values instead of controlling parameter values as is done when using the strong repeated sampling axiom. Examples of almost improper priors are given in Section 6 of the Web Appendix in the Supplementary Material.
If the pivot is nonlinear, there always exists an improper prior for which the posterior would provide the same posterior credible interval as the corresponding ME at the same level. Thus a ME cannot be improper. Unfortunately, we can only know "a posteriori" what this prior is because it depends on the observation y 0 , being given by the Schweder and Hjort-Hannig-Fraser Theorem-from three different contexts-stated in Section 5 of the Web Appendix.
Dempster's conditioning-mentioned in Section 2.1-uses an extra assumption because an invertible pivot does not exist. It is assumed, however, that a conditional error measure does exist. It is easily confirmed by simulation that in Mandelkern's (2002) restricted mean problem, mentioned in Section 2.1, Dempster's conditioning solution has the proper average coverage frequency if a very small uniform prior density is assumed for ≥ 0, for 2 or for 4 .  (2008) and Bebu & Mathews (2008) have used algorithms based on the  2 distribution to generate estimation distributions. Here, I use Fisher's step-by-step argument (Fisher, 1959, pp. 120, 163-164, 171-175). Bebu & Mathew's method, based on generalized pivots, has much in common with the renewed theory of errors method so that the results of both are expected to be close. This is approximately verified when examining their ratio of bivariate-lognormal-means example. Based on 5,000 iterations and expressed with the "±" notation for each of their two versions of the generalized pivot, their 95% generalized CI is 0.7857 ± 0.4650 and 0.7550 ± 0.4488 for the ratio of means of two health care costs, with N = 98,x = 6.41,ȳ = 6.50, s 2 x = 2.73, s 2 y = 3.48, r = 0.450 (using the logs of the data). In comparison, the renewed theory of errors method gives 95% ME 0.733 ± 0.403 (Median DPE = 0.6248; Mean DPE ≈ 0.653). Bebu and Mathew pointed out that once their generalized pivots are established, they could estimate any (computable) parameter derived from the five parameters of the  2 distribution. Furthermore, their method provided numerical tests with larger power compared to the modified log-likelihood test in Barndorff-Nielsen (1991).
Using the prototype program included in the Software Web Appendix in the Supplementary Material, I can also produce MEs for any (computable) parameter. This is done by changing within the program the instruction = y ∕ x ( literally: "w = muy/mux") into = "new formula." To exemplify this I used the data on vitamin B-12 in Section 8 and the satellite parameters: = exp{( y − x ) + ( 2 x + 2 y − 2 x y )∕2} for the mean ratio of a bivariate lognormal distribution; = exp{( y − x ) + ( 2 y − 2 x )/2} for the ratio of means of a bivariate lognormal distribution; x 2 y )}/2 x y for the slope of the major axis of a bivariate normal distribution; =Sign( ) y ∕ x for the slope of the reduced major axis; = y − x for the y-intercept of the extended major axis; = x 2 y ∕( 2 x + 2 y ) 2 for the ratio of the lengths of the minor and major axis; v = u , =exp( ) for various values of u for the model II regression of v on u. DPEs could also be easily computed for the more than dozen parameters for which Berger & Sun (2008) have determined exact matching priors and associated constructive posteriors with exact frequentist matching. However, I am primarily interested in parameters for which one cannot have exact matching, as CIs for them can be improper.

An Algorithm Based on Fisher's Step-by-Step Procedure
The first step in establishing a DPE on any estimate of a (computable) derived parameter such as = y ∕ x is to identify an invertible pivot T(̂; ) = e for the five-dimensional problem corresponding to the five parameters and their jointly sufficient estimators for the bivariate normal distribution ( 2 ( x , y , 2 x , 2 y , x y )). Although Fisher did not describe explicitly all of his requirements for his argument, Hotelling (1953) provided a simplified notation clarifying which variables and parameters are present-or absent-at each stage of Fisher's procedure. The general process starts with p jointly sufficient estimatorŝ1, … ,̂p for parameters 1 , … , p and assumes that the marginal density of̂1, … ,̂i, for 1 ≤ i ≤ p, be (̂1, … ,̂i ; 1 , … , i ). The marginal density of̂1 is (̂1 ; 1 ); the conditional density of̂2 given̂1 is (̂2 |̂1 ; 1 , 2 ); …; and the conditional density of̂i given̂1, … ,̂i −1 is (̂i |̂1, … ,̂i −1 ; 1 , … , i ). Accordingly, the step-by-step argument uses successive probability integral transformations corresponding to distribution The Canadian Journal of Statistics / La revue canadienne de statistique Letx,ȳ, s x , s y , r be estimators of x , y , x , y , . The sample size is N and n = N − 1. To use a bivariate normal distribution, first apply the transformations x = 1∕2 e − ∕2 , y = 1∕2 e ∕2 , s x = a 1∕2 e −b∕2 , s y = a 1∕2 e b∕2 (Hotelling, 1953). The reverse transformations are = x y , = ln( y ∕ x ), a = s x s y , b = ln(s y ∕s x ). Since {x,ȳ}, {s x , s y , r} and standardized residuals are three independent sets of random variables, and since (x,ȳ) are sufficient for ( x , y ), two sets of parameters must be estimated-first , and and then x and y . The fast converging expansion of the density of r in Hotelling (1953) is central to the algorithm. It is . (2) Proposition 1 (Partial five dimensional pivot.). With the notation introduced above, (3) where 2 2n is a random variable with the chi square distribution with 2n degrees of freedom.
There are two possible pivots available for obtaining a DPE for estimates of x and y from the bivariate normal distribution ofx andȳ with means x and y , standard deviations x ∕ √ N and y ∕ √ N and correlation coefficient . Thus, if z 1 and z 2 are two independent error variables, each with the standard normal distribution, we may set The Canadian Journal of Statistics / La revue canadienne de statistique DOI: 10.1002/cjs These two distributions appear identical when simulated, although negligible sampling variations differentiate them. However, as our main objective is to construct a ME for a ratio of means, Equation (5) is preferable to Equation (6) since the numerator is usually compared to the denominator. Random numbers u 1 , u 2 , u 3 , z 1 , z 2 -thus simulated values for , x , y , x , y -are obtained 10 million times. A simulated value of, say, = y ∕ x is obtained at each iteration and a histogram bin is updated. Details may be obtained from a VBA for Excel (FORTRAN like) program in the Software Web Appendix in the Supplementary Material.
It is shown in the Appendix that the five-dimensional pivot is invertible.

USING A RENEWED THEORY OF ERRORS TO IMPROVE FIELLER CONFIDENCE SETS
Given a sample of size N from a  2 population, Fieller's pivot for constructing Fieller's CIs for is where the variable t in Equation (7) has the Student distribution with N − 1 degrees of freedom. For clarity, Proposition 3 describes the typology of Fieller's confidence sets in terms of sets of "Studentized means" (x * ,ȳ * ) with positive area, wherex * = √ Nx∕s x andȳ * = √ Nȳ∕s y . Confidence sets for a ratio of means are also described in Fieller (1954) and in Scheffé (1970).
Proposition 3 (Essential typology of Fieller's confidence sets.). With the above notation, and writing t for the upper 100 %-th percentage point of the Student distribution with N − 1 degrees of freedom, consider the planex * Oȳ * . The ellipsē has a major axisȳ * = sign(r)x * and is inscribed within the square max{|x * |, |ȳ * |} = t ∕2 . To determine a 100(1 − )% confidence set for based on the pivot in Equation (7), there are three regions with positive area determining the type of the resulting confidence sets: (i) the region interior to the ellipse in Equation (8) where the confidence set is the interval (iii) the region within which |x * | < t ∕2 but which is exterior to the ellipse in Equation (8)  A solution with an upper and a lower confidence coefficient using both a "fiducial" coefficient calculated from a confidence distribution and a confidence coefficient was suggested in Plante (1994). Tsao & Hwang (1998) applied an estimated confidence approach due to Kiefer (1977) to remove the awkwardness of calling (−∞, ∞) a "95% CI." Hannig, Wang & Iyer (2003) indicated an asymptotic approach. Liseo (2003) analyzed the ratio of means problem from different viewpoints: frequentist, likelihood and Bayesian. Moineddin, Beyene & Boyle (2003) investigated properties of various location quotient methods ("…a way of measuring the relative contribution of one specific area to the whole for a given [geographical] outcome"). Sherman, Maity & Wang (2011) compared various methods to estimate a ratio in a sample survey context. Figure 2 illustrates differences between standard CI solutions and renewed theory of errors solutions to the estimation problem of a ratio of means using the famous Cushny & Peebles data from Fisher (1925-1970, Section 26.2, since 1946) (N = 10,x = 2.33,ȳ = 0.75, s x = 2.002, s y = 1.789, r = 0.7952). I am assuming normality (probably falsely), correlation between the random variables behind the samples compared for the two most peaked curves, and (falsely) no correlation for the two less peaked curves. CIs are represented by approximate confidence densities derived from Fieller's pivot in Equation (7), while the DPE densities are derivatives of spline-smoothed empirical distribution functions derived from error simulations with 10 million iterations. Interestingly, the most peaked curve has longer and more asymmetrical tails than does the next highly peaked curve. (Details are shown in Table 1.)  Table 1 illustrates, using the Cushny and Peebles data, the suggested change to interval estimation of a ratio when the denominator is close to zero. The factor k is such that only the means are modified by the transformation (x,ȳ) −→ (kx, kȳ) so that kȳ∕kx is constant. One can see on line three of the table (which is beside the region where CIs cannot be used in lines one and two) that CIs can be extremely skewed. The median (unbiased) estimates, however, are more stable than are the ends of the CIs, which are closer to a singular point. This can be  [ME L, ME U]) for a ratio of means with large values of the coefficient of variation of the denominator estimate of mean. Symbol "M" indicates a median of a distribution. Data are the Cushny and Peebles data.
Symbol k indicates a common factor of both means, other estimates being invariant. Column M is the unbiased median estimate or 0+% interval. When the lower limit is larger than the upper limit, this indicates that the confidence set is the complement of the interval with the given limits. The precision achieved using 10,000,000 iterations in the simulations is indicated by the number of significant figures quoted for the MEs. seen by comparing the fourth and the seventh columns with the second and fifth columns of the table. Note that the line with k = 1 corresponds to the result quoted in Fisher (1925-1970, Section 26.2, since 1946. We see clearly in Table 1 [a, b] occurring at the 5% level. One could also obtain a CI of the form (−∞, ∞). Rare unacceptable CIs can also occur in repeated sampling at the 95% confidence level. Their overall effect is to decrease slightly the length of finite Fieller's CIs.

Jolicoeur's Pivot
Improper CIs are an incapacitating difficulty when estimating the slope = [ 2 x y of the major axis of a  2 population. Setting CIs for the slope is complicated by both a singularity of this derived parameter at = 0, and also, if ≠ 0, because = ±1 when x = y -side-effects due to singularities are to be expected. Also, the pivotal equation giving the desired CIs provides four solutions, as was described by Jolicoeur (1973) when exploring possible complications occurring with CIs having infinite length. Furthermore, there are warnings by software providers for researchers in biology (Warton et al., 2006) concerning using CIs for the slope of the major axis using Jolicoeur's method. Warton et al. suggest that the slope of the reduced major axis Sign( ) y ∕ x be used instead of to avoid estimation problems occurring with . DOI: 10.1002/cjs The Canadian Journal of Statistics / La revue canadienne de statistique I provide an example of a maximum likelihood estimate of the major axis of an  2 distribution and a CI based on that estimate where, in a routine data analysis, both the point estimate and the CI were found to be unacceptable-both visually and analytically. A renewed theory of errors approach appears definitely preferable to using a CI approach to estimate the slope of a major axis and related parameters-although the two approaches occasionally do give congruent answers.
The maximum likelihood estimator̂corrected for biased estimation of variances is obtained by substituting s x , s y , and r instead of x , y , and in the formula for . Anderson (1963) found asymptotic CIs for based on a chi-square distribution, and Jolicoeur (1968Jolicoeur ( , 1973 The pivot v has an F distribution with 1 and N − 2 degrees of freedom. The pivot in Equation (10) being improper, gives confidence sets for that are either an "inclusive" interval [o 1 , o 2 ]; an "exclusive interval" on the extended real number system-the complement of (o 2 , o 1 ) (when o 1 > o 2 ); or the interval (−∞, ∞). Jolicoeur (1973) distinguished 13 types of samples relevant for the four possible roots resulting from Equation (10).

The Amount of Vitamin B12 in the Blood of Mothers and Their Newborns
The routine data analysis in question shows how a standard CI method based on the maximum likelihood method and on Jolicoeur's pivot in Equation (10) can unexpectedly provide misplaced CIs and point estimates.
The data subset of vitamin B12 birth-levels for mothers and their newborn infants, used in the next example, is extracted from a data set with several hundred variables collected from 207 births in the Baudelocque maternity clinic in Paris during a 5-month period between 1984 and 1985 for an environmental study. Medical and environmental information, as well as further statistical analysis using these data, may be found in Fréry et al. (1992). The data were kindly provided by these researchers from the Institut National de la Santé et de la Recherche Médical de France.
I omitted 57 couples from the vitamin B12 data with either missing information or with dubious components in order to avoid being confronted with a mixed distribution not relevant for illustrating a difficulty in Jolicoeur's technique. What remained resembles a sample from a bivariate lognormal distribution. Using the lognormality assumption, I excluded one outlier couple in which the mother had too much vitamin B12 in her blood. (A remark in a Mayo Clinic website warned against this possibility often due to vitamin supplements.) Consequently, the intermediate data summary (before reducing the number of significant figures) is N = 149, x = 5.51774,ȳ = 6.38079, s x = 0.44264, s y = 0.616214, and r = 0.628373, wherex andȳ are the natural logarithms of the amount of vitamin B12 (pg/ml) in, respectively, the mother's blood and the newborn's (cord) blood, while r is the sample correlation coefficient between x and y, and s x and s y are the sample standard deviations of x and y. The (x, y)s in Figure 3 are assumed to resemble a random sample from a bivariate normal distribution.
In order to perform a chi-square goodness of fit test, the scatter diagram in Figure 3 was split into 20 classes. Assuming parameters equal to the corresponding estimates from the vitamin B12 data, class probabilities with a precision of at least four significant figures were calculated by simulation for simplicity. After merging classes having fewer than five expected observations with adjacent classes, but admitting one class with a frequency between 1 and 5, 14 classes remained and the chi-square test had eight degrees of freedom. The bivariate "sample" did not appear significantly different from one drawn from a bivariate normal distribution (P = 0.313).
The Canadian Journal of Statistics / La revue canadienne de statistique DOI: 10.1002/cjs FIGURE 3: A misplaced CI for the slope of a major axis. Vitamin B12 in blood at birth data on logarithmic scales (x for mother, y for newborn) with lines through the point (x,ȳ) with slopes corresponding to the 95% anomalous CI for the slope of the major axis and to the maximum likelihood estimate of the major axis. Furthermore, both sample skewness and kurtosis coefficients for marginal distributions of log-observations are moderate. Interestingly, as shown in Figure 2 in Fréry et al. (1992), the conditional distributions of the amount of vitamin B12 in mothers' blood, given their ages, do not generally follow simple lognormal distributions. Figure 3 shows, using logarithmic scales, the scatter diagram of vitamin B12 birth-level data in mothers' blood (x) and in their newborns' cord blood (y), with three lines through the point (x,ȳ); with slopes corresponding to the 95% CI ends; and, to the maximum likelihood estimate of the major axis slope. The scatter diagram major axis is within the 95% confidence cone, but the cone extends too much on the left towards the minor axis. Furthermore, the maximum likelihood   (1963) where Jolicoeur's method succeeds (dotted line). According to Jolicoeur (1973), only the right side of the graphs should be considered. major axis estimate "visually" misses the scatter major axis. Figure 4-which uses original coordinates-shows that the results obtained using the renewed theory of errors is as expected.

Misplaced Maximum Likelihood Estimate and CI for the Slope of a Major Axis
Note that the horizontal and vertical scales are different in both figures. The 95% ME for the slope is [1.3724, 1.6062] (DPE median = 1.48513) and the 95% CI is [1.37523, 2.07054] (maximum likelihood estimate = 1.67082).
The p-value function for these data, given in Figure 5, explains why the CIs based on maximum likelihood estimates for this problem are unsatisfactory. A 10% CI for the slope of the major axis would be empty! A correction of corresponding side effects due to singularities affecting CIs with finite length is beyond the scope of a detailed analysis in this article. However, the main problem with using CIs for the slope of the major axis of a bivariate normal distribution is that "it is not always evident whether a particular CI is trustworthy" since it may be offset and elongated due the complex effect of several singularities. While Jolicoeur's technique is unsatisfactory with B12 vitamin in mothers' and newborns' blood data, the following example illustrates that it can also give a "good" answer. Using data in Jolicoeur (1963) concerning humerus dimensions of male Martes Americana, with N = 92, s x = 0.014276, s y = 0.010744, and r = 0.593854 (before reducing the number of significant figures due to data imprecision), the resulting 95% CI and 95% ME for are congruent. The 95% CI is 1.70 ± 0.48 with a median unbiased estimate equal to 1.60 and the 95% ME is 1.70 ± 0.41 with a median DPE equal to 1.60.
The suggested correction to CIs for the slope of the major axis of a  2 distribution is to replace them by MEs based on a renewed theory of errors as is illustrated in Figure 4 using the original coordinates.

Users of ''Fiducial'' Techniques
Manipulating error variables according to "fiducial" techniques (i.e., probability laws) was first done by Gauss (1821, 1855 and by Fisher (1935Fisher ( , 1959. Fraser (1968) used hundreds of such manipulations. Formal fiducial techniques are also used in Bunke, H. (1975), Bunke, O. (1976), Barnard (1977), Plante (1979aPlante ( , 1979bPlante ( , 1994, and Dawid & Stone (1982). More recently, Weerahandi (1993Weerahandi ( , 2013, Iyer, Wang & Mathew (2004), Hannig (2009) and Hannig et al. (2016) performed fiducial as well as other procedures on formal random variables to obtain approximate The Canadian Journal of Statistics / La revue canadienne de statistique DOI: 10.1002/cjs confidence distributions using simulation. Reid & Cox (2015) associated these approaches by Weerahandi, by Hannig, and by their collaborators with Cox's confidence distribution approach. Hannig & Xi (2012) have indicated relationships between manipulation rules in the confidence distribution approach, in the Dempster-Shafer theory, and in generalized fiducial inference. Mauldon (1955), Lindley (1958), Stein (1959), Buehler & Feddersen (1963) and Dawid, Stone & Zidek (1973) have provided counterexamples to fiducial techniques. As this section explains, the renewed theory of errors is not affected by these objections-although using integration to eliminate unwanted parameters. Kolmogorov (1956, pp. 14-15) proved that his system of axioms for probability is consistent, and commented that "Our system of axioms is not, however, complete, for in various problems in the theory of probability different fields of probability have to be examined" (Kolmogorov 1956, p. 3). This incompleteness has several implications. One must be aware that when applying Kolmogorov's formalism to a statistical problem for which both a pre-data probability model and a post-data renewed theory of errors model are needed, "the same formalism is being applied to different -fields." Moreover, within error analysis itself, one must not pass inadvertently from one field into another one by algebraically modifying an expression referring to the observations. Paradoxes in Mauldon (1955)-as pointed out by Fisher (1959) in the last sentence of his book-as well as marginalization paradoxes in Dawid, Stone & Zidek (1973) indicate an inadvertent crossing from one field into another field when one attempts to create a paradox against a renewed theory of errors. Thus, using this type of paradox against a renewed theory of errors would be invalid. In other words, this so-called "paradox" would exhibit a situation in which one has intuitive expectations which are not supported by Kolmogorov's theory of probability. In fact, Fisher (1960) solved a similar alleged paradox claimed by Lindley (1958).

Objections to ''Fiducial'' Techniques
Problematic inconsistencies can also exist within an "alleged" Gaussian theory of errors approach. For example, Buehler & Feddersen (1963) and Brown (1967) used Buehler's version of fiducial terminology such as "relevant subset," in which observations would not be treated as mathematical constants in post-data algebraic manipulations (Buehler, 1959(Buehler, , 1980, whereas an observation y 0 should be so treated within a theory of errors, and within a fiducial approach. Consequently, Buehler and Feddersen's conclusions have no relevance for criticizing a theory of errors. Stein (1959) introduced the noncentrality coefficient underestimation problem treated in Section 2 with the added condition that 2 = o(p 2 ).

Dismissal of Stein's Fiducial Example
He then showed that under his added condition in Equation (11), the coverage frequency of his alleged fiducial solution, has limiting value zero as p → ∞, where d 2 0 is the observed value of variable d 2 . However, any bounded (measurable) fixed subset of Θ such as S p ( p) = {( 1 , … , p ) ∶ 2 = ∑ p i=1 2 i ≤ 2 p 2 }, with a fixed p and-because of Equation (11)-for any (small) 2 , has a prior probability equal to zero when the prior distribution is "uniform" and improper in R p . With a large enough hypersquare K p (k) with a side length 2k, in order to have simultaneously 2 ∕p 2 < 2 and V{S p ( p)}∕V{K p (k)} < , for any > 0, where V( ⋅ ) is the hypervolume measure, it is sufficient to take k larger than p∕[{ Γ(p∕2 + 1)} 1∕p ( ∕4) −1∕2 .] (Cramér, 1946, p. 120 (11), has a negligible effect on 100(1 − )% improper Bayesian credible intervals for the problem. This is true for MEs as well according to Section 5.4. Furthermore, two Monte-Carlo experiments described in the Web Appendix in the Supplementary Material confirm that, without the condition in Equation (11), the 100(1 − )% MEs in Equation (12) are also 100(1 − )% improper Bayesian credible intervals-although the speed of convergence of the coverage frequency decreases as p increases, as predicted by Pinkham (1966). It is remarkable that the mean of the DPE on 2 is not a consistent point estimator of 2 . Better point estimators are considered in Evans & Shakhatreh (2014).
When comparing the results of these Monte-Carlo experiments and Pinkham's (1966) conclusion with Stein's result, it is clear that the convergence to zero of the coverage of the interval in Equation (12), subject to Stein's condition in Equation (11), is due to this condition and definitely not only to fortuitous errors-as is required in order to use Gauss's error postulate. Therefore, Stein's alleged fiducial interval in Equation (12) cannot be a ME as used in a renewed theory of errors.
Note that Equation (1) in Section 4.2 also provides a ME for Stein's noncentrality coefficient underestimation problem.

Marginalization is Unavoidable but Manageable
Stein's (1959) reckless use of marginalization in his counterexample (he has more parameters than observations!) has unfortunately given the impression that marginalization is a dangerous method that can get out of control. This is a gross exaggeration. Fieller's pivot to estimate a ratio of means and Jolicoeur's pivot for a major axis slope are improper; whereas, the method demonstrated here to improve corresponding confidence belts is to use a joint proper pivot for the five parameters of a  2 distribution. This is followed by a marginalization process. Marginalization is also the natural method of eliminating unwanted parameters in a Bayesian approach to a problem.
Often a proper pivot for a particular parameter does not exist-as Gleser & Hwang (1987) have shown. To be able to find a proper pivot in a space of higher dimension capable of providing a DPE for that parameter by using marginalization is then invaluable-provided one is willing to adopt the Gaussian error postulate with its extended Fraser's frequency interpretation as a guideline. A comparison of Figures 3 and 4 emphasizes that a unique sample criterion-like that of the theory of errors-has definite advantages over the long-run criterion of CI theory.
There are rare occasions when more than one pivot exists to estimate a parameter. For example, using the Cushny & Peebles data when assuming normality, the pivot for the five parameters of the  2 distribution provides a ME 1.580 ± 0.9622 for Δ = x − y , which is 9% longer than the usual 95% CI (or ME) 1.580 ± 0.8800 from the pivot (D − Δ, 9s 2 D ∕ 2 D ) for the two parameters Δ and D , where D i = x i − y i , (i = 1, … , 10). This is 3% longer per marginalized real parameter. It is not difficult to choose between these two interval estimates using an efficiency criterion. A comparable situation occurs with Behrens-Fisher MEs and approximate CIs (or MEs) based on the Welch-Satterthwaite approximate pivot. When multiplying each mean of a ratio of means by a factor of 15 using the Cushny-Peebles data, the 95% Fieller's CI interval for the ratio = 15 y ∕15 x is 0.321235 ± 0.027356 while the 95% ME is 0.32128 ± 0.02813. Marginalization thus produced an almost 3% (manageable) increase of the interval's length.

DISCUSSION
Fisher's criterion of mutual logical consistency of estimation statements (Fisher, 1934Wilkinson, 1977;Barnard, 1981;Plante, 1984Plante, , 1991Plante, , 1994 has been largely ignored by statisticians except when improper CIs occur. This article provides an alternative to using improper CIs. The technical difficulties in using these CIs (tacitly denounced by Tukey, for a ratio of means in Section 7; and for the slope of a major axis in Section 8. Gauss introduced error analysis as one of the most fertile fields of application of mathematics to natural phenomena-although the theory of errors is presently underdeveloped when compared to its proper and improper Bayesian and Neyman-Pearsonean neighbours. A review of problems which can be treated using what is essentially Gauss's method would be an encyclopedic endeavour.
Many current problems need attention such as: •  & Hwang (1987) and in Batschelet (1981, Section 5.3 Two significant theoretical problems are: • Estimation of the parameters of the three-dimensional and multi-dimensional normal distributions could be useful. One could use Fisher (1962)

CONCLUSION
Joseph Bertrand added a "prophetic" remark to his translation of Gauss's work on the theory of errors (Gauss, 1855), when stating that Gauss was aware of critical judgements about his theory, but that Gauss firmly believed that geometers would entirely adopt his ideas once his memoirs, then quite rare, would be more publicized.

APPENDIX
Proof of a statement in Section 2.1 identifying improper CIs for a noncentrality parameter when the number of degrees of freedom tends to infinity. In Section 2.1 it is stated without proof that uniformly most accurate 100(1 − )% CIs for the noncentrality parameter 2 of a noncentral chi-square distribution can be improper with probability 1 − at the limit if 2 = o(p 1∕2 ). The following Theorem 1 clarifies this statement. Let d 2 be a random variable with a noncentral chi-square distribution with p degrees of freedom and noncentrality parameter 2 , and let F(d 2 ; 2 , p) be the distribution function of d 2 . Suppose that 2 must satisfy the condition The natural 100(1 − )% CIs of form [ 2 − (d 2 ), ∞) are the inverse images F −1 {d 2 ; (0, 1 − ], p} of the partial point function F(d 2 ; ⋅ , p) when the first and the third arguments are fixed.
Theorem 1 (Limiting behaviour of natural CIs for 2 ). Under the condition in Equation (A1), (i) the probability that the natural 100(1 − )% (0 < < 1) CIs of the form [ 2 − , ∞), for the noncentrality parameter 2 of a noncentral chi-square distribution with p degrees of freedom, be equal to [0, ∞) is 1 − at the limit when p → ∞; (ii) the conditional probability that the natural 100(1 − )% CIs cover 2 , given that they are not equal to [0, ∞), is zero at the limit when p → ∞.
Proof of Theorem 1. Without loss of generality, suppose that y 2 i > 0 for all i. (i) Stein (1959) pointed out that the noncentral chi-square distribution with p degrees of freedom and noncentrality parameter 2 is approximated by a  1 (p + 2 , 2p + 4 2 ) distribution for large p-the approximation being uniform in 2 . Expression (10) in Stein (1959) for the natural CI [ 2 − , ∞), when translated into the above notation (and using 1 − instead of Stein's ") is [F −1 (d 2 ; 1 − , p), ∞). This should be replaced by the inverse image F −1 {d 2 ; (0, 1 − ], p} since the partial function F(d 2 ; ⋅ , p) has range (0, F(d 2 ; 0, p)] (I have assumed that each y 2 i is positive.). The probability that the 100 using the Cornish-Fisher expansion (Abramowitz & Stegun, 1964, 1972, the normal approximation gives S ≈ F(p + z (2p) 1∕2 ; 2 , p). Thus, because of the condition in Equation (A1), Because of the condition in Equation (A1) and of part (i) of the theorem, one can conclude that The proof of the statement in Section 2.1 is now complete.

Proof of Equation (1) in Section 4.2.
Assume that  = {d 2 } = (0, ∞) (d 2 > 0, without loss of generality), Θ = { 2 } = [0, ∞),  = {e} = (0, 1), and that the distribution Q( ⋅ ) has a uniform density over -events. The model where F −1 ( ⋅ ; 2 , p) is the inverse of F( ⋅ ; 2 , p), is a "structured model" (Fraser, 1971). It is known that the family of distributions of d 2 is strictly ordered according to the nondecreasing The Canadian Journal of Statistics / La revue canadienne de statistique DOI: 10.1002/cjs likelihood ratio ordering, thus is strictly stochastically ordered. This implies that the model in Equation (A2) is additive (Plante, 1979a), i.e., ∀e, F −1 ({e}; 2 , p) ∩ F −1 ({e}; ′2 , p) = ∅ if ′2 ≠ 2 , where now F −1 ( ⋅ ; 2 , p) is the inverse image set function of F( ⋅ ; 2 , p). Therefore Dempster's upper and lower error measures (or probabilities) are equal. The "set of possible antecedents," after observing 1)} is the inverse image set function of the point function F(d 2 0 ; ⋅ , p) when the first and third arguments are fixed, and where ( ⋅ ) designates the class of all subsets of the argument. This set of possible antecedents has a positive length because of the assumption that d 2 0 > 0. The unique "cofunction" (Plante, 1979a) is the point function The following statement holds for any -event A (Plante, 1979b, Theorem 1): By definition, the "structured measure," i.e., the conditional error distribution, given ). This conditional error distribution has a uniform density on the error-interval To check that observation errors and coverage errors are not far apart when using Equation (1) in the main article, I made an elementary Monte-Carlo experiment at the 95% level based on 100,000 iterations, using the "null" distribution F(d 2 ; 20, 18). The resulting coverage frequency was: 0.949.

∕2
since the casex * 2 = t 2 ∕2 occurs with probability zero. The solution of = 0, when it exists, and ifx * 2 ≠ t 2 ∕2 , where is defined in Equation (A6), is determined by the values = s y s x (x * 2 − t 2 ∕2 ) When the divisor s x (x * 2 − t 2 ∕2 ) changes from being positive to negative asx * 2 decreases, the set of s satisfying Equation (A8) changes from an interval to become the complement of an interval-except for the two points for whichȳ * = ±rt ∕2 where it becomes (−∞, ∞). It is not necessary to consider points on the linesx * = ±t ∕2 and points on the boundary of the ellipse in Equation (A7) since they occur with probability zero. ◼