The 250 Year Dispute About Bayes Rule - BSQ Research

The 250 Year Dispute About Bayes Rule

In an article published in the Bulletin of the American Mathematical Society in 2012, the famous American statistician Bradley Efron celebrated the upcoming 250th anniversary of Bayes Rule, a fundamental principle of inference in mathematical statistics. The proper use of Bayes Rule has been a matter of contention for two and a half centuries, with the Bayesian and frequentist schools of thought waging a still unresolved philosophical war that has many practical implications for scientific and mathematical practice. We provide below the principal points of Efron’s argument and his proposal that data intensive methods can mediate between the Bayesian and frequentist camps of statistics,.

Efron starts with a practical inference problem as an illustration. A couple has been told that they are going to have twin boys. What is the probability that the twins will be identical. given that prior odds of fraternal vs. identical twins are 2:1? These prior odds mean that Prob{Identical}/Pr{Fraternal} = 1/2. We also know that these twins are of the same sex. This likelihood ratio of this event is given by Prob{same sex|Identical}/Pr{same sex|Fraternal} = 1/(1/2) = 2. Bayes Rule tells us that Posterior odds = Prior Odds * Likelihood ratio = (1/2)*2 = 1. Therefore we conclude that the twins are equally likely to be identical or fraternal.

This is an example of Bayesian inference which involves an unknown state of nature θ that we wish to learn more about, prior beliefs about θ that can be expressed as a probability distribution π(θ), an observation x that tells us something about θ and a probability model f(θ,x) that says how x is distributed for each value of θ. Bayes rule states: π(θ|x) = cπ(θ) · f(θ,x) where c is a normalising constant. The crucial elements are π(θ), the prior belief distribution, and the likelihood f(θ,x), which is a function with the observed data x held fixed and the parameter θ varying.

There would be no controversy about the Bayes Rule if the prior belief in question is a genuine one, as in the twins example above. However, in most cases, there is little prior experience to go upon in practice, and the disagreement between Bayesians and frequentists hinges on how prior beliefs are formed. Following Laplace and Jeffreys, Bayesians have often used so-called uninformative priors or flat priors, but this option has often been frowned upon, and frequentists like Keynes, Fisher and others and have pointed out that the use of non-genuine priors could lead the analyst completely astray.

Frequentists take a different view of things. Instead of a prior distribution π(θ), they focus on a statistical procedure t(x), which is an estimate or a confidence interval, or a test statistic or a prediction rule. In this case, scientists attempt to find an optimal rule t(x) that is true, say 95% of the time, irrespective of the true value of the unknown parameter θ. The method is called frequentist because it is based on procedures that are justified using high frequencies of occurrence. The optimal frequentist methods for parameter estimation were first derived by Sir Roland Fisher, and for testing and confidence intervals by Jerzy Neyman  This Fisher-Neyman frequentist method became the dominant paradigm of statistics in the 20th Century and has been used extensively in all the sciences from physics and astronomy to psychology, economics and sociology.

However, Bayesian methods can be, in principle, far better efficiency and they have built-in optimality properties. They can perform much better than frequentist methods and provide much narrower confidence limits if some reasonable prior beliefs can be formed about the parameter distribution. In order to apply such a Bayesian method, consider the problem of determining some parameter of a two variable problem that is well described by a bivariate Gaussian distribution. This distribution is described by five free parameters, two means, two variances and the covariance between the two variables. In general, a parameter is a function of all five free parameters and therefore its determination is complicated by the need to determine four extra parameters that are called nuisance parameters. A Bayesian procedure can get around the problem by integrating over the nuisance parameters, but using of Jeffreys priors to do so is not optimal, and biases confidence limits away from the optimal frequentist limits.

Modern data intensive methods promise a way out of the dilemma. On the one hand, a bootstrap is a method to find frequentist confidence limits using simulation. It proceeds by generating artificial data sets by sampling repeatedly, and with replacement, from the estimated sample distribution. Then the variation of the estimate over the bootstrap samples provides confidence estimates.

Automated computationally intensive methods are also available for Bayesian calculations. Given the prior and the data, the Markov Chain Monte Carlo (MCMC) procedure produces samples from an otherwise mathematically intractable posterior distribution π(θ|x).

A combination of the two methods can be found in the so called Empirical Bayes method. We know that the marginal density of a parameter is the noise in the parameter integrated with respect to the prior density. The true posterior density can only be determined if the prior density is known, an impossibility according to frequentists, although Bayesians are probably willing to assume some sort of a Jeffreys type prior. However, if there is a large amount of data, we can arrive at a two-step parametric solution. Suppose the noise in the observation is additive and normal. Then the marginal density of the posterior can be calculated by smoothing the histogram of the empirically observed parameters. This gives us an estimate of the prior density which can then be used to derive the posterior density of any parameter. This is true in general, whenever the posterior distribution can be described parametrically and thus we have a frequentist estimate of a Bayes prior!

To sum up, Efron compares the two statistical philosophies:

Bayesian practice is bound to prior beliefs, while frequentism focuses on the behavior of estimates. The Bayesian requirement for a prior distribution, is rejected by  frequentists. On the other hand, frequentist analysis begins with the choice of a specific method, which is artificial and incoherent.

Bayesianism is a coherent and fully principled philosophy, while frequentism is a grab-bag of opportunistic, individually optimal, methods.

Bayesians consider only one posterior distribution, but frquentists must consider  a family of possible distributions.

The simplicity of the Bayesian approach is especially appealing in dynamic contexts, where data arrives sequentially, and where updating one’s beliefs is a natural practice. Thus, the Kalman filter is best formulated as a Bayesian problem.

In the absence of genuine prior information, Bayesian methods are inherently subjective.

Bayesians aim for the best possible performance versus a single (presumably correct) prior distribution, while frequentists hope to due reasonably well no matter what the correct prior might be.

Bootstrap methods are an attempt to reduce frequentism to a one-distribution theory and there are deep connections between Bayesianism and the bootstrap.

Leave a Comment

Your email address will not be published. Required fields are marked *