Motivating probability as rational degree of belief

“Probability does not exist” – de Finetti (1946)

What did the scholar de Finetti mean by this? Why does it matter?

  1. Motivation
  2. Examples (to follow in next post)
  3. Applications (to follow that)

Section One: Motivation

Here, I look at Bayesian subjective or conditional probability as extended logic. I compare it with orthodox, frequentist ad-hoc statistics. I look at the pros and cons of probability, utility and Bayes’ logic and ask why it is not used more often.

In the title, I have used a quote of de Finetti, a Roman scholar known for his intellect and beautiful writing. He meant that your probability of an event is subjective (up to a point) rather than objective. That probability does not exist in the same way that the ‘ether’ that scientists thought existed before the Michelson-Morley experiments of the early 20th century.

Probability is relative not objective. It is a function of your state of knowledge, the possible options you are aware of, and the observed data that you may have, and which you trust. When these have been used up we equivocate between the alternatives. We do this in the sense that we choose our probability distribution so as to use all the information we have, not throwing any away, and so as not to add any more ‘information’ that we do not have. As you find out more or get more data, you can update your probabilities. This up-to-date probability distribution is one of your key tools for making decisions. Many people don’t write it down. Such information may be tacit. If you try to work with probability, it is likely that you may not be using the above logic, i.e. probability theory. You may be making decisions by some other process: be aware!

Recently, I attended a lecture by the Nobel Laureate, Professor ‘t Hooft. He won the prize in the late 1990s for his work with a colleague on making a theory of subatomic particle forces make sense. At the lecture, he expounded a newer theory in which everything that happens is determined in advance. This is called the ‘1/N expansion’. It’s just a theory.

Why did I tell you that? Well, let us go back to de Finetti. Since we can never know all the ‘initial conditions’ in their minute detail, then our world is subjective, based on our state of knowledge, and this leads to other theories, including that of probability logic, which is my topic here.

As human beings, we find this situation really tricky. There may be false intuition. There may be ‘groupthink’. Alternatives may be absent from the calculations. The famous ether experiment mentioned above is an example of the great majority of top scientists (physicists), in fairly modern times, believing in something that turned out later literally to be non-existent, like the Emperor’s New Clothes.

In the ‘polemic’ section of his paper about different kinds of estimation intervals (1976), the late, eminent physicist, E T Jaynes, wrote ‘…orthodox arguments against Laplace’s use of Bayes’ theorem and in favour of “confidence Intervals” have never considered such mundane things as demonstrable facts concerning performance.’

Jaynes went on to say that ‘on such grounds (i.e. that we may not give probability statements in terms of anything but random variables*), we may not use (Bayesian) derivations, which in each case lead us more easily to a result that is either the same as the best orthodox frequentist result, or demonstrably superior to it’.

*In his book, de Finetti avoids the term ‘variable’ as it suggests a number which ‘varies’, which he considers a strange concept related to the frequentist idea of multiple or many idealised identical trials where the parameter we want to describe is fixed, and the data is not fixed, which viewpoint probability logic reverses.

Jaynes went on: ‘We are told by frequentists that we must say ‘the % number of times that the confidence interval covers the true value of the parameter‘ not ‘the probability that the true value of the parameter lies in the credibility interval‘. And: ‘The foundation stone of the orthodox school of thought is the dogmatic insistence that the word probability must be interpreted as frequency in some random experiment.’ Often that ‘experiment’ involves made-up, randomised data in some imaginary and only descriptive rather than prescriptive model. Often, we can’t actually repeat the experiment directly or even do it once! Many organisations will want a prescription for their situation in the here-and-now, rather than a description of what may happen with a given frequency in some ad hoc and imaginary model that uses any amount of made-up data.

Liberally quoting again, Jaynes continues: ‘The only valid criterion for choosing is which approach leads us to the more reasonable and useful results?

‘In almost every case, the Bayesian result is easier to get at and more elegant. The main reason for this is that both the ad hoc step of choosing a statistic and the ensuing mathematical problem finding its sampling distribution are eliminated.

‘In virtually every real problem of real life the direct probabilities are not determined by any real random experiment; they are calculated from a theoretical model whose choice involves ‘subjective’ judgement…and then ‘objective’ or maximum entropy calibration of what we don’t know. Here, ‘maximum entropy’ simply means not putting in any more information once we’ve used up all the information we believe we actually have.

‘Our job is not to follow blindly a rule which would prove correct 95% of the time in the long run; there are an infinite number of radically different rules, all with this property. Things never stay put for the long run. Our job is to draw the conclusions that are most likely to be right in the specific case at hand; indeed, the problems in which it is most important that we get this theory right or just the ones where we know from the start that the experiment can never be repeated.’ (See blog three in this series for some application sectors.)

‘In the great majority of real applications long run performance is of no concern to us, because it will never be realised.’

And finally, E T Jaynes said ‘the information we receive is often not a direct proposition, but is an indirect claim that a proposition is true, from some “noisy” source that is itself not wholly reliable’. The great Hungarian logician and problem-solver Polya deals with such situations in his 1954 works around plausible inference.

Most people are happy to use logic when dealing with certainty and impossibility. This is the standard framework for trillions of pounds worth of electronic devices, for example. Where there is uncertainty between these extremes of logic, let us use the theory of probability as extended logic.

I will next post a second letter here, giving examples of how probability logic works, as compared to frequency statistics.

If you’d like to contact me about the above letter, please write to ‘teiresaas’ at ‘cantab’ dot ‘net’

CIR > Bayes Task Group > Letter 1 (of 9)