When to use the Bayesian approach

In the following situations, I might want to use Bayes’ approach:

• I have quantifiable beliefs beforehand. These may come from internal experienced colleagues, external ‘experts’, or other subjective sources.

• The data may be ‘sparse’ or limited (presently or for the foreseeable), certainly not ‘big’ , and it often will, but may not dominate our prior, subjective beliefs.

• There is medium or high uncertainty involved.

• I wish to make consistent, sound decisions in the face of and acknowledging my uncertainty.

• I wish to do this in such a way that I can be honest with my stakeholders, shareholders, team, wider staff, investors, board, and so on.

• The model or data-generation methods will involve one or multiple parameters (such as profit, share price, average customer lifetime, transaction value, sales, cost, COGS, and so on).

• I cannot [wait to] trial in an idealised experiment. In dynamic environments, this is one of the key problems with frequentist approaches: we never have the same situation and data twice. The Bayesian approach naturally revises and updates.

• I want to know what it is best to do, or understand what the options are and which ones are better or worse for me and my team in the here and now, for *this *occasion and situation. In life, it’s rare to be able to wait for ‘the long-run’, but it is often the case that using recent prior data can be useful.

• I want to use all the new data available to me, and be able to eliminate noise as best I can.

• I don’t want to choose an arbitrary approach, I want to use logic; I want the logic of the inferences to be ‘leakproof’ and only the assumptions can be inappropriate. Throughout this book, we’ll see some simple and more complicated examples of using logical probability.

Finally, Bayesian methods keep *type*. As Jaynes (1976) explained, if the data used is imaginary or pseudo-random, the probability distributions will be imaginary or pseudo-random, and if the data is real data, the probability distributions will relate to real data, e.g. real frequencies, then the probability outputs will be real frequencies, if the prior data is taken from what is reasonable to believe, then the out probabilities will also represent what is reasonable to believe, and so on. . . Summary: the outputs will be of the same character as the inputs.

We first compare approaches to statistics and probability.

**• ****Comparing the Frequentist and the Bayesian approaches to probability**

*In idem flumen bis descendimus, et non descendimus* – Horace, via Seneca, L. A., Epistulae Morales LVIII.23

Frequency is the description used of the statistics that are still the most commonly used. Here we define frequency and compare the frequentist with the Bayesian approach. The frequency definition of probability is the orthodoxy. It is defined as the number of successes say, m, in a large number of identical trials n, i.e. the probability is taken to be the frequency: m/n . There are laws of (large) numbers that lead us to believe that for high enough n, we shall have a good description of the propensity of an event happening.

However, a problem with frequency statistics is highlighted by analogy in the above saying attributed to the poet Horace, and by the apocryphal Buddhist monks. The river changes; we never step into the same river twice, though we go down to the ‘same river.’

In the table below comparing approaches, we see the dynamics of what is being modelled, i.e. ‘reality’, is best approached so that the model changes in real time with the latest information, rather than being descriptive and noting the unusualness of sample or batch information. One is subjectivist and relativist, while the other remains objectivist. We have seen how subjectivist theories like quantum mechanics and general relativity have superseded what went before. These are two very finely-tested theories. Is this is the moment subjectivist approaches in probability logic will arrive?

**Summary comparison: Bayesian vs Frequentist Approaches**

Bayesian | Frequentist |

Inferential, prescriptive | Descriptive |

The here and now, and the next… | Long-run behaviour, hoping things persist as-is |

Useful, intuitive result | Possibly large number of conflicting results |

Elegant, simple mathematics | Arbitrary convention & complexity |

Weight of evidence, credibility intervals | Significance, `p-values’, Confidence intervals |

Probability as rational degree of belief | Probability as frequency of occurrence |

Leakproof, logical probability theory | Ad hoc devices, possibly irrelevant information |

Equivocation, best model choice | Model then test samples |

Unique outcome of one experiment | Accept or reject batch vs population |

Emphasises revision as data comes in | Notes the sample data |

Data fixed, parameters unknown | Data is just one of many possible realisations |

Unknowns can be constants | Unknowns are random variables |

Doesnt apply in all situations | Ditto, but works most of the time with minimal assumptions |

Use all of the data, optimally | Often does not use all the data or fully |

Doesnt require us to understand degrees of freedom or sufficient statistics | We must understand and compute the degrees of freedom |

Common sense results, transparently inappropriate inference tracks back to the assumptions | Sometimes non-common sense results or failure occurs without obvious recourse or poor inference |

Focus on the scientific mathematical or business merits | Focus on overcoming technical difficulties of the methods |

Table: Comparison of Bayesian (left column) and Frequentist (right column) methods

I have deliberately left out the somewhat contentious issue (to some) of ‘Prior’ distribution selection, but cover this issue in my book:

*The Art of Decision*, out soon with Big Bold Moves Publishing.