Understanding Subjective Probabilities

Any time a controversial prediction about the future comes up, there's a type of interaction that's pretty much guaranteed to happen.

Alice: "I think this thing is 20% likely to occur."

Bob: "Huh? How could you know that. You just made that number up!".

Or in slightly more technical Twitterese:

Our posteriors come from our priors, but our priors come from our posteriors.
— Noah Smith 🐇🇺🇸🇺🇦 (@Noahpinion) June 10, 2021

That is, any time someone attempts to provide a specific numerical probability on a future event, they'll inundated with claims that that number is meaningless, and they just made it up. Is this true? Does it make sense to assign probabilities to things in the real world? If so, how do we calculate them?

Interpretations of probability

The traditional interpretation of probability is known as physical probability, with two main variants: frequentist probability and propensity probability. Under frequentism, the chance of an event happening is equal to the fraction of times it's occurred among all similar events in the past. Under the propensity interpretation, items have some intrinsic "quality" of being some % likely to do one thing vs. another. For example, a coin has a fundamental probabilistic essence of being 50% likely to come up heads when flipped.

These are useful mathematical abstractions, but they break down when applied to the real world. There's no fundamental "50% ness" to a real coin. Flip it the same way every time, and it will land the same way every time. Flip it without perfect regularity but with some measure of consistency, and you can get it to land heads more than 50% and less than 100% of the time. In fact, even coins flipped by people who are trying to make them 50/50 are actually biased towards the face that was up when they were flipped. This intrinsic quality posited by the propensity interpretation doesn't exist in the real world; it's not encoded in any way by the atoms that make up a real coin, and our laws of physics don't have any room for such a thing.

Frequentist interpretations run into their own problems. What group of real-life events counts as "similar enough"? How do we factor in additional information about the event? Say Alice is considering starting a company, and wants to know the chance that it succeeds. She could look at the history of past companies and try to calculate the fraction that succeeded, but what if there's something different about hers? Maybe she's come up with a new business plan that's never been tried before, or maybe society has changed over the years such that something that would have worked 20 years ago won't work anymore.

And there are philosophical problems here too; does frequentism mean that the first time someone ever flipped a coin, it had no particular probability of landing heads? Also, our universe is deterministic, meaning that if we know the starting conditions, the end result will be the same every time.That's not quite true; quantum mechanics may have some fundamental randomness, if the Copenhagen interpretation is correct. But if the many-worlds interpretation or superdeterminism is true, then there's isn't. This also doesn't particularly matter, because as long as it's logically possible to conceptualize of a world that's deterministic, we'd need a theory of probability that works in that world. So the choice of what set of "similar" events to use seems somewhat arbitrary; if you make them too similar, the coins start coming up the same way each time!

One potential solution is the Bayesian interpretation of probability.There are modifications to frequentism and propensity that can potentially also solve these issues, but applying these modifications makes them start behaving very similarly to Bayesianism in practical cases. This article is intended to help people understand how to use probability in the real world, it's not trying to rigorously refute non-Bayesian philosophical interpretations of probability. (And Bayesianism has its own flaws anyway, like its inability to rigorously define a credence in a non-tautological way.) So I'm eliding the more nuanced distinctions here and using the Bayesian framing, since I think "how much information do I have about the event" tends to be the most useful way to think about probabilities for the purposes of practical decision-making. Under Bayesianism, probability isn't "about" the object in question, it's "about" the person making the statement. If I say that a coin is 50% likely to come up heads, that's me saying that I don't know the exact initial conditions of the coin well enough to have any meaningful knowledge of how it's going to land, and I can't distinguish between the two options. If I do have some information, such as having seen that it's being flipped by a very regular machine and has already come up heads 10 times in a row, then I can take that into account and know that it's more than 50% likely to come up heads on the next flip.In fact you don't even need the knowledge that it's being flipped by a machine; seeing 10 heads in a row is enough to conclude that it's more than 50% likely to be heads again. If you had no prior knowledge of the coin at all, this could be determined by applying Laplace's Law of succession. With the knowledge that it looks like a normal coin, which are difficult to strongly manipulate, you'd want to adjust that towards 50%, but not all the way.

Dealing with "fuzzy" probabilities

That's all well and good, but how do we get a number on more complicated propositions? In the previous example, we know that the coin is >50% likely to come up heads, but is it 55%? 75%? 99%?

This is the hard part.

Sometimes there's a wealth of historical data, and you can rely on the base rate. This is what we do with coins; millions of them have been flipped, and about half have been heads and half are tails, so we can safely assume that future coins are likely to behave the same way.

The same goes even with much less data. There have only been ~50 US presidents, but about half have come from each political party, so we can safely assume that in a future election, each party will have around a 50% chance of winning. We can then modify this credence based on additional data, like that from polls.

How do we do that modification? In theory, by applying Bayes' rule. In practice, it's infeasible to calculate the exact number of bits of evidence provided by any given observation, so we rely on learned heuristics and world models instead; that is, we guess.

Guessing

It's obvious once you think about it for a moment that guessing is a valid way of generating probabilistic statements; we do it all the time.

Alice: "We're probably going to run out of butter by tomorrow, so I'm going to go to the grocery store."

Bob: "Preposterous! Have you been tracking our daily butter consumption rate for the past several weeks? Do you have a verifiably accurate statistical model of the quantity of butter used in the production of each kind of meal? Anyway we just bought a new stove, so even if you did have such a model, it could no longer be considered valid under these new conditions. And how would you even know what dinner we're going to make this evening; I haven't yet told you what I plan on eating! This is such nonsense for so many reasons, you can't possibly know our exact likelihood of running out of butter tomorrow."

Clearly Bob is being unreasonable here. Alice does not need a detailed formal model suitable for publishing in a scientific journalAnd even these sorts of scientific models tend to contain a lot of assumptions that the researchers chose with the "guess based on experience" method.; she can draw on her general knowledge of how cooking works and her plans for the evening meal in order to draw a reasonable conclusion.

And these sorts of probabilistic judgements are ubiquitous. A startup company might try to guess what fraction of the population will use a new technology they're developing. A doctor might work out the odds of their patient dying, and the odds of each potential treatment, in order to know what to recommend. A country's military, when considering any action against another country, will estimate the chance of retaliation. Etc.

Probability words

Does it matter that Alice said "probably" rather than assigning a specific number to it? Not at all. Verbal probabilities are shorthand for numerical ranges, as can be demonstrated by one of my all-time favorite graphs:

By u/zoninationNo I don't know why the last three are presented out of order, that bothers me too. Nor why some of the distributions go below 0% or above 100%. Probably just poor graph-design. There's a more rigorous investigation of this here, which finds similar results.

We have to be careful though, since probability words are context-dependent. If my friend is driving down the highway without wearing a seatbelt, I might tell them "hey, you're likely to die if we get into an accident, maybe put your seatbelt on". Their actual probability of dying in an accident is still less than 50%, but it's high relative to their chance if they were wearing a seatbelt, so I'm communicating that relative difference.

This is a very useful feature of probability words. If I say "nah, that's unlikely" then I'm communicating both "the probability is <50%" and "I think that probability in this context is too low to be relevant", which are two different statements. This is efficient, but it also increases the chance of miscommunications, since the other person might disagree with my threshold for relevance, and if they do there's a huge difference between 1% and 10%. When precision is needed, a specific number is better.An illustrative excerpt from The Precipice, in the section on catastrophic risk from climate change:

"The IPCC says it is 'very unlikely that methane from clathrates will undergo catastrophic release (high confidence)'. This sounds reassuring, but in the official language of the IPCC, 'very unlikely' translates into a 1% to 10% chance, which then sounds extremely alarming. I don’t know what to make of this as the context suggests that it was meant to reassure.

Probabilities are no different from other units

Humans have a built-in way to sense temperature. We might say "this room is kind of warm", or "it's significantly colder than it was yesterday". But these measures are fuzzy, and different people could disagree on exact differences. So scientists formalized this concept as "degrees", and spent years trying to figure out exactly what it meant and how to measure it.

Humans don't naturally think in terms of degrees, or kilometers, or decibels; our brains fundamentally do not work that way. Nevertheless, some things are warmer, farther, and louder than others. These concepts started as human intuitions, and were slowly formalized as scientists figured out exactly what they meant and how they could be quantized.

Probabilities are the same. We don't yet understand how to measure them with as much precision as temperature or distance, but we've already nailed down quite a few details.

My subjective judgements about temperature are not precise, but that doesn't mean they're meaningless. As I type this sentence, I'm not sure if the room I'm sitting in is 21 degrees, 22, or maybe 23. But I'm pretty sure it's not 25, and I'm very sure it's not 15.If you live in an uncivilized country like the United States, I'll leave the conversion to freedom units up to you. In the same way, my subjective judgements about probability are fuzzy, but still convey information. I can't say with confidence whether my chance of getting my suitcase inspected by TSA the next time I fly is 35% or 45%, but I know it's not 1%.

Can precise numbers impart undue weight to people's claims?

Sure, they can. If someone says "it was exactly 22.835 degrees Centigrade at noon yesterday", we'd tend to assume they have some reason to believe that other than their gut feeling, and the same goes for probabilities. But "I'd be more comfortable if you lowered the temperature by about 2 degrees" is a perfectly reasonable statement, and "I think this is about 40% likely" is the same.

Since the science of probability is newer and less developed than the science of temperature, their usage in everyday life is less frequent. In our current society, even a vague statement like "around 40%" can sometimes be taken as implying that some sort of in-depth study has been performed. This is unfortunate, but self-correcting. The more people normalize using numerical probabilities outside of formal scientific fields, the less exceptional they'll be, and the lower the chance of a misunderstanding. Someone has to take the first step.

More importantly, this criticism is fundamentally flawed, as it rests on the premise that bad actors will voluntarily choose not to mislead people. If someone has the goal of being trusted more than they know they should be, politely asking them not to do that is not going to accomplish anything. Encouraging people not to use numerical probabilities only harms good-faith dialog by making it harder for people to clearly communicate their beliefs; the bad actors will just come up with some bogus study or conveniently misunderstand a legit one and continue with their unearned confidence.

And encouraging people to never use numerical probabilities for subjective estimates is far worse. As the saying goes, it's easy to lie with statistics, but it's easier to lie without them. Letting people just say whatever they want removes any attempt at objectivity, whereas asking them to put numbers on their claims let us constrain the magnitude of the error and notice when something doesn't seem quite right.

On the contrary, very often (not always!), the people objecting to numerical probabilities are the ones acting in bad faith, as they try to sneak in their own extremely overconfident probabilistic judgements under the veneer of principled objection. It's not uncommon to see a claim along the lines of "How can you possibly know the risk of [action] is 20%? You can't. Therefore we should assume it's 0% and proceed as though there's no risk". This fails elementary school level reasoning; if your math teacher asks you to calculate a variable X, and you don't know how to do that or believe it can't be done with the information provided, you don't just get to pick your own preferred value! But under the guise of objecting to all numerical probabilities, the last sentence is generally left unsaid and only implied, which can easily trip people up if they're not familiar with the fallacy. Asking people to clearly state what they believe the probability to be prevents them from performing this sort of rhetorical sleight of hand, and provides a specific crux for further disagreement.

What does it mean to give a probability range?

In my experience, the people who use numerical probabilities are vastly more worried about accidentally misrepresenting their own level of confidence than the people who dislike numerical probabilities. A common way they do this is to give a probability range; rather than say "I'm 40% sure of this", they'll say "I think it's somewhere from 20% to 50% likely".

But what does this mean exactly? After all, a single probability is already a quantification of uncertainty; if they were sure of something, it'd just be 0% or 100%. Probability ranges seem philosophically confused.

XKCD

To some extent they are, but there's a potentially compelling defense. When we talk philosophically about Bayesian probability, we're considering an ideal reasoner who applies Bayes' theorem to each new bit of evidence in order to update their beliefs.There are actually an infinite number of ideal reasoners who use different priors. But some priors are better than others when applied to the world we live in, and any reasonable prior, like computable approximations of a Solmonoff prior, will yield similar results once some real-world information is factored in.Of course we only believe that the Solmonoff Prior is in some sense "good" because of whatever priors evolution build humans with, which is itself somewhat arbitrary. So there's a sort of infinite regress where we can't actually talk about a fully optimal reasoner. But we can definitely do better than humans. Probably. This is still an active area of study. If humanity knew all the answers here, we'd be able to create superintelligent AI that reasons in a mathematically optimal way and easily defeats humans at all prediction tasks, so it's probably good that we don't. But real humans are not this good at processing new information, and our behavior can substantially differ from that of a perfect reasoner.

So a probability range can be an attempt to communicate our expected level of error. If someone says "I think this is 20% to 50% likely", they might mean something like "I'm 95% confident that an ideal Bayesian reasoner with the same information available to me would have a credence somewhere between 20% and 50%". Just like how one might say "I'm not sure exactly how hot it is in here, but I'm pretty sure it's somewhere from 19 to 23 degrees."

There's also another justification, which is that a probability range shows how far your probability might update given the most likely pieces of information you expect to encounter in the future. All probabilities are not created equal; the expected value of bets placed on your credences depends on how many bits of information that credence is based on. For example, your probability on a coin flip coming up heads should be 50%, and if someone offers to bet you on this at better odds, it's positive expected value to accept.

But now say someone offers a slightly different game, where they've hidden a coin under a cup, and you can bet on it being heads-up. Going into this game, your credence that it's heads will still be 50%, since you have no particular information about which face is up. But if the person offering the game offers you a bet on this, even a 2:1 bet that seems strongly in your favor, you should probably not accept, because the fact that they're offering you this bet is itself evidence that suggests it's probably tails.This is the explanation for why it's often rational in a prediction market to only bet with a relatively large spread between YES and NO, rather than betting towards a single number. So giving a probability range of "30% to 70%" could mean something like "My current credence is 50%, but it seems very plausible that I get information that updates me to anywhere within this range." In other words, the width of the range represents the value of information for the most likely kinds of information that could be gained.This is similar to putting a probability distribution over what probability you'd assign to the proposition if you were to spend an additional 1000 hours researching it, and then measuring the width of that distribution.It also helps resolve the optimizer's curseA variant of the winner's curse; if you're estimating probabilities for different options and picking the one with the highest expected value, you're likely to pick one where you overestimated a probability somewhere. Knowing the amount of information that went into each estimate allows you to skew towards the ones with a smaller range.

Both of these usages of probability ranges are fairly advanced, and I think that for most everyday usages it's fine to just give a single number estimate.

Making good estimates

Just like some experience measuring things will help you make better estimates of distance, some experience forecasting things will help you make better estimates of probability. People who want to be good at this generally practice by taking calibrated probability assessmentsYou can try your hand at some free online exercises here., competing in forecasting tournaments, and/or placing bets in prediction markets. These sorts of exercises help improve your calibration; your ability to accurately translate a vague feeling into a number.

For relatively unimportant everyday judgements, nothing more is needed. "Hey, are you going to be able to finish this project by tomorrow?" "Eh, 75%". But sometimes you might want more certainty than that, and there are a number of ways to improve our forecasts.

First, always consider the base rate. If you want to know how likely something is, look at how many time it's happened in the past, out of some class of similar situations. Want to know how likely you are to get into a car accident in your next year of driving? Just check the overall per-kilometer accident rate in your region.

"After meeting every Friday afternoon for about a year, we had constructed a detailed outline of the syllabus, written a couple of chapters, and run a few sample lessons. We all felt we had made good progress. Then, an exercise occurred to me. I asked everyone to write down their estimate of how long it would take us to submit a finished draft of the textbook to the Ministry of Education. I collected the estimates and jotted the results on the blackboard. They were narrowly centered around two years.

Then I turned to Seymour, our curriculum expert, and asked whether he could think of other teams similar to ours that had developed a curriculum from scratch. Seymour said he could think of quite a few, and it turned out that he was familiar with the details of several. I asked him to think of these teams when they were at the same point in the process as we were. How much longer did it take them to finish their textbook projects?

He fell silent. When he finally spoke, it seemed to me that he was blushing, embarrassed by his own answer: “You know, I never realized this before, but in fact not all the teams at a stage comparable to ours ever did complete their task. A substantial fraction of the teams ended up failing to finish the job.”

This was worrisome; we had never considered the possibility that we might fail. My anxiety rising, I asked how large he estimated that fraction was. “About 40 percent,” he said. By now, a pall of gloom was falling over the room.

“Those who finished, how long did it take them?”

“I cannot think of any group that finished in less than seven years,” Seymour said.

This came as a complete surprise to all of us—including Seymour, whose prior estimate had been well within the optimistic consensus of the group. Until I prompted him, there was no connection in his mind between his knowledge of the history of other teams and his forecast of our future.

We should have quit that day. None of us was willing to invest six more years of work in a project with a 40 percent chance of failure. Yet although we must have sensed that persevering was not reasonable, the warning did not provide an immediately compelling reason to quit. After a few minutes of desultory debate, we gathered ourselves and carried on as if nothing had happened. Facing a choice, we gave up rationality rather than the enterprise.

The book was completed eight years later. The initial enthusiasm for the idea in the Ministry of Education had waned, and the textbook was never used."

-Daniel Kahneman

Unfortunately, this can only get you so far before you're playing "reference class tennis"; unsure of what class of events counts as "similar enough" to make a good distribution.The same problem that plagues frequentist interpretations of probability. (Maybe you have strong evidence that you're a more reckless driver than average, in which case you'd want to adjust away from the base rate.) And when predicting future events that aren't very similar to past events, such as the impact of a new technology, there often is no useful reference class at all.

When this occurs, another option is to take heuristic judgements of which you're relatively confident and combine them together, following the axioms of probability, in order to derive the quantity you're unsure of. Maybe you were able to look up that a regular driver has a 0.1% chance of getting into an accident per year, and you know from personal experience that you tend to drive about 30% faster than the average. Now all you need to look up is the global relationship between speed and accident rate, and then you can calculate the probability you want to know.

Another option that can go hand in hand with the previous ones is to use several different methods to make multiple probability estimates, and compare them against each other. If they're wildly different, this is a hint that you've made an error somewhere, and you can use your specific knowledge of the system and your likely biases to try to figure out where that error is.

For example, when considering whether to trust "gut feelings" over explicit models, consider whether the situation is one that your brain's heuristics were designed for or trained on. If it's a situation similar to the ones that humans encountered frequently in the past million years, then natural selection will have made us reasonably good at estimating it. Or if it's a situation that you've encountered many times before, you've probably learned a good heuristic for it on an intuitive level. But if it's something that you and your ancestors would have been unfamiliar with, the explicit model is likely to be more reliable.

When you have multiple independently-obtained probability estimates, after adjusting them for the reliability of the method used to obtain them, take the geometric mean of the oddsNot the geometric mean of the probabilities! Doing that would lead to invalid results, since for example the geometric mean of 0.1 and 0.2 is a different distance from 0.5 as the geometric mean of 0.8 and 0.9. You instead convert probabilities to an odds ratio first, like how 50% is 1:1 and 75% is 3:1.Using fractional odds if necessary, so the second number is always 1. 25% under this system is 1/3:1. Then you take the geometric mean of the odds and convert that back to a probability.Here's a Wolfram Alpha link that will do it for you, just replace the values of "a" and "b" with the ends of your range. in order to collapse them down to a single number. This allows you to take advantage of the wisdom of crowds, without even needing a crowd.