Understanding Subjective Probabilities
Any time a controversial prediction about the future comes up, there's a type of interaction that's pretty much guaranteed to happen.
Alice: "I think this thing is 20% likely to occur."
Bob: "Huh? How could you know that. You just made that number up!".
Or in slightly more technical Twitterese:
Our posteriors come from our priors, but our priors come from our posteriors.— Noah Smith 🐇🇺🇸🇺🇦 (@Noahpinion) June 10, 2021
That is, any time someone attempts to provide a specific numerical probability on a future event, they'll inundated with claims that that number is meaningless, and they just made it up. Is this true? Does it make sense to assign probabilities to things in the real world? If so, how do we calculate them?
Interpretations of probability
The traditional interpretation of probability is known as physical probability, with two main variants: frequentist probability and propensity probability. Under frequentism, the chance of an event happening is equal to the fraction of times it's occurred among all similar events in the past. Under the propensity interpretation, items have some intrinsic "quality" of being some % likely to do one thing vs. another. For example, a coin has a fundamental probabilistic essence of being 50% likely to come up heads when flipped.
These are useful mathematical abstractions, but they break down when applied to the real world. There's no fundamental "50% ness" to a real coin. Flip it the same way every time, and it will land the same way every time. Flip it without perfect regularity but with some measure of consistency, and you can get it to land heads more than 50% and less than 100% of the time. In fact, even coins flipped by people who are trying to make them 50/50 are actually biased towards the face that was up when they were flipped. This intrinsic quality posited by the propensity interpretation doesn't exist in the real world; it's not encoded in any way by the atoms that make up a real coin, and our laws of physics don't have any room for such a thing.
Frequentist interpretations run into their own problems. What group of real-life events counts as "similar enough"? How do we factor in additional information about the event? Say Alice is considering starting a company, and wants to know the chance that it succeeds. She could look at the history of past companies and try to calculate the fraction that succeeded, but what if there's something different about hers? Maybe she's come up with a new business plan that's never been tried before, or maybe society has changed over the years such that something that would have worked 20 years ago won't work anymore.
And there are philosophical problems here too; does frequentism mean that the first time someone ever flipped a coin, it had no particular probability of landing heads? Also, our universe is deterministic, meaning that if we know the starting conditions, the end result will be the same every time.
One potential solution is the Bayesian interpretation of probability.
Dealing with "fuzzy" probabilities
That's all well and good, but how do we get a number on more complicated propositions? In the previous example, we know that the coin is >50% likely to come up heads, but is it 55%? 75%? 99%?
This is the hard part.
Sometimes there's a wealth of historical data, and you can rely on the base rate. This is what we do with coins; millions of them have been flipped, and about half have been heads and half are tails, so we can safely assume that future coins are likely to behave the same way.
The same goes even with much less data. There have only been ~50 US presidents, but about half have come from each political party, so we can safely assume that in a future election, each party will have around a 50% chance of winning. We can then modify this credence based on additional data, like that from polls.
How do we do that modification? In theory, by applying Bayes' rule. In practice, it's infeasible to calculate the exact number of bits of evidence provided by any given observation, so we rely on learned heuristics and world models instead; that is, we guess.
It's obvious once you think about it for a moment that guessing is a valid way of generating probabilistic statements; we do it all the time.
Alice: "We're probably going to run out of butter by tomorrow, so I'm going to go to the grocery store."
Bob: "Preposterous! Have you been tracking our daily butter consumption rate for the past several weeks? Do you have a verifiably accurate statistical model of the quantity of butter used in the production of each kind of meal? Anyway we just bought a new stove, so even if you did have such a model, it could no longer be considered valid under these new conditions. And how would you even know what dinner we're going to make this evening; I haven't yet told you what I plan on eating! This is such nonsense for so many reasons, you can't possibly know our exact likelihood of running out of butter tomorrow."
Clearly Bob is being unreasonable here. Alice does not need a detailed formal model suitable for publishing in a scientific journal
And these sorts of probabilistic judgements are ubiquitous. A startup company might try to guess what fraction of the population will use a new technology they're developing. A doctor might work out the odds of their patient dying, and the odds of each potential treatment, in order to know what to recommend. A country's military, when considering any action against another country, will estimate the chance of retaliation. Etc.
Does it matter that Alice said "probably" rather than assigning a specific number to it? Not at all. Verbal probabilities are shorthand for numerical ranges, as can be demonstrated by one of my all-time favorite graphs:
We have to be careful though, since probability words are context-dependent. If my friend is driving down the highway without wearing a seatbelt, I might tell them "hey, you're likely to die if we get into an accident, maybe put your seatbelt on". Their actual probability of dying in an accident is still less than 50%, but it's high relative to their chance if they were wearing a seatbelt, so I'm communicating that relative difference.
This is a very useful feature of probability words. If I say "nah, that's unlikely" then I'm communicating both "the probability is <50%" and "I think that probability in this context is too low to be relevant", which are two different statements. This is efficient, but it also increases the chance of miscommunications, since the other person might disagree with my threshold for relevance, and if they do there's a huge difference between 1% and 10%. When precision is needed, a specific number is better.
"The IPCC says it is 'very unlikely that methane from clathrates will undergo catastrophic release (high confidence)'. This sounds reassuring, but in the official language of the IPCC, 'very unlikely' translates into a 1% to 10% chance, which then sounds extremely alarming. I don’t know what to make of this as the context suggests that it was meant to reassure.
Probabilities are no different from other units
Humans have a built-in way to sense temperature. We might say "this room is kind of warm", or "it's significantly colder than it was yesterday". But these measures are fuzzy, and different people could disagree on exact differences. So scientists formalized this concept as "degrees", and spent years trying to figure out exactly what it meant and how to measure it.
Humans don't naturally think in terms of degrees, or kilometers, or decibels; our brains fundamentally do not work that way. Nevertheless, some things are warmer, farther, and louder than others. These concepts started as human intuitions, and were slowly formalized as scientists figured out exactly what they meant and how they could be quantized.
My subjective judgements about temperature are not precise, but that doesn't mean they're meaningless. As I type this sentence, I'm not sure if the room I'm sitting in is 21 degrees, 22, or maybe 23. But I'm pretty sure it's not 25, and I'm very sure it's not 15.
Can precise numbers impart undue weight to people's claims?
Sure, they can. If someone says "it was exactly 22.835 degrees Centigrade at noon yesterday", we'd tend to assume they have some reason to believe that other than their gut feeling, and the same goes for probabilities. But "I'd be more comfortable if you lowered the temperature by about 2 degrees" is a perfectly reasonable statement, and "I think this is about 40% likely" is the same.
Since the science of probability is newer and less developed than the science of temperature, their usage in everyday life is less frequent. In our current society, even a vague statement like "around 40%" can sometimes be taken as implying that some sort of in-depth study has been performed. This is unfortunate, but self-correcting. The more people normalize using numerical probabilities outside of formal scientific fields, the less exceptional they'll be, and the lower the chance of a misunderstanding. Someone has to take the first step.
More importantly, this criticism is fundamentally flawed, as it rests on the premise that bad actors will voluntarily choose not to mislead people. If someone has the goal of being trusted more than they know they should be, politely asking them not to do that is not going to accomplish anything. Encouraging people not to use numerical probabilities only harms good-faith dialog by making it harder for people to clearly communicate their beliefs; the bad actors will just come up with some bogus study or conveniently misunderstand a legit one and continue with their unearned confidence.
And encouraging people to never use numerical probabilities for subjective estimates is far worse. As the saying goes, it's easy to lie with statistics, but it's easier to lie without them. Letting people just say whatever they want removes any attempt at objectivity, whereas asking them to put numbers on their claims let us constrain the magnitude of the error and notice when something doesn't seem quite right.
On the contrary, very often (not always!), the people objecting to numerical probabilities are the ones acting in bad faith, as they try to sneak in their own extremely overconfident probabilistic judgements under the veneer of principled objection. It's not uncommon to see a claim along the lines of "How can you possibly know the risk of [action] is 20%? You can't. Therefore we should assume it's 0% and proceed as though there's no risk". This fails elementary school level reasoning; if your math teacher asks you to calculate a variable X, and you don't know how to do that or believe it can't be done with the information provided, you don't just get to pick your own preferred value! But under the guise of objecting to all numerical probabilities, the last sentence is generally left unsaid and only implied, which can easily trip people up if they're not familiar with the fallacy. Asking people to clearly state what they believe the probability to be prevents them from performing this sort of rhetorical sleight of hand, and provides a specific crux for further disagreement.
What does it mean to give a probability range?
In my experience, the people who use numerical probabilities are vastly more worried about accidentally misrepresenting their own level of confidence than the people who dislike numerical probabilities. A common way they do this is to give a probability range; rather than say "I'm 40% sure of this", they'll say "I think it's somewhere from 20% to 50% likely".
But what does this mean exactly? After all, a single probability is already a quantification of uncertainty; if they were sure of something, it'd just be 0% or 100%. Probability ranges seem philosophically confused.
To some extent they are, but there's a potentially compelling defense. When we talk philosophically about Bayesian probability, we're considering an ideal reasoner who applies Bayes' theorem to each new bit of evidence in order to update their beliefs.
So a probability range can be an attempt to communicate our expected level of error. If someone says "I think this is 20% to 50% likely", they might mean something like "I'm 95% confident that an ideal Bayesian reasoner with the same information available to me would have a credence somewhere between 20% and 50%". Just like how one might say "I'm not sure exactly how hot it is in here, but I'm pretty sure it's somewhere from 19 to 23 degrees."
There's also another justification, which is that a probability range shows how far your probability might update given the most likely pieces of information you expect to encounter in the future. All probabilities are not created equal; the expected value of bets placed on your credences depends on how many bits of information that credence is based on. For example, your probability on a coin flip coming up heads should be 50%, and if someone offers to bet you on this at better odds, it's positive expected value to accept.
But now say someone offers a slightly different game, where they've hidden a coin under a cup, and you can bet on it being heads-up. Going into this game, your credence that it's heads will still be 50%, since you have no particular information about which face is up. But if the person offering the game offers you a bet on this, even a 2:1 bet that seems strongly in your favor, you should probably not accept, because the fact that they're offering you this bet is itself evidence that suggests it's probably tails.
Both of these usages of probability ranges are fairly advanced, and I think that for most everyday usages it's fine to just give a single number estimate.
Making good estimates
Just like some experience measuring things will help you make better estimates of distance, some experience forecasting things will help you make better estimates of probability. People who want to be good at this generally practice by taking calibrated probability assessments
For relatively unimportant everyday judgements, nothing more is needed. "Hey, are you going to be able to finish this project by tomorrow?" "Eh, 75%". But sometimes you might want more certainty than that, and there are a number of ways to improve our forecasts.
First, always consider the base rate. If you want to know how likely something is, look at how many time it's happened in the past, out of some class of similar situations. Want to know how likely you are to get into a car accident in your next year of driving? Just check the overall per-kilometer accident rate in your region.
"After meeting every Friday afternoon for about a year, we had constructed a detailed outline of the syllabus, written a couple of chapters, and run a few sample lessons. We all felt we had made good progress. Then, an exercise occurred to me. I asked everyone to write down their estimate of how long it would take us to submit a finished draft of the textbook to the Ministry of Education. I collected the estimates and jotted the results on the blackboard. They were narrowly centered around two years.
Then I turned to Seymour, our curriculum expert, and asked whether he could think of other teams similar to ours that had developed a curriculum from scratch. Seymour said he could think of quite a few, and it turned out that he was familiar with the details of several. I asked him to think of these teams when they were at the same point in the process as we were. How much longer did it take them to finish their textbook projects?
He fell silent. When he finally spoke, it seemed to me that he was blushing, embarrassed by his own answer: “You know, I never realized this before, but in fact not all the teams at a stage comparable to ours ever did complete their task. A substantial fraction of the teams ended up failing to finish the job.”
This was worrisome; we had never considered the possibility that we might fail. My anxiety rising, I asked how large he estimated that fraction was. “About 40 percent,” he said. By now, a pall of gloom was falling over the room.
“Those who finished, how long did it take them?”
“I cannot think of any group that finished in less than seven years,” Seymour said.
This came as a complete surprise to all of us—including Seymour, whose prior estimate had been well within the optimistic consensus of the group. Until I prompted him, there was no connection in his mind between his knowledge of the history of other teams and his forecast of our future.
We should have quit that day. None of us was willing to invest six more years of work in a project with a 40 percent chance of failure. Yet although we must have sensed that persevering was not reasonable, the warning did not provide an immediately compelling reason to quit. After a few minutes of desultory debate, we gathered ourselves and carried on as if nothing had happened. Facing a choice, we gave up rationality rather than the enterprise.
The book was completed eight years later. The initial enthusiasm for the idea in the Ministry of Education had waned, and the textbook was never used."
Unfortunately, this can only get you so far before you're playing "reference class tennis"; unsure of what class of events counts as "similar enough" to make a good distribution.
When this occurs, another option is to take heuristic judgements of which you're relatively confident and combine them together, following the axioms of probability, in order to derive the quantity you're unsure of. Maybe you were able to look up that a regular driver has a 0.1% chance of getting into an accident per year, and you know from personal experience that you tend to drive about 30% faster than the average. Now all you need to look up is the global relationship between speed and accident rate, and then you can calculate the probability you want to know.
Another option that can go hand in hand with the previous ones is to use several different methods to make multiple probability estimates, and compare them against each other. If they're wildly different, this is a hint that you've made an error somewhere, and you can use your specific knowledge of the system and your likely biases to try to figure out where that error is.
For example, when considering whether to trust "gut feelings" over explicit models, consider whether the situation is one that your brain's heuristics were designed for or trained on. If it's a situation similar to the ones that humans encountered frequently in the past million years, then natural selection will have made us reasonably good at estimating it. Or if it's a situation that you've encountered many times before, you've probably learned a good heuristic for it on an intuitive level. But if it's something that you and your ancestors would have been unfamiliar with, the explicit model is likely to be more reliable.
When you have multiple independently-obtained probability estimates, after adjusting them for the reliability of the method used to obtain them, take the geometric mean of the odds