# Disease Statistics

*This was posted in March 2020, and has not been updated since then other than to add this disclaimer. It's no longer particularly relevant, but I'm leaving it up for historical purposes.*

There have been a lot of graphics shared recently that contain misleading or outright false information about COVID-19 and diseases in general. The people sharing these are at best, misinformed- At worst they are using the public's poor understanding of disease spread and statistics to further a political agenda. I'd like to dive into these statistics in an effort to help people understand why they're not accurate.

First off, here are some examples of the infographics in question.

They boil down to a few core statements:

- "COVID-19 only seriously affects the elderly and immunocompromised."
- "The death count is very low."
- "The total number of diagnosed cases is very low."
- "The death rate is very low."
- "I've survived other epidemics, this one will be the same."

Let's discuss some basics. Any infectious disease has what's called its **Effective Reproduction Number**, R. This is the average number of people that will catch the disease from one person who's already infected. This number is a major (but not the only) factor in determining how widely a disease will spread. If on average each person infects more than one other person (R > 1), then the number of cases will increase exponentially. If R = 1, the number of cases will remain constant. (Assuming people recover eventually. If there's no recovery and people remain infected forever like with AIDS, then instead the total number of cases grows linearly.) If R < 1, the disease will eventually die out.

The R of a disease has two main contributors. First off, there are the details of the pathogen itself. How it spreads, how long it can remain viable outside of the body, etc. (This is known as its **Basic Reproduction Number**, or R_{0}.) However how people react to the disease also affects its R. Things like a vaccine, quarantine measures, good hygiene, etc can all reduce the R of a disease by making it harder to spread. When these measures reduce the R to below 1, that's when the disease starts dying off.

As an example, Measles has an extremely high R_{0} of about 15. (It varies a lot depending on the conditions. Even without preventative measures, societal differences like how much contact people tend to have with each other in their daily lives will affect R_{0}.) In countries with developed health care systems however, R is significantly less than 1, mainly due to the existence of a measles vaccine, rendering it mostly nonexistent in those countries. (R has been increasing slightly over the last few years due to the growing pro-disease movement. Measles was in fact completely eradicated in the United States in 2000, but has made a resurgence in the last 3 years.)

Now let's address these misleading statistics.

*"COVID-19 only seriously affects the elderly and immunocompromised."*

Well, that depends on what you mean by "seriously". Even healthy people are going to be subject to several days or weeks of fever, coughing, and diarrhea. If all you're concerned about is whether you'll die, then no, healthy people are pretty unlikely to die. However you'll still pass it on to others, who may be more susceptible than you.

*"The death count is very low."*

At the time of writing, the death count in the USA is 37. Focusing on this fact overlooks a point that should be incredibly obvious, but somehow isn't- the death count is going to increase. Statements about how the death count is low *now* and so COVID-19 clearly isn't a serious problem are akin to saying "well the Tsunami has only killed two people on boats- it clearly won't do anything worse when it hits land" or "the nuclear fallout from this morning hasn't killed anyone yet, so clearly we don't need to worry about radiation poisoning."

*"The total number of diagnosed cases is very low."*

The misconception here is similar to that in the section above- the number of cases may be low *now*, but it's going to increase. Additionally, if you're only looking at the *diagnosed* cases, that ignores all of the ones that *haven't* been diagnosed. Especially with a disease like COVID-19, where the symptoms can be mild or nonexistent while you're already capable of spreading the pathogen to others, there will be a significant number of people who don't realize they're sick or don't go to the hospital for financial reasons. The number of infected people may be many, many times the number that have actually been tested and found positive.

*"The death rate is very low."*

Well, it is and it isn't. There are two unrelated factors here. The first one is that the death count lags behind the total infections. To illustrate this, let's consider a hypothetical disease that kills 95% of the people it infects after 2 weeks. The first case is introduced to a country on January 1st. Two weeks later, 10000 people are infected, but only the first one has died. This gives us a death rate of 0.0001% if you compare the deaths to the total infections, implying that there's practically no risk. Of course if you wait a few more weeks, the outlook becomes much more grim. The thing to realize here is that it's incorrect to calculate a death rate as a percentage of deaths compared to total infections. Rather, it should be deaths compared to the number of successful *recoveries*. With our hypothetical disease, applying this method instead after two weeks would give us a death rate of 100% (1 death, no recoveries), which is much closer to the correct number. (Note that this method of estimating a death rate will slightly overestimate it, since people die earlier than they make a full recovery. In other words, recoveries lag behind deaths just like deaths lag behind infections.)

The second factor is that people simply aren't very good at understanding probabilities. If you're a young and healthy adult infected with COVID-19, your chance of dying is about 0.2%, or 1 in 500. That's certainly rather low, but it's around the same likelihood as rolling two nat 1s in a row, or about twice as likely as drawing an all-land hand from a 60-card deck with 24 lands. I don't know about you, but I wouldn't want to risk my life on those odds.

*"I've survived other epidemics, this one will be the same."*

This is known as survivorship bias. The fact that you survived the last few epidemics doesn't affect your chances of surviving this one.

Please take COVID-19 and other health risks seriously. Do not panic or hoard supplies, this takes them away from other people and does not improve the overall situation. However please do exercise good hygiene, avoid contact with other people, and understand what a statistic is saying before sharing it.

For more information, I would highly recommend checking out this article for a more in-depth read, or this article for a shorter one.

If you'd like some better infographics/memes to share, here are some that are more likely to be helpful.

And if you want to get a better feel for how diseases transmit across the globe in real life, just play Pandemic.