The Stories We Tell with Data

A friend of mine recently posted some data that Elon Musk cites, that compares the deaths from the flu, and from COVID-19, and then cites death statistics from tobacco, alcohol, obesity, road accidents, and pneumonia. “I’m only sharing data,” my friend noted. “Please feel free to interpret it as you see fit.”

Of course, that’s quite disingenuous. Statistics tell a story; they always tell a story. Pretending that a curated data set is simply data, and that there’s no perspective implied, is, well, somewhat mistruth adjacent.

Context matters

Here’s a related example. Consider the data about Amazon’s failure to protect their workers in fulfillment centers. We all know the data; lots of Amazon’s warehouse workers are getting sick. But that’s a lie in a single statistic! Yes, people are getting sick. But without context, there’s no way to interpret the data. The question is not whether Amazon’s line workers are getting sick, or even if they’re getting sick at a higher rate than the general population. The interesting question is whether they’re getting sick at a higher rate than workers in comparable, critical jobs. Are they getting sick more or less frequently than people working in Walmart stores? Typically we don’t know (but see below). The story we’re being told, and told loudly, is a lie.

The point is that context matters. A collection of data establishes context by which we can interpret the world; lack of context, or the wrong context, leads to misinterpetation. And from that perspective, I believe the statistics my friend posted also tell a lie. Mixing together infectious and non-infectious deaths creates a false equivalency. A neighbor’s drinking problems do not increase my risk of alcohol poisoning; in contrast, his exposure to COVID-19 significantly increases my own risk.

If we remove all the non-infectious causes, we are left with only the initial comparison: “290,000 and 650,000 people die of flu related infections each year. The mortality rate is tagged at 0.1% of the population. … COVID-19 on the other hand currently has a body count of 340,000 out of 5,300,000 true positives putting the mortality rate at 6.41% of the known infected population.” [Note: caveats elided.] That’s a pretty well explored comparison at this point: COVID is much, much worse than the flu: both more infectious and much higher mortality (and the suffering even for those who recover is incomparably worse).

What story should we tell?

And yet, it does feel as though the the other mortality statistics should tell us something. So what story should we be telling with them?

For me, the unifying concern is whether people are empowered to protect themselves. It’s not why people die, it’s how well they were able to live. It is hard for me to protect myself against the flu, and much harder to protect myself against COVID.

But it’s much easier for me than for many.

And the same is true for alcohol, obesity, and most of the others.

The context that matters would describe who is forced by circumstance to put themselves at risk, while others are able to stay safe. The interesting distinction is not how people die, but how underlying populations are affected differently. For example, it’s been well documented that COVID disproportionately affects the poor, and people of color. Similarly, many of the causes cited affect those populations more profoundly, and at a younger age. (Automobile deaths is an exception, because poorer people use public transportation, which is significantly safer than cars.)

So from that perspective, consider this data set from the New York Times. It compares COVID exposure across a number of dimensions, including exposure to the virus and proximity to others. It’s interesting data, but there are no real surprises here, either: healthcare workers are highly exposed and physically close to others; architects and software developers – and multi-billionaire business owners – are not. (Warehouse workers are about middle of the pack; high proximity to others, but low exposure.) But I think the real payload is the two tables at the end, which show that the bottom 25% of wage earners (and part-time workers) don’t get sick leave, don’t get personal leave, and can’t work from home. They can’t protect themselves. Similarly the other causes cited in the essay also affect the poor much more profoundly, and at a younger age.

That’s a very different story

Indeed. So why isn’t Elon Musk talking about that?

Because the story that data tells makes someone like Elon Musk uncomfortable. He is profiting greatly from the labor – and risk, and death – of people who don’t have his advantages. He is forcing them to work, right now, in violation of California state law. He is personally and directly empowered to change that story, and is choosing not to. The story those statistics tell would require him to look hard in the mirror.

So instead, he points to the other, broader statistics. The ones that – well, that descrive something other people need to do. It’s a much more comforting story, for someone like Elon Musk. Because it lets him off the hook.

Don’t be like Elon Musk. Look at the data, even when it makes you uncomfortable.

About dondo