ESSAY: The Black Box, Part 1 of 4: Introduction
Okay. So—I've been working on this post—really, these posts, since it makes the most sense to do it as a series—for quite a while, now, and I'm still... not very satisfied with them, but yesterday I ended up leaving a comment in private that hit a lot of the same bases and two whole people told me it was helpful! Which was very validating. (Thanks, friends.) So I think I'm just going to take a little time today to get the first part as cleaned up and sensical as I can, and then... set it free.
For some reason, lately, no idea why, I've been thinking a lot about data analysis as a practice; and I've been thinking a lot about how to discuss how we, both as individuals and as a society, can use data to drive decisions in ways that are sensible.
I'm emphasizing that word—"sensible"—really hard for reasons I will go into later; but before I get to that, I want to say: specifically, what I've been thinking about is the following question:
What does good, data-driven decision-making look like, when the data—and any conclusions that are drawn from the data—are by necessity low-confidence?
"Low-confidence" is science for: we have some information, but we know that the information isn't complete, but we don't know what's missing, which means we don't know how much we can trust the information that we do have. So—what the heck do we do now???
That's what I want to talk about. Because—well. Because it's April 29, 2020; and that question up there is just—the state of play, on April 29, 2020; and part of what is feeding the churning vortex of anxiety in which we are all living, on April 29, 2020, is that identifying that question and sitting with that question and grappling with that question effectively is, quite frankly, something that most people, through absolutely no fault of their own, have been extraordinarily ill-equipped to do.
And I think, actually, that I might kind of be able to help!
(Yes, this post is about Covid-19; of course this post is about Covid-19. Did anyone think this post was not about Covid-19?)
On that note... I'm not cutting the whole post, because I'm using cuts internally to hide a lot of explanatory bits but still let people read summary bits, if they want to. (If you land here via a link, you can view this post with cuts intact from today's date page.) BUT: I am going to add this link, which you can click on to go down to an anchor I've embedded at the bottom of the post and not behind a cut, in case you are avoiding Covid content—I think you should be able to use that link (once more with feeling) to teleport you down beyond the bad lady talking about diseases and also math. If it doesn't work for some reason, you can also just do a "find in page" on the phrase, "skip this post anchor should drop you here" (no quotation marks) to get to whatever is below this post on your feed. Godspeed, my friends! ♥
Some initial introductory notes and disclaimers...
So. This is where I'm starting:
- Data and data analysis are tools.
- Lots of people—in fact, I suspect most people—don't really understand what they are, and they don't understand how to use them (for reasons that are unfortunate but also, ftr, totally understandable and in no way reflections on those people's intelligence or worth as human beings).
- I have pretty good reasons to expect that I kind of do understand what they are, and that I do (in general) understand how to use them.
- And I know, actually, quite a bit about how you can't use them, because that actually sort of is my specialist subject.
- Specifically: I understand how I can and can't use data and data analysis to construct a mental picture of a problem that I know I do not understand.
- Even more specifically: I have some reason to think that I can explain how a person who is not an expert on Thing X—in this case, Covid-19—can use data and data analysis to build enough of a mental picture of Covid-19 to just... make weathering the outbreak a little bit less deranging.
Essentially—I'm not going to give you a library card catalogue full of carefully organized and analyzed information about Covid-19. I'm not even going to show you how to make one. But I am going to try to show you how to construct a DIY three-file sorter from duct tape and cardboard boxes where you can dump the chewed-on torn-up half-invisible-ink coffee-stained research notes the universe is throwing at you constantly, right now, by the reamful—and I'm going to explain to you why that is, probably, good enough.
The sense of "sensible"-ity.
So. Again: in this series, what I want to talk to you about is using data to make sensible decisions.
What do I mean by "sensible"?
Oh, I'm so glad you asked!!! Because I think that's really the question that we aren't asking right now; and failing to ask that question underscores a very common, and very understandable, and also very dangerous misunderstanding of what "data" is; and what "making decisions based on data," full stop, means.
I can help with that! I think!!
I think the place where we need to start is actually earlier than what "sensible" means. I think we need to start with what "data" means. Because if we don't understand what data is, we have absolutely no prayer whatsoever of understanding what we can and cannot do with it.
What data means is: some stuff happened. Some people wrote it down.
I really, really would like everyone reading this to sit with this idea for a second: data means "some stuff happened and some people wrote it down."
I'm going to keep hammering on this because over and over and over and over again, I am seeing people—very smart people, people I respect and like and know to be competent adult humans who reason well—failing to distinguish between the data about Covid-19, and the stuff that comes after the data: the math, and the science, and the policy-making, and the science reporting, and the ethical decisions that go into all of those things that come after the data. But the data and the science are not the same thing. And we need to understand the ways that the problems and limitations of one are and are not connected to the problems and limitations of the other.
So. What data tells you is: some stuff happened and some people wrote it down. But what data doesn't tell you, and why data doesn't tell you that stuff, is also hugely, critically, incontinently important; and this is where people—all people, because scientists are just people and aren't immune to this either—can use data to make bad decisions.
So, fine. We all talk about good decisions and bad decisions colloquially, yes. But this series is about making decisions, when you fundamentally can't know what is going to happen until after you've made them. So I think it's more helpful, if what we're going to talk about is how to make decisions, to have more specific criteria for distinguishing the decisions we're trying to make from the decisions we're trying to avoid, and I think that those criteria need to be criteria that are in evidence at the time we're trying to make the decision. Not after we've already made it and seen what happened. And a useful framework for me, personally, is to think about whether or not a decision is sensible:
(which I will be developing further as this series goes along:)
Proposition 1: a sensible decision is a decision which:
- by some reasonable, justifiable, and consistent metric, minimizes the likelihood of a bad outcome (or maximizes the likelihood of a better one);
- is made with the understanding that the basis for said decision may in fact be flawed; and
- admits mental space for the possibility that the ongoing bits of that decision may need to change as the data evolves, to remain consistent with #1 and #2.
Proposition 2 (and—bear with me on this one): you can make sensible decisions based on science even if you don't, actually, understand the science itself.
I'm defining sensible the way that I'm defining it for a whole bunch of reasons, some of which I've discussed but others of which I've just implied, so let me really quickly run down the general version of the list, so we're all on the same page:
This is a very long justify-your-existence sort of an introduction, but it's long for a reason. I want everyone to start by understanding what it is I'm trying to do—and what it is that I'm not going to be addressing at all.
So, in that light, I have some goals. I want to tell you what they are, so you can decide whether or not you're interested in reading the rest of the series.
- I want to explain how science-y people make decisions based on science, even when we don't, actually, understand the science itself;
- I want to give you, the
man on the Clapham omnibus ordinary human (or human-impersonating alien, this is a safe space) on the internet, some basic tools for how you can make decisions based on science, without requiring or expecting you to understand the science itself.
I want to do all of those things because:
- I know a lot of people find science, and science-based decision making, incredibly intimidating.
- I know a lot of people are already incredibly anxious, right now, about these very issues.
- I want to tell you right now that both of those feelings are okay, and that I will do my absolute level-best to make this series of posts useful to you even if you are currently holding one or both of those feelings inside of you, right now, as you skim it.
- So I'm trying to design the layout of this series to be, as much as possible, something that people can read in parts; or put down and come back to; or read only at a headline level; and still get something useful out of the experience.
- Finally, I want to align the tools that I give you, and the context in which I'm establishing those tools, to not only be useful in your own individual decision-making, but be tools that you can—and I would gently suggest, probably should—be using when you find yourselves evaluating the decisions your leaders (political, community, religious, professional, whatever) are currently making. Not just because civic engagement or universal duty or blah blah blah—like, honestly, right now? Fuck that. Right now I want to do, like, data processing first aid: I want to stop people from bleeding out because there is too much math happening to them. And I hope that a side effect of that is that you become more able, and find it easier, to critically reflect upon what your leaders are doing, because I think that will make you feel less powerless. I think it will be helpful for you, if you can decide whether or not to wear this mask to the grocery store today; but I also think it will be helpful for you to be able to assess things like:
- Should my leaders be using science and scientific data to make this law/decree/decision?
- Are my leaders using science and scientific data to make this law/decree/decision?
- When they use those things to make these laws/decrees/decisions, are they using that science and scientific data correctly?
- Are the laws/decrees/decisions that they are making—based on science or otherwise—sensible decisions?
- And how does their decision-making, good or bad, alter or affect what a sensible decision might look like for you?
So. I think that's probably a reasonable (dare I say—sensible) place to stop. I will rejoin you tomorrow with Part 2 of 4: Everyone's Covid Math is Wrong (and Why That Is, Actually, Kind of Okay).
(skip this post anchor should drop you here! thanks, friends!)