I try to make fun things on the internet. Code, graphics, data, journalism on the WNYC Data News team. Inveterate Californian.
Last weekend at the Mozilla Festival, a group of journalists sat down to solve a murder mystery on the command line.
Each person got a set of folders containing text data files full of information about the mean streets of Terminal City. The files listed who lived there, the vehicles they owned, the clubs they belonged to, the streets they lived on, and so forth. The formats were varied - some of them were tab-separated tables, some were plain text, some had instructional header or footer rows.
More importantly, 99.9% of the text was junk. It was gibberish, or excerpts from Alice in Wonderland, or names of random 2012 olympic athletes. But buried at key points in these large files were actual clues that, when followed, would eventually lead you to the identity of the murderer. With so much nonsense text to sift through, the only way to crack the case in a reasonable amount of time would be to use the command line to quickly search, filter, and inspect the data.
I thought this might be a stickier way to teach the basics of the command line than drily walking through a lot of examples, because it more closely mimics a real world data journalism scenario: you inherit a big dump of messy data without any context. There’s too much data to hold in your head, and you don’t even really know what’s in it, or how the files are structured. You have to probe and get your bearings, and then you have to be careful with your inquiries, spot checking and duplicating results as you go. You can only see one slice of the big picture at a time.
The key thing about this whodunit exercise is that it’s freeform. You don’t have instructions to follow; you have a situation, and it’s up to you to experiment and find a path to the solution, once you figure out what the solution would even look like. There are many different ways you could find the answer. Some might be more efficient but trickier to implement, others might be simple and stepwise but easier to follow and modify. This is an important part of getting comfortable with the command line: understanding that it consists of small pieces that do one thing well, and you can combine them in infinite ways to get what you need.
Why worry about teaching journalists the command line in the first place? I can think of a few reasons why it comes in handy even if you have no plans to become a developer:
As for the mystery, you can give it a try yourself (you only need the file clmystery.zip). This version was kind of a rush job, with not nearly as much hardboiled, Sam Spade flavor as I would have liked, but pretty soon I’ll start working on the next case and hopefully introduce more advanced commands like sed and awk. Get to work, gumshoes!
We need more of these.
Recent Post
Read more