When Data Speaks, It Sometimes Says Unexpected Things

fivethiryeight article was sent to me recently that reminded me of a great lesson in data analysis- to let the data disagree with you. On the surface that either sounds obvious, or cryptic, I’m guessing. The gist is, sometimes data will point to truths that surprise you, and those truths can possibly go against all those meetings you’ve been having about that new policy– although I don’t wish that on any fellow data scientists. But it happens, and when it does, it’s important to remember why you’re looking at the data to begin with.

The article from fivethirtyeight, to offer a summary, regarded the possibility of ties being allowed again in baseball. The operative word in this case is “again”, since, while most people assume that baseball has just never had ties, it most certainly did. They were common. And that is a big takeaway:


It can be really hard to truly let the data speak for itself though. I say this from experience. I have most certainly built models, and in review, found a strange pattern that required me to start from scratch. Curveballs like that- note that the fivethirtyeight article is about MLB!- are pretty rare, but there’s something really fascinating about those moments. Our collective memory of trends and patterns is biased by personal experience, limited recall, and countless other psychological phenomena I’m probably unaware of. And this amounts to a really powerful revelation. We really need data if we’re to make genuinely objective assessments within our institutions, organizations, corporations, or wherever you’re using data and, it can be helpful to have easy to use and transparent tools to access and prepare the data quickly so we can continually analyze our results.

I have complete sympathy for anyone who’s ever had to stay late because they found something that invalidates the 15 charts and tables they spent the afternoon generating. It stinks. But I also have complete envy for anyone who gets to dig through a dataset and discover something they had no idea about, and gets to bring that to the table. I certainly hope that new discovery is a good one, but if it’s one you’re not so happy about, well, you caught it. And now your data can lead you forward.

Any fun/not-so-fun stories about the data calling an audible at the 11th hour? Any hidden gems you’ve found in your data during routine analyses? I’d love to hear them.