Data-Driven Decision Making for Baby Names

jon macmillan
senior data analyst

Data-driven decisions are made every day. The reviews you sift through on Amazon before making a purchase, the Yelp reviews you read before choosing a new restaurant, and if you’re like me – the people you test your jokes on to gauge their reactions before (possibly) embarrassing yourself by publishing it in a blog post, are all data-driven decisions. Data-driven decision making is important and can be extremely crucial, depending on the weight of the decision. Currently, my wife and I are facing an enormously heavy decision. What should we name our baby girl, who is due in January?

As an analyst, I approached this task just like any other- I turned to the data. As I do with most of my new projects, I kicked off this project by going to Kaggle.com in search of the data I’m interested in. Within a few minutes of searching, I was able to find a dataset containing all of the US Social Security applications from 1910-2014 and it even included all of the state totals for every year.

With the massive amount of data within this dataset, I was positive that I would find some interesting information … and the data did not disappoint.

Data-Driven Decision Making Visuals

My name ‘Jonathan,’ hit peak popularity in 1988 and has been on the decline since then. I was born right in the thick of it in 1986. Clearly, my parents didn’t look at data trends before picking my name. It’s cool, but also a little weird, to think about how I am a data point in this analysis:

My wife, Ryann, on the other hand is a trend setter, which I already knew. ‘Ryann’ hit its peak back in 2006. Just to be clear, when I say Ryann hit its peak in 2006, I’m referring to all female Ryanns in the world, collectively. My wife, Ryann, has not even begun to hit her peak (she might be reading this). My wife was born in 1986, paving the way for the women who will constantly be asked time after time what the correct pronunciation of their name is. Personally, I like to put a little southern drawl on it and pronounce it Ry-Ann. Based on a data sample size of one, they don’t like this as much as I do.

While it’s interesting to see the history of these names, the reality is that the dataset wasn’t immediately ready to use in visualizations yet. I needed to do some data cleanup and add in some disparate data so that I could create a better story. For example, while sifting through the data, I realized that there were 29 male Ryanns in California that were causing some problems in this dataset. Since I wanted to see a total sum of names for each gender, I had to separate them by gender. Another feature I wanted to work in was a “randomizer” so that with the click of a button, you could stumble upon names and explore the data.

How to Clean Your Data

The next step was pulling in the meaning of the first names, which is easy if you can find data in a clean format. That wasn’t the case this time. Instead, I had to download a PDF, convert the file to Excel and then use Veera Construct’s “stack node” to append all of the worksheets together automatically, and then clean up the data. It’s rare to find perfectly clean data to start with, but Veera Construct makes cleaning up dirty data much more enjoyable and efficient.

Other features I wanted to add in were popularity rankings and a random value that would allow me to dynamically search the data based on a random sliding bar. I used Veera Construct to quickly create both of these measures by using the “quantile node” to create percentile rankings based on the total count of each name. Then I used the “transform node” to calculate a random value and rank it between one and 33,170, which is the number of unique names represented in the data.

With the ability to add all of these steps rapidly and easily without having to write a single line of code, I was able to manipulate the data to get exactly what I wanted. The below image depicts the actual job that I created in Veera Construct and shows where the data is being directly outputted to the Tableau data extract.

With the finished product, I am able to either search for specific names, or randomly mine the data for names we may be interested in. At a quick glance, I can see where the names are most popular and the breakdown of whether a certain name is typically given to males or females.

For those expecting parents curious about data-driven decision making for your baby’s name, check out my dashboard below. If you have suggestions for additional data to include, let me know.

And if you’re like any of our friends and family and have name suggestions, I’m happy to hear those too.