Update 2018-Apr-23: This dataset is now the BabbyNames sample database, available on GitHub here: https://github.com/LitKnd/BabbyNames
A little while back, I wrote about free sample datasets on the internet.
One of my favorite datasets is the list of names given to babies in the United States each year since 1880. The data is simple enough to be small, and just weird enough to be fascinating. The data’s published in 134 .csv files (and counting– it’s one per year), which takes a little fiddling to get into a database.
So I did the work for you. I imported the files into SQL Server, did a little normalization, and set up a compressed backup file for download.
Download the Database
The backup file is small by design – this first published version is under 11MB.
The restored database size is 1.5GB (1 GB datafile, 512MB log file).
Restore it And Start Querying
After you download sqlindexworkbook.zip, take it for a test restore and run some basic queries.
I love this sample dataset for the portability of the backup file, and I have lots of cool scripts in the works that will be coming to you soon on this blog. We’ll expand the data, manipulate it, and work through all sorts of indexing problems.
And I promise to update the sample database with 2015 baby names as soon as they’re published. 🙂
Viva la index!