During some recent downtime, I downloaded the raw pitch-by-pitch data for the 2016 MLB season. The complete dataset is over 725,000 records. While certainly not GB-scale, a 700k set of records provides a decent population to work with.
I set a few goals for myself while working with this data:
- Become more fluent in Python, and specifically pandas
- Kick the tires of the new version 5 of the ELK stack
- Explore AWS’s new Quicksight product