Calculating Spinning Baseballs

During some recent downtime, I downloaded the raw pitch-by-pitch data for the 2016 MLB season. The complete dataset is over 725,000 records. While certainly not GB-scale, a 700k set of records provides a decent population to work with. 

I set a few goals for myself while working with this data:

  • Become more fluent in Python, and specifically pandas
  • Kick the tires of the new version 5 of the ELK stack
  • Explore AWS’s new Quicksight product
Read More

Goal the first: Python and pandas

Shifting to a Python and pandas mindset for performing the data import and data cleansing was a good challenge for me. As I worked through the subsequent calculations, I started to think about possible uses for the flexibility and power that this approach provided. It’s a really complementary set of skills for anyone with a strong SQL background and who dedicates lots of time to data analysis.

Read More

Goal the second: Elasticsearch, Logstash, & Kibana (ELK)

Starting from scratch with ELK v5 was a lot of fun. And after replicating the baseball calculations in a Kibana dashboard, I asked several questions of the data by running searches of the play-by-play descriptions. The combination of basic visualizations with Kibana and data exploration in Elasticsearch is where the solution really shines for data analysis situations. 

Read More

Music, Sports, and Microservices

The success of any product is whether the customer uses it — and, in nearly all cases — pays for it. I’m going to describe on how I have personally moved away from the monolithic ESPN for my sports consumption, moving instead toward sports “microservices.” For the first time ever, I am seriously considering killing my cable/dish subscription with little feelings of loss for ESPN. Here are a couple of examples.

Read More

Getting familiar with Kibana

We’ve been using Elasticsearch a lot over the past year. It’s a fantastic distributed data store and can do lots more than just power search in your application. Elastic’s ELK stack includes Elastic, Logstash, and Kibana. Elastic and Logstash each merit their own discussion. Right now I want to focus just Kibana, which is how you can explore and visualize data in an Elasticsearch index. 

Read More

Pasting Data

I’ve found that on many occasions I want to take a quick look at some data without going through the process of extracting a data set or querying a database. Maybe a colleague has sent me a spreadsheet, and I just want to quickly visualize the data set. Maybe I just want to take a quick look at my progress working on an analysis and see if the data are telling me what I think they are. 

The ability to paste data directly into JMP or Tableau is probably one of my favorite underestimated features. Here’s how I do it:

Read More