Calculating Spinning Baseballs

February 16, 2017 by Scott Curtis

During some recent downtime, I downloaded the raw pitch-by-pitch data for the 2016 MLB season. The complete dataset is over 725,000 records. While certainly not GB-scale, a 700k set of records provides a decent population to work with.

I set a few goals for myself while working with this data:

Become more fluent in Python, and specifically pandas
Kick the tires of the new version 5 of the ELK stack
Explore AWS’s new Quicksight product

Goal the first: Python and pandas

February 16, 2017 by Scott Curtis

Shifting to a Python and pandas mindset for performing the data import and data cleansing was a good challenge for me. As I worked through the subsequent calculations, I started to think about possible uses for the flexibility and power that this approach provided. It’s a really complementary set of skills for anyone with a strong SQL background and who dedicates lots of time to data analysis.

Goal the second: Elasticsearch, Logstash, & Kibana (ELK)

February 16, 2017 by Scott Curtis

Starting from scratch with ELK v5 was a lot of fun. And after replicating the baseball calculations in a Kibana dashboard, I asked several questions of the data by running searches of the play-by-play descriptions. The combination of basic visualizations with Kibana and data exploration in Elasticsearch is where the solution really shines for data analysis situations.

Goal the third: Amazon Quicksight

February 16, 2017 by Scott Curtis

Amazon announced AWS Quicksight at re:Invent 2015 and then released it publicly more than a full year later. After taking Amazon’s Quicksight for a “quick” spin, I’m very bullish on the future of the product. It’s pretty simple to get up and going, and I have every confidence in AWS’s commitment to continued deployment of new features.

Music, Sports, and Microservices

October 21, 2016 by Scott Curtis

The success of any product is whether the customer uses it — and, in nearly all cases — pays for it. I’m going to describe on how I have personally moved away from the monolithic ESPN for my sports consumption, moving instead toward sports “microservices.” For the first time ever, I am seriously considering killing my cable/dish subscription with little feelings of loss for ESPN. Here are a couple of examples.

Getting familiar with Kibana

December 29, 2015 by Scott Curtis

We’ve been using Elasticsearch a lot over the past year. It’s a fantastic distributed data store and can do lots more than just power search in your application. Elastic’s ELK stack includes Elastic, Logstash, and Kibana. Elastic and Logstash each merit their own discussion. Right now I want to focus just Kibana, which is how you can explore and visualize data in an Elasticsearch index.

Pasting Data

February 14, 2015 by Scott Curtis

I’ve found that on many occasions I want to take a quick look at some data without going through the process of extracting a data set or querying a database. Maybe a colleague has sent me a spreadsheet, and I just want to quickly visualize the data set. Maybe I just want to take a quick look at my progress working on an analysis and see if the data are telling me what I think they are.

The ability to paste data directly into JMP or Tableau is probably one of my favorite underestimated features. Here’s how I do it:

Visualizing the Polyglots

April 26, 2014 by Scott Curtis

I've had a lot of conversations lately regarding the adoption of polyglot programming and persistence in applications and database solutions. And if not actual adoption, at least moving toward adoption of the concepts and approach of a polyglot world. I get excited about our solutions harnessing the best features of available technologies.