old-computerIn addition to the statistical data with which we’re all generally familiar, a baseball game is packed with data we don’t generally observe. Defensive data, batted ball data, non-pitching velocity data, etc. You may or may not know that this data, all the while, is being collected by radars and cameras all around each Major League ballpark, often for proprietary use by individual teams or by business enterprises with an eye toward licensing whatever information they can cull.

But now, by way of MLB Advanced Media (the media face of the MLB teams), a great deal of that data is going to be available for the public to consume. This is the baseball nerd equivalent of being invited into Willy Wonka’s factory.

You can, and should, read about the plan for the new system here. A snippet:

The goal is to revolutionize the way people evaluate baseball, by presenting for the first time the tools that connect all actions that happen on a field to determine how they work together. This new datastream will enable the industry to understand the whole play on the field — batting, pitching, fielding and baserunning — and enable new metrics for evaluation by clubs, scouts, players and fans.

For instance, on a brilliant, game-saving diving catch by an outfielder, this new system will let us understand what created that outcome. Was it the quickness of his first step, his acceleration? Was it his initial positioning? What if the pitcher had thrown a different pitch? Everything will be connected for the first time, providing a tool for answers to questions like this and more ….

Claudio Silva, PhD and professor of computer science and engineering at NYU Polytechnic School of Engineering, said the biggest challenge was to ensure that the data received reflects actual game play. He said anyone who watches baseball, from club to player to fan, will see a new baseball world that is “completely unexpected.”

“One of the things we had to do to be certain that was the case was to design a whole validation scheme, where we recorded our own video, and designed algorithms that would independently generate some of the metrics to be compared to the data that we were getting out of the vendors,” Silva said. “It’s really very complex algorithms that are going into making this thing work, into the validation process, and actually eventually into all the analysis that people are going to be doing on the metrics.

“One of the goals of what we wanted to achieve was to virtually recreate the game using the geometric data. This actually turns out not to be straightforward. So let’s say you want to compute a player’s speed, you want to compute a ball’s speed. We could actually take the 3D data and match it to a verbal description of the game. This was a very exciting finding for us. You can imagine, it’s kind of like a two-way street. You can use the experts’ opinions to then generate information. You can even imagine other forms of storytelling about a season of a team.”

He called this “a completely new data stream” and added, “To be one of the first few people to have the luxury of looking at the new datastream was a true privilege. I believe that this data is so rich, there are so many interesting things we can do with it, we’re going to be able to comb through this data and find layers and layers of features that we never could see before.”

The system will be operational at Miller Park, Citi Field, and Target Field for 2014, and will come on line at all other parks throughout the year, with a goal of having it working everywhere by Opening Day 2015. The data stream could take a year or so to be fully realized.

The implications of being able to granulize everything that happens in a baseball game like this are enormous, particularly when it comes to defensive metrics and evaluation. Of course, in time, we will almost certainly realize that this system will provide so much more. We’ve seen what having this kind of non-baseball-game-outcome statistical data (anyone got a better way to describe that?) can do when it comes to evaluating pitchers, because we’ve had PitchF/X data available for a while now.

And now folks will debate whether we’re doing harm to our favorite game by focusing on data rather than just watching, and enjoying, the play on the field.

(For my part, I like to enjoy the game both ways – in fact, that’s why baseball remains my favorite sport.)

  • Funn Dave

    Yay, a whole new methodology for us to argue about!

  • itzscott

    I liked it better when watching baseball wasn’t like taking a math exam or doing a Soduko puzzle.

    • Edwin

      I don’t think watching baseball will change, just the way that we’re able to evaluate baseball.

      • http://www.friendly-confines.com hansman

        I see no reason to actually watch a game anymore.

    • Jon


    • Drew7

      Did you even look at the linked article?

      How is a graphic, showing the speed of an outfielder’s 1st step, the trajectory of a flyball, and the distance the OF’er had to go to catch it, “like doing a Soduko (lulz…) puzzle”?

      • http://www.friendly-confines.com hansman

        Because it doesn’t quantify belly fire, scrap or want.

        • TWC

          But it does provide plenty of cover for folks so inclined to denigrate things they don’t, can’t, or choose not to understand.

          • Drew7

            Which is an unfortunate defense mechanism.

          • DarthHater

            Not to mention denigrating things they don’t, can’t, or choose not to spell correctly.

            • http://www.friendly-confines.com hansman

              Soduko puzzles tuk away his spelling ability.

    • http://www.michigangoat.blogspot.com MichiganGoat

      Of course because complaining about a spring training lineup is so much more productive.

  • http://www.bleachernation.com Luke

    I would love to get my hands on that data set, and I can’t wait for this (and Pitch f/x) to show up in every minor league stadium.

    • DocPeterWimsey

      Ditto for me. The “efficiency” metric (as the player runs / as the crow flies) is going to be really interesting, and that coupled with the the distance of “as the player runs” should really revolutionize range statistics.

      (This is what the UZR folks have been trying to get at for years, but they use sighting data so they cannot measure the “as the player runs” part well.)

  • mjhurdle

    RIP “eye test”.

    You provided generations of fans the perfect way to validate any opinion they had. You were a cherished tool that always seemed to swing the tide of any argument. You were famous for the ability to be wielded by anyone losing an argument’. A simple mention of the ‘eye-test’ combined with a derogatory shot at ‘nerds’ or ‘people that spend all their time in basements and have never played baseball in real life” and the argument was over.
    But not even the rock-solid, un-quantifiable accuracy of the “eye-test” can possibly withstand this latest assault on old-school baseball by these new school Saber-Nerds.

    You will be missed…

    • http://www.michigangoat.blogspot.com MichiganGoat

      Oh the eye test will never go away, it just will become sillier to listen to.

  • ssckelley


  • Featherstone

    BP has a wonderful article about this new system today.


  • Patrick W.

    As someone who loves data (it is my livelihood) I don’t think it’s fair to just assume people who prefer to watch the game the way we all used to are somehow less smart/evolved/involved as those of us who like the numbers. Baseball is beautiful just in the act. The reason it became so popular is that dance we see with every pitch, the song we hear when leather meets wood, that smell of grass and beer and food and people. I get it. I can see enjoying the game for the aesthetic aspects and not needing the statistics to increase that enjoyment.

    • DarthHater

      Of course it isn’t fair to assume that. But by the same token, people who want to appreciate the beauty of the game the old-fashioned way, don’t need to make smart alecky remarks that the mere existence of statistical analysis somehow turns their own experience of the game into a math exam.

      • Patrick W.

        Maybe it was more of a recognition of how advancements in understanding parallel advances in time and advances in time wend us ever closer to the end of it, and thus a wOBA becomes a stand-in for picking a life – insurance beneficiary; Defensive Runs Saved is a surrogate for reserving a cemetery plot; Route Efficiency the substitute for a date to be carved on the right side of the dash on a tombstone.

  • Darth Ivy

    test test

  • Darth Ivy

    BN poll:

    tube socks and crocs


    dog with head cone thing

    • DarthHater

      dog, definitely

      • Darth Ivy


  • Cheese Chad

    I didn’t read any comments before writing this: Will this data show, for instance, the batter was late on a pitch. The ball was thrown at this speed and he swings the bat this speed. The batter would have had to start his swing at this second to catch up to the speed of the pitch. I would find that info to be really intriguing.

  • Pingback: A Modest Offseason for the Cubs and Other Bullets | Bleacher Nation | Chicago Cubs News, Rumors, and Commentary()

  • Pingback: Nerd Alert: More Details on the New Stream of Game Data MLBAM Will Soon Activate (VIDEO) | Bleacher Nation | Unofficial Chicago Cubs News, Rumors, and Commentary()