Nerd alert.
Aspiring sabermetricians and humble baseball fans alike have, for years, used Batting Average on Balls in Play (BABIP) to provided additional context for a player’s performance. But, in the way baseball analysis is always moving toward the next thing – desiring to strip away as much noise, and provide as much projection, as possible – we have increasingly desired to know what a player’s BABIP should have been.
Allow me some necessary background.
The concept of Expected Batting Average of Balls in Play (xBABIP) seems to be evolving constantly. From Jeff Sullivan’s January 2014 RotoGraphs post came with a July update, and that developed into Mike Podhorzer’s “bestest xBABIP equation yet” – which led to a May 2015 formula update from Alex Chamberlain that eventually evolved into the most recent update from July 2016.
That is to say, what we’re going to get into today is something of a shifting ground, for which there will undoubtedly be improvement in the future. But it’s still very useful and wildly interesting.
[adinserter block=”1″]
Now allow me a second layer of necessary background; to understand xBABIP and its utility, we first have to dig in a little on BABIP.
BABIP is self-explanatory insofar as it is used to measure how often a player gets a hit when a ball is put in play. It is useful, among other things, for analyzing a player’s batting average in a given season or during a given stretch, and providing context for that number (which, in turn, impacts OBP and SLG).
Because the rate at which balls put into play fall in for hits is not entirely within a batter’s control, knowing that he had an enormous BABIP (relative to his career mark, for example) during a given stretch of baseball might suggest he’s had some good luck in where the balls happen to land. (For a more involved explanation of how BABIP is calculated and why we care, see this FanGraphs glossary entry.)
But BABIP is not entirely out of the batter’s control, either. A batter’s speed can impact the number of grounders he beats out, and, more importantly, the quality of his contact when he puts the ball in play can make it more difficult, in the aggregate, for defenses to record outs on his balls in play.
Wouldn’t it be nice if we could figure out how much of a player’s BABIP was earned, and how much of it was luck?
Ta-da! That’s why xBABIP exists! Full circle.
[adinserter block=”2″]
xBABIP is an attempt to measure how much of a player’s BABIP is luck and how much is deserved. We can go back all the way to 2013 and check out the original model for predictive BABIP via The Hardball Times, which had the goal to make make a distinction between BABIP coming from a player’s skill and coming from variance.
Chamberlain’s most recent updated xBABIP formula (which you can read for yourself here) uses Baseball Info Solutions’ batted ball data, and incorporates things line drive, fly ball, and infield fly percentages, directionality, hard hit rate, and FanGraphs’ player speed metric (Spd). No one will argue that this xBABIP formula is perfect, but it does offer a considered approach to actually quantifying that sense of “ah, that guy’s BABIP was way too high last year, he’s gonna regress.”
Michael, Luis, and I decided to use the formula to calculate the 2016 xBABIP figures for projected contributors to the Cubs in 2017. The following table sorts expected Cubs contributors by the difference between their actual 2016 BABIP and their 2016 xBABIP, with larger negative numbers suggesting bad luck (and thus possible future positive regression):
Player | BABIP | xBABIP | Diff. |
---|---|---|---|
Miguel Montero | .249 | .283 | -.034 |
Jason Heyward | .266 | .288 | -.022 |
Ben Zobrist | .290 | .309 | -.019 |
Addison Russell | .277 | .289 | -.012 |
Albert Almora | .315 | .327 | -.012 |
Tommy La Stella | .319 | .324 | -.005 |
Anthony Rizzo | .309 | .292 | .017 |
Kris Bryant | .332 | .313 | .019 |
Matt Szczur | .305 | .275 | .030 |
Jon Jay (with Padres) | .371 | .336 | .035 |
Willson Contreras | .339 | .301 | .038 |
Javier Baez | .336 | .288 | .048 |
[adinserter block=”3″]
So, what can we pull from this data grouping? Among other things:
[adinserter block=”4″]
If history is our guide, this won’t be the last xBABIP formula update we’ll see. For now, it’s the one we’ve got, and it reveals some insight into how “deserved” various Cubs batters’ BABIPs were last year. It also offers guidance for watching more closely in 2017, and guarding against the bias that might creep in when it comes to player performance versus player results.
Nerds out.
Luis Medina and Michael Cerami contributed to this post.
[adinserter block=”5″]