A Delightfully Nerdy Undertaking: Using Cubs' 2016 xBABIP to Project 2017 Regression

## A Delightfully Nerdy Undertaking: Using Cubs’ 2016 xBABIP to Project 2017 Regression

Aspiring sabermetricians and humble baseball fans alike have, for years, used Batting Average on Balls in Play (BABIP) to provided additional context for a player’s performance. But, in the way baseball analysis is always moving toward the next thing – desiring to strip away as much noise, and provide as much projection, as possible – we have increasingly desired to know what a player’s BABIP should have been.

Allow me some necessary background.

The concept of Expected Batting Average of Balls in Play (xBABIP) seems to be evolving constantly. From Jeff Sullivan’s January 2014 RotoGraphs post came with a July update, and that developed into Mike Podhorzer’s “bestest xBABIP equation yet” – which led to a May 2015 formula update from Alex Chamberlain that eventually evolved into the most recent update from July 2016.

That is to say, what we’re going to get into today is something of a shifting ground, for which there will undoubtedly be improvement in the future. But it’s still very useful and wildly interesting.

Now allow me a second layer of necessary background; to understand xBABIP and its utility, we first have to dig in a little on BABIP.

BABIP is self-explanatory insofar as it is used to measure how often a player gets a hit when a ball is put in play. It is useful, among other things, for analyzing a player’s batting average in a given season or during a given stretch, and providing context for that number (which, in turn, impacts OBP and SLG).

Because the rate at which balls put into play fall in for hits is not entirely within a batter’s control, knowing that he had an enormous BABIP (relative to his career mark, for example) during a given stretch of baseball might suggest he’s had some good luck in where the balls happen to land. (For a more involved explanation of how BABIP is calculated and why we care, see this FanGraphs glossary entry.)

But BABIP is not entirely out of the batter’s control, either. A batter’s speed can impact the number of grounders he beats out, and, more importantly, the quality of his contact when he puts the ball in play can make it more difficult, in the aggregate, for defenses to record outs on his balls in play.

Wouldn’t it be nice if we could figure out how much of a player’s BABIP was earned, and how much of it was luck?

Ta-da! That’s why xBABIP exists! Full circle.

xBABIP is an attempt to measure how much of a player’s BABIP is luck and how much is deserved. We can go back all the way to 2013 and check out the original model for predictive BABIP via The Hardball Times, which had the goal to make make a distinction between BABIP coming from a player’s skill and coming from variance.

Chamberlain’s most recent updated xBABIP formula (which you can read for yourself here) uses Baseball Info Solutions’ batted ball data, and incorporates things line drive, fly ball, and infield fly percentages, directionality, hard hit rate, and FanGraphs’ player speed metric (Spd). No one will argue that this xBABIP formula is perfect, but it does offer a considered approach to actually quantifying that sense of “ah, that guy’s BABIP was way too high last year, he’s gonna regress.”

Michael, Luis, and I decided to use the formula to calculate the 2016 xBABIP figures for projected contributors to the Cubs in 2017. The following table sorts expected Cubs contributors by the difference between their actual 2016 BABIP and their 2016 xBABIP, with larger negative numbers suggesting bad luck (and thus possible future positive regression):

PlayerBABIPxBABIPDiff.
Miguel Montero.249.283-.034
Jason Heyward.266.288-.022
Ben Zobrist.290.309-.019
Albert Almora.315.327-.012
Tommy La Stella.319.324-.005
Anthony Rizzo.309.292.017
Kris Bryant.332.313.019
Matt Szczur.305.275.030
Willson Contreras.339.301.038
Javier Baez.336.288.048

So, what can we pull from this data grouping? Among other things:

• It’s possible that Miguel Montero’s xBABIP could give the Cubs hope for an offensive bounce back from their No. 2 catcher. His xBABIP led the way for underperformance for the Cubs in 2016, and – while I know what I’m about to say opens a statistical can of worms, I’m saying it only for illustrative purposes – if you add those 34 points of BABIP to his 2016 performance, he suddenly has a slugging percentage near .400 (close to his career average), and an OBP near .350 (better than his career average). Is it possible Montero, who hit .216/.327/.357 (83 wRC+, with otherwise typical and solid peripherals), did not perform badly in his individual plate appearances last year? And instead was just the unfortunate recipient of bad results? This xBABIP and these numbers, together with his usual peripherals, all strongly suggest that the low batting average was mostly bad luck.
• Jason Heyward (-.022) doesn’t lead the way for underperformers, but comes close. We know that quality of contact was a real problem for him in 2016, which is a big part of why his BABIP was so low. On the other hand, this data suggests his results were even worse than you would have expected after considering the poor contact. Even a .288 BABIP would have been 21 points below his career average, so, yes, it’s fair to suggest that last season’s results for Heyward were the product of poor contact *AND* bad luck on top of that. Hopefully his swing changes can lead to some better results.
• Javier Baez is the greatest overperformer by a significant margin, suggesting he was the frequent beneficiary of balls that happened to find holes. That makes him someone to keep an eye on next year, and especially the quality of his contact.
• Willson Contreras isn’t too far behind on the overperformer scale, and he appears to be dinged in the formula for having a low line drive rate (17.9%, versus 20.7% league average). But it’s worth noting he posted the fourth highest hard-hit rate of the players listed (32.3%, versus 31.4% league average).
• Ben Zobrist surprisingly makes the list as an underperformer. He does put a lot of balls in play and often provides hard-hit balls – the third highest among returning players.
• Both Kris Bryant and Anthony Rizzo show up as moderate overperformers – a fluke for both, or something about their contact that generates a better than expected BABIP? We’ll see, but this approach suggests each could be in for a slight BABIP regression in 2017. Given their disproportionate importance to the offense, a 15 to 20 point BABIP drop for each player would sting.
• Jon Jay has carried a high BABIP throughout his career, so maybe it shouldn’t be surprising that he significantly beat his xBABIP, which would have been the best on the team had he been on it in 2016. (Get to know the Cubs’ new outfielder here.)
• The lowest xBABIP on the team belonged to Matt Szczur. Perhaps he could borrow the bats he lent to Anthony Rizzo during the postseason for better contact?
• Note: Kyle Schwarber is going to be major contributor to the 2017 Cubs, but he essentially did not have 2016 data, and was thus not included.

If history is our guide, this won’t be the last xBABIP formula update we’ll see. For now, it’s the one we’ve got, and it reveals some insight into how “deserved” various Cubs batters’ BABIPs were last year. It also offers guidance for watching more closely in 2017, and guarding against the bias that might creep in when it comes to player performance versus player results.

Nerds out.

Luis Medina and Michael Cerami contributed to this post.