Bleacher Nation is on Facebook, and you should totally "Like" us:

Bleacher Nation is also on Twitter, and you should totally follow us:

Member Since 27 Dec 2011
Offline Last Active Feb 24 2014 06:39 AM

### Want to back up that argument with plots?!?!?

12 August 2013 - 09:51 PM

Stats arguments always give way to probability arguments.  But, alas!  Where is a Cubs fan to get rapid-fire assessment of whether a guy is doing differently in one situation or another?!?!?

So, here is R-code for doing just that.  It's not horribly user friendly, but it's not horribly user cruel, either.  You can download R for free at: http://www.r-project.org.  Good R programmers make this stuff really easy to use.  I learned to program using punch cards while listening to music on vinyl and wondering why my polyester shirt was so uncomfortable, so screw that.

Let's take a really simple example: is Anthony Rizzo significantly better/worse in "high leverage" (late and close) situations?  Let's just ask if he makes outs more than expected.  We set aside two vectors: safes (all the times he doesn't "choke" and make an out), and PAs (number of plate appearances).

safes<-vector(length=2)     # you could use "well-hit balls" with ABs or K's or anything else that you suspect might differ
PAs<-vector(length=2)       # you could use ABs instead

player<-"Rizzo"                  # replace this with "Castro," "Aaron" or whomever: however, R was written by Yankees fans, so "Jeter" generates "IS GOD" all the time.
scenario1<-"Late, Close"    # you could change this to RiSP or something else if those are the numbers you use
scenario2<-"Regular"

#Here are some values from a couple of days ago courtesy of Fangraphs.
safes[1]<-15
safes[2]<-145
PAs[1]<-49
PAs[2]<-432

# Now we get the probability of these results given his overall stats, and the probability given his separate stats
onerate<-(safes[1]+safes[2])/(PAs[1]+PAs[2])         # this will be OBP here; it could be BA, HR-rate, K-rate, or whatever
onerlnl<-log(dbinom(safes[1],PAs[1],onerate))+log(dbinom(safes[2],PAs[2],onerate))      # this is the log-probability of results given overall OBP or whatever
tworlnl<-log(dbinom(safes[1],PAs[1],safes[1]/PAs[1]))+log(dbinom(safes[2],PAs[2],safes[2]/PAs[2]))  # this is the log-probabiity of the results given separate rates
llr<-pchisq(2*(tworlnl-onerlnl),1)     # the is the log-likelihood ratio test probability.  Basically, if you throw darts at a normal curve and take the natural log of the deviations between the peak of the normal curve and where the dart lands (with "random" based on area, not x-axis), then the sum of those has a chi-square distribution with degrees of freedom = darts - 1

# we want to illustrate this, so we'll get "support" bars: if these don't overlap, then maybe something is up
mlr<-vector(length=2)
ubr<-vector(length=2)
lbr<-vector(length=2)
ml<-vector(length=2)
ubl<-vector(length=2)
lbl<-vector(length=2)

for (i in 1:2) {
mlr[i]<-round(safes[i]/PAs[i],3)
r<-mlr[i]
ml[i]<-dbinom(safes[i],PAs[i],mlr[i])
lbr[i]<-mlr[i]
lbl[i]<-ml[i]
while (abs(log(lbl[i])-log(ml[i]))<1) {
lbr[i]<-lbr[i]-0.01
lbl[i]<-dbinom(safes[i],PAs[i],lbr[i])
}
lbr[i]<-lbr[i]+0.01
lbl[i]<-dbinom(safes[i],PAs[i],lbr[i])
ubr[i]<-mlr[i]
ubl[i]<-ml[i]
while (abs(log(ubl[i])-log(ml[i]))<1) {
ubr[i]<-ubr[i]+0.01
ubl[i]<-dbinom(safes[i],PAs[i],ubr[i])
}
ubr[i]<-ubr[i]-0.01
ubl[i]<-dbinom(safes[i],PAs[i],ubr[i])-0.01
}

# let's plot this!
players<-round(1/(1-llr),3)
situation<-vector(length=2)
situation[1]<-1
situation[2]<-2
plot(situation,mlr,main=player, sub=players, xlab="One player in", ylab="Safe Rate", xlim=c(0,3),ylim=c(round(min(lbr),2)-0.01,round(max(ubr),2)+0.01),axes=FALSE,type="n") # make plot and label axes, but don't draw
axis(1,at=seq(0,3,by=1),xlab="Situation",xlim=c(0,3),tcl=-0.3,labels=c("",scenario1,scenario2,""))
axis(2,at=seq(round(min(lbr),2)-0.01,round(max(ubr),2)+0.01,by=0.005),ylab="Success Rate",ylim=c(round(min(lbr),2)-0.01,round(max(ubr),2)+0.01),tcl=-0.2,labels=FALSE)
axis(2,at=seq(round(min(lbr),2)-0.01,round(max(ubr),2)+0.01,by=0.02),ylim=c(min(lbr)-0.01,max(ubr)+0.01),tcl=-0.5,las=1)
segments(situation,lbr,situation,ubr)
segments((situation-0.1),lbr,(situation+0.1),lbr)
segments((situation-0.1),ubr,(situation+0.1),ubr)
points(mlr,pch=21,col="#191970",bg="violet")

Run all that through R and you'll get this plot.  The number at the very bottom tells you that one in 1.48 players would show this difference just by chance alone if he was basically the same batter all the time.

You can replace the situations with whatever you please.

Enjoy!

### Another way to show just how improbable this season is.

24 May 2013 - 08:33 PM

We had some discussion on the main page about whether the Cubs are doing something really implausible by posting a 0.391 winning percentage.  This follows from a couple of summaries of Cub peripheral statistics suggesting that they should be a 0.530 team or better.

Here is the important number: 0.027.  That's the Cubs' "net" OPS: the offense has an OPS of 0.707 and the Cubs pitching + fielding has allowed an OPS of 0.680.  (In a brain cramp, I wrote that this was 0.037: whoops, took the kid to the zoo earlier and the neurons still were not working.)

Let's say that the Cubs ended the season with a net OPS of 0.027.  How should they do?  Below are the net OPS of all MLB teams from 1962 - 2012:

That's 1338 teams, showing a very tight correlation between the OPS Garnered minus OPS Allowed and winning percentage.  (Net OPS correlates tightly with runs scored / allowed, and run differential correlates tightly with winning, so this shouldn't be a surprise.)  Basically, for every 0.01 a team increases it's Net OPS, you expect 2.14 wins.

So, that means that the Cubs should be on pace to win 87 of 162 games: and with luck, you can make the playoffs with 87 wins.  However, the Cubs winning percentage of 0.391 would give them only 63 wins: a whopping 23 below expectations!

But, you say, OPS is only part of winning.  (Or, you say, it's a made-up stat because your Topps cards didn't have it in 1972.)  With a bit a bad managing, non-clutch hitting, pitching and fielding, and this can happen.  All we need to do is compare the Cubs to other teams that missed by so much....

.... except that there aren't any.  This shows the difference in actual and expected wins, with "actual" based on winning percentage x 162.  (Sometimes teams play 161 or 163 games, so this standardizes for that.)  Net OPS actually explains 80% of the variation in winning percentage, so we actually didn't have much room for many teams on pace for 26 over/under expectations.  Indeed, 50% of teams win within 3.5 games (one way or the other) of expectations.  Only one team, the 1994 Padres, had a winning percentage so far off that they would lose 20 games: but because 1994 was the strike year, they were on pace to lose 20 more games than you'd expect given their net OPS.  (I wish them many more, as I still hate them for defying odds in the other direction 10 years earlier.)  S, the Padres had a record over 0.100 under expected after 117 games, not after 162 games: and had they regressed to their mean (0.500), then they would have come in at about 15 or 16 under.

The biggest "underachievers" over 162 games are the 1965 Red Sox, who managed 17 fewer wins than expected, and two other teams (1962 Mets and 1993 Mets) managed 15 fewer wins than expected.

Now, will the Cubs keep this up?  Almost certainly not.  Let's just say that the Cubs keep playing +0.027 OPS ball.  There have been 118 teams in the last 51 years that finished with net OPS between 0.022 and 0.032 (i.e., with 0.01 of the Cubs).  Only 11 of these teams finished with records under 0.500.  Three more finished at 0.500: which is where the Cubs will finish if they "regress" to the expecation for a +0.027 OPS team.

Of course, the other reason why this won't happen is that if the Cubs aren't a 0.500 team in July, then there is going to be a sell-off: and the remaining team won't be a +0.027 OPS team (probably).  As we don't expect the Cubs to crawl back to 0.500 until the very end of the season, this seems assured!

However, when we start asking "why" then we probably should exclude answers that would apply to whole teams over an entire season.  For example, there have been a lot of really bad managers over the last 51 years: but nobody has managed their team to 23 wins under expectations.  There is the "clutch" aspect: and, of course, as "clutch" over any stretch of games fails to predict "clutch" over the next stretch, this suggests that the bad luck (especially when it comes to slugging with men on base) can't continue.  (I mean, it can't, can it?!?!?)

And, of course, we have to wonder if this isn't a small blessing in disguise.  Is this really a +0.027 team?  Are Wood and Feldman really pitching as well as their OPS Permitted suggests?  Is Valbuena going to keep hitting like this?  An 86 or 87 win team is just tantalizing enough to "go for it": but it's probably not going to make it, especially in this year's NL Central.

However, that's food for another discussion.

Bleacher Nation is not affiliated in any way with Major League Baseball or the Chicago National League Ballclub (that's the Cubs).