Bleacher Nation is on Facebook, and you should totally "Like" us:

Bleacher Nation is also on Twitter, and you should totally follow us:

### Upcoming Calendar Events

There are no forthcoming calendar events

# Want to back up that argument with plots?!?!?

5 replies to this topic

Bleacher Bum

• Members
• 143 posts

Posted 12 August 2013 - 09:51 PM

Stats arguments always give way to probability arguments.  But, alas!  Where is a Cubs fan to get rapid-fire assessment of whether a guy is doing differently in one situation or another?!?!?

So, here is R-code for doing just that.  It's not horribly user friendly, but it's not horribly user cruel, either.  You can download R for free at: http://www.r-project.org.  Good R programmers make this stuff really easy to use.  I learned to program using punch cards while listening to music on vinyl and wondering why my polyester shirt was so uncomfortable, so screw that.

Let's take a really simple example: is Anthony Rizzo significantly better/worse in "high leverage" (late and close) situations?  Let's just ask if he makes outs more than expected.  We set aside two vectors: safes (all the times he doesn't "choke" and make an out), and PAs (number of plate appearances).

safes<-vector(length=2)     # you could use "well-hit balls" with ABs or K's or anything else that you suspect might differ
PAs<-vector(length=2)       # you could use ABs instead

player<-"Rizzo"                  # replace this with "Castro," "Aaron" or whomever: however, R was written by Yankees fans, so "Jeter" generates "IS GOD" all the time.
scenario1<-"Late, Close"    # you could change this to RiSP or something else if those are the numbers you use
scenario2<-"Regular"

#Here are some values from a couple of days ago courtesy of Fangraphs.
safes[1]<-15
safes[2]<-145
PAs[1]<-49
PAs[2]<-432

# Now we get the probability of these results given his overall stats, and the probability given his separate stats
onerate<-(safes[1]+safes[2])/(PAs[1]+PAs[2])         # this will be OBP here; it could be BA, HR-rate, K-rate, or whatever
onerlnl<-log(dbinom(safes[1],PAs[1],onerate))+log(dbinom(safes[2],PAs[2],onerate))      # this is the log-probability of results given overall OBP or whatever
tworlnl<-log(dbinom(safes[1],PAs[1],safes[1]/PAs[1]))+log(dbinom(safes[2],PAs[2],safes[2]/PAs[2]))  # this is the log-probabiity of the results given separate rates
llr<-pchisq(2*(tworlnl-onerlnl),1)     # the is the log-likelihood ratio test probability.  Basically, if you throw darts at a normal curve and take the natural log of the deviations between the peak of the normal curve and where the dart lands (with "random" based on area, not x-axis), then the sum of those has a chi-square distribution with degrees of freedom = darts - 1

# we want to illustrate this, so we'll get "support" bars: if these don't overlap, then maybe something is up
mlr<-vector(length=2)
ubr<-vector(length=2)
lbr<-vector(length=2)
ml<-vector(length=2)
ubl<-vector(length=2)
lbl<-vector(length=2)

for (i in 1:2) {
mlr[i]<-round(safes[i]/PAs[i],3)
r<-mlr[i]
ml[i]<-dbinom(safes[i],PAs[i],mlr[i])
lbr[i]<-mlr[i]
lbl[i]<-ml[i]
while (abs(log(lbl[i])-log(ml[i]))<1) {
lbr[i]<-lbr[i]-0.01
lbl[i]<-dbinom(safes[i],PAs[i],lbr[i])
}
lbr[i]<-lbr[i]+0.01
lbl[i]<-dbinom(safes[i],PAs[i],lbr[i])
ubr[i]<-mlr[i]
ubl[i]<-ml[i]
while (abs(log(ubl[i])-log(ml[i]))<1) {
ubr[i]<-ubr[i]+0.01
ubl[i]<-dbinom(safes[i],PAs[i],ubr[i])
}
ubr[i]<-ubr[i]-0.01
ubl[i]<-dbinom(safes[i],PAs[i],ubr[i])-0.01
}

# let's plot this!
players<-round(1/(1-llr),3)
situation<-vector(length=2)
situation[1]<-1
situation[2]<-2
plot(situation,mlr,main=player, sub=players, xlab="One player in", ylab="Safe Rate", xlim=c(0,3),ylim=c(round(min(lbr),2)-0.01,round(max(ubr),2)+0.01),axes=FALSE,type="n") # make plot and label axes, but don't draw
axis(1,at=seq(0,3,by=1),xlab="Situation",xlim=c(0,3),tcl=-0.3,labels=c("",scenario1,scenario2,""))
axis(2,at=seq(round(min(lbr),2)-0.01,round(max(ubr),2)+0.01,by=0.005),ylab="Success Rate",ylim=c(round(min(lbr),2)-0.01,round(max(ubr),2)+0.01),tcl=-0.2,labels=FALSE)
axis(2,at=seq(round(min(lbr),2)-0.01,round(max(ubr),2)+0.01,by=0.02),ylim=c(min(lbr)-0.01,max(ubr)+0.01),tcl=-0.5,las=1)
segments(situation,lbr,situation,ubr)
segments((situation-0.1),lbr,(situation+0.1),lbr)
segments((situation-0.1),ubr,(situation+0.1),ubr)
points(mlr,pch=21,col="#191970",bg="violet")

Run all that through R and you'll get this plot.  The number at the very bottom tells you that one in 1.48 players would show this difference just by chance alone if he was basically the same batter all the time.

You can replace the situations with whatever you please.

Enjoy!

Gods don't play dice with the universe, they are the dice of the universe: our job is to figure out how many sides and dice!

### #2 MichiganGoat

MichiganGoat

Give me a BEER

• Moderators
• 3,771 posts
• LocationGrand Rapids, MI

Posted 13 August 2013 - 06:06 AM

Stop speaking machine on here you damn dirty robot this is 'MERICA.

"There are a lot of guys who are respected but not liked" - Ron Santo

### #3 jh03

jh03

Bleacher Bum

• Members
• 255 posts

Posted 13 August 2013 - 07:24 AM

So, do you want to explain the results you found to those of us that are so stupid we can't comprehend that, without a major headache coming along? Especially so early in the morning... Asking for a friend.....

Bleacher Bum

• Members
• 143 posts

Posted 13 August 2013 - 08:39 AM

Sure, the results present a range of plausible "true" rates (in this case, on base percentage) that predict the outcomes.  The official lingo is "support" bars: all the hypothesized rates within those bars are well-supported, with the single most-likely rates (i.e., the observed!) noted with the dots.  If we are comparing two rates ("clutch" vs. normal, pre-All Star Break vs. post-All Star break, pre-injury vs. post-injury, night vs. day, etc.), then only about one case in 20 should have the support bars fail to overlap.  (In this case, they completely overlap.)

As sample size increases, those support bars will narrow.  As it stands, Rizzo has had so few PAs in "high leverage" situations that we would not be surprised to see his results from an OBP god or an NL pitcher!  (Again, he had had only 49 such PAs: but, let's face it, as you usually get zero "high leverage" PAs in a game, which is almost always "late and close," so the sample size never will be high for any player, at least in any given year.)

Instead of a significance value, I have it report how common this should be: in this case, one in every 1.5 players should have this sort of difference just by chance, or 2 in 3.  I.e., it's a dead common deviation.  (It's one in three who will have the "negative" deviation and one in three who will have the "positive" deviation.)  This is important because there are so many players: we expect one in 20 to have deviations significant at p=0.05 (the classic "significant" result!), but each team has 25 guys, and there are 750 players on MLB rosters on any given day: or nearly 38 "one in 20" guys!

Any way, I know that this often sounds like voodoo to some people, but I figured if I could give people an easy way to see it, then it might clarify when (say) a guy taking 5 walks instead of 10 walks is meaningful.

Gods don't play dice with the universe, they are the dice of the universe: our job is to figure out how many sides and dice!

### #5 Cubbie Blues

Cubbie Blues

The Engineer

• Members
• 3,345 posts
• LocationBloomington, IN

Posted 14 August 2013 - 06:13 AM

Very cool, thanks Doc.

"It's not the dress that makes you look fat, it's the fat that makes you look fat." - Al Bundy

"Ow" - Dylan Bundy

### #6 Luke

Luke

Bleacher Hero

• Members
• 1,056 posts
• LocationMaryland

Posted 17 August 2013 - 11:01 PM

My Awesome-O-Meter just broke.  Looks like I need new one with a higher capacity.

In addition to being a very good stats example and R lesson, this is a beautiful example why all programmers of any language ever should always comment their code in every situation.

#### 0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users

Bleacher Nation is not affiliated in any way with Major League Baseball or the Chicago National League Ballclub (that's the Cubs).