Bleacher Nation is on Facebook, and you should totally "Like" us:

Bleacher Nation is also on Twitter, and you should totally follow us:

Member Since 27 Dec 2011
Offline Last Active Feb 24 2014 06:39 AM

### #52863Want to back up that argument with plots?!?!?

Posted by on 12 August 2013 - 09:51 PM

Stats arguments always give way to probability arguments.  But, alas!  Where is a Cubs fan to get rapid-fire assessment of whether a guy is doing differently in one situation or another?!?!?

So, here is R-code for doing just that.  It's not horribly user friendly, but it's not horribly user cruel, either.  You can download R for free at: http://www.r-project.org.  Good R programmers make this stuff really easy to use.  I learned to program using punch cards while listening to music on vinyl and wondering why my polyester shirt was so uncomfortable, so screw that.

Let's take a really simple example: is Anthony Rizzo significantly better/worse in "high leverage" (late and close) situations?  Let's just ask if he makes outs more than expected.  We set aside two vectors: safes (all the times he doesn't "choke" and make an out), and PAs (number of plate appearances).

safes<-vector(length=2)     # you could use "well-hit balls" with ABs or K's or anything else that you suspect might differ
PAs<-vector(length=2)       # you could use ABs instead

player<-"Rizzo"                  # replace this with "Castro," "Aaron" or whomever: however, R was written by Yankees fans, so "Jeter" generates "IS GOD" all the time.
scenario1<-"Late, Close"    # you could change this to RiSP or something else if those are the numbers you use
scenario2<-"Regular"

#Here are some values from a couple of days ago courtesy of Fangraphs.
safes[1]<-15
safes[2]<-145
PAs[1]<-49
PAs[2]<-432

# Now we get the probability of these results given his overall stats, and the probability given his separate stats
onerate<-(safes[1]+safes[2])/(PAs[1]+PAs[2])         # this will be OBP here; it could be BA, HR-rate, K-rate, or whatever
onerlnl<-log(dbinom(safes[1],PAs[1],onerate))+log(dbinom(safes[2],PAs[2],onerate))      # this is the log-probability of results given overall OBP or whatever
tworlnl<-log(dbinom(safes[1],PAs[1],safes[1]/PAs[1]))+log(dbinom(safes[2],PAs[2],safes[2]/PAs[2]))  # this is the log-probabiity of the results given separate rates
llr<-pchisq(2*(tworlnl-onerlnl),1)     # the is the log-likelihood ratio test probability.  Basically, if you throw darts at a normal curve and take the natural log of the deviations between the peak of the normal curve and where the dart lands (with "random" based on area, not x-axis), then the sum of those has a chi-square distribution with degrees of freedom = darts - 1

# we want to illustrate this, so we'll get "support" bars: if these don't overlap, then maybe something is up
mlr<-vector(length=2)
ubr<-vector(length=2)
lbr<-vector(length=2)
ml<-vector(length=2)
ubl<-vector(length=2)
lbl<-vector(length=2)

for (i in 1:2) {
mlr[i]<-round(safes[i]/PAs[i],3)
r<-mlr[i]
ml[i]<-dbinom(safes[i],PAs[i],mlr[i])
lbr[i]<-mlr[i]
lbl[i]<-ml[i]
while (abs(log(lbl[i])-log(ml[i]))<1) {
lbr[i]<-lbr[i]-0.01
lbl[i]<-dbinom(safes[i],PAs[i],lbr[i])
}
lbr[i]<-lbr[i]+0.01
lbl[i]<-dbinom(safes[i],PAs[i],lbr[i])
ubr[i]<-mlr[i]
ubl[i]<-ml[i]
while (abs(log(ubl[i])-log(ml[i]))<1) {
ubr[i]<-ubr[i]+0.01
ubl[i]<-dbinom(safes[i],PAs[i],ubr[i])
}
ubr[i]<-ubr[i]-0.01
ubl[i]<-dbinom(safes[i],PAs[i],ubr[i])-0.01
}

# let's plot this!
players<-round(1/(1-llr),3)
situation<-vector(length=2)
situation[1]<-1
situation[2]<-2
plot(situation,mlr,main=player, sub=players, xlab="One player in", ylab="Safe Rate", xlim=c(0,3),ylim=c(round(min(lbr),2)-0.01,round(max(ubr),2)+0.01),axes=FALSE,type="n") # make plot and label axes, but don't draw
axis(1,at=seq(0,3,by=1),xlab="Situation",xlim=c(0,3),tcl=-0.3,labels=c("",scenario1,scenario2,""))
axis(2,at=seq(round(min(lbr),2)-0.01,round(max(ubr),2)+0.01,by=0.005),ylab="Success Rate",ylim=c(round(min(lbr),2)-0.01,round(max(ubr),2)+0.01),tcl=-0.2,labels=FALSE)
axis(2,at=seq(round(min(lbr),2)-0.01,round(max(ubr),2)+0.01,by=0.02),ylim=c(min(lbr)-0.01,max(ubr)+0.01),tcl=-0.5,las=1)
segments(situation,lbr,situation,ubr)
segments((situation-0.1),lbr,(situation+0.1),lbr)
segments((situation-0.1),ubr,(situation+0.1),ubr)
points(mlr,pch=21,col="#191970",bg="violet")

Run all that through R and you'll get this plot.  The number at the very bottom tells you that one in 1.48 players would show this difference just by chance alone if he was basically the same batter all the time.

You can replace the situations with whatever you please.

Enjoy!

### #46323Another way to show just how improbable this season is.

Posted by on 25 May 2013 - 07:01 AM

Sure: although I just realized one mistake: I meant to cut or caveat the 1994 and 1995 seasons because of the strike!  I got the deviations in winning by winning percentage and 162 games.  Sure, a couple of teams play 161 games or 163 games some years, but that's not going to affect the differences by more than a small fraction.  However, I completely spaced that those two years had far fewer games.  At any rate, the upshot is that the Padres were on pace to lose 20 too many games.  Regression means that they probably would have ended up with the 1965 Red Sox or so.

### #46317Another way to show just how improbable this season is.

Posted by on 24 May 2013 - 08:33 PM

We had some discussion on the main page about whether the Cubs are doing something really implausible by posting a 0.391 winning percentage.  This follows from a couple of summaries of Cub peripheral statistics suggesting that they should be a 0.530 team or better.

Here is the important number: 0.027.  That's the Cubs' "net" OPS: the offense has an OPS of 0.707 and the Cubs pitching + fielding has allowed an OPS of 0.680.  (In a brain cramp, I wrote that this was 0.037: whoops, took the kid to the zoo earlier and the neurons still were not working.)

Let's say that the Cubs ended the season with a net OPS of 0.027.  How should they do?  Below are the net OPS of all MLB teams from 1962 - 2012:

That's 1338 teams, showing a very tight correlation between the OPS Garnered minus OPS Allowed and winning percentage.  (Net OPS correlates tightly with runs scored / allowed, and run differential correlates tightly with winning, so this shouldn't be a surprise.)  Basically, for every 0.01 a team increases it's Net OPS, you expect 2.14 wins.

So, that means that the Cubs should be on pace to win 87 of 162 games: and with luck, you can make the playoffs with 87 wins.  However, the Cubs winning percentage of 0.391 would give them only 63 wins: a whopping 23 below expectations!

But, you say, OPS is only part of winning.  (Or, you say, it's a made-up stat because your Topps cards didn't have it in 1972.)  With a bit a bad managing, non-clutch hitting, pitching and fielding, and this can happen.  All we need to do is compare the Cubs to other teams that missed by so much....

.... except that there aren't any.  This shows the difference in actual and expected wins, with "actual" based on winning percentage x 162.  (Sometimes teams play 161 or 163 games, so this standardizes for that.)  Net OPS actually explains 80% of the variation in winning percentage, so we actually didn't have much room for many teams on pace for 26 over/under expectations.  Indeed, 50% of teams win within 3.5 games (one way or the other) of expectations.  Only one team, the 1994 Padres, had a winning percentage so far off that they would lose 20 games: but because 1994 was the strike year, they were on pace to lose 20 more games than you'd expect given their net OPS.  (I wish them many more, as I still hate them for defying odds in the other direction 10 years earlier.)  S, the Padres had a record over 0.100 under expected after 117 games, not after 162 games: and had they regressed to their mean (0.500), then they would have come in at about 15 or 16 under.

The biggest "underachievers" over 162 games are the 1965 Red Sox, who managed 17 fewer wins than expected, and two other teams (1962 Mets and 1993 Mets) managed 15 fewer wins than expected.

Now, will the Cubs keep this up?  Almost certainly not.  Let's just say that the Cubs keep playing +0.027 OPS ball.  There have been 118 teams in the last 51 years that finished with net OPS between 0.022 and 0.032 (i.e., with 0.01 of the Cubs).  Only 11 of these teams finished with records under 0.500.  Three more finished at 0.500: which is where the Cubs will finish if they "regress" to the expecation for a +0.027 OPS team.

Of course, the other reason why this won't happen is that if the Cubs aren't a 0.500 team in July, then there is going to be a sell-off: and the remaining team won't be a +0.027 OPS team (probably).  As we don't expect the Cubs to crawl back to 0.500 until the very end of the season, this seems assured!

However, when we start asking "why" then we probably should exclude answers that would apply to whole teams over an entire season.  For example, there have been a lot of really bad managers over the last 51 years: but nobody has managed their team to 23 wins under expectations.  There is the "clutch" aspect: and, of course, as "clutch" over any stretch of games fails to predict "clutch" over the next stretch, this suggests that the bad luck (especially when it comes to slugging with men on base) can't continue.  (I mean, it can't, can it?!?!?)

And, of course, we have to wonder if this isn't a small blessing in disguise.  Is this really a +0.027 team?  Are Wood and Feldman really pitching as well as their OPS Permitted suggests?  Is Valbuena going to keep hitting like this?  An 86 or 87 win team is just tantalizing enough to "go for it": but it's probably not going to make it, especially in this year's NL Central.

However, that's food for another discussion.

### #32508September and the LDS

Posted by on 04 October 2012 - 09:21 AM

By the way, here are the run differentials since 3 September for the 10 teams in question:

Tigers 19

A's 20

Yankees 50

Orioles 38

Rangers -20

Giants 26

Reds -3

Nats 23

Braves 18

Cards 20

The Reds are in pretty bad shape: just as in 2010, they played poorly in September given run differential (75 scored, 78 allowed). For what it's worth, the 78 runs allowed was pretty good: only the Nats allowed fewer. And that tells us a lot about how bad the Reds hitting is right now. As they are playing the +26 Giants (128 scored, 102 allowed), the Giants should be heavily favored.

The Nats played well, but so did the Braves and Cards. Still, the Nats did a little better, so all else being equal (and it's not: the Braves-Cards winner will be a little depleted), expect the Nats to win.

In the AL, the Yankees had by far the best September. What's a little sad is that the O's had the 2nd best September of the post-season: if they get past the Rangers (who were awful), then the Yanks will have the upper hand. Moreover, with this one game format, it probably is only a little better than a coin-flip that the O's reach the LDS! (That sucks, in my unhumble opinion....)

The A's and TIgers basically were the same.

In the AL, the Yanks were on fire this September. (In 2010 and 2011, they had very poor Septembers.)

### #32506September and the LDS

Posted by on 04 October 2012 - 08:19 AM

Very interesting and illuminating. Am I correct that even the September run diffferential is not a good predictor of success after the first round of playoffs?

It seems to be a poor predictor so far. However, the sample size also is half the size, so there has not been as big an opportunity for differences to emerge. Moreover, you often are getting LCS between two hot teams: and it's quite possible that after a certain point (say, +30 for September) it's really all the same.

It's possible that what this really reflects is the tendency to cull teams playing poorly, and they do not make the LDS often enough to create the same effect. For example, the three teams with negative Sept. run-differentials who went on to the LDS all lost there: but that's not enough to really influence a pattern with 20+ examples.

Where can I find the run differentials for this September?

The easiest way to get it is to download the standings data for the end of the season from ESPN or from MLB and the data on August 31s (or 4 weeks before the end; it won't affect things much), and just subtract the August 31 RD from the final RD. That is all I did!

### #27060Mets v. Cubs - June 25, 2012 (TV: CSN+)

Posted by on 25 June 2012 - 08:51 PM

LaHair looked foolish on the last pitch. Almost too foolish

LaHair has looked suckisk for the month.

It's only because his feelings are hurt by being platooned.

### #25867Cubs v. White Sox - June 18, 2012 (TV: WCIU)

Posted by on 18 June 2012 - 08:53 PM

Also, with these conditions, it would be really easy for Garza to give up another couple of HR. As it is, he gets a quality start, and if anybody looks, then they'll see that he probably got it in tough conditions given the Cubs' score. If they don't look, then they'll just see the QS. I'd rather have it so that Theo & Jed can say "it was even better than it looked" than "it wasn't as bad as it looked."

### #15667Breaking down winning in 2012

Posted by on 21 April 2012 - 02:52 PM

Continued!

I.e., Pat Hughes and most of the sports talk radio guys are wrong! The team that gets the most guys on base who fail to score still wins 60% of the time. (This is significant at p=0.002). This is very much an "old-school" vs. "new school" dichotomy. The former tends to look at baseball as a western shoot out, where each side has about the same number of bullets and it's the guys who shoot most accurately who wins. The latter views baseball as trench warfare, where the side with the most men and the most artillery tends to win. In other words, don't get better shooters, get more guns!

Stolen bases are fun, but they don't associate well with winning. What is interesting is that if you look at either just successful steals OR caught stealings, then you do see significant distributions: the team that steals the most AND the team that is caught stealing the most both tend to win! That is, winning teams have more stolen base attempts than do losing teams. The actual correlate is between attempted steals and getting on base. However, at 79-55, losing teams are nearly as apt to outsteal the winner as vice-versa.

(If you restrict this to 1-run games, then the team with the most net steals is 30-23: but the teams with the most HR are 26-14 in the same games, and the team with the most XBH is 31-14.)

The final thing that should standout is that the Cubs do not stand out. Their games fit in the main distributions pretty well. (Again, the sample size is too small to say that any team deviates yet.)

So, if you are left feeling like the Cubs are always just a key hit or two away from winning, well... no, they rarely are. They are getting outperformed in the same way that losing teams typically are so far in 2012.

And, of course, it leads to the answer to the question: if my team is going to be constantly outslugged and out-OBPed, then how do we win? The answer? Get new players, because Mike Leake isn't going to pitch against you every day......

### #15666Breaking down winning in 2012

Posted by on 21 April 2012 - 02:52 PM

I told Luke and a few others that I'd try to do this, as time permits, this season. I've been keeping track of the games in terms of differences between winning and losing teams. As of Friday, we had about 200 games, so our sample size already is bigger than we'll get for any one team.

Each of the lots below shows how much better (or worse) the winning team did in some stat from each game. Because this is a Cubs site, I've separated the Cubs games in dark blue. The stats are not independent: I'm showing home runs, extra base hits (XBH), walks (BB), total bases + walks (TB + BB), left-on-base (LOB), and net stolen bases (NSB; = successful steals - caught stealings). The ∆ (delta) indicates that it's the difference: so, if both teams hit 5 HR, then ∆HR = 0. You can think of this as the difference between how well the offense did and how well the defense (mostly pitching but some fielding) did for the winning team.

200 games into the season says that Earl Weaver was right and Gene Mauch was wrong. The team that hits the most HR is 115-31 so far. We cannot say much about what those 31 games have in common, but one trait stands out already: there, the winning team made up for homers by still getting more extrabase hits than the opposition. So far, they are 140-29. It also says that Dale Sveum is at least partially correct: half of outslugging is hitting, so the Cubs inability to slug has hurt them. (These numbers are highly significant based on any number of tests of the null hypothesis.)

Dusty Baker was wrong: walks don't keep guys from scoring. The team that draws the most walks is 125-57.

When we pool these (as OPS essentially does), then the results are very prominent: the team with the most total bases & walks is 173-19.

### #13761Brett Jackson Anyone?

Posted by on 12 April 2012 - 09:24 AM

Also, Byrd's "horrible start" really does not mean much. He's 1 for 21. Facing league average pitching, you expect one 0.280 hitter in 46 to be on a 1 for 21 streak. His K's are one more than most probable given his career and thus hardly out of whack with his career: if he had 1 fewer K, then his K frequency would be below career average. His walk rate is dead on career levels. Now, he does have no extra base hits: but we expect him to have 1 or 2 at this point: 0 is hardly unexpected.

(The "league average pitching" also is important: 21 ABs is a 4-game series, and 0.300 hitters will have a couple of lost 4-game series every year because they are facing above-average pitching.)

The really telling stat on Byrd is that he's gotten 1 single on 15 balls that he has hit. Usually he gets singles 24% of the time when he hits the ball, which means that we expect 3 or 4 singles instead of 1 right now. And, again, a guy who gets singles 24% of the time is going to have 1 for 15 stretches one time in ten.

Now, does he look bad? Probably: but guys always "look" bad when they don't hit the ball well. The difference is that they "fought it off" if the ball gets past fielders but just "looked bad" if it doesn't.

If Byrd is still hitting like this come May, then worry. But focus on his K, BB & XBH numbers: it's when those change that you really need to worry.

### #10842Perhaps you already have seen this, but....

Posted by on 05 April 2012 - 10:16 AM

.... it's always amusing.

Bleacher Nation is not affiliated in any way with Major League Baseball or the Chicago National League Ballclub (that's the Cubs).