Maybe "dLOB%" would be more "honest," but is that the stat that the "radio guys" use? (whoa, too many quotations) No, they simply cite LOB, and say how it is a bad thing, which Doc clearly demonstrates hear that it really isn't all that bad, and that in fact, it can be used as a proxy for how many men you get on base (a good thing). And, if we asked nicely, I'm sure Doc would show us a graph that probably shows that "dLOB%" doesn't vary much between winning and losing teams, at least not as much as some of the other important numbers he listed above. (HR, XBH, BB, etc.) That would be my hypothesis anyways...Hey, Doc, this is pretty cool and all, but I'm not a fan of that LOB graph. Its misleading to say "The team that gets the most guys on base who fail to score still wins 60% of the time, therefore those radio guys are wrong", cause that number isnt reflecting left on base numbers, but rather total runners reaching base. The team that is leaving more men on base is winning because they're simply getting more men on base.
Also, those "radio guys" always use LOB as a proxy for "squandering chances", and the general consensus is that if you fail to plate guys in a large amount of your chances, you're going to lose. The data you've provided doesn't show that sentiment to be wrong. I think a much more honest graph would be a dLOB% (LOB/Total Baserunners) graph
Breaking down winning in 2012
Posted 23 April 2012  06:50 AM
Posted 23 April 2012  09:56 AM
Regarding the outliers, it was the A's who somehow beat the Angels despite the Angels getting so many more total bases and baseonballs. As they say, "stuff happens." We'll need at least a season's worth of data, but I can see two viable hypotheses here. There is the "bad teams find a way to lose" hypothesis: i.e., a bad team will throw away a game despite getting so many more hits and walks due to errors, lack of clutch hits, etc. Alternatively, there is the "good teams will have bad luck, too" hypothesis: i.e., it will be the good teams that consistently get more TB+BB than the opposition, and thus have the most opportunities to accidentally lose one of those games.
Finally, Luke and others asked about the proportion of baserunners who fail to score. Winning teams tend to leave more runners on base than do losing teams. However, a team that scores 4 runs on 10 hits/walks was more efficient than a team that scores 2 runs on 6 hits/walks despite leaving 2 more men on base (40% vs. 33%).
However, I think that we need to factor out homers to get at what people really want to know: are winning teams better at getting men on base home than losing teams? In the example above, say that the winning team hit 1 HR and the losing team hit 0. That means that Team A got 3 runs on 9 nonhomers & walks, whereas Team B got 2 runs on 6 nonhomers/walks. Now, both are 33% again.
What we really find (and I'll try to post the histogram later today) is that winning teams typically do get a larger proportion of their runners home. Does this mean "clutch" hits? Well, it turns out that the single biggest correlate is doubles + triples. That makes sense: a double has a better chance both of scoring and of driving in runners than does a single or a walk. Almost as important is HR. We've taken out the run scored by the HR, but HR necessarily clear the bases, which increases scoring efficiency . It's a big drop in significance after that (from 10^11 to 10^7), but singles are the 3rd most important effect on scoring efficiency. So, getting the ("clutch?") singles helps: but far more frequently it is the extrabase hit that matters. (Adding 1 HR or 2B/3B has the same effect on scoring efficiency as does adding almost 3 singles.)
There is some evidence that baserunning matters, too: stolen bases increases scoring efficiency, too, albeit less than just getting another single does.
Well, at least through late April, anyway…..
EDIT Regarding the "sports radio" guys: One hears alltooften on baseball broadcasts & discussions of baseball that it's the team that makes the most of its opportunities that wins. However, what really seems to be happening is that the team that gets (makes or is given) the most opportunities wins. It is the difference between "get him on, get him over, get him in" and "get him on, get him on, keep the line moving, get him on...."
These do dovetail to an extent: the more hits you get, the more difficult it is to avoid getting hits & walks clustered together. Where people err, I think is in the assumption that all teams are just as good at "setting up" rallies and the good teams add the "clutch" hits to that. Where bad teams often fail is in failing to set up rallies: if you consistently get one or two outs before anybody reaches base, then you are going to get a lower proportion of runners home than when you consistently get guys on with none or one out. And if you are consistently getting singles rather than extra base hits, then you are going to need more hits to get any one baserunner home.
(Another way to look at this is in the proportion of 2out runs good offenses and bad offenses score; announcers love the clutch 2out RBI, but the good offenses get a lot more 0out RBI than do the bad offenses: conversely, bad pitchers give up a lot more 0out RBI than do the good ones, against whom it's often tough to string together 23 hits & walks before getting 2 outs.)
Posted 23 April 2012  10:14 AM
Well, it turns out that the single biggest correlate is doubles + triples.
I'm pretty sure when the Cubs' radio guys talk about LOB numbers, they are using them as a proxy for team failing to get runners off the bases at an equitable rate. Typically, at least in recent seasons, after one of them mentions the LOB number the other will often say something along the lines of "it's tough to get those runners home without some extra base hits."
I wonder if we are needing a new stat here, maybe something like Stranded Rate (LOB / Total baserunners). That one should correlate strongly with team SLG, and therefor with winning. But if it correlates strongly with SLG, then maybe we don't need the stat.
By the way, what software are you using for this work? I've worked with (but don't own) both Stata and SPSS. I've thought about grabbing R (which is free) for some of the more complex stuff, but I'm not sure I could spend the time learning it just yet.
Posted 23 April 2012  11:13 AM
And, yeah, your stranded rate (LOB/Total Baserunners) is the reciprocal (or close to the reciprocal) of my "Scoring Efficiency ([Runs  HR] / Total BaseRunners), assuming that you exclude HR from baserunners. The data for this year suggest that Keith Moreland is quite right: you need extra base hits to clear the bases. (Zonk is quietly pretty emphatic about this, and he has been ever since he started broadcasting; he was a big doubles guy as a player, so that might be part of it.) However, it also is consistent with the idea that a double scores more easily than a single, which affects either equation. Runscoring is a 4wheel drive vehicle, after all.
In the end, this might well be like BABiP: breaking down "success" rates for guys who get walks singles, doubles and triples independently might be useful. What are the traits of the teams that successfully score guys who get walks or singles, and what are the traits of the teams that successfully prevent those same guys from scoring? Unfortunately, the data that I'm collecting can only hint at that.
Posted 23 April 2012  12:06 PM
Posted 24 April 2012  09:08 AM
We all know that the team that scores more runs wins the game; however, in the long haul, teams that score more runs than they allow don't win every game (or even their fair share). For instance, the Astros have outscored their opponents 7667 this year, and are 611 to show for it. 17 games isn't a strong statistical sample, but it isn't weak, either: you'd expect that their "bad luck" will even out if they play at this level, and they'll be a .500 or better team.
There are a few ways to predict how well a team is going to do based on their runs, and I'll share 1 very bad one and two good ones..
1st, we have good old Line Percentage, or Winning Percentage. This is easy; we just take a team's current record and extrapolate it over 162 games. Basically, it assumes that each team will play exactly as their record indicates. This is a very bad indicator, however, and historically has proven to be.. not so good, for a variety of reasons.
2nd, we have pythagorean win percentage (yes, I know about Pythagenport, but that's even more complicated). Basically, this measures the runs you've scored against the runs you've allowed, and comes up with a pretty basic formula to give you an expected winning percentage (for those interested, the formula I use is RS^2/(RS^2+RA^2). What's very neat about this simple formula is it can tell you what teams seem to be lucky or unlucky (the other side of this coin? what team is gritty or not!). Historically, pythagorean WL records have come within 10 games of a teams actual record about 97% of the time, which is why people can say with some reasonability that luck can give or take around 10 wins per team over the course of a season (i.e an 81 win team may have actually have 71 or 91 win talent). However, it usually is much closer. After about 3035 games, this becomes a pretty excellent indicator, so we aren't there yet.
The last predictor (and my favorite) is runs per game regression. Basically, this takes the difference in runs scored and runs allowed per game and asks "how much is 1 run more scored over allowed worth, in terms of winning percentage?" This is a close relative of pythagorean win percentage, but I like it a lot because the regression makes a lot of sense. The numbers have basically shown that a team that scores exactly as much as it allows will go 8181. For each run per game they score over their opponents (this year), their winning percentage increases by 11.24%, or about 18 wins over the course of the year. What's so cool about this is that it tells you that if you can score 1 more run than you allow, on average, you'll win around 100 games (and if it goes the other way, you'll lose 100). For all the stats, that gives you an easy thing to shoot for!
What we all care about, though, is how these systems affect the Cubs.
WP% Projection: 41121
Pyth% Projection: 56106
Reg% Projection: 55107
Obviously these are all too low. I'd wait until around 35 games before I put a lot of store into them. Still, it's pretty neat!
Posted 24 April 2012  06:13 PM
For those who haven't seen box plots in a while, the boxes encompass 50% of the games in which a team had (say) 1 HR, 2 HR, etc. The "whiskers" encompass 75% of the data points. These are a little easier to read than scatter plots, where dozens of points are crammed so closely together (and on a line) that you cannot really appreciate their density. The individual dots are the 12.5% on either end of the distribution. I've included traditional linear regression lines, assuming dependency of the "success" frequency (proportion of nonHR hits+BB scoring) on the stat in question.
I've ordered them on strength of correlation. What also is telling is the slope: i.e., what added proportion of runners score as you add 1 HR, 1 double or triple, etc. Adding power really helps: and that makes sense, as it clears the bases and (in the case of a double or triple) leaves a guy close to home plate.
Stolen bases seem to help more than singles. However, "correlation" vs. "causation" comes to mind here: as singles increase, stolen bases increase. So, part of the slope for stolen bases is building upon the lower slope of singles. It's too early in the season (in terms of numbers of games) to break these down into the effect of 1, 2, 3, etc., steals when there are 1, 2, 3, etc., singles.
I'll post walks tomorrow. I do not want to apologize for Dusty Baker's silly lines, but I do wonder if this is what he saw. Basically, as you add walks, the proportion of runners that score barely increases. It makes sense that this would have the least effect: the runner gets only to first and other runners advance only one base. (Many singles advance guys 2 bases.) HOWEVER.... each walk is a runner, so even if adding 4 walks barely increases the probability of any one runner scoring, you still have 4 more chances to at that same probability. Walks might not clear the bases, and they add more to LOB than anything else, but they are still very, very good for your offense and very, very bad for your pitching.
EDIT: oh, note that these are individual team performances, not with regard to which team won. Teams with higher scoring efficiency do tend to win, but remember that teams with more HR, doubles+triples, etc., tend to win, so what's the horse and what's the cart (or the extent to which it's both) is not resolved by these data.
Posted 03 May 2012  02:05 PM
Well, the Cubs had ∆TB+BB of 13: and lost. That supplants the ChiSox's ∆TB+BB=11 loss to the A's from a week ago.
For those who are interested, there is absolutely no obvious pattern in the 7 games. The A's have won 2: but they are among the MLB leaders in games where they have given up 5+ more TB+BB, so they've had the greatest opportunity to "get lucky." No team has lost twice this way, and the losers include the Cards and the Rangers, both of whom are good teams. (I'm betting that, in the end, good teams will have more "got unlucky" loses simply because these teams will have many more games with a lot more TB+BB than the other team.
Posted 09 May 2012  05:07 PM
Posted 09 May 2012  05:09 PM
Posted 11 May 2012  08:38 AM
From a statistical point of view, we call this an outlier! Still, they are amusing....
Over Memorial Day weekend, I'm going to do some slightly more advanced breakdowns of how particular performances correlate not just with winning, but with the winning margin. That is, how much do HR, walks, K's, innings pitched by the starter, etc., contribute to winning by 1, 2, 3, etc. runs. Now, on one hand, a win is a win: but the vast majority of teams that make postseason routinely win by a comfortable margin. (Conversely, really bad teams get their collective butt kicked a lot.)
This could give us an idea of which teams have been "lucky" or "unlucky" so far this year, as well as which teams are (for good or ill) doing what we expect.
Posted 11 May 2012  08:42 AM
They also reached base via a HBP, but the runner was erased by a double play (so no LOB).This is more trivia than "how to win," but for people who hate stranded runners, the Orioles put on a clinic last night. They scored 6 runs on 5 hits and 1 walk. Yup: everybody scored. Of course, they hit 5 HR, one of which came after the walk. I've never seen anything like that in a game
Looking forward to your breakdown of margins!
Posted 11 May 2012  03:34 PM
They also reached base via a HBP, but the runner was erased by a double play (so no LOB).
This is more trivia than "how to win," but for people who hate stranded runners, the Orioles put on a clinic last night. They scored 6 runs on 5 hits and 1 walk. Yup: everybody scored. Of course, they hit 5 HR, one of which came after the walk. I've never seen anything like that in a game
Looking forward to your breakdown of margins!
D'oh! I missed that! Perfection is so elusive.......
