Jump to content





Bleacher Nation is on Facebook, and you should totally "Like" us:
 


Bleacher Nation is also on Twitter, and you should totally follow us:




Upcoming Calendar Events

There are no forthcoming calendar events

Today's birthdays

No members are celebrating a birthday today

Photo

Breaking down winning in 2012


  • Please log in to reply
27 replies to this topic

#16 SirCub

SirCub

    Bleacher Hero

  • Members
  • PipPipPip
  • 908 posts
  • LocationCarrboro, NC

Posted 23 April 2012 - 06:50 AM

Hey, Doc, this is pretty cool and all, but I'm not a fan of that LOB graph. Its misleading to say "The team that gets the most guys on base who fail to score still wins 60% of the time, therefore those radio guys are wrong", cause that number isnt reflecting left on base numbers, but rather total runners reaching base. The team that is leaving more men on base is winning because they're simply getting more men on base.

Also, those "radio guys" always use LOB as a proxy for "squandering chances", and the general consensus is that if you fail to plate guys in a large amount of your chances, you're going to lose. The data you've provided doesn't show that sentiment to be wrong. I think a much more honest graph would be a dLOB% (LOB/Total Baserunners) graph

Maybe "dLOB%" would be more "honest," but is that the stat that the "radio guys" use? (whoa, too many quotations) No, they simply cite LOB, and say how it is a bad thing, which Doc clearly demonstrates hear that it really isn't all that bad, and that in fact, it can be used as a proxy for how many men you get on base (a good thing). And, if we asked nicely, I'm sure Doc would show us a graph that probably shows that "dLOB%" doesn't vary much between winning and losing teams, at least not as much as some of the other important numbers he listed above. (HR, XBH, BB, etc.) That would be my hypothesis anyways...

#17 DocPeterWimsey

DocPeterWimsey

    Bleacher Bum

  • Members
  • PipPip
  • 145 posts

Posted 23 April 2012 - 09:56 AM

My doctorate is in paleontology. I'm one of the "statheads" there: much like baseball, there was dichotomy between the "old school" (look at the rocks and fossil really hard to get the answers) and "new school" (crunch a lot of numbers). The new school one a while ago. (U. Chicago was and is a big leader in statistical paleontology.)

Regarding the outliers, it was the A's who somehow beat the Angels despite the Angels getting so many more total bases and base-on-balls. As they say, "stuff happens." We'll need at least a season's worth of data, but I can see two viable hypotheses here. There is the "bad teams find a way to lose" hypothesis: i.e., a bad team will throw away a game despite getting so many more hits and walks due to errors, lack of clutch hits, etc. Alternatively, there is the "good teams will have bad luck, too" hypothesis: i.e., it will be the good teams that consistently get more TB+BB than the opposition, and thus have the most opportunities to accidentally lose one of those games.


Finally, Luke and others asked about the proportion of baserunners who fail to score. Winning teams tend to leave more runners on base than do losing teams. However, a team that scores 4 runs on 10 hits/walks was more efficient than a team that scores 2 runs on 6 hits/walks despite leaving 2 more men on base (40% vs. 33%).

However, I think that we need to factor out homers to get at what people really want to know: are winning teams better at getting men on base home than losing teams? In the example above, say that the winning team hit 1 HR and the losing team hit 0. That means that Team A got 3 runs on 9 non-homers & walks, whereas Team B got 2 runs on 6 non-homers/walks. Now, both are 33% again.

What we really find (and I'll try to post the histogram later today) is that winning teams typically do get a larger proportion of their runners home. Does this mean "clutch" hits? Well, it turns out that the single biggest correlate is doubles + triples. That makes sense: a double has a better chance both of scoring and of driving in runners than does a single or a walk. Almost as important is HR. We've taken out the run scored by the HR, but HR necessarily clear the bases, which increases scoring efficiency . It's a big drop in significance after that (from 10^-11 to 10^-7), but singles are the 3rd most important effect on scoring efficiency. So, getting the ("clutch?") singles helps: but far more frequently it is the extra-base hit that matters. (Adding 1 HR or 2B/3B has the same effect on scoring efficiency as does adding almost 3 singles.)

There is some evidence that base-running matters, too: stolen bases increases scoring efficiency, too, albeit less than just getting another single does.


Well, at least through late April, anyway…..


EDIT Regarding the "sports radio" guys: One hears all-to-often on baseball broadcasts & discussions of baseball that it's the team that makes the most of its opportunities that wins. However, what really seems to be happening is that the team that gets (makes or is given) the most opportunities wins. It is the difference between "get him on, get him over, get him in" and "get him on, get him on, keep the line moving, get him on...."

These do dovetail to an extent: the more hits you get, the more difficult it is to avoid getting hits & walks clustered together. Where people err, I think is in the assumption that all teams are just as good at "setting up" rallies and the good teams add the "clutch" hits to that. Where bad teams often fail is in failing to set up rallies: if you consistently get one or two outs before anybody reaches base, then you are going to get a lower proportion of runners home than when you consistently get guys on with none or one out. And if you are consistently getting singles rather than extra base hits, then you are going to need more hits to get any one baserunner home.

(Another way to look at this is in the proportion of 2-out runs good offenses and bad offenses score; announcers love the clutch 2-out RBI, but the good offenses get a lot more 0-out RBI than do the bad offenses: conversely, bad pitchers give up a lot more 0-out RBI than do the good ones, against whom it's often tough to string together 2-3 hits & walks before getting 2 outs.)
Gods don't play dice with the universe, they are the dice of the universe: our job is to figure out how many sides and dice!

#18 Luke

Luke

    Bleacher Hero

  • Members
  • PipPipPip
  • 1,088 posts
  • Twitter:@ltblaize
  • LocationMaryland

Posted 23 April 2012 - 10:14 AM

Well, it turns out that the single biggest correlate is doubles + triples.


I'm pretty sure when the Cubs' radio guys talk about LOB numbers, they are using them as a proxy for team failing to get runners off the bases at an equitable rate. Typically, at least in recent seasons, after one of them mentions the LOB number the other will often say something along the lines of "it's tough to get those runners home without some extra base hits."

I wonder if we are needing a new stat here, maybe something like Stranded Rate (LOB / Total baserunners). That one should correlate strongly with team SLG, and therefor with winning. But if it correlates strongly with SLG, then maybe we don't need the stat.

By the way, what software are you using for this work? I've worked with (but don't own) both Stata and SPSS. I've thought about grabbing R (which is free) for some of the more complex stuff, but I'm not sure I could spend the time learning it just yet.

#19 DocPeterWimsey

DocPeterWimsey

    Bleacher Bum

  • Members
  • PipPip
  • 145 posts

Posted 23 April 2012 - 11:13 AM

I use R or a very old (but still unparalleled) stats program called StatView. (It was bought out by JMP just to eliminate it.) When I do real number crunching, I tend to use programs I've written in C, although my former students keep telling me to get with the times and rewrite it in R.

And, yeah, your stranded rate (LOB/Total Baserunners) is the reciprocal (or close to the reciprocal) of my "Scoring Efficiency ([Runs - HR] / Total BaseRunners), assuming that you exclude HR from baserunners. The data for this year suggest that Keith Moreland is quite right: you need extra base hits to clear the bases. (Zonk is quietly pretty emphatic about this, and he has been ever since he started broadcasting; he was a big doubles guy as a player, so that might be part of it.) However, it also is consistent with the idea that a double scores more easily than a single, which affects either equation. Run-scoring is a 4-wheel drive vehicle, after all.

In the end, this might well be like BABiP: breaking down "success" rates for guys who get walks singles, doubles and triples independently might be useful. What are the traits of the teams that successfully score guys who get walks or singles, and what are the traits of the teams that successfully prevent those same guys from scoring? Unfortunately, the data that I'm collecting can only hint at that.
Gods don't play dice with the universe, they are the dice of the universe: our job is to figure out how many sides and dice!

#20 MichiganGoat

MichiganGoat

    Give me a BEER

  • Moderators
  • 3,799 posts
  • Twitter:MichiganGoat
  • Facebook:michigangoat
  • LocationGrand Rapids, MI

Posted 23 April 2012 - 12:06 PM

My brain is growing this is great.

MichiganGoat on Twitter

"There are a lot of guys who are respected but not liked" - Ron Santo


#21 Myles

Myles

    Bleacher Bum

  • Members
  • PipPip
  • 20 posts
  • LocationIndianapolis

Posted 24 April 2012 - 09:08 AM

I've got a companion piece to this that looks at the data a little differently.

We all know that the team that scores more runs wins the game; however, in the long haul, teams that score more runs than they allow don't win every game (or even their fair share). For instance, the Astros have outscored their opponents 76-67 this year, and are 6-11 to show for it. 17 games isn't a strong statistical sample, but it isn't weak, either: you'd expect that their "bad luck" will even out if they play at this level, and they'll be a .500 or better team.

There are a few ways to predict how well a team is going to do based on their runs, and I'll share 1 very bad one and two good ones..

1st, we have good old Line Percentage, or Winning Percentage. This is easy; we just take a team's current record and extrapolate it over 162 games. Basically, it assumes that each team will play exactly as their record indicates. This is a very bad indicator, however, and historically has proven to be.. not so good, for a variety of reasons.

2nd, we have pythagorean win percentage (yes, I know about Pythagenport, but that's even more complicated). Basically, this measures the runs you've scored against the runs you've allowed, and comes up with a pretty basic formula to give you an expected winning percentage (for those interested, the formula I use is RS^2/(RS^2+RA^2). What's very neat about this simple formula is it can tell you what teams seem to be lucky or unlucky (the other side of this coin? what team is gritty or not!). Historically, pythagorean W-L records have come within 10 games of a teams actual record about 97% of the time, which is why people can say with some reasonability that luck can give or take around 10 wins per team over the course of a season (i.e an 81 win team may have actually have 71 or 91 win talent). However, it usually is much closer. After about 30-35 games, this becomes a pretty excellent indicator, so we aren't there yet.

The last predictor (and my favorite) is runs per game regression. Basically, this takes the difference in runs scored and runs allowed per game and asks "how much is 1 run more scored over allowed worth, in terms of winning percentage?" This is a close relative of pythagorean win percentage, but I like it a lot because the regression makes a lot of sense. The numbers have basically shown that a team that scores exactly as much as it allows will go 81-81. For each run per game they score over their opponents (this year), their winning percentage increases by 11.24%, or about 18 wins over the course of the year. What's so cool about this is that it tells you that if you can score 1 more run than you allow, on average, you'll win around 100 games (and if it goes the other way, you'll lose 100). For all the stats, that gives you an easy thing to shoot for!

What we all care about, though, is how these systems affect the Cubs.

WP% Projection: 41-121
Pyth% Projection: 56-106
Reg% Projection: 55-107

Obviously these are all too low. I'd wait until around 35 games before I put a lot of store into them. Still, it's pretty neat!

#22 DocPeterWimsey

DocPeterWimsey

    Bleacher Bum

  • Members
  • PipPip
  • 145 posts

Posted 24 April 2012 - 06:13 PM

On the topic of scoring efficiency, here is a breakdown of how proportion not left on base corresponds to four stats: doubles+triples, homers, steals and singles.
Posted Image
For those who haven't seen box plots in a while, the boxes encompass 50% of the games in which a team had (say) 1 HR, 2 HR, etc. The "whiskers" encompass 75% of the data points. These are a little easier to read than scatter plots, where dozens of points are crammed so closely together (and on a line) that you cannot really appreciate their density. The individual dots are the 12.5% on either end of the distribution. I've included traditional linear regression lines, assuming dependency of the "success" frequency (proportion of non-HR hits+BB scoring) on the stat in question.

I've ordered them on strength of correlation. What also is telling is the slope: i.e., what added proportion of runners score as you add 1 HR, 1 double or triple, etc. Adding power really helps: and that makes sense, as it clears the bases and (in the case of a double or triple) leaves a guy close to home plate.

Stolen bases seem to help more than singles. However, "correlation" vs. "causation" comes to mind here: as singles increase, stolen bases increase. So, part of the slope for stolen bases is building upon the lower slope of singles. It's too early in the season (in terms of numbers of games) to break these down into the effect of 1, 2, 3, etc., steals when there are 1, 2, 3, etc., singles.

I'll post walks tomorrow. I do not want to apologize for Dusty Baker's silly lines, but I do wonder if this is what he saw. Basically, as you add walks, the proportion of runners that score barely increases. It makes sense that this would have the least effect: the runner gets only to first and other runners advance only one base. (Many singles advance guys 2 bases.) HOWEVER.... each walk is a runner, so even if adding 4 walks barely increases the probability of any one runner scoring, you still have 4 more chances to at that same probability. Walks might not clear the bases, and they add more to LOB than anything else, but they are still very, very good for your offense and very, very bad for your pitching.


EDIT: oh, note that these are individual team performances, not with regard to which team won. Teams with higher scoring efficiency do tend to win, but remember that teams with more HR, doubles+triples, etc., tend to win, so what's the horse and what's the cart (or the extent to which it's both) is not resolved by these data.
Gods don't play dice with the universe, they are the dice of the universe: our job is to figure out how many sides and dice!

#23 DocPeterWimsey

DocPeterWimsey

    Bleacher Bum

  • Members
  • PipPip
  • 145 posts

Posted 03 May 2012 - 02:05 PM

I know that Cubs fans (really, all fans) feel that their team corners the market on games that they should have won. Well, the Cubs really have been fairly unexceptional this year. However, today's game (we'll call it the Marmol "Get Away Day? I thought it was Give Away Day!" Game) stands out. I noted above that Total Bases + Walks is a great predictor not just of who wins, but by how much. Usually, once you get past 7 or so, the game isn't close. Moreover, when Team A gets 7 more TB+BB than the Team B, Team A is a whomping 175-7.

Well, the Cubs had ∆TB+BB of 13: and lost. That supplants the ChiSox's ∆TB+BB=11 loss to the A's from a week ago.

For those who are interested, there is absolutely no obvious pattern in the 7 games. The A's have won 2: but they are among the MLB leaders in games where they have given up 5+ more TB+BB, so they've had the greatest opportunity to "get lucky." No team has lost twice this way, and the losers include the Cards and the Rangers, both of whom are good teams. (I'm betting that, in the end, good teams will have more "got unlucky" loses simply because these teams will have many more games with a lot more TB+BB than the other team.
Gods don't play dice with the universe, they are the dice of the universe: our job is to figure out how many sides and dice!

#24 MichiganGoat

MichiganGoat

    Give me a BEER

  • Moderators
  • 3,799 posts
  • Twitter:MichiganGoat
  • Facebook:michigangoat
  • LocationGrand Rapids, MI

Posted 09 May 2012 - 05:07 PM

My head ***BOOM*** Doc this amazing stuff.

MichiganGoat on Twitter

"There are a lot of guys who are respected but not liked" - Ron Santo


#25 MichiganGoat

MichiganGoat

    Give me a BEER

  • Moderators
  • 3,799 posts
  • Twitter:MichiganGoat
  • Facebook:michigangoat
  • LocationGrand Rapids, MI

Posted 09 May 2012 - 05:09 PM

I feel rather stoopid right now, but I love to learn. I've been having to some statistical analysis of student engagement and my head already hurts, but this is a nice distraction and will help my "project" at work.

MichiganGoat on Twitter

"There are a lot of guys who are respected but not liked" - Ron Santo


#26 DocPeterWimsey

DocPeterWimsey

    Bleacher Bum

  • Members
  • PipPip
  • 145 posts

Posted 11 May 2012 - 08:38 AM

This is more trivia than "how to win," but for people who hate stranded runners, the Orioles put on a clinic last night. They scored 6 runs on 5 hits and 1 walk. Yup: everybody scored. Of course, they hit 5 HR, one of which came after the walk. I've never seen anything like that when a team scored so many runs! (For all of this power, the Orioles managed only 4 more total bases than the Rangers and only one more extra base hit; so, it turned out to be 1-run victory for the O's!)

From a statistical point of view, we call this an outlier! B) Still, they are amusing....

Over Memorial Day weekend, I'm going to do some slightly more advanced breakdowns of how particular performances correlate not just with winning, but with the winning margin. That is, how much do HR, walks, K's, innings pitched by the starter, etc., contribute to winning by 1, 2, 3, etc. runs. Now, on one hand, a win is a win: but the vast majority of teams that make post-season routinely win by a comfortable margin. (Conversely, really bad teams get their collective butt kicked a lot.)

This could give us an idea of which teams have been "lucky" or "unlucky" so far this year, as well as which teams are (for good or ill) doing what we expect.
Gods don't play dice with the universe, they are the dice of the universe: our job is to figure out how many sides and dice!

#27 SirCub

SirCub

    Bleacher Hero

  • Members
  • PipPipPip
  • 908 posts
  • LocationCarrboro, NC

Posted 11 May 2012 - 08:42 AM

This is more trivia than "how to win," but for people who hate stranded runners, the Orioles put on a clinic last night. They scored 6 runs on 5 hits and 1 walk. Yup: everybody scored. Of course, they hit 5 HR, one of which came after the walk. I've never seen anything like that in a game

They also reached base via a HBP, but the runner was erased by a double play (so no LOB).

Looking forward to your breakdown of margins!

#28 DocPeterWimsey

DocPeterWimsey

    Bleacher Bum

  • Members
  • PipPip
  • 145 posts

Posted 11 May 2012 - 03:34 PM


This is more trivia than "how to win," but for people who hate stranded runners, the Orioles put on a clinic last night. They scored 6 runs on 5 hits and 1 walk. Yup: everybody scored. Of course, they hit 5 HR, one of which came after the walk. I've never seen anything like that in a game

They also reached base via a HBP, but the runner was erased by a double play (so no LOB).

Looking forward to your breakdown of margins!


D'oh! I missed that! Perfection is so elusive.......
Gods don't play dice with the universe, they are the dice of the universe: our job is to figure out how many sides and dice!




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users

Bleacher Nation is not affiliated in any way with Major League Baseball or the Chicago National League Ballclub (that's the Cubs).