Sunday, November 25, 2012

Filling the Gap Between Runs Scored and RBI

Imagine the following scenario.  Tigers slow-footed catcher Alex Avila leads off an inning with a single and is removed for speedy pinch runner Quintin Berry.  Center fielder Austin Jackson then doubles Berry to third.  Finally, Berry scores on weak grounder by Andy Dirks.  This sequence goes into the books as a run scored for Berry and an RBI for Dirks, but Avila and Jackson who contributed important hits get no credit for the team scoring a run.

To the best of my knowledge, the kind of run participation by Jackson and Dirks described above has never been formally tracked like runs scored and RBI.  My goal is to track this run involvement for all players with the help of play-by-play data at Retrosheet.org.  I want to account for every single instance of a player helping to create a run, whether it be a run scored, run batted in or an indirect contribution for all games where play-by-play data are available.


Limitations of Runs Scored and RBI 


The above example illustrates that the runs scored and RBI statistics do not always give players the credit they deserve for participation in run scoring, but that is not their only limitation.  Many analysts eschew these metrics because they measure things that are, to some extent, out of control of the batter.  Unless a batter hits a home run or steals home, he needs teammates to help him score runs.  Even a relatively poor baserunner will score a lot of runs if he gets on base frequently and has good hitters behind him.  Who bats behind him in the line-up is as important as baserunning skill in determining how many runs a player will score. 


The RBI statistic has similar limitations to runs scored.  Unless he smacks a home run, a player needs teammates on base in order to drive in runs.  If a player has hitters batting in front of him who frequently get on base, then he is more likely to drive in runs than if he has weaker hitters setting him up.   Thus, a player on a good hitting team has more chances to drive in runs than a player on a poor hitting team.

A batter’s position in his line-up also influences his runs scored and RBI totals. For example, a leadoff hitter  usually has fewer opportunities to drive home runs than a clean-up hitter, since the generally weaker 7-8-9 hitters bat in front of him.  The RBI leaders at the end of a season are as likely to be the players with the most opportunities as the players most proficient at hitting with men on base.

Many mathematically-minded fans would like to see RBI and Runs become extinct in favor of statistics, such as on-base percentage, Weighted On-base Percentage (wOBA) and Batting Runs, which isolate a player's contribution from those of his teammates.  Despite the shortcomings of these measures however, most traditional fans still like the concreteness of runs scored and RBI.  Players like it too which is understandable.  A batter does not want to reach base to improve his on-base percentage, but rather to put himself in position to score a run.  Similarly, a batter up with a runner in scoring position is not focused on his slugging average.  He's thinking about driving in the run. 

The Origins of Runs and RBI

The runs scored and RBI statistics both have long histories. Shortly after Alexander Cartwright and the New York Knickerbockers established the first set of modern baseball rules, the first box score appeared in the New York Morning News on October 25, 1845.  The only statistics that were included in this box score were hands out (Today, they are simply called “outs”.) and runs for batters.  Some of the early baseball writers had ties to cricket, a relative of baseball, and early box scores reflected that association.  Hits that did not result in runs were not included because, in cricket, one either scores a point by reaching the opposite wicket or is out. 

The runs batted in statistic was recorded in newspapers in 1879 and 1880 and was an official statistic in the National League in 1891.  However, fans complained that the measure was unfair to leadoff batters and too dependent on opportunity and it was quickly dropped.  Ernie Lanigan, an important baseball statistician in the early 20th century, personally tracked runs batted in and included the statistic in New York Press box scores starting in 1907.  It became an official statistic again in 1920 under the name, “Runs Responsible For”.  The RBI statistic gradually gained acceptance and eventually became even more popular than the runs scored metric.

Runs Assisted 
 
Because of their extensive history and their popularity with fans, media and players, the runs scored and RBI metrics are not going to disappear as some in the sabermetric world would like.  I would argue that they really shouldn't be eliminated altogether even from the sabermetric community.  While they should not be used as overarching player evaluation measures, it is good to know how actual runs were scored along with how they theoretically should have been scored.

If one is going to use actual runs scored in any analysis of players though, it is a good idea to consider the entire run as opposed to the popular practice of just looking at RBI. To that end, I have created the Runs Assisted (or RAS to distinguish it from the pitching metric "Run Average") statistic which gives players credit for contributing to runs without a run scored or RBI.  Here are the ways a batter can get a Run Assisted: 
  • A batter advances a runner on first to either second or third with a single, double, base on balls, hit batsmen, error, sacrifice bunt, or another kind of out.  If that runner then scores, the batter who advanced him is given a Run Assisted. If the run scored on a triple or home run, a Run Assisted would not be credited, because the advancement would be unnecessary in scoring the run.
  • A batter advances a runner on second to third with a single, base on balls, hit batsmen, error, sacrifice bunt, or an other kind of out.  If that runner then scores, the batter who advanced him is given a Run Assisted.  If the run scored on a double, triple or home run, a Run Assisted would not be credited, because the advancement would be unnecessary in scoring the run.
  • A batter reaches base and is removed for a pinch runner or is replaced by another runner on a force out.  If the new runner then scores, the batter who originally reached base is given a Run Assisted.
The 2012 American League Runs Assisted Leaders are listed in Table 1 below.  Catcher Joe Mauer of the Twins led the league with 59 Runs Assisted.  Mauer's teammate Ben Revere was one of the more surprising names among the leaders.  Revere's runs scored (70) and RBI (32) would suggest that he was not very involved in team runs scored, but his 48 assists tell a different story.  My first thought was that he had a lot of sacrifice bunts, but he only had six, so that does not explain it.  In later analyses, I will look more at what kinds of players accumulate a lot of Runs Assisted.   

Table 1: AL Runs Assisted Leaders, 2012

Player
Team
PA
R
RBI
RAS
Joe Mauer
MIN
641
81
85
59
Elvis Andrus
TEX
711
85
62
51
Robinson Cano
NYA
697
105
94
50
Paul Konerko
CHA
598
66
75
48
Ben Revere
MIN
553
70
32
48
Billy Butler
KCA
678
72
107
47
Jason Kipnis
CLE
672
86
76
47
Asdrubal Cabrera
CLE
616
70
68
44
Alcides Escobar
KCA
648
68
52
44
Josh Willingham
MIN
615
85
110
42
Michael Brantley
CLE
609
63
60
41
Prince Fielder
DET
690
83
108
40
Miguel Cabrera
DET
697
109
139
39
Torii Hunter
ANA
584
81
92
38
David Murphy
TEX
521
65
61
38
Data Source: Retrosheet.org 

 
Runs Participated In

The addition of Runs Assisted allows us to expand the Runs Participated In (RPI) measure.  The current RPI definition is the number of runs to which a player made a direct contribution.  It is calculated by adding runs scored and RBI and then subtracting home runs:

   RPI = RS + RBI - HR

RPI was first introduced as runs produced in the 1950s by Sports Illustrated writer Bob Creamer but was more recently renamed RPI by Tom Tango.  If Boston Red Sox second baseman Dustin Pedroia doubles and then scores on a single by David Ortiz, neither player actually produces the run by himself.  Both participate in creating the run but neither is 100% responsible for producing the run.  Thus, the name “runs participated in” is more appropriate than "runs produced".  Home runs are subtracted in the RPI formula, so that a player does not get credit for two runs (an RBI and a run scored) when he only participated in one team run. 

Adding Runs Assisted to the RPI formula yields:

   RPI = RS + RBI + RAS - HR 

One might question whether a Run Assisted should count as much as a run scored or an RBI since it is more likely to also produce an out.  I would guess that a player getting an assist typically contributes less to the run than a player with a run scored or RBI, (although the opening example shows that is not always the case).  More complicated statistics involving linear weights are better for answering that question.  By definition, runs scored, RBI and Runs Assisted will count the same in the Runs Participated In measure..


Also, remember that RPI does not address the biases of runs scored and RBI (and RAS for that matter).  It is still the case that some players have more opportunities to contribute to runs based on their teammates and batting order position.  RPI is not a replacement for something like Batting Runs, but rather a simple alternative for those that prefer to look at actual runs scored.  


Keeping the above caveats in mind, the American League RPI Leaders are listed in Table 2 below.  AL MVP winner Miguel Cabrera led the league with 243 RPI, well ahead of Rangers slugger Josh Hamilton at 220.  Mauer, who would have finished 14th by the old definition of RPI, was 5th with 215.   


Table 2: AL Runs Participated In Leaders, 2012


Player
Team
PA
R
RBI
RAS
HR
RPI
Miguel Cabrera
DET
697
109
139
39
44
243
Josh Hamilton
TEX
636
103
128
32
43
220
Robinson Cano
NYA
697
105
94
50
33
216
Mike Trout
ANA
639
129
83
34
30
216
Joe Mauer
MIN
641
81
85
59
10
215
Josh Willingham
MIN
615
85
110
42
35
202
Prince Fielder
DET
690
83
108
40
30
201
Billy Butler
KCA
678
72
107
47
29
197
Albert Pujols
ANA
670
85
105
36
30
196
Curtis Granderson
NYA
684
102
106
31
43
196
Elvis Andrus
TEX
711
85
62
51
3
195
Jason Kipnis
CLE
672
86
76
47
14
195
Torii Hunter
ANA
584
81
92
38
16
195
Adrian Beltre
TEX
654
95
102
34
36
195
Edwin Encarnacion
TOR
644
93
110
32
42
193
  Data Source: Retrosheet.org

Now that they have been defined, other analyses can be done with RAS and RPI.  These statistics will probably be more useful over longer careers where noise created by team environment tends to be minimized.  Thus, I plan to go back to past years to determine career totals for players going back to 1950, the first year of complete Retrosheet data.  I'd also like to investigate whether certain types of players accumulate a lot of RAS and whether they do it consistently from year to year.  Correlations between RPI and numbers like Batting Runs would also be interesting.  Finally, I might attempt to create a simple rate statistic which somehow takes opportunities into consideration.  You can expect multiple articles on this topic throughout the winter. 

The information used here was obtained free of charge from and is copyrighted by Retrosheet.
Interested parties may contact Retrosheet at "www.retrosheet.org".

No comments:

Post a Comment

Twitter

Blog Archive

Subscribe

My Sabermetrics Book

My Sabermetrics Book
One of Baseball America's top ten books of 2010

Other Sabermetrics Books

Stat Counter