Saturday, January 25, 2014

Filling the Gap Between Runs and RBI



 Red Sox second baseman Dustin Pedroia led the American League in Runs Assisted in 2013
(Photo credit: SIKids.com )

Imagine the following scenario.  Tigers slow-footed designated hitter Victor Martinez leads off an inning with a single and is removed for speedy pinch runner Rajai Davis.  Fight fielder Torii Hunter then doubles Davis to third and Davis eventually scores on a weak grounder by Andy Dirks.  This sequence goes into the books as a run scored for Davis and an RBI for Dirks, but Martinez and Hunter get no credit for the team scoring a run despite contributing important hits.

To the best of my knowledge, this kind of run participation by Martinez and Hunter described above is not publicly tracked like runs scored and RBI.  My goal is to track this run involvement for all players with the help of play-by-play data at Retrosheet.org.  I want to account for every instance of a player helping to create a run, whether it be a run scored, run batted in or an indirect contribution for all games where play-by-play data are available.

Limitations of Runs Scored and RBI 
 
The above example illustrates that the runs scored and RBI statistics do not always give players the credit they deserve for participation in run scoring, but that is not their only limitation.  Many analysts eschew these metrics because they measure things that are, to some extent, out of control of the individual batter.  Unless a batter hits a home run or steals home, he needs teammates to help him score runs.  Even a relatively poor base runner will score a lot of runs if he gets on base frequently and has good hitters behind him.  Who bats behind him in the line-up is as important as base running skill in determining how many runs a player will score.  

The RBI statistic has similar limitations to runs scored.  Unless he smacks a home run, a player needs teammates on base in order to drive in runs.  If a player has hitters batting in front of him who frequently get on base, then he is more likely to drive in runs than if he has weaker hitters setting him up.   Thus, a player on a good hitting team has more chances to drive in runs than a player on a poor hitting team.

A batter’s position in his line-up also influences his runs scored and RBI totals. For example, a lead-off hitter  usually has fewer opportunities to drive home runs than a clean-up hitter, since the generally weaker 7-8-9 hitters bat in front of him.  The RBI leaders at the end of a season are as likely to be the players with the most opportunities as the players most proficient at hitting with men on base.

Many mathematically-minded fans would like to see RBI and Runs become extinct in favor of statistics, such as on-base percentage, Weighted On-base Percentage (wOBA) and Batting Runs, which isolate a player's contribution from those of his teammates.  Despite the shortcomings of these measures however, most traditional fans still like the concreteness of runs scored and RBI.  Players like it too which is understandable.  A batter does not want to reach base to improve his on-base percentage, but rather to put himself in position to score a run.  Moreover, a batter up with a runner in scoring position is not focused on his slugging average, but rather he is thinking about driving in the run.

The Origins of Runs and RBI

The runs scored and RBI statistics both have long histories. Shortly after Alexander Cartwright and the New York Knickerbockers established the first set of modern baseball rules, the first box score appeared in the New York Morning News on October 25, 1845.  The only statistics that were included in this box score were hands out (Today, they are simply called “outs”.) and runs for batters.  Some of the early baseball writers had ties to cricket, a relative of baseball, and early box scores reflected that association.  Hits that did not result in runs were not included because, in cricket, one either scores a point by reaching the opposite wicket or is out. 

The runs batted in statistic was recorded in newspapers in 1879 and 1880 and was an official statistic in the National League in 1891.  However, fans complained that the measure was unfair to leadoff batters and too dependent on opportunity and it was quickly dropped.  Ernie Lanigan, an important baseball statistician in the early 20th century, personally tracked runs batted in and included the statistic in New York Press box scores starting in 1907.  It became an official statistic again in 1920 under the name, “Runs Responsible For”.  The RBI statistic gradually gained acceptance and eventually became even more popular than the runs scored metric. 

Runs Assisted 

Because of their extensive history and their popularity with fans, media and players, the runs scored and RBI metrics are not going to disappear as some in the sabermetric world would like.  I would argue that they really shouldn't be eliminated altogether even from the sabermetric community.  While they should not be used as overarching player evaluation measures, it is good to know how actual runs were scored along with how they theoretically should have been scored.

If one is going to use actual runs scored in any analysis of players though, it is a good idea to consider the entire run as opposed to the popular practice of just looking at RBI. To that end, I have created the Runs Assisted (or RAS to distinguish it from the pitching metric "Run Average") statistic which gives players credit for contributing to runs without a run scored or RBI.  Here are the ways a batter can get a Run Assisted:  
  • A batter advances a runner to either second or third with a hit, base on balls, hit batsmen, error, sacrifice bunt, or another kind of out.  If that runner then scores either during the same at bat or an ensuing at bat, the batter who advanced him is given a Run Assisted.
  • A batter reaches base and is removed for a pinch runner or is replaced by another runner on a force out.  If the new runner then scores, the batter who originally reached base is given a Run Assisted.
The 2012 American League Runs Assisted Leaders are listed in Table 1 below.  Red Sox second baseman Dustin Pedroia led the league with 70 Runs Assisted.  Pedroia assisted runs on the following events:
  • 40 hits (H)
  • 11 walks (BB)
  • 1 hit batsman (HBP)
  • 3 times reached on errors (ROE)
  • 0 sacrifice bunts (SH)
  • 14 outs (OUT)
  • 1 Removed from bases due to force out or pinch runner and new runner scored (RR)
The leading Tigers was Hunter (60), Victor Martinez (60) and Miguel Cabrera (54).

Table 1: AL Runs Assisted Leaders, 2013
Player
Team
H
BB
HBP
ROE
SH
OUT
RR
RAS
Dustin Pedroia
BOS
40
11
1
3
0
14
1
70
Elvis Andrus
TEX
29
7
3
3
8
12
2
64
Carlos Santana
CLE
31
14
0
3
0
9
5
62
Nick Markakis
BAL
30
9
0
1
0
18
2
60
Torii Hunter
DET
36
2
1
4
3
13
1
60
Victor Martinez
DET
31
9
0
1
0
9
10
60
Evan Longoria
TBA
35
12
0
2
0
8
3
60
Eric Hosmer
KCA
33
9
0
1
0
12
3
58
Ben Zobrist
TBA
26
8
1
2
1
16
4
58
Shane Victorino
BOS
26
1
3
2
10
11
2
55
Miguel Cabrera
DET
17
24
0
1
0
9
3
54
Billy Butler
KCA
23
10
0
1
0
9
11
54
Josh Donaldson
OAK
36
7
1
1
0
6
2
53
Erick Aybar
ANA
26
3
1
2
7
10
3
52
Mike Trout
ANA
21
17
2
3
0
6
2
52
  Data Source: Retrosheet.org 

Runs Participated In


The addition of Runs Assisted allows us to expand the Runs Participated In (RPI) measure.  The current RPI definition is the number of runs to which a player made a direct contribution.  It is calculated by adding runs scored and RBI and then subtracting home runs:

   RPI = RS + RBI - HR

RPI was first introduced as runs produced in the 1950s by Sports Illustrated writer Bob Creamer but was more recently renamed RPI by Tom Tango.  If  Pedroia doubles and then scores on a single by David Ortiz, neither player actually produces the run by himself.  Both participate in creating the run but neither is 100% responsible for producing the run.  Thus, the name “runs participated in” is more appropriate than "runs produced".  Home runs are subtracted in the RPI formula, so that a player does not get credit for two runs (an RBI and a run scored) when he only participated in one team run. 

Adding Runs Assisted to the RPI formula yields:

   RPI = RS + RBI + RAS - HR 

One might question whether a Run Assisted should count as much as a run scored or an RBI since it is more likely to also produce an out.  I would guess that a player getting an assist typically contributes less to the run than a player with a run scored or RBI, (although the opening example shows that is not always the case).  More complicated statistics involving linear weights are better for answering that question.  By definition, runs scored, RBI and Runs Assisted will count the same in the Runs Participated In measure..


Also, remember that RPI does not address the biases of runs scored and RBI (and RAS for that matter).  It is still the case that some players have more opportunities to contribute to runs based on their teammates and batting order position.  RPI is not a replacement for something like Batting Runs, but rather a simple alternative for those that prefer to look at actual runs scored.  


Keeping the above caveats in mind, the American League RPI Leaders are listed in Table 2 below.  AL MVP winner Miguel Cabrera topped the league with 250 RPI (He led with 243 in 2012), followed by Pedroia (236) and Angels out fielder Mike Trout (231).  Other Tigers among the leaders were Hunter (217) and Prince Fielder (215).   


Table 2: AL Runs Participated In Leaders, 2012

Player
Team
PA
R
RBI
RAS
HR
RPI
Miguel Cabrera
DET
651
103
137
54
44
250
Dustin Pedroia
BOS
724
91
84
70
9
236
Mike Trout
ANA
716
109
97
52
27
231
Adam Jones
BAL
689
100
108
51
33
226
Chris Davis
BAL
673
103
138
34
53
222
Elvis Andrus
TEX
698
91
67
64
4
218
Torii Hunter
DET
652
90
84
60
17
217
Prince Fielder
DET
712
82
106
52
25
215
Josh Donaldson
OAK
668
89
93
53
24
211
Robinson Cano
NYA
681
81
107
47
27
208
Evan Longoria
TBA
693
91
88
60
32
207
Eric Hosmer
KCA
680
86
79
58
17
206
David Ortiz
BOS
600
84
103
48
30
205
Jason Kipnis
CLE
658
86
84
50
17
203
Nick Markakis
BAL
700
89
59
60
10
198


The information used here was obtained free of charge from and is copyrighted by Retrosheet.
Interested parties may contact Retrosheet at "www.retrosheet.org".

2 comments:

  1. Do you know of any stat or metric that follows runner advancement and the like, even if no run scores?

    Here is what I am thinking. If there is no one on, there are 4 potential bases that could be gotten. If there is a runner on second, let's say, then there are 6 -- 4 for the batter and 2 for the runner on second. If there is a sac fly and the runner goes to third, there is an advancement of 1 base. If there is a single and the runner scores, there are three bases advanced... if he only goes to third, there are two.

    Or, if there is a runner on first and the batter hits into a fielder choice, there's no net bases advanced. It seems like if there was an easy way to totally these up, you could come up with a rate like unto slugging percentage. Every at bat would have somewhere between 4 and 10 total possible....

    Or is there a better measure that you know of that deals with the idea of "productive outs" and the like? (By the by, I like your formula - simple and quick and informative)

    ReplyDelete
  2. Eric,

    I did some work on runner advancement last year: http://www.detroittigertales.com/2013/01/which-players-were-best-at-advancing.html
    It's not exactly what you are looking for, but it's closest I know of.

    Lee

    ReplyDelete

Twitter

Blog Archive

Subscribe

My Sabermetrics Book

My Sabermetrics Book
One of Baseball America's top ten books of 2010

Other Sabermetrics Books

Stat Counter