Monday, January 05, 2015

Filling The Gap Between RBI And Runs - 2014

Angels second baseman Howie Kendrick led the AL in Runs Assisted in 2014
(Photo Credit: James Squire/ Getty Images)

Imagine the following scenario.  Tigers slow-footed designated hitter Victor Martinez leads off an inning with a single and is removed for speedy pinch runner Rajai Davis.  Third baseman Nick Castellanos doubles Davis to third and Davis eventually scores on a weak grounder by Alex Avila. This sequence goes into the books as a run scored for Davis and an RBI for Avila, but Martinez and Castellanos get no credit for the team scoring a run despite contributing important hits.

To the best of my knowledge, this kind of run participation by Martinez and Hunter described above is not publicly tracked like runs scored and RBI.  My goal is to track this run involvement for all players with the help of play-by-play data at Retrosheet.org.  I want to account for every instance of a player helping to create a run, whether it be a run scored, run batted in or an indirect contribution for all games where play-by-play data are available.

Limitations of Runs Scored and RBI 
  
The above example illustrates that the runs scored and RBI statistics do not always give players the credit they deserve for participation in run scoring, but that is not their only limitation.  Many analysts eschew these metrics because they measure things that are, to some extent, out of control of the individual batter.  Unless a batter hits a home run or steals home, he needs teammates to help him score runs.  Even a relatively poor base runner will score a lot of runs if he gets on base frequently and has good hitters behind him.  Who bats behind him in the line-up is as important as base running skill in determining how many runs a player will score.  

The RBI statistic has similar limitations to runs scored.  Unless he smacks a home run, a player needs teammates on base in order to drive in runs.  If a player has hitters batting in front of him who frequently get on base, then he is more likely to drive in runs than if he has weaker hitters setting him up.   Thus, a player on a good hitting team has more chances to drive in runs than a player on a poor hitting team.


A batter’s position in his line-up also influences his runs scored and RBI totals. For example, a lead-off hitter  usually has fewer opportunities to drive home runs than a clean-up hitter, since the generally weaker 7-8-9 hitters bat in front of him.  The RBI leaders at the end of a season are as likely to be the players with the most opportunities as the players most proficient at hitting with men on base.

Many mathematically-minded fans would like to see RBI and Runs become extinct in favor of statistics, such as on-base percentage, Weighted On-base Percentage (wOBA) and Batting Runs, which isolate a player's contribution from those of his teammates.  Despite the shortcomings of these measures however, most traditional fans still like the concreteness of runs scored and RBI.  Players like it too which is understandable.  A batter does not want to reach base to improve his on-base percentage, but rather to put himself in position to score a run.  Moreover, a batter up with a runner in scoring position is not focused on his slugging average, but rather he is thinking about driving in the run.

The Origins of Runs and RBI

The runs scored and RBI statistics both have long histories. Shortly after Alexander Cartwright and the New York Knickerbockers established the first set of modern baseball rules, the first box score appeared in the New York Morning News on October 25, 1845.  The only statistics that were included in this box score were hands out (Today, they are simply called “outs”.) and runs for batters.  Some of the early baseball writers had ties to cricket, a relative of baseball, and early box scores reflected that association.  Hits that did not result in runs were not included because, in cricket, one either scores a point by reaching the opposite wicket or is out. 

The runs batted in statistic was recorded in newspapers in 1879 and 1880 and was an official statistic in the National League in 1891.  However, fans complained that the measure was unfair to leadoff batters and too dependent on opportunity and it was quickly dropped.  Ernie Lanigan, an important baseball statistician in the early 20th century, personally tracked runs batted in and included the statistic in New York Press box scores starting in 1907.  It became an official statistic again in 1920 under the name, “Runs Responsible For”.  The RBI statistic gradually gained acceptance and eventually became even more popular than the runs scored metric. 

Runs Assisted

Because of their extensive history and their popularity with fans, media and players, the runs scored and RBI metrics are not going to disappear as some in the sabermetric world would like.  I would argue that they really shouldn't be eliminated altogether even from the sabermetric community.  While they should not be used as overarching player evaluation measures, it is good to know how actual runs were scored along with how they theoretically should have been scored.

If one is going to use actual runs scored in any analysis of players though, it is a good idea to consider the entire run as opposed to the popular practice of just looking at RBI. To that end, the Runs Assisted (or RAS to distinguish it from the pitching metric "Run Average") statistic gives players credit for contributing to runs without a run scored or RBI.  Here are the ways a batter can get a Run Assisted:  
  • A batter advances a runner to either second or third with a hit, base on balls, hit batsmen, error, sacrifice bunt, or another kind of out.  If that runner then scores either during the same at bat or an ensuing at bat, the batter who advanced him is given a Run Assisted.
  • A batter reaches base and is removed for a pinch runner or is replaced by another runner on a force out.  If the new runner then scores, the batter who originally reached base is given a Run Assisted.
The 2014 American League Runs Assisted Leaders are listed in Table 1 below.  Angels second baseman Howie Kendrick led the league with 68 Runs Assisted.  Kendrick assisted runs on the following events:
  • 34 hits (H)
  • 11 walks (BB)
  • 1 hit batsman (HBP)
  • 3 times reached on errors (ROE)
  • 1 sacrifice bunt (SH)
  • 13 outs (OUT)
  • 3 Removed from bases due to force out or pinch runner and new runner scored (RR)
The leading Tigers were Victor Martinez (55), Ian Kinsler (47), and Torii Hunter (46).

Table 1: AL Runs Assisted Leaders, 2014

Player
Team
H
BB
HBP
ROE
SH
OUT
RR
RAS
Howard Kendrick
ANA
34
11
1
3
1
13
3
68
Erick Aybar
ANA
29
8
2
3
2
20
3
68
Josh Donaldson
OAK
21
12
2
1
0
18
1
56
Victor Martinez
DET
27
4
2
5
0
12
5
55
Dustin Pedroia
BOS
34
10
0
3
0
7
1
55
Jose Bautista
TOR
24
18
0
0
2
8
2
54
Brian Dozier
MIN
25
16
1
2
2
5
1
53
Jose Reyes
TOR
33
3
0
3
2
11
1
53
Melky Cabrera
TOR
29
8
1
3
2
7
2
52
Elvis Andrus
TEX
26
5
0
2
7
9
3
52
Evan Longoria
TBA
26
9
1
2
1
9
2
50
Alexei Ramirez
CHA
30
4
1
0
1
11
3
50
Brandon Moss
OAK
24
12
2
0
0
9
2
49
Ian Kinsler
DET
29
2
0
1
3
12
0
47
Torii Hunter
DET
19
3
3
0
0
20
1
46
The information used here was obtained free of charge from and is copyrighted by Retrosheet.

Runs Participated In 

The addition of Runs Assisted allows us to expand the Runs Participated In (RPI) measure.  The current RPI definition is the number of runs to which a player made a direct contribution.  It is calculated by adding runs scored and RBI and then subtracting home runs:

   RPI = RS + RBI - HR

RPI was first introduced as runs produced in the 1950s by Sports Illustrated writer Bob Creamer but was more recently renamed RPI by Tom Tango.  If  Kendrick doubles and then scores on a single by Erick Aybar, neither player actually produces the run by himself.  Both participate in creating the run but neither is 100% responsible for producing the run.  Thus, the name “runs participated in” is more appropriate than "runs produced".  Home runs are subtracted in the RPI formula, so that a player does not get credit for two runs (an RBI and a run scored) when he only participated in one team run. 

Adding Runs Assisted to the RPI formula yields:

   RPI = RS + RBI + RAS - HR 

One might question whether a Run Assisted should count as much as a run scored or an RBI since it is more likely to also produce an out.  I would guess that a player getting an assist typically contributes less to the run than a player with a run scored or RBI, (although the opening example shows that is not always the case).  More complicated statistics involving linear weights are better for answering that question.  By definition, runs scored, RBI and Runs Assisted will count the same in the Runs Participated In measure..


Also, remember that RPI does not address the biases of runs scored and RBI (and RAS for that matter).  It is still the case that some players have more opportunities to contribute to runs based on their teammates and batting order position.  RPI is not a replacement for something like Batting Runs, but rather a simple alternative for those that prefer to look at actual runs scored.   

Keeping the above caveats in mind, the American League RPI Leaders are listed in Table 2 below.  AL MVP winner Mike Trout led the league with 234 RPI. followed by Tigers slugger Miguel Cabrera (230) and Blue outfielder Jose Bautista (223).  Other Tigers among the leaders were Kinsler (222) and Victor Martinez (213).    

Table 2: AL Runs Participated In Leaders, 2014
Player
Team
PA
R
RBI
RAS
HR
RPI
Mike Trout
ANA
705
115
111
44
36
234
Miguel Cabrera
DET
685
101
109
45
25
230
Jose Bautista
TOR
673
101
103
54
35
223
Ian Kinsler
DET
726
100
92
47
17
222
Howard Kendrick
ANA
674
85
75
68
7
221
Josh Donaldson
OAK
695
93
98
56
29
218
Victor Martinez
DET
641
87
103
55
32
213
Brian Dozier
MIN
707
112
71
53
23
213
Michael Brantley
CLE
676
94
97
42
20
213
Albert Pujols
ANA
695
89
105
42
28
208
Erick Aybar
ANA
642
77
68
68
7
206
Evan Longoria
TBA
700
83
91
50
22
202
Adam Jones
BAL
682
88
96
44
29
199
Alexei Ramirez
CHA
657
82
74
50
15
191
Melky Cabrera
TOR
621
81
73
52
16
190
The information used here was obtained free of charge from and is copyrighted by Retrosheet.



8 comments:

  1. A nice stat and a fascinating chart, Lee. It occurs to me that it has a lot of the same issues as the old RBI and Runs stat though; you actually allude to this. If you play on a team with lots of good hitters in front of you and behind you, you are going to be involved in a lot of runs scored. Kinsler, it seems to me, did not have a particularly good offensive season, in particular a disappointing second half. He is among the leaders in Runs Assisted in large part because he had two likely future Hall-of-Famers batting behind him.

    ReplyDelete
  2. Love the post! Kinsler seems even more valuable to me now. How did you get the info from Retrosheet to the player? If you do it by hand, it'd take a long time. Do you use a formula or coding of some sort?

    ReplyDelete
  3. I'm not crazy/ patient enough to do it by hand! I programmed it in SAS.

    ReplyDelete
  4. What's SAS? Is it free? Because I'd like to do some things of this nature. Hopefully it's easier to use too because I'm not overly coding savvy.

    ReplyDelete
  5. SAS is very expensive. I get it for free because we use it at work. R is free. There is a book analyzing baseball data with R by Max Marchi and Jim Albert. If you don't have much experience programming though, starting with excel is a good idea.

    ReplyDelete
  6. Thanks man! Really love sabermetrics and want to start doing my own stuff. I might check out the book. I'm headed into college next fall and want a job in baseball

    ReplyDelete
  7. I'm thinking of getting your book too!

    ReplyDelete
  8. Good luck with college. You'll definitely want to learn as much computer programming as you can if you are considering a career in baseball analytics. If you have any other questions, let me know.

    ReplyDelete

Twitter

Blog Archive

Subscribe

My Sabermetrics Book

My Sabermetrics Book
One of Baseball America's top ten books of 2010

Other Sabermetrics Books

Stat Counter