Major League Baseball has traditionally recorded reaching base on error as if it were an out in the official statistics. Their argument is that reaching base on an error is not the result of something a batter did well but rather rather is due to a fielding mistake. Thus, they say that the batter should get no positive credit at all. This practice is so ingrained into the record keeping process that it is very difficult to even find the Reached on Error (ROE) statistic.
Many have argued that some players have the ability, because of their speed, to force errors and that reaching base on fielder miscues is more like a hit than an out. Therefore, they believe that an error should not count in calculation of batting average and that it should count as reaching base in on base percentage. At the very least, they think it should be recorded separately from outs and more commonly reported.
The purpose here is to investigate whether an error is a random event or an event which some players are more likely to create than others. I looked at the
retrosheet play be play database from 2000-2008 and found ROE data for all players during that period. I considered all ground balls that did not result in hits as opportunities to reach base on error and counted the ROEs. I did not include balls hit in the air because it would be hard to argue that those errors were forced by the batter's speed. I calculated ROE percentage (ROE%) for each player by dividing ROE by opportunities. The MLB average ROE% was .034 (or 3.4%).
There were 281 players with 500 or more opportunities during that period and their ROE% ranged from .016 (Alex Cintron) to .065 (Rondell White). Considering statistical probability, the distribution of ROE% did not look like one that came from a random event. There were many more ROE% that were further above .034 than would be expected if reaching on error was a random event. The more mathematically inclined can see the math at the end of the post*.
The top ROE% from 2000-2008 are listed in Table 1 below. The first thing you might notice is that the list is not comprised of speedsters. There is no Juan Pierre or Ichiro Suzuki or other players who would come to mind when you think of batters who might force fielders to make errors. Rather, it looks to me like a random list of players with no distinguishing quality.
So, while reaching base on errors is probably not a random event, it also doesn't seem to be the result of speed. It could have something to do with the way the ball spins off a players bat, the ballpark infields or official scorers or something else. It's worth further investigation.
Table 2 lists the current Tigers. Gary Sheffield with his 4.6% is a player who reaches base more than would be expected if it were a random event.
Table 1 - ROE% for MLB players 2000-2008 Name | Opps | ROE | ROE% |
Rondell White | 883 | 57 | .065 |
Sammy Sosa | 806 | 49 | .061 |
Gabe Kapler | 578 | 35 | .061 |
Ty Wigginton | 637 | 38 | .060 |
Marlon Byrd | 506 | 30 | .059 |
Jeff Cirillo | 806 | 46 | .057 |
Joe Randa | 945 | 53 | .056 |
Tony Graffanino | 626 | 35 | .056 |
Aaron Boone | 751 | 40 | .053 |
Mike Cameron | 807 | 42 | .052 |
Tim Salmon | 528 | 27 | .051 |
Jeff Bagwell | 745 | 38 | .051 |
Craig Biggio | 1296 | 66 | .051 |
Reggie Sanders | 653 | 33 | .051 |
Jeff Kent | 970 | 49 | .051 |
Benito Santiago | 560 | 28 | .050 |
Table 2: ROE% in 2000-2008 for current Tigers
Name | Opps | ROE | ROE% |
Gary Sheffield | 1017 | 47 | .046 |
Adam Everett | 598 | 22 | .037 |
Carlos Guillen | 997 | 36 | .036 |
Placido Polanco | 1449 | 49 | .034 |
Magglio Ordonez | 1165 | 33 | .028 |
*
Math:There were 281 players with 500 or more opportunities to reach base on an error on a ground ball between 2000-2008. The population proportion (p) = .034. To test whether a player's ROE% differed significantly from a chance event, we can do a normal approximation of the binomial. The z-score is z = (roe% - p)/SE where SE (standard error) is SQRT (p(1-p)/n).
For example, Rondell white had 883 opportunities and a ROE% of .065. Thus,
SE=SQRT ((.034*.966)/883)) = .0061 and z=(.065-.034)/.0061 = 5.08.
Z-scores of 1.64 or above suggest that an event may not be not random. With 281 players, we would expect about 14 (or 5%) of the players to have z-scores above 1.64 and 3 (or 1%) to have z-scores about 1.96. Instead, we have 37 with z-scores above 1.64 and 16 with z-scores above 1.96. This leads me to believe that reaching base on error is not random (but not necessarily a skill either).
The information used here was obtained free of charge from and is copyrighted by
Retrosheet. Interested parties may contact Retrosheet at 20 Sunset Rd., Newark, DE
19711.