Sunday, February 15, 2009

Is reaching base on errors a skill?

Major League Baseball has traditionally recorded reaching base on error as if it were an out in the official statistics. Their argument is that reaching base on an error is not the result of something a batter did well but rather rather is due to a fielding mistake. Thus, they say that the batter should get no positive credit at all. This practice is so ingrained into the record keeping process that it is very difficult to even find the Reached on Error (ROE) statistic.

Many have argued that some players have the ability, because of their speed, to force errors and that reaching base on fielder miscues is more like a hit than an out. Therefore, they believe that an error should not count in calculation of batting average and that it should count as reaching base in on base percentage. At the very least, they think it should be recorded separately from outs and more commonly reported.

The purpose here is to investigate whether an error is a random event or an event which some players are more likely to create than others. I looked at the retrosheet play be play database from 2000-2008 and found ROE data for all players during that period. I considered all ground balls that did not result in hits as opportunities to reach base on error and counted the ROEs. I did not include balls hit in the air because it would be hard to argue that those errors were forced by the batter's speed. I calculated ROE percentage (ROE%) for each player by dividing ROE by opportunities. The MLB average ROE% was .034 (or 3.4%).

There were 281 players with 500 or more opportunities during that period and their ROE% ranged from .016 (Alex Cintron) to .065 (Rondell White). Considering statistical probability, the distribution of ROE% did not look like one that came from a random event. There were many more ROE% that were further above .034 than would be expected if reaching on error was a random event. The more mathematically inclined can see the math at the end of the post*.

The top ROE% from 2000-2008 are listed in Table 1 below. The first thing you might notice is that the list is not comprised of speedsters. There is no Juan Pierre or Ichiro Suzuki or other players who would come to mind when you think of batters who might force fielders to make errors. Rather, it looks to me like a random list of players with no distinguishing quality.

So, while reaching base on errors is probably not a random event, it also doesn't seem to be the result of speed. It could have something to do with the way the ball spins off a players bat, the ballpark infields or official scorers or something else. It's worth further investigation.

Table 2 lists the current Tigers. Gary Sheffield with his 4.6% is a player who reaches base more than would be expected if it were a random event.

Table 1 - ROE% for MLB players 2000-2008

Name

Opps

ROE

ROE%

Rondell White

883

57

.065

Sammy Sosa

806

49

.061

Gabe Kapler

578

35

.061

Ty Wigginton

637

38

.060

Marlon Byrd

506

30

.059

Jeff Cirillo

806

46

.057

Joe Randa

945

53

.056

Tony Graffanino

626

35

.056

Aaron Boone

751

40

.053

Mike Cameron

807

42

.052

Tim Salmon

528

27

.051

Jeff Bagwell

745

38

.051

Craig Biggio

1296

66

.051

Reggie Sanders

653

33

.051

Jeff Kent

970

49

.051

Benito Santiago

560

28

.050



Table 2: ROE% in 2000-2008 for current Tigers

Name

Opps

ROE

ROE%

Gary Sheffield

1017

47

.046

Adam Everett

598

22

.037

Carlos Guillen

997

36

.036

Placido Polanco

1449

49

.034

Magglio Ordonez

1165

33

.028



*Math:

There were 281 players with 500 or more opportunities to reach base on an error on a ground ball between 2000-2008. The population proportion (p) = .034. To test whether a player's ROE% differed significantly from a chance event, we can do a normal approximation of the binomial. The z-score is z = (roe% - p)/SE where SE (standard error) is SQRT (p(1-p)/n).

For example, Rondell white had 883 opportunities and a ROE% of .065. Thus,
SE=SQRT ((.034*.966)/883)) = .0061 and z=(.065-.034)/.0061 = 5.08.

Z-scores of 1.64 or above suggest that an event may not be not random. With 281 players, we would expect about 14 (or 5%) of the players to have z-scores above 1.64 and 3 (or 1%) to have z-scores about 1.96. Instead, we have 37 with z-scores above 1.64 and 16 with z-scores above 1.96. This leads me to believe that reaching base on error is not random (but not necessarily a skill either).

The information used here was obtained free of charge from and is copyrighted by
Retrosheet. Interested parties may contact Retrosheet at 20 Sunset Rd., Newark, DE
19711.

7 comments:

  1. Lee, Your TigerTalesBlog is always informative with these types of stats. Is it possible to add the Tigers players in this time frame?

    ReplyDelete
  2. I just added the five current Tigers who had enough opportunities to qualify. Sheffield has a high percentage.

    ReplyDelete
  3. Batters who get lots of at bats are skewed toward the high end of ROE%. Speed does not seem to be the cause.

    So, while the effect is real, I'm having trouble with the explanation that it is a hitter skill. So I'm trying to figure how it could have something to do with the fact that these batters get a large number of at bats.

    I was looking for a distribution of errors by inning. I thought that if in general teams made more errors early, or late in a game it might be a place to start to look for an explanation. I couldn't find the data summarized anywhere I could get with a search engine.

    ReplyDelete
  4. Jeff, I'm also struggling to see a hitting skill here. I'm thinking it might be a ballpark effect but I have not looked into it yet. I agree speed is not the reason

    I'm not sure I understand what you mean about ROE% being skewed towards players with a high number of at bats.

    ReplyDelete
  5. RE Skewing: It might be skewed for all batters. But among those who hit over 500 ground balls as specified in your entry, the tail of the distribution is larger than you would expect if it were a normal distribution. (Basically this is just what you said.)

    I was just looking for something the group has in common to look for an explanation.

    ReplyDelete
  6. It looks like most of these hitters are RH. I've seen data that shows RH hitters are more likely to reach base on infield hits than LH hitters. It's possible the same is true of creating errors, as the SS and 3B have less time to compensate for a fielding mistake and still record an out.

    ReplyDelete
  7. Good suggestion Nick and it's something that could be checked as retrosheet gives hit location, fielder who made the error. It does make sense that players who hit the ball to the left side would reach base on errors more often.

    ReplyDelete

Twitter

Blog Archive

Subscribe

My Sabermetrics Book

My Sabermetrics Book
One of Baseball America's top ten books of 2010

Other Sabermetrics Books

Stat Counter