Saturday, December 29, 2007

Converting Zone Rating to Something Useful

In Part 1 of my 2007 fielding analysis, I ranked the Tigers on range factor and zone rating. Range factor is not so useful now that more sophisticated measures are available. Zone rating is still regarded as one of the better measures and it’s also very accessible throughout the year. In this post, I will explain how Zone Rating can be translated into something more useful than just a percentage. The Table of Contents for the entire fielding series is shown below:

Basic fielding stats
Converting Zone Rating to something useful
Revised Zone Rating
Probabilistic Model of Range
Fielding Bible
Ultimate Zone Rating
Fan Fielding Survey versus range measures
Outfield arms
Ranking the second basemen
Ranking the shortstops
Ranking the third basemen
Ranking the first baseman
Ranking the center fielders
Ranking the right fielders
Ranking the left fielders
What about catchers?

It would be nice if we could change ZR from a percentage to something like plays made above average or runs saved above average. We can not do this precisely without the actual Balls In Zone data from STATS but we can get estimates. Chris Dial, who writes for Baseball Think Factory and has access to some of the data used for calculating ZR, has developed a method for obtaining these kinds of estimates.

Dial takes the average opportunities at each position and the player's zone rating to calculate the number of plays the player made and compares that to the number of plays that a player with league average ZR would have made. The result is Plays Made Above Average (PMAA). He also estimates the approximate run value of a ball in play at each position. His run value methodology is found here. You will notice that, although it varies by position, an extra play made is worth about 0.8 runs on average. Tom Tango explains it more intuitively on his blog. From that, he determines the runs saved a player aggregates above or below the average player at his position (RSAA).

The Table below shows the ZR, PMAA and RSAA for Tigers players in 2007 (those who were with the team in 2007 and those who have been acquired this off-season). PMAA/150 is PMAA prorated to 150 games played. RSAA/150 = RSAA prorated to 150 games played. PMAA/150 and RSAA/150 allow us to better compare players with different numbers of games played. The final column shows how each player ranks on RSAA/150. This may be slightly different from the ZR ranks due to the games played adjustment.

I'll use Jacque Jones as an example. Jones played 645 innings and had a Zone Rating of .904 as a center fielder in 2007. Given his playing time, he made 5 more plays than the average center fielder. This comes out to 4 runs saved above the average center fielder. If we assume the same level of performance over 150 games, Jones would have made 11 plays above average and saved 9 runs above average.

The leading tigers in runs saved per 150 games in 2007 were: Brandon Inge (12), Curtis Granderson (10), Magglio Ordonez (10) and Sean Casey (8). The only Tiger who was below average was Carlos Guillen who had a RSAA/150 of -7. New acquisition Jacques Jones finished 9 runs above average. However, Miguel Cabrera (-16) and Edgar Renteria (-9) were below average. Based on this statistic, it appears that the Tigers did not help their defense during the off-season.

The Replacement Level Yankee Weblog now includes a zone rating database that has Zone Ratings along with conversions to runs saved above average for all players from 1987-2007. Just go to the left sidebar and you'll see a link to the database.


Table 1: Plays Made and Runs Saved by Tigers fielders - 2007

POS

#

Player

Innings

ZR

PMAA

RSAA

PMAA/150

RSAA/150

RSAA/150 Rank

1B

29

Casey

989.0

.886

7

6

10

8

7

2B

28

Polanco

1209.0

.828

5

4

5

4

11

3B

27

Inge

1310.2

.803

15

12

15

12

4

3B

27

Cabrera

1311.2

.714

-20

-16

-20

-16

25

SS

30

Guillen

1074.0

.807

-7

-5

-9

-7

20

SS

30

Renteria

1019.1

.800

-9

-7

-12

-9

23

LF

27

Monroe

806.2

.882

5

4

9

7

10

CF

27

Granderson

1285.0

.908

12

10

12

10

4

CF

27

Jones

645.0

.904

5

4

11

9

6

RF

28

Ordonez

1221.0

.908

11

9

12

10

2

Thursday, December 27, 2007

Basic Fielding Stats - 2007

Today's article is the first in a series of articles looking at individual fielding statistics for the Tigers and the rest of baseball in 2007. It is more difficult to measure fielding than hitting or even pitching so I will discuss several different options over the next couple of weeks. I will start with the most basic of fielding stats and work my way up to the more sophisticated modern methods. Finally, I will rank the players at each position by aggregating all the stats. Today I’ll give a brief history of fielding stats and a discuss the most frequently used measures. After that, I’ll show how the Tigers rank on these measures. In future posts, I’ll discuss some of the newer fielding measures. The Table of contents for the series is listed below:

Basic fielding stats
Converting Zone Rating to something useful
Revised Zone Rating
Probabilistic Model of Range
Fielding Bible
Ultimate Zone Rating
Fan Fielding Survey versus range measures
Outfield arms
Ranking the second basemen
Ranking the shortstops
Ranking the third basemen
Ranking the first baseman
Ranking the center fielders
Ranking the right fielders
Ranking the left fielders
What about catchers?

The most commonly reported fielding measure is fielding percentage (FPCT) which is the infrequency with which fielders make errors on balls which they reach. It is calculated as (Total plays – errors)/total plays. Not making errors is a positive thing so this statistic has some value but it also has some important flaws. First, errors are subjective and judgement varies from one official scorer to the next.

More importantly, FPCT says nothing about range. Some players get to a lot of balls which other players can not reach. Measurement of range is an area of sabermetrics which is still developing. The two most accessible range statistics, Range Factor (RF) and Zone Rating (ZR), are discussed below.

Range Factor (RF) was originated by baseball writer Al Wright in the 1870s but it was virtually ignored for over 100 years until Bill James re-introduced it around 1980. Range Factor = (put outs + assists)/games. One of the limitations of RF is that it is a team dependent statistic. For example, an outfielder playing behind a predominantly ground ball staff will have fewer opportunities than an outfielder on a staff with a lot of fly ball pitchers. Similarly, the range factors of all fielders on a team will be affected if their pitchers strike out a lot of batters.

Zone Rating (ZR) was developed by John Dewan when he worked at STATS, Inc in the early 1990s. They divided the baseball field into small areas and assigned these areas to fielders as follows: Based on hit location data, if half the balls hit into a certain area are converted into outs by all he players at a given position, then that area is considered to be part of the zone for that position. For example, if 1,000 balls are hit into area X and 506 are converted into outs by baseball's shortstops, then Area X is considered to be part of the zone for the shortstop position.

The calculation of ZR for a given player considers three factors: The number of balls hit into his zone while he is in the game (Balls In Zone or BIZ) , the number of these balls which he converts into outs (Plays in Zone or PIZ) and the number of plays he makes outside his zone (Outside of Zone or OOZ). The ZR is computed as follows:

ZR= (PIZ+OOZ)/(BIZ+OOZ).

So, ZR can be regarded as the percentage of balls in a player's zone that he converts into outs plus extra credit for successful plays he made outside his zone. Since Zone Rating penalizes a player for errors by considering them to be missed opportunities, it makes FPCT almost obsolete.

One of the problems with zone rating is that it depends on the reliability of those collecting the data. They need to judge whether balls were actually hit into the zone and distinguish between fly balls and line drives as there is a different zone for each. Another drawback of zone rating is that it treats all balls in the zone the same way even though it may be more difficult to reach some balls within the zone than others.

Table 1 Shows how Tiger fielders (2007 Tigers and recent acquisitions) rank on FPCT, RF and ZR among players with 600 or more innings in 2007. Fielding percentage does not tell us much about fielding performance but has been included on the table to show how deceptive it can be in evaluation of fielding. The discrepancies between range factor and zone rating, two statistics which are supposed to measure the same thing tell us they are not completely reliable. Zone rating is considered to more reliable because it is not as team dependent. In fact, Range Factor is rarely used by sabermetricians anymore.

Using Curtis Granderson as an example, the table tells us the following: There were 27 MLB center fielders who played 600 or more innings in 2007. Granderson had a .989 FPCT which placed him 14th in baseball. His 3.04 RF means that he made about 3 plays per game. He ranked 2nd in baseball on that statistic. His ZR was .908 which says that he converted 90.8% of the balls in his zone into outs (including extra credit for plays he made outside his zone). He ranked 4th in the majors on that measure.

Among others who did well on Zone Rating was Sean Casey who finished 7th in the majors. Note that range measures may not tell us as much about first basemen as they do for other fielders because they do not address throws taken from infielders and this is obviously an important part of a first baseman's job.

Others who ranked well on ZR were third baseman Brandon Inge (4th in the majors), center fielder Jacque Jones (6th) and Magglio Ordonez (2nd). The biggest surprise in that group is probably Ordonez. He definitely seemed to improve last year but 2nd is better than I expected. ZR is just one range measure though. We will see how these players rank on other measures when we get to them.

Another interesting case is Placido Polanco who had a perfect FPCT but finished only 13th in ZR. That is an illustration of one of the problems with fielding percentage - it ignores how much ground a player covers. It also might surprise some people that Edgar Renteria ranked even lower (23rd) in ZR than Carlos Guillen (20th). Again, ZR is just one zone measure. I always recommend looking at more than one measure when evaluating fielding.

Note that these statistics don’t really pertain to catchers so that position will have to be addressed at another time.

The statistics for the table below were abstracted from the ESPN database.


Table 1: Tigers Basic Fielding statistics in 2007

POS

last name

#

FPCT

FPCT Rank

RF

RF Rank

ZR

ZR Rank

1B

Casey

29

.998

4

9.41

12

.886

7

2B

Polanco

28

1.000

1

5.08

10

.828

13

3B

Cabrera

27

.941

25

2.51

19

.714

25

3B

Inge

27

.959

16

2.86

8

.803

4

SS

Guillen

30

.955

30

4.29

19

.807

20

SS

Renteria

30

.977

11

4.14

24

.800

23

LF

Monroe

27

.983

17

1.92

17

.882

10

CF

Granderson

27

.989

14

3.04

2

.908

4

CF

Jones

27

.981

25

2.83

11

.904

6

RF

Ordonez

28

.996

3

1.95

21

.908

2

Wednesday, December 26, 2007

Wily Mo and Hairston Jr.

No, the Tigers did not just make a trade involving Wily Mo Pena and Jerry Hairston Jr. I'm in the process of collecting fielding data from around the internet - The Hardball Times, Baseball Musings, Tangotiger.net, Fielding Bible and ESPN. I'm almost done merging everything but Wily Mo Pena and Jerry Hairston Jr. are messing things up because of their names, multiple positions and in Pena's case multiple teams. Anybody who has merged databases before probably understands what I mean. Once I'm done, I'll be talking about fielding data for the next couple of weeks (unless major news interrupts).

In the mean time, I hope everybody is enjoying the holidays.

Saturday, December 22, 2007

Run Preventing Events - 2007

Today, I’ll continue with the batted balls versus pitchers theme which I started earlier this week. An at bat can result in any of the following events:

  • Strikeout
  • Base on balls
  • Hits batsman
  • Ground ball
  • Line drive
  • Outfield fly
  • Infield fly
Three of those events are generally favorable events for pitchers:

  • Strikeout
  • Ground ball
  • Infield fly
I call these run preventing events (RPE). Of course, a ground ball is not as easy an out as a strikeout or an infield fly and can have a negative result for a pitcher. However, inducing a lot of ground balls will help to prevent runs over the course of a season. On the other hand, it is good for pitchers to avoid, for the most part, the following events:

  • Base on balls
  • Hits batsman
  • Line drive
  • Outfield fly
Last year, I created a statistic I call Run Preventing Event percentage (RPE%) which is calculated as follows: (SO + GB + IFF)/BFP. Striking out batters and inducing grounders have been shown to be repeatable skills. Getting batters to hit infield flies is not stable from year to year (correlation = .10 between 2005 and 20065). However, infield flies are relatively rare compared to other batted ball types and including them does not change the RPE% substantially in most cases. Plus, I suspect (without statistical evidence) that this is a real ability for some power pitchers.

It turns out that RPE% is fairly stable with a .66 correlation between 2005 and 2006. It can also be considered a fielding independent stat because, although the end result is not independent of fielders, getting a grounder or infield fly to happen in the first place has nothing to do with fielders. It is as stable or more stable than FIP ERA but it is not weighted and thus does not explain as much about runs allowed. Part of the value of RPE% is its simplicity.

There were 65 American League starters with 17 or more starts in 2007. Table 1 lists the RPE% rankings for Tigers starters. Table 2 lists the top 20 pitchers in the league. In general, the RPE% seems to be good for identifying effective pitchers but it gives rather unusual results for a couple of Tiger starters: Jeremy Bonderman and Justin Verlander.

From the tables, we can see that Bonderman (RPE%=57.2) ranked very well (9th in the AL) and was the only Tigers pitcher in the top 20 in 2007. I mentioned in the starting pitcher FIP ERA article, Bonderman does well on fielding independent stats (strike outs, walks, ground balls) but allows runs in bunches, particularly in the first inning. His problems with runners on base are illustrated by his OPS against splits: .827 with runners on base, .888 with runners in scoring position and .748 with the bases empty.

In contrast to Bonderman, Verlander had an RPE% (52.1) which was close to league average. Verlander had a lower than average ground ball percentage (41%) but had very good results on fly balls and line drives. According to The Hardball Times Baseball Annual 2008, he allowed fewer runs per line drive(.32) than any American League pitcher. He also finished in the top 5 in runs per outfield fly (.14). This may be an indication that he did not allow well hit balls when runners were on base.

The other three pitchers expected to start for the Tigers in 2008, had RPE percentages of league average or just below: Kenny Rogers 52.0, Dontrelle Willis (51.5), and Nate Robertson (51.1).

The raw data used in calculating RPE% were abstracted from The Hardball Times database.


Table 1: Run Preventing Events for Tigers Starters in 2007

Rank

Name

IP

SO

GB

IF

RPE

RPE%

9

Bonderman

174.3

145

266

20

431

57.2

31

Verlander

201.7

183

245

23

451

52.1

34

Willis

205.3

146

322

17

485

51.5

41

Robertson

177.7

119

266

14

399

51.1

50

Durbin

127.7

66

193

17

276

49.2

.

Jurrjens

30.7

13

37

6

56

45.9

.

Maroth

78.3

28

120

9

157

45.4

.

Miller

64.0

56

102

7

165

53.4

.

Rogers

63.0

36

102

5

143

52.0



Table 2: Top 20 AL Starters by RPE% in 2007

Rank

Name

Team

IP

SO

GB

IF

RPE

RPE%

1

Hernandez

SEA

190.3

165

357

11

533

66.0

2

Carmona

CLE

215.0

137

431

8

576

65.5

3

Burnett

TOR

165.7

176

239

15

430

62.2

4

Bedard

BAL

182.0

221

216

12

449

61.3

5

Beckett

BOS

200.7

194

276

29

499

60.7

6

Wang

NYA

199.3

104

381

12

497

60.4

7

McGowan

TOR

169.7

144

264

10

418

59.3

8

Halladay

TOR

225.3

139

391

19

549

59.2

9

Bonderman

DET

174.3

145

266

20

431

57.2

10

Sabathia

CLE

241.0

209

324

25

558

57.2

11

Loe

TEX

136.0

78

269

3

350

56.9

12

DiNardo

OAK

131.3

59

249

7

315

56.8

13

Kazmir

TB

206.7

239

238

23

500

56.4

14

Westbrook

CLE

152.0

93

264

7

364

56.2

15

Haren

OAK

222.7

192

304

28

524

56.0

16

Shields

TB

215.0

184

279

25

488

55.8

17

Tavarez

BOS

134.7

77

252

8

337

55.8

18

Meche

KC

216.0

156

321

26

503

55.5

19

Gaudin

OAK

199.3

154

318

18

490

55.3

20

Blanton

OAK

230.0

140

359

26

525

55.3

Twitter

Blog Archive

Subscribe

My Sabermetrics Book

My Sabermetrics Book
One of Baseball America's top ten books of 2010

Other Sabermetrics Books

Stat Counter