Monday, August 27, 2012

Strikeouts and Z-Scores

(Warning: This post is going to be even more mathy than my usual posts, but Mike Rogers and hopefully others will want to see it)

In yesterday's post about Max Scherzer, I listed the pitchers who scored highest on the K9+ measure (strikeout rate per nine innings relative to league average) in the history of the game.  The K9+ statistic is like OPS+ or ERA+ except it's for pitcher strikeouts.  It turned out that Dazzy Vance, a Brooklyn Dodger hurler in the 1920s and 30s, dominated the list with five of the top seven K9+'s ever.  Mike Rogers, who you may know from Bless You Boys and Beyond the Box Score and other places,  pointed out in a comment that Bill Petti had done something similar a while ago at FanGraphs.

Mr Petti used strikeout percentage or K% (percentage of batters faced by a pitcher resulting in strikeouts) rather than strikeouts per nine innings as his base. He then computed K%+ (strikeout percentage relative to league average) and also found that Vance dominated the leader board.  Many commenters at FanGraphs suggested he try computing something called a Z-Score to compare pitchers from different years.  I'm not sure if he or anyone else ever got around to it, but I can't find it so I'm going to try it here.

In general, when one tries to rank players from different eras relative to league average, the best players from earlier eras tend to fare better than players from more modern eras.  That is OK, of course, in cases when they actually were better, but in many cases they were getting an unfair advantage.  What seems to be a drop in quality of star players over time is often actually a decrease in the spread of talent.  In the early days of the game, there was a relative small percentage of players who could play the game really well.  So, these players were way ahead of the average player.

In more modern times, there is a larger pool of players from which to draw (largely due to the integration of African Americans, Latinos and Asians) and more players have learned how to play the game well.  Thus the best players are not so far ahead of the average ones.  In other words, the best players are not getting worse, rather the average player is getting better.

In order to adjust for the tighter distribution of talent over time, we use standard deviations.  The standard deviation is a measure of the spread of numbers for a particular statistic.  If K% varies a lot from pitcher to pitcher in a given year, there will be a large standard deviation.  If all the pitchers have a K% close to average in a given year, there be a small standard deviation.

A look at the data shows that standard deviations are actually higher in more recent years, but that is because the strikeout percentages are higher.  The coefficients of variation (standard deviation relative to the average) are lower and that will make a difference in our results.

The z-score is calculated as (k%-league k%)/Standard deviation of k%.  This will give us an idea of which players dominated their leagues most after adjusting for spread of talent.

Table 1 below shows the top Z-scores in the history of the game. Vance is still prominent near the top of the list but he has more company now as Pedro Martinez and Randy Johnson have moved up the chart.  Those three pitchers hold the top 11 spots on the list.

Table 1: All-time Single-Season Strikeout percentage Z-Score Leaders

Player
Year
Team
IP
K
k%
K%+
Z-Score
Pedro Martinez
1999
BOS
213.3
313
37.5
248
5.52
Dazzy Vance
1924
BRO
308.3
262
21.5
297
5.38
Pedro Martinez
2000
BOS
217.0
284
34.8
229
5.14
Dazzy Vance
1925
BRO
265.3
221
20.3
282
4.83
Randy Johnson
1995
SEA
214.3
294
33.9
230
4.64
Dazzy Vance
1926
BRO
169.0
140
19.6
273
4.56
Randy Johnson
1997
SEA
213.0
291
34.2
221
4.50
Randy Johnson
2001
ARI
249.7
372
37.4
216
4.37
Dazzy Vance
1923
BRO
280.3
197
16.6
233
4.33
Randy Johnson
2000
ARI
248.7
347
34.7
210
4.27
Randy Johnson
1999
ARI
271.7
364
33.7
210
4.25
Nolan Ryan
1987
HOU
211.7
270
30.9
206
4.15
Dwight Gooden
1984
NYN
218.0
276
31.4
218
4.05
Nolan Ryan
1976
CAL
284.3
327
27.3
227
4.02
Nolan Ryan
1978
CAL
234.7
260
25.8
224
4.01
Johnny Vander Meer
1941
CIN
226.3
202
21.4
230
4.01
Nolan Ryan
1989
TEX
239.3
301
30.5
227
3.99
Rube Waddell
1903
PHA
324.0
302
23.1
223
3.96
Randy Johnson
1993
SEA
255.3
308
29.5
206
3.95
Pedro Martinez
2002
BOS
199.3
239
30.4
197
3.95
Data Source: Baseball-Databank.org

Table 2 shows the Tigers Strikeout Percentage Z-Score leaders.  Hal Newhouser's 1946 season tops the list followed by Mickey Lolich in 1969.  I don't know the exact standard deviation for 2012, but Scherzer has a z-score of about 2.61 if we use standard deviations from recent years.


Table 2: Tigers Single-Season Strikeout percentage Z-Score Leaders 

Player
Year
IP
K
k%
K%+
Z-Score
Hal Newhouser
1946
292.7
275
23.4
207
3.25
Mickey Lolich
1969
280.7
271
23.1
158
2.71
Hal Newhouser
1945
313.3
212
16.8
188
2.69
Hal Newhouser
1943
195.7
144
16.9
178
2.56
Justin Verlander
2009
240.0
269
27.4
163
2.40
Jim Bunning
1959
249.7
201
19.4
153
2.33
Syl Johnson
1923
176.3
93
12.7
168
2.28
Tommy Bridges
1943
191.7
124
16.0
168
2.24
Bobo Newsom
1941
250.3
175
15.5
168
2.20
Bobo Newsom
1939
246.0
164
15.6
171
2.12
Data Source: Baseball-Databank.org

I will try to update this table after Scherzer's season is over.  I also plan to do the same sort of thing with some other pitching and hitting statistics when I have time. 

1 comment:

  1. Well done, Lee. This is exactly what I was hoping Bill would do at FG but he never did. I'm glad you got around to doing it because I don't have a database so collecting all of that data would've sucked. Great work. This is the stuff I love to read the most.

    ReplyDelete

Twitter

Blog Archive

Subscribe

My Sabermetrics Book

My Sabermetrics Book
One of Baseball America's top ten books of 2010

Other Sabermetrics Books

Stat Counter