(Warning: This post is going to be even more mathy than my usual posts, but Mike Rogers and hopefully others will want to see it)
In yesterday's post about Max Scherzer, I listed the pitchers who scored highest on the K9+ measure (strikeout rate per nine innings relative to league average) in the history of the game. The K9+ statistic is like OPS+ or ERA+ except it's for pitcher strikeouts. It turned out that Dazzy Vance, a Brooklyn Dodger hurler in the 1920s and 30s, dominated the list with five of the top seven K9+'s ever. Mike Rogers, who you may know from Bless You Boys and Beyond the Box Score and other places, pointed out in a comment that Bill Petti had done something similar a while ago at FanGraphs.
Mr Petti used strikeout percentage or K% (percentage of batters faced by a pitcher resulting in strikeouts) rather than strikeouts per nine innings as his base. He then computed K%+ (strikeout percentage relative to league average) and also found that Vance dominated the leader board. Many commenters at FanGraphs suggested he try computing something called a Z-Score to compare pitchers from different years. I'm not sure if he or anyone else ever got around to it, but I can't find it so I'm going to try it here.
In general, when one tries to rank players from different eras relative to league average, the best players from earlier eras tend to fare better than players from more modern eras. That is OK, of course, in cases when they actually were better, but in many cases they were getting an unfair advantage. What seems to be a drop in quality of star players over time is often actually a decrease in the spread of talent. In the early days of the game, there was a relative small percentage of players who could play the game really well. So, these players were way ahead of the average player.
In more modern times, there is a larger pool of players from which to draw (largely due to the integration of African Americans, Latinos and Asians) and more players have learned how to play the game well. Thus the best players are not so far ahead of the average ones. In other words, the best players are not getting worse, rather the average player is getting better.
In order to adjust for the tighter distribution of talent over time, we use standard deviations. The standard deviation is a measure of the spread of numbers for a particular statistic. If K% varies a lot from pitcher to pitcher in a given year, there will be a large standard deviation. If all the pitchers have a K% close to average in a given year, there be a small standard deviation.
A look at the data shows that standard deviations are actually higher in more recent years, but that is because the strikeout percentages are higher. The coefficients of variation (standard deviation relative to the average) are lower and that will make a difference in our results.
The z-score is calculated as (k%-league k%)/Standard deviation of k%. This will give us an idea of which players dominated their leagues most after adjusting for spread of talent.
Table 1 below shows the top Z-scores in the history of the game. Vance is still prominent near the top of the list but he has more company now as Pedro Martinez and Randy Johnson have moved up the chart. Those three pitchers hold the top 11 spots on the list.
Table 1: All-time Single-Season Strikeout percentage Z-Score Leaders
Year
|
Team
|
IP
|
K
|
k%
|
K%+
|
Z-Score
|
|
Pedro
Martinez
|
1999
|
BOS
|
213.3
|
313
|
37.5
|
248
|
5.52
|
Dazzy
Vance
|
1924
|
BRO
|
308.3
|
262
|
21.5
|
297
|
5.38
|
Pedro
Martinez
|
2000
|
BOS
|
217.0
|
284
|
34.8
|
229
|
5.14
|
Dazzy
Vance
|
1925
|
BRO
|
265.3
|
221
|
20.3
|
282
|
4.83
|
Randy
Johnson
|
1995
|
SEA
|
214.3
|
294
|
33.9
|
230
|
4.64
|
Dazzy
Vance
|
1926
|
BRO
|
169.0
|
140
|
19.6
|
273
|
4.56
|
Randy
Johnson
|
1997
|
SEA
|
213.0
|
291
|
34.2
|
221
|
4.50
|
Randy
Johnson
|
2001
|
ARI
|
249.7
|
372
|
37.4
|
216
|
4.37
|
Dazzy
Vance
|
1923
|
BRO
|
280.3
|
197
|
16.6
|
233
|
4.33
|
Randy
Johnson
|
2000
|
ARI
|
248.7
|
347
|
34.7
|
210
|
4.27
|
Randy
Johnson
|
1999
|
ARI
|
271.7
|
364
|
33.7
|
210
|
4.25
|
Nolan
Ryan
|
1987
|
HOU
|
211.7
|
270
|
30.9
|
206
|
4.15
|
Dwight
Gooden
|
1984
|
NYN
|
218.0
|
276
|
31.4
|
218
|
4.05
|
Nolan
Ryan
|
1976
|
CAL
|
284.3
|
327
|
27.3
|
227
|
4.02
|
Nolan
Ryan
|
1978
|
CAL
|
234.7
|
260
|
25.8
|
224
|
4.01
|
Johnny
Vander Meer
|
1941
|
CIN
|
226.3
|
202
|
21.4
|
230
|
4.01
|
Nolan
Ryan
|
1989
|
TEX
|
239.3
|
301
|
30.5
|
227
|
3.99
|
Rube
Waddell
|
1903
|
PHA
|
324.0
|
302
|
23.1
|
223
|
3.96
|
Randy Johnson
|
1993
|
SEA
|
255.3
|
308
|
29.5
|
206
|
3.95
|
Pedro
Martinez
|
2002
|
BOS
|
199.3
|
239
|
30.4
|
197
|
3.95
|
Table 2 shows the Tigers Strikeout Percentage Z-Score leaders. Hal Newhouser's 1946 season tops the list followed by Mickey Lolich in 1969. I don't know the exact standard deviation for 2012, but Scherzer has a z-score of about 2.61 if we use standard deviations from recent years.
Table 2: Tigers Single-Season Strikeout percentage Z-Score Leaders
Player
|
Year
|
IP
|
K
|
k%
|
K%+
|
Z-Score
|
Hal
Newhouser
|
1946
|
292.7
|
275
|
23.4
|
207
|
3.25
|
Mickey
Lolich
|
1969
|
280.7
|
271
|
23.1
|
158
|
2.71
|
Hal
Newhouser
|
1945
|
313.3
|
212
|
16.8
|
188
|
2.69
|
Hal
Newhouser
|
1943
|
195.7
|
144
|
16.9
|
178
|
2.56
|
Justin
Verlander
|
2009
|
240.0
|
269
|
27.4
|
163
|
2.40
|
Jim
Bunning
|
1959
|
249.7
|
201
|
19.4
|
153
|
2.33
|
Syl
Johnson
|
1923
|
176.3
|
93
|
12.7
|
168
|
2.28
|
Tommy
Bridges
|
1943
|
191.7
|
124
|
16.0
|
168
|
2.24
|
Bobo Newsom
|
1941
|
250.3
|
175
|
15.5
|
168
|
2.20
|
Bobo
Newsom
|
1939
|
246.0
|
164
|
15.6
|
171
|
2.12
|
I will try to update this table after Scherzer's season is over. I also plan to do the same sort of thing with some other pitching and hitting statistics when I have time.
Well done, Lee. This is exactly what I was hoping Bill would do at FG but he never did. I'm glad you got around to doing it because I don't have a database so collecting all of that data would've sucked. Great work. This is the stuff I love to read the most.
ReplyDelete