Saturday, October 25, 2008

Hitting for Average

I was over at TigsTown recently and they were discussing which Tigers minor leaguers were best at "hitting for average". They judged the players based on scouting and on statistics. From the scouting perspective, they considered qualities such as consistently making solid contact, hitting all different kinds of pitchers and using the whole field. Statistically, they used use items such as batting average, percent of plate appearances that resulted in the batter making contact, batting average on balls in play and strikeouts per at bat. It's premium content so I can not reveal many of the results but I'll mention that second baseman Justin Henry fared well on the hitting for average skill on both scouting tools and statistics.

What I wanted to do here is rate the major league Tigers on hitting for average in 2008. I'm not a scout so I just considered statistics and not tools. Since there are more statistics available for major league players, I used a different algorithm than the one used at TigsTown. I started by looking at batting average. The Tigers 2008 batting averages can be found in Table 1.

Table 1: Tigers Batting Averages in 2008

player

avg

Magglio Ordonez

.317

Placido Polanco

.307

Miguel Cabrera

.292

Carlos Guillen

.286

Curtis Granderson

.280

Edgar Renteria

.270

Marcus Thames

.241

Gary Sheffield

.225

Brandon Inge

.205


The problem with batting average is that what happens after the batter hits the ball is largely out of his control. He can hit line drives that are caught or soft bloopers that escape the grasp of infielders. Often times, these fortunes and misfortunes even out throughout the course of a season but sometimes they don't. Thus, a player's batting average is not repeatable. That is, it varies a lot from season to season (correlation =.43).

A statistic which is much more repeatable than batting average is contact percentage (correlation = .90). Contact% is the percent of balls that a batter swings at which result in the batter making contact. This stat was abstracted from Fan Graphs which has fast developed into one of my favorite sites on the internet. Another contact hitting statistic is strikeouts per at bat which has a year to year correlation of .80. I could have used K/PA instead but the Fan Graphs database doesn't have all the items needed to calculate plate appearances and merging with my other database would have been more trouble than it worth right now. The Tigers leaders on contact% and K/AB are presented in Tables 2 and 3 below.

Tables 2: Contact percentage for Tigers in 2008

player

contact %

Placido Polanco

.927

Carlos Guillen

.862

Edgar Renteria

.856

Magglio Ordonez

.851

Gary Sheffield

.830

Curtis Granderson

.796

Miguel Cabrera

.775

Brandon Inge

.758

Marcus Thames

.743



Table 3: Strikeouts per at bat for Tigers in 2008

player

K/AB

Placido Polanco

.074

Edgar Renteria

.127

Magglio Ordonez

.135

Carlos Guillen

.160

Gary Sheffield

.199

Curtis Granderson

.201

Miguel Cabrera

.205

Brandon Inge

.271

Marcus Thames

.301


It probably comes to no surprise that Placido Polanco led the Tigers in both categories. In fact, he led the American League in both. Contact% and K/AB give us information about ability to make contact but they tell us nothing about how solid the contact was. Line drive percentage helps us there. You can often tell about a batter's fortunes by looking at line drive percentage. A player with a high line drive percentage relative to his batting average is possibly hitting into a lot of hard outs. Conversely, a batter who has a low line drive rate relative to his batting average is possibly getting a lot of cheap hits. The Tigers line drive percentages are listed in Table 4.

Table 4: Line drive percentages for Tigers in 2008


player

line drive %

Edgar Renteria

.222

Magglio Ordonez

.204

Carlos Guillen

.202

Miguel Cabrera

.196

Curtis Granderson

.191

Placido Polanco

.187

Marcus Thames

.170

Brandon Inge

.164

Gary Sheffield

.143


It's a little surprising to see Edgar Renteria's high line drive rate. This suggests that he may have been unlucky and that his batting average rebound might rebound in 2009.

I combined the above four items to arrive at one statistic which describes the hitting for average skill. First, I normalized each number, so that they all had the same scale - an average of 0 and a standard deviation of 1. Then I assigned weights to each statistic denoting their importance. The most important statistic is batting average (after all the skill is called hitting for average) so I gave twice as much weight to batting average as the other numbers:

0.4 x BA + 0.2 x contact% - 0.2 x KPCT + 0.2 x LD%

Finally, I reverse normalized the result so that we get back to the original batting average scale.
The way it works is like this: Edgar Renteria had only a .270 batting average. However, his contact, strikeout and line drive rates were all very good. So, his adjusted batting average goes up to .284. The results for all Tigers are listed in Table 5.

Table 5: Tigers hitting for average summary in 2008

player

avg

contact %

K/AB

line drive %

adjusted avg

Placido Polanco

.307

.927

.074

.187

.304

Magglio Ordonez

.317

.851

.135

.204

.299

Carlos Guillen

.286

.862

.160

.202

.285

Edgar Renteria

.270

.856

.127

.222

.284

Miguel Cabrera

.292

.775

.205

.196

.275

Curtis Granderson

.280

.796

.201

.191

.271

Gary Sheffield

.225

.830

.199

.143

.245

Marcus Thames

.241

.743

.301

.170

.239

Brandon Inge

.205

.758

.271

.164

.227


Renteria and Sheffield both had significantly better adjusted batting averages than real batting averages. Magglio Ordonez and Miguel Cabrera had adjusted BA which were signficantly lower that their real averages. Finally, according to this algorithm, Polanco was the most skilled Tiger at hitting for average in 2008 and Brandon Inge was the worst.

5 comments:

  1. The good news is that if hitting for average correlates with average Sheffield is as likely to hit 270 as 220.

    The bad new is if hitting for average correlates to batting average Inge is as likely to bat 200 as 250.

    ReplyDelete
  2. I have just looked at one year of data so far so I'm not sure how predictive adjusted average is. However, those low line drive percentages for Sheffield and Inge tell me that their low batting averages last year were probably not the result of bad luck.

    ReplyDelete
  3. Lee, I don't see the differences between the the actual and projected averages as being significant. The most dramatic change difference was Inge with a 22 point difference or roughly slightly over 1.1 hit every 50 ABs. It seems that randomness or other factors such as speed and power may be more significant and defeat any predictive value the analysis might hold. I do see it's value for projecting minor leaguer's BA. I suspect you would see greater differences in their figures.

    ReplyDelete
  4. The reason why there were not big differences between batting average and adjusted batting average is because batting average was part of the calculation. It would be the same for minor leaguers. it might be interesting to try to predict BA from contact rate, line drive rate and K pct.

    I don't think the adjusted BA has any significant predictive value. I think there are three basic skills that contribute to statistics like OPS and RC: hitting for average, power and plate discipline. This was an attempt to isolate the hitting for average skill. I think it might work a little better than batting average. Next, I'm going to look at power which is a lot simpler because isolated power is more repeatable than BA and probably doesn't need to be adjusted.

    ReplyDelete
  5. Great work as always on this stuff, Lee.

    Mike
    www.DailyFungo.com

    ReplyDelete

Twitter

Blog Archive

Subscribe

My Sabermetrics Book

My Sabermetrics Book
One of Baseball America's top ten books of 2010

Other Sabermetrics Books

Stat Counter