Tuesday, October 25, 2005

Fun with Similarity Scores

Similarity Scores were introduced by Bill James about 20 years ago. They are now included in the Baseball-reference database. The way it works is you select any player and run him through the database to see which players in the history of the game are most similar to him statistically. For example, the player who was most similar to Carlos Pena through the age of 27 was Andre Thornton.

James did not develop a really sophisticated algorithm and did not intend for it to be taken as a serious projection tool. In some cases, it might put a player’s career in perspective or give you some kind of clue as to where he might be headed. In other cases, it’s nothing more than a conversation piece. Below, I ran a few Tigers through the program. Note that a perfect similarity score is 1000. A score of 1000 would indicate that two player’s were exactly alike.

Carlos Pena

Andre Thornton (956)

Mike Epstein (955)

Bob Robertson (952)

Jay Buhner (948)

Don Mincher (939)

Tino Martinez (938)

Phil Plantier (938)

Nick Esasky (937)

Jay Gibbons (936)

Rob Deer (936)

Not surprisingly, Pena is matched with players who hit a lot of homeruns and walked a lot but hit for low average with a lot of strike outs. Note that the similarity score algorithm does not adjust for era. Nor does it adjust for ballpark.

Brandon Inge

Roy Smalley (957)

Daryl Spencer (952)

Eli Marrero (950)

Dale Sveum (947)

Luis Rivera (944)

Andre Rodgers (944)

Frank Duffy (942)

Pat Meares (941)

Alex Cora (936)

Billy Myers (936)

It would be nice if he could be matched to Roy Smalley Jr. rather than Roy Smalley Sr.

Craig Monroe

Harry Anderson (982)

Jeffrey Hammonds (967)

Art Shamsky (964)

Hal McRae (961)

Henry Rodriguez (961)

Johnny Rizzo (960)

Bob Nieman (960)

Jim Greengrass (959)

Jerry Lynch (956)

Larry Sheets (956)

Hopefully, he’ll have a better career than Harry Anderson.

Placido Polanco

Todd Walker (932)

Gil McDougald (925)

Adam Kennedy (923)

Fred Dunlap (919)

Jimmie Dykes (915)

Johnny Ray (914)

Odell Hale (911)

Hubie Brooks (907)

Julio Franco (905)

Tony Bernazard (903)

Todd Walker is not a bad comparison. Luckily, Polanco is better defensively.

Carlos Guillen

Johnny Logan (942)

Julio Lugo (935)

Billy Sullivan (932)

Tom Burns (929)

Sam Wise (928)

Alvin Dark (926)

Ron Belliard (926)

Rich Aurilia (925)

Adam Kennedy (924)

Mike Lansing (924)

Logan is not a bad player with whom to be compared. Guillen needs to stay healthy though.

Magglio Ordonez

Mike Sweeney (963)

Wally Berger (952)

Fred Lynn (941)

Tony Oliva (938)

Dave Parker (936)

Larry Walker (925)

Jim Edmonds (925)

Tim Salmon (919)

Ellis Burks (918)

George Foster (916)

There are some interesting names on that list.

Ivan Rodriguez

Ted Simmons (864)

Yogi Berra (822) *

Gary Carter (812) *

Johnny Bench (791) *

Joe Torre (791)

Cal Ripken (788)

Bill Dickey (778) *

Ryne Sandberg (770) *

Joe Cronin (769) *

Bobby Doerr (768)*

Lot’s of HOFers (that’s what the * is for) there. Hopefully Pudge has something left.

Dmitri Young

Rondell White (950)

Felipe Alou (948)

Richie Zisk (942)

Joe Adcock (939)

Cecil Cooper (938)

Bobby Higginson (938)

Leon Durham (930)

Wally Joyner (929)

Cliff Floyd (927)

Bill Skowron (920)


In case you are wondering, they didn’t do similarity scores for Chris Shelton and Curtis Granderson because their careers are too short at this point.


  1. The statistic is filled with LIES! Brandon Inge is going to be so much better than Dale Sveum. DON'T TELL ME OTHERWISE I WON'T HEAR OF IT.

  2. I find it interesting that the Ordonez comp contains so many players who missed significant chunks of time due to injury.

  3. a couple of names i'm happy to see:

    -roy smalley (my first glove was a roy smalley jr signature model)

    -johnny ray (we share our birthday, and i'm a closet pirates fan)

  4. Geoff, the similarity score matches actually turned out to be more appropriate than I expected for most players.

    Sam, I'd like to tell you that Inge will be better than Sveum but I've about given up on trying figure out what Inge will do next.

  5. i actually prefer to evaluate not by the similar player scores, but smilar through current age. i also like to use similar by age/year for projection. (particularly for hitters-pitchers are a whole different story)

    call me a dork, but i use sim scores as a a part of my evaluations in fantasy trades, and also use them when scouting teams for hidden gems.

    of course, it helps when my team is based around these 2 guys:

    albert pujols: (similar through current age-joe dimaggio)
    miguel cabrera: (STCA-hank aaron)

  6. Zimm, I've got Pujols on one of my fantasy teams. I've had him since before his rookie year. I got lucky on that one but I'll never give him up. There's nothing wrong with using sim scores for your fantasy team. I exhaust all avenues with my teams.



Blog Archive


My Sabermetrics Book

My Sabermetrics Book
One of Baseball America's top ten books of 2010

Other Sabermetrics Books

Stat Counter