The History of Sabermetrics and Its Relation to Math

Tom Lewis
History of Math: Math 498
The History of Sabermetrics

Ever since baseball began being played in the United States in the mid 1800s in the eastern United States, mathematics has had a close relationship with the national pastime. Baseball arguably puts to use math much more so than any other sport does. It has always been a game that has been intertwined with the use of algebra, geometry and statistics. For instance, the trajectory of a curveball or knuckleball can be solved through trigonometry equations. Also it has been proved mathematically that a base runner sliding into first base makes absolutely no sense, although evidently many players remain unconvinced. However, in baseball the use of math is mainly coming from one area of mathematics, and that is statistics. It is plain and simple, baseball and statistics go together. When I hear the word statistics, I don't think about my STAT 301 class, instead my mind goes straight to baseball. In baseball there is a stat for everything. Runs driven in, homeruns, runs allowed, and innings pitched to name a few of the more obvious ones. But in baseball there are also a lot of strange statistics like batting average in day games or runs driven in with 2 outs and runners on in the 7th inning or later. So there is no doubt that math and statistics play a vital role in the culture of baseball fans. The love of statistics in baseball has led to the creating of a mathematical system called saber metrics that has been the latest craze among a handful of general managers of baseball teams.

Saber metrics is defined as the mathematical and statistical analysis of baseball records. This analysis done by the saber metric system of players, determines statistically, which players would ultimately be the most valuable to the team. Basically, this is a system that scouts players on paper, rather than the old fashioned way of sending a scout to go see the player in person. There are many skeptics of the system, but for the most part it has proved to be very successful. Billy Beane, one of the first general managers to use this system, has had enormous success. Working with a team that has been in the bottom 2/3 of team payrolls for years, he has led them to a number of playoff appearances since joining the team in 1997. Also since 1997, Beane has been able to achieve an average cost per win (team's total payroll divided by wins in a season) of $388,000. Compare that to the New York Yankees average of 1.23 million dollars per win. Also there is Theo Epstein, the general manager of the Boston Red Sox, who is a big believer in saber metrics. He was able to accomplish something no other general manager of a Red Sox team had done in over eighty years and that was to win the World Series. Saber metrics basic design is to improve upon the traditionally used statistics (HRs, RBI's, ERA, etc.), and so far the results have led to the belief that this method is working.

The pioneer that headed baseball toward the saber metric revolution was by the baseball magazine editor F.C. Lane in 1916. Back in the early 1900's, the only stat that was ever really used was batting average so Lane came up with a formula for determining a players' run value. He simply kept track of 1,000 hits and their results in order to assign them coefficients to use in an equation that he developed. This simple linear weights equation was: total run value= (.30*1B) + (.60*2B) + (.90*3B) + (1.15*HR). Where there is a higher weight added to the more successful hit. This equation, although not always entirely accurate because of the somewhat inaccurate coefficients, was the foundation for the beginning of saber metrics. Lane's contributions, on a much smaller scale, resemble the work of some of the very first mathematicians in Egypt and Greece that we have studied. Just like Lane's work, their formulas for determining things weren't always right on, but they laid the groundwork for future generations to build upon.

After Lane's work in 1916 there was a long dry spell in the advancement of his linear weights formula. In fact, there wasn't any other work done with saber metrics again until 1963 when George Lindsey came up with his run expectancy table. Lindsey was a military man who watched over 400 baseball games and with his father they made a detailed statistical analysis for all of the 24 base/out combinations that can occur in baseball (ex: 0 outs 1 runner on, 2 outs/2 on, 0 outs/3 on, etc.) And after tedious work of hundreds of hours of observations, he came up with the same type of formula as Lane did only the numbers (.41, ,82, 1.06, 1.42) were more accurate. Peter Palmer, in 1970, developed a far more advanced equations that brought in the use of even more statistics. Batting Runs= (.46*1B)+(.80*2B)+(1.02*3B)+(1.40*HR)+(.33*(BB+HBP))+(.30*SB)+(-.60*CS)+(-.25*(AB-H))-(.50*OOB)
An even more accurate and sophisticated system was adapted much later by Palmer and John Thorn in 1993.

Bill James, who is currently one of the leading sabermatricians, came up with another formula that is widely used as a tool by general managers around the league. In 1982, James argued that a hitter should be evaluated by his ability to generate runs for his team and not just the traditional production statistics. After studying large quantities of individual team's data he was able to come up with a runs created formula that had no mention of RBIs in the equation.
(HITS + WALKS) (TOTAL BASES)
RUNS = ----------------------------
AT-BATS + WALKS

One of the foundations of saber-metrics involves getting players on base, and this formula reflects that belief. Take for example 2 well-known players and teammates: David Ortiz and Manny Ramirez. Ortiz has 81 hits, 50 walks, 171 total bases, and 293 at bats, while Ramirez has 79, 61,159, and 256. And Even though Ortiz has gotten a little more attention this year and his basic productive stats are better (Ortiz has 25 homeruns, 74 RBIs and 81 hits, Ramirez has 22, 60, and 79), his overall runs created score is 5 points lower than Ramirez (70.22 to 65.30), thus making Ramirez the more valuable offensive player. Another often used method is on-base plus slugging or OPS. It is the most used because it can be quicly calculated using simple statistics. It is just the on base percentage plus the slugging percentage of a hitter, and this is another one of the formulas that builds the foundations for sabermatricians.
As with hitting, sabermatricians also came up with a better measuring tool for the ability of pitchers. For decades and even still today, the stats for measuring a quality pitcher have been wins and a low ERA. However, sabermatricians believe that this is flawed thinking. First of all, a pitcher could have a lousy ERA but still rack up in the win column because of a high scoring offense. The ERA does measure the rate of a pitcher's efficiency, but it does not tell you the actual benefit of this pitcher over an entire season. So this led to Thorn and Palmer developing the pitching runs formula that more accurately measures a pitchers' worth.
League ERA
PITCHING RUNS = Innings Pitched x ----------- - ER.
9
The factor (League ERA/9) measures every major league teams' runs allowed per inning. That value multiplied by the number of innings pitched by that pitcher- this product represents the number of runs that pitcher would allow over the season if he was average. Finally, subtract the amount of earned runs the pitched allowed for the season. According to saber metrics if this number is greater than zero, the pitcher is considered above average. For instance, take Jason Marquis of the St. Louis Cardinals and Mark Hendrickson of the Tampa Bay Devil Rays. Jason Marquis has nine wins, and for years baseball analysts say Marquis is a good pitcher because he has the ability to win, which to sabermatricians (and me) that means nothing. Meanwhile, Hendrickson has only four wins but an ERA under 4.00. According to the pitching formula layed out by Thorn and Palmer Hendrickson is far more of an effective pitcher than Marquis. If you put both of these pitchers in the exact same situations, Hendrickson would more often be much more effective than Marquis.

Baseball is a game of tradition and past-times and when saber metrics really caught on big in the late 1990s, there were plenty of critics and there still are today. They point to Paul Depodesta, a disciple of Billy Beane, who was fired last year after just two seasons as general manager of the Los Angeles Dodgers after a series of strange trades and 91 losses in his final year. Also critics look at Beane's failures in the playoffs as a result that saber metrics are a failure. Those in support of saber metrics will of course point to the successes that these so-called "stat geeks" have had. Beane and Epstein, of course, are the headliners for this mathematical movement, but this year another Beane disciple is emerging in Toronto where the Bluejays are in a serious race for the pennant for the first time in years. Now nearly every MLB team employs some sort of advanced statistician to keep up with all of these numbers and formulas. So whether traditionalists like it or not, saber metrics appears that it will be in the game of baseball to stay.

- Agonistes, D. (Oct. 2004), "A Brief History of Run Estimation: Batting Runs", http://danagonistes.blogspot.com/2004/10/brief-history-of-run-estimation.html

- Albert, J. "An Introduction to Saber metrics", http://www-math.bgsu.edu/~albert/papers/saber.html

- Thorn, J. and Palmer, P. (1993), Total Baseball, New York: Harper Collins

Published by Tom Lewis

I am a senior mathematics major at Western Kentucky University in Bowling Green, KY. I am just about to begin my student teaching semester at WKU. I have a big family all who live in the Nashville, Tennesse...  View profile

To comment, please sign in to your Yahoo! account, or sign up for a new account.