I’ll admit it, I wasn’t much of a sports fan when I was growing up. I enjoyed playing games, but watching them just bored me to tears. What can I say, I was a huge nerd and thought I was too smart to be obsessed with stupid sports. Then one day, I wound up dating (and eventually marrying) a huge Yankees fan, so I started learning about baseball for the first time. And you know what? It’s full of math! Learning about all the wacky statistics that go into analyzing the game was what really made it fun for me. (Well, that and ripping on Red Sox Fans. :-P ) Before I knew it I was reading Moneyball and learning about sabermetrics and becoming a huge nerdy baseball fan.
Before we get into any of the newfangled statistics, let’s look at the most basic numbers, the game summary box score. For the purposes of this post, I’ll be using screencaps of the ESPN.com box scores from the first game of this season between the St. Louis Cardinals and the Miami Marlins. Different sources use slight variations; for comparison’s sake, you can check out the same game’s box scores on mlb.com and CBSSports.com.
In the game summary, the top row is always the visiting team since they bat first, and the bottom row is the home team. The first nine columns show the number of runs scored in each inning (with more columns added as needed if there’s a tie and the game goes into extra innings). If the home team is ahead in the middle of the 9th inning, the game ends without them going back up to bat so you’ll see an X in that slot. The R column shows the number of runs scored; in this case St. Louis won the game by a score of 4 – 1. H is the number of hits each team got, and E is the number of errors committed by players on the field.
You can get more information about the game by looking at the box scores for each player. Let’s start with the batters.
The box score lists players in the order they came up to bat, with any substitutions usually footnoted at the bottom. Since this was a National League game, the pitchers have to bat, but in this case the starting pitcher stayed in the game long enough that none of the relief pitchers came up in the rotation so it’s a bit easier to read. The first column, AB, tells you the number of times a player came up to bat, minus any times they were walked or hit a sacrifice fly ball to advance the players already on base. R and H break down which players got runs and hits, respectively. RBI is the number of runs batted in, BB is the number of walks (base on balls), and SO tells you how many times the player struck out. #P is the number of pitches that were thrown to the player; this stat doesn’t appear in all box scores and some others will include an LOB column for the number of players that were left on base as a result of the batter not causing them to score a run. With AVG we finally get into some mathematical calculations. The batting average is determined by dividing the number of hits by the number of at bats. In the box score above, Rafael Furcal had 3 hits divided by 5 at bats for an average of .600. OBP and SLG are newer statistics, so we’ll come back to them later.
Now let’s look at pitching statistics. This time we’ll look at both teams’ pitchers so you can see some of the weirder irregularities that can pop up.
The first player listed is the starting pitcher, with relievers and the closer listed in the order they came into the game. The first column, IP, tells you the number of innings pitched. This statistic doesn’t follow normal math rules, which makes me vaguely twitchy but does actually make more sense. In our example, the Cardinals’ Kyle Lohse pitched 7 1/3 innings, seven full innings plus one out in the eighth before being replaced by Fernando Salas. One third is mathematically .333, but in this case baseball uses 0.1 (or 0.2 if two outs are earned). IP can also be 0.0 if the pitcher doesn’t get any outs while he’s at the mound; Ryan Webb of the Marlins pitched to two batters and both got on base, so he isn’t credited with any innings pitched (and this screws up some of the later stats). H, R, BB, and SO still stand for hits, runs, walks, and strikeouts, but in this case they refer to what the other team’s batters got from the pitcher in question. ER refers to earned runs, which in our examples are the same as runs but can be different. If one pitcher loads the bases before being taken out of the game and the next batter hits a grand slam off his replacement, the replacement is only credited for one earned run; the other three go to the original pitcher since they were his fault. The pitcher also isn’t credited for runs scored as a result of fielding errors, say, an outfielder dropping an easy fly ball that should have ended an inning and instead allows a baserunner to score on that or subsequent hits. HR is the number of home runs given up by the pitcher. PC-ST isn’t used in all box scores; it gives the total pitch count thrown by each player and the number of those pitches that were in the strike zone (regardless of whether they were hits or called strikes). ERA is the earned run average, and calculates how many runs would scored off the pitcher if you extrapolated their performance over a full nine innings. To calculate it, you divide ER by IP and multiply by 9; the Marlins’ Josh Johnson earned 3 runs in 6 innings so that calculates to an ERA of 4.50. Since Ryan Webb earned a run without technically pitching any innings, his ERA can’t be calculated for the game since you can’t divide by zero (but these runs will still be included in his overall ERA as the season goes on).
Now let’s get back to the stats we skipped before and add some new ones that have just been invented in recent years, many of which aren’t reported in box scores. OBP, on base percentage, was only officially recognized by the MLB in 1984; it adds hits plus walks plus getting on base due to being hit by a pitch and then divides by the total number of times a player goes up to bat (including times when they walked/sacrificed/got hit). Since Furcal didn’t have any walks, his AVG and OBP are the same, however Lance Berkman had only one hit but two walks in three at bats for an AVG of .333 (1/3) and an OBP of .600 (3/5). SLG stands for slugging and was made popular by baseball statistician Bill James. It’s calculated by dividing the total number of bases earned at bat by the number of at-bats. In the example above, Furcal hit two singles and one double for a total of four bases; divided by five at bats that’s a slugging percentage of .800. Slugging is now considered to be a better measure of a player’s strength than just their batting average, since two players with equal numbers of hits per at bat would have the same average even if one only hit singles and the other hit a lot of doubles, triples, and home runs. OPS is another relatively new stat to evaluate batters in the long term rather than in a single game; it simply adds together the player’s on-base percentage and slugging average. For pitchers, the most important new statistic is WHIP, invented in 1979. It calculates how effective a pitcher is by adding the number of hits and walks per inning (regardless of whether any fielding errors occurred). A low WHIP is better because it means the pitcher kept batters from getting on base by any means.
Many of these new statistics are due to an emerging field of study called Sabermetrics, named after the Society for American Baseball Research (SABR). The word was coined by Bill James, who was one of the first people to start looking at alternate ways to value a player’s total contribution to a team since the more traditional statistics have some glaring defects. At first the baseball establishment mocked his efforts, but in recent years many teams have overhauled their scouting to reflect his insights. For example, RBIs used to be a key metric since a high RBI means that player helped the team score a lot of runs, and you can’t win without runs. Newer analysis downplays the RBI somewhat because it can only tell you so much about any individual player. Let’s face it, if the players who bat before you rarely get on base, it’s much more difficult to get a high RBI unless you’re hitting home runs at every at bat (and thus batting yourself in). Lead-off hitters are especially punished by this because they always have at least one at bat where it’s impossible for anyone else to be on base and in later innings they bat after the worst players. One of the coolest new stats is VORP, value over replacement player, which purports to determine how much better or worse any given player is than a theoretical average replacement player. The calculations are weirdly complex and take into account runs scored (or allowed in the case of pitchers), the player’s position on the team, what ballpark they play in, and a few other things. For a lot more detail on the wackier stats that have been invented in recent years, check out the links on the Sabermetrics wikipedia page. (There’s even one called NERD. It seems appropriate.)Related