Baseball Math: How to Read the Box Score

[E] HillarySports12 Comments

thatsovintage

I’ll admit it, I wasn’t much of a sports fan when I was growing up. I enjoyed playing games, but watching them just bored me to tears. What can I say, I was a huge nerd and thought I was too smart to be obsessed with stupid sports. Then one day, I wound up dating (and eventually marrying) a huge Yankees fan, so I started learning about baseball for the first time. And you know what? It’s full of math! Learning about all the wacky statistics that go into analyzing the game was what really made it fun for me. (Well, that and ripping on Red Sox Fans. :-P ) Before I knew it I was reading Moneyball and learning about sabermetrics and becoming a huge nerdy baseball fan. 

Before we get into any of the newfangled statistics, let’s look at the most basic numbers, the game summary box score. For the purposes of this post, I’ll be using screencaps of the ESPN.com box scores from the first game of this season between the St. Louis Cardinals and the Miami Marlins. Different sources use slight variations; for comparison’s sake, you can check out the same game’s box scores on mlb.com and CBSSports.com.

Game summary from opening night game

In the game summary, the top row is always the visiting team since they bat first, and the bottom row is the home team. The first nine columns show the number of runs scored in each inning (with more columns added as needed if there’s a tie and the game goes into extra innings). If the home team is ahead in the middle of the 9th inning, the game ends without them going back up to bat so you’ll see an X in that slot. The R column shows the number of runs scored; in this case St. Louis won the game by a score of 4 – 1. H is the number of hits each team got, and E is the number of errors committed by players on the field.

You can get more information about the game by looking at the box scores for each player. Let’s start with the batters.

box score for St. Louis Cardinals batters during game of 4/4/12

The box score lists players in the order they came up to bat, with any substitutions usually footnoted at the bottom. Since this was a National League game, the pitchers have to bat, but in this case the starting pitcher stayed in the game long enough that none of the relief pitchers came up in the rotation so it’s a bit easier to read. The first column, AB, tells you the number of times a player came up to bat, minus any times they were walked or hit a sacrifice fly ball to advance the players already on base. R and H break down which players got runs and hits, respectively. RBI is the number of runs batted in, BB is the number of walks (base on balls), and SO tells you how many times the player struck out. #P is the number of pitches that were thrown to the player; this stat doesn’t appear in all box scores and some others will include an LOB column for the number of players that were left on base as a result of the batter not causing them to score a run. With AVG we finally get into some mathematical calculations. The batting average is determined by dividing the number of hits by the number of at bats. In the box score above, Rafael Furcal had 3 hits divided by 5 at bats for an average of .600. OBP and SLG are newer statistics, so we’ll come back to them later.

Now let’s look at pitching statistics. This time we’ll look at both teams’ pitchers so you can see some of the weirder irregularities that can pop up.

box score for St. Louis Cardinals pitchers during game of 4/4/12box score for Miami Marlins pitchers during game of 4/4/12

The first player listed is the starting pitcher, with relievers and the closer listed in the order they came into the game. The first column, IP, tells you the number of innings pitched. This statistic doesn’t follow normal math rules, which makes me vaguely twitchy but does actually make more sense. In our example, the Cardinals’ Kyle Lohse pitched 7 1/3 innings, seven full innings plus one out in the eighth before being replaced by Fernando Salas. One third is mathematically .333, but in this case baseball uses 0.1 (or 0.2 if two outs are earned). IP can also be 0.0 if the pitcher doesn’t get any outs while he’s at the mound; Ryan Webb of the Marlins pitched to two batters and both got on base, so he isn’t credited with any innings pitched (and this screws up some of the later stats). H, R, BB, and SO still stand for hits, runs, walks, and strikeouts, but in this case they refer to what the other team’s batters got from the pitcher in question. ER refers to earned runs, which in our examples are the same as runs but can be different. If one pitcher loads the bases before being taken out of the game and the next batter hits a grand slam off his replacement, the replacement is only credited for one earned run; the other three go to the original pitcher since they were his fault. The pitcher also isn’t credited for runs scored as a result of fielding errors, say, an outfielder dropping an easy fly ball that should have ended an inning and instead allows a baserunner to score on that or subsequent hits. HR is the number of home runs given up by the pitcher. PC-ST isn’t used in all box scores; it gives the total pitch count thrown by each player and the number of those pitches that were in the strike zone (regardless of whether they were hits or called strikes). ERA is the earned run average, and calculates how many runs would scored off the pitcher if you extrapolated their performance over a full nine innings. To calculate it, you divide ER by IP and multiply by 9; the Marlins’ Josh Johnson earned 3 runs in 6 innings so that calculates to an ERA of 4.50. Since Ryan Webb earned a run without technically pitching any innings, his ERA can’t be calculated for the game since you can’t divide by zero (but these runs will still be included in his overall ERA as the season goes on).

Now let’s get back to the stats we skipped before and add some new ones that have just been invented in recent years, many of which aren’t reported in box scores. OBP, on base percentage, was only officially recognized by the MLB in 1984; it adds hits plus walks plus getting on base due to being hit by a pitch and then divides by the total number of times a player goes up to bat (including times when they walked/sacrificed/got hit). Since Furcal didn’t have any walks, his AVG and OBP are the same, however Lance Berkman had only one hit but two walks in three at bats for an AVG of .333 (1/3) and an OBP of .600 (3/5). SLG stands for slugging and was made popular by baseball statistician Bill James. It’s calculated by dividing the total number of bases earned at bat by the number of at-bats. In the example above, Furcal hit two singles and one double for a total of four bases; divided by five at bats that’s a slugging percentage of .800. Slugging is now considered to be a better measure of a player’s strength than just their batting average, since two players with equal numbers of hits per at bat would have the same average even if one only hit singles and the other hit a lot of doubles, triples, and home runs. OPS is another relatively new stat to evaluate batters in the long term rather than in a single game; it simply adds together the player’s on-base percentage and slugging average. For pitchers, the most important new statistic is WHIP, invented in 1979. It calculates how effective a pitcher is by adding the number of hits and walks per inning (regardless of whether any fielding errors occurred). A low WHIP is better because it means the pitcher kept batters from getting on base by any means.

Many of these new statistics are due to an emerging field of study called Sabermetrics, named after the Society for American Baseball Research (SABR). The word was coined by Bill James, who was one of the first people to start looking at alternate ways to value a player’s total contribution to a team since the more traditional statistics have some glaring defects. At first the baseball establishment mocked his efforts, but in recent years many teams have overhauled their scouting to reflect his insights. For example, RBIs used to be a key metric since a high RBI means that player helped the team score a lot of runs, and you can’t win without runs. Newer analysis downplays the RBI somewhat because it can only tell you so much about any individual player. Let’s face it, if the players who bat before you rarely get on base, it’s much more difficult to get a high RBI unless you’re hitting home runs at every at bat (and thus batting yourself in). Lead-off hitters are especially punished by this because they always have at least one at bat where it’s impossible for anyone else to be on base and in later innings they bat after the worst players. One of the coolest new stats is VORPvalue over replacement player, which purports to determine how much better or worse any given player is than a theoretical average replacement player. The calculations are weirdly complex and  take into account runs scored (or allowed in the case of pitchers), the player’s position on the team, what ballpark they play in, and a few other things. For a lot more detail on the wackier stats that have been invented in recent years, check out the links on the Sabermetrics wikipedia page. (There’s even one called NERD. It seems appropriate.)

Related
Avatar of [E] Hillary

[E] Hillary

Hillary is an avowed nerd and former Mathlete. She once read large swaths of "Why Evolution is True" and a geology book aloud to her infant daughter, in the hopes of a) instilling a love of science in her from a very young age and b) boring her to sleep. After escaping the wilds of Waco, Texas and spending the next decade in NYC, she currently lives in upstate New York, where she misses being able to get decent pizza or Chinese takeout delivered to her house.
Avatar of [E] Hillary

Latest posts by [E] Hillary (see all)

Thanks for rating this! Now tell the world how you feel via Twitter.
What feel do you feel after reading this post?
  • Inspired
  • Smart
  • Tickled
  • Hungry
  • Sad
  • Smash!
[E] HillaryBaseball Math: How to Read the Box Score

12 Comments on “Baseball Math: How to Read the Box Score”

Leave a Reply

  1. Avatar of Opifex
    Opifex

    Hooray for baseball and it’s mad statistics. The Tigers are 4-0. That is all. (But honestly I’m more interested in the Wings playoff game tonight)

    Actually, minor point of order, when speaking of baseball stats aloud, it is worth noting that .300 is said three hundred.

      1. Avatar of lostinmybox
        lostinmybox

        And you’ll get my only baseball gif! I’m an Oakland A’s fan, but I do support my friends across the Bay. Especially if Tim Lincecum is involved.

         

  2. Avatar of raine
    raine

    Hmm, I’m sorry, I could not follow this because I was blinded with rage every time I came across the Cardinals logo.

    Awww I kid! Kind of! (I’m a Royals fan, so I know the Cardinals are better than us but I hate them anyway.) I do need to learn how pitching stats work, because I haven’t been able to make that stick in my brain.

    “Baseball is like church. Many attend, but few understand.”

    1. Avatar of [E] Hillary
      [E] Hillary

      :) I was gonna use a Yankee game but they freaking lost all the games I checked.

      Part of the reason I was so excited to write this was that I couldn’t remember what happened statistically if a pitcher was charged with 0.0 innings. Now I know!

  3. Avatar of Susan
    Susan

    I used to take stats for our softball team.  Because I was the worst player on the team, so I was always on the bench.  It was…not super fun.

    1. Avatar of [E] Sally J. Freedman
      [E] Sally J. Freedman

      I totally forgot about taking stats for softball (and my brothers’ baseball teams at times)- I’ll admit, I kind of liked it. I don’t think we were this detailed though!

Leave a Reply