Goatriders of the Apocalypse

Alfonso Soriano and the Selective Sampling Issue

That's a graph of Soriano's performance in every game last season. I used wOBA, a form of linear weights, but you could have used OPS or Runs plus RBIs if you wanted, the shape of the graph isn't going to change substantially.

Nor is that sort of variance particular to Soriano - I did a graph of A-Rod from last season and you see just as many swings in performance. (I'd post it, but I don't know a good way to post multiple images here.)

Baseball players are streaky; that's a function of a lot of things, and sheer random chance is far from least among them.

Look at it this way - when you flip a coin, you know there is a 50-50 chance of heads or tails. (I rounded. If anyone wants to get pedantic and give me seven or eight significant digits on the bias of the average U.S. quarter, well, you'd be awesome. But I'm continuing anyways.) But if you flip a coin ten times, it's not going to go heads-tails-heads-tails-heads-tails-heads-tails-heads-tails. Over a small sample, you will see random fluctuations in performance around the true talent level.

Here's the kicker, folks - even a full baseball season is really a small sample size. It's certainly not as small as a month's worth of baseball, and you can certainly draw more definate conclusions about a player the more playing time he's had. But there's no magical point where anyone's knowlege about a player becomes certain. Your confidence in your measurements simply increases as time goes by.

This is why trying to assign meaning to a small number of at-bats can lead to making very wrong decisions - anything can happen in a small enough sample. Riding the hot hand, playing matchups, benching a guy on a cold streak - all of these are ways to try and read meaning into the "noise" of a player's statistics. It's called a clustering illusion, the statistical equivelent of a Rorschach test; you're seeing a picture that isn't there.

The fact that Soriano is hitting .175/.230/.298 so far on the season, or the fact that he went 0-for-4 tonight, are just Rorschach blots. You can read whatever you like into them, but it says more about your preconceptions about Soriano than it says about how Soriano will produce the rest of the season.

Chicago Tribune's Chicago's Best Blogs award