The Normality Fallacy
“Bob rarely wins a pot, so he must be a losing player.”
In English, the word “average” can refer to one of three different values: the mean, the median, or the mode. In many cases, they are nearly the same. However, in many cases they are dramatically different. In the above example, the speaker observes that Bob's median per-hand win rate is negative (“Bob rarely wins a pot”) and concludes that Bob's mean per-hand win rate is also negative (“he must be a losing player”). What if Bob tends to win large pots, when he does win? He might still be a winning player. The normality fallacy occurs by implicitly assuming that the mean, median, and mode are equivalent.
Why is this called the “normality fallacy”? Because in a normal distribution, the mean, median, and mode are all the same.
“Joe has a mean win rate of 0.02 BB/hand with a standard deviation of 1.8 BB/hand. Since 95% of the values are within two standard deviations of the mean, we can expect that 5% of Joe's hands win more than 3.62 BB0.02 BB + 1.8 BB * 2 or lose more than 3.58 BB0.02 BB - 1.8 BB * 2.”
Many statistical formulas assume a normal distribution. In this example, the speaker uses the formula “95% of values are within two standard deviations of the mean”. While this is true for normal distributions, it is not true for all distributions. The standard deviation measures the dispersal of values away from the mean, but it doesn't distinguish between many values close to the mean and a few values far from the mean. Joe could be a tight player who only gets involved in a few big pots, or he could be a loose player who gets involved in many small pots.
What if we want to apply a statistical formula but don't know if the distribution is normal? If we have enough data, the Central Limit Theorem comes to the rescue. The theorem states that data grouped into large enough bins will follow a normal distribution. For example, instead of examining Joe's individual hands, we examine groups of 100 hands. The mean is still the same: 2 BB / 100 hands = 0.02 BB/hand. For our data set, we find the standard deviation is 31 BB / 100 hands = 0.31 BB/hand. Assuming normality, this means that when Joe plays 100 hands, 5% of the time he will win more than 64 BB2 BB + 31 BB * 2 or lose more than 60 BB2 BB - 31 BB * 2. The downside of applying the Central Limit Theorem is a loss of resolution. In the above example, statistics only apply to sets of 100 hands, rather than individual hands.
How many hands is “large enough” for the Central Limit Theorem to kick in? Unfortunately, it varies depending on the player's style and the statistic of interest. In our tests, a few hundred hands is typically sufficient for win rate data to approximate a normal distribution.