The arithmetic mean is expected value in disguise
I realized the arithmetic mean — the one you learn in grade school — is a special case of expected value, restricted to a universe where every outcome is equally likely.
In the uniform case, if you have values , and each one occurs with probability :
Which is exactly the standard mean — sum divided by count.
When outcomes have different probabilities, the constant breaks. Each value gets its own weight , and the formula becomes a true weighted sum:
The center-of-mass analogy makes this concrete. Place different weights at different points on a see-saw. The balance point (the expected value) shifts toward the heavier weights. With uniform weights, it sits at the geometric center (the arithmetic mean). With non-uniform weights, it moves.
Tying it back to entropy
Now apply this directly to the "surprise = information" idea. The surprise of a single event is . The probability of that event is . The expected value of the surprise across the whole system is the weighted sum:
That's the definition of Shannon entropy. Entropy is literally the expected value (weighted average) of surprise across all possible outcomes.
The thing I didn't appreciate before: entropy isn't a new concept invented for information theory. It's just expected value applied to the surprise function. Same machinery, different operand.
This also explains the Akinator design choice. When I built the solver with uniform priors, the "best question" was the one that split candidates evenly — because under uniform probability, splitting candidates and splitting probability mass are the same thing. When I added non-uniform priors, that equivalence broke, and the solver had to start tracking probability mass directly instead of just counting. Same algorithm, but the math now actually uses the weighted-sum form because the constant shortcut no longer applies.