Shannon's Entropy Formula

Reference for Shannon entropy H(X) = -Σ p(x) log₂ p(x), measuring information in bits.
Covers data compression, cryptography, and feature selection.

The Formula

H = -Σ p(x) × log₂(p(x))

Shannon's entropy measures the average amount of information (in bits) per symbol in a message. Higher entropy means more unpredictability and more bits needed to encode the data.

Variables

Symbol	Meaning
H	Entropy (measured in bits when using log base 2)
p(x)	Probability of each possible symbol or outcome
Σ	Sum over all possible symbols
log₂	Logarithm base 2

Example 1

Find the entropy of a fair coin flip

Two outcomes: Heads (p = 0.5), Tails (p = 0.5)

H = -(0.5 × log₂(0.5) + 0.5 × log₂(0.5))

H = -(0.5 × (-1) + 0.5 × (-1))

H = 1 bit (maximum entropy for two outcomes)

Example 2

A source emits A (70%), B (20%), C (10%). Find the entropy.

H = -(0.7 × log₂(0.7) + 0.2 × log₂(0.2) + 0.1 × log₂(0.1))

H = -(0.7 × (-0.515) + 0.2 × (-2.322) + 0.1 × (-3.322))

H = -(−0.360 − 0.464 − 0.332)

H ≈ 1.157 bits per symbol

When to Use It

Use Shannon's entropy when:

Measuring the information content of a data source
Designing efficient data compression algorithms
Evaluating the randomness or predictability of data
Building decision trees in machine learning (information gain)

Key Notes

Formula: H = −Σ p(x) log₂ p(x): Sum over all possible outcomes x. The log base 2 gives entropy in bits. Using natural log gives nats; log base 10 gives hartleys (dits).
Maximum entropy means maximum uncertainty: Entropy is maximized when all outcomes are equally likely (uniform distribution). A fair coin (H = 1 bit) has more entropy than a biased coin.
Zero entropy means certainty: If one outcome has probability 1 and all others 0, entropy is 0 bits — there is no uncertainty at all.
Foundation of data compression: Shannon's theorem shows no lossless compression scheme can compress a message below its entropy rate. ZIP, MP3, and JPEG all approach this theoretical limit.
Used in machine learning: Decision trees use information gain (reduction in entropy) to choose which feature to split on. Cross-entropy is the standard loss function for classification models.

Shannon's Entropy Formula

The Formula

Variables

Example 1

Example 2

When to Use It

Key Notes

Related Formulas

Logarithm Properties

Probability Formula

Embed This Calculator