Hamming Distance Formula
Hamming distance counts differing bit positions between two binary strings.
Used in error detection, correction codes, DNA analysis, and information theory.
The Formula
The Hamming distance counts how many bit positions differ between two strings of equal length. It measures how many single-bit errors would transform one string into the other.
Variables
| Symbol | Meaning |
|---|---|
| d(x, y) | Hamming distance between strings x and y |
| x, y | Two binary strings of equal length |
| xᵢ, yᵢ | The bit at position i in strings x and y |
Example 1
Find the Hamming distance between 1011101 and 1001001
Position 1: 1 vs 1 ✓
Position 2: 0 vs 0 ✓
Position 3: 1 vs 0 ✗
Position 4: 1 vs 1 ✓
Position 5: 1 vs 0 ✗
Position 6: 0 vs 0 ✓
Position 7: 1 vs 1 ✓
Hamming distance = 2
Example 2
Find the Hamming distance between "karolin" and "kathrin"
k-k ✓, a-a ✓, r-t ✗, o-h ✗, l-r ✗, i-i ✓, n-n ✓
Hamming distance = 3
When to Use It
Use the Hamming distance when:
- Designing error-detecting and error-correcting codes
- Measuring similarity between binary data or DNA sequences
- Implementing spell checkers or fuzzy matching
- Evaluating the reliability of communication channels
Key Notes
- Minimum Hamming distance d_min determines error capability: to detect up to t errors, need d_min ≥ t + 1; to correct t errors, need d_min ≥ 2t + 1 — this is the Hamming bound
- For binary strings, Hamming distance = number of 1-bits in the XOR of the two strings: d(x,y) = popcount(x XOR y) — this is efficiently computed in hardware in a single instruction on modern CPUs
- Hamming distance only counts substitutions — it requires strings of equal length and does not account for insertions or deletions; for variable-length strings, Levenshtein (edit) distance is used instead
- The formula extends beyond binary: comparing DNA sequences character-by-character uses Hamming distance directly — a distance of 3 means 3 nucleotide substitutions separate the two sequences