P-Value Formula and Interpretation

Reference for calculating and interpreting p-values in hypothesis testing.
Covers null rejection, one-tailed vs two-tailed, and z-test vs t-test.

Need to calculate, not just reference? Use the interactive version. Open P-Value Calculator →

The Concept

p-value = P(observing data at least this extreme | null hypothesis is true)

The p-value is the probability of getting results at least as extreme as the observed results, assuming the null hypothesis is true.

One-tailed (right): p = P(Z > z)
One-tailed (left): p = P(Z < z)
Two-tailed: p = 2 × P(Z > |z|)

P-Value	Common Interpretation	Decision (at α = 0.05)
p < 0.001	Very strong evidence against H₀	Reject H₀
p < 0.01	Strong evidence against H₀	Reject H₀
p < 0.05	Moderate evidence against H₀	Reject H₀
p ≥ 0.05	Weak evidence against H₀	Fail to reject H₀
p > 0.10	Little to no evidence against H₀	Fail to reject H₀

A z-test gives z = 2.15. What is the two-tailed p-value?

P(Z > 2.15) = 0.0158 (from z-table or calculator)

Two-tailed p = 2 × 0.0158 = 0.0316

Since 0.0316 < 0.05, this result is statistically significant at the 5% level.

We reject the null hypothesis.

What p-value actually means: The p-value is the probability of obtaining a test result at least as extreme as the observed one, assuming the null hypothesis is true. It is NOT the probability that H₀ is true.
The 0.05 threshold is a convention: Ronald Fisher suggested 0.05 as a rough guideline, not a law of science. Some fields use 0.01 (stricter) or 0.10 (more lenient). High-energy physics requires p < 0.0000003 (5-sigma) before claiming a discovery.
Statistical vs practical significance: With a very large sample, even a trivially small and unimportant effect can produce p < 0.05. Always report effect size (Cohen's d, R², etc.) alongside the p-value.
Multiple comparisons inflate false positives: If you run 20 independent tests at p < 0.05, you expect about 1 false positive by chance. Apply Bonferroni correction (divide α by the number of tests) or use FDR methods.
One-tailed vs two-tailed tests: A two-tailed test checks for an effect in either direction (more common). A one-tailed test only checks one direction and has half the p-value of a two-tailed test for the same data — only use it with strong prior directional justification.