Ad Space — Top Banner

P-Value Formula and Interpretation

Reference for calculating and interpreting p-values in hypothesis testing.
Covers null rejection, one-tailed vs two-tailed, and z-test vs t-test.

Need to calculate, not just reference? Use the interactive version. Open P-Value Calculator →

The Concept

p-value = P(observing data at least this extreme | null hypothesis is true)

The p-value is the probability of getting results at least as extreme as the observed results, assuming the null hypothesis is true.

For a Z-Test

One-tailed (right): p = P(Z > z)
One-tailed (left): p = P(Z < z)
Two-tailed: p = 2 × P(Z > |z|)

Decision Rules

P-ValueCommon InterpretationDecision (at α = 0.05)
p < 0.001Very strong evidence against H₀Reject H₀
p < 0.01Strong evidence against H₀Reject H₀
p < 0.05Moderate evidence against H₀Reject H₀
p ≥ 0.05Weak evidence against H₀Fail to reject H₀
p > 0.10Little to no evidence against H₀Fail to reject H₀

Common Misconceptions

  • A p-value is NOT the probability that the null hypothesis is true
  • A p-value is NOT the probability your result is due to chance
  • p < 0.05 does not mean the result is practically important
  • p > 0.05 does not mean there is no effect — it means you lack evidence
  • A very small p-value with a tiny effect size may not be meaningful

Example

A z-test gives z = 2.15. What is the two-tailed p-value?

P(Z > 2.15) = 0.0158 (from z-table or calculator)

Two-tailed p = 2 × 0.0158 = 0.0316

Since 0.0316 < 0.05, this result is statistically significant at the 5% level.

We reject the null hypothesis.

Key Notes

  • What p-value actually means: The p-value is the probability of obtaining a test result at least as extreme as the observed one, assuming the null hypothesis is true. It is NOT the probability that H₀ is true.
  • The 0.05 threshold is a convention: Ronald Fisher suggested 0.05 as a rough guideline, not a law of science. Some fields use 0.01 (stricter) or 0.10 (more lenient). High-energy physics requires p < 0.0000003 (5-sigma) before claiming a discovery.
  • Statistical vs practical significance: With a very large sample, even a trivially small and unimportant effect can produce p < 0.05. Always report effect size (Cohen's d, R², etc.) alongside the p-value.
  • Multiple comparisons inflate false positives: If you run 20 independent tests at p < 0.05, you expect about 1 false positive by chance. Apply Bonferroni correction (divide α by the number of tests) or use FDR methods.
  • One-tailed vs two-tailed tests: A two-tailed test checks for an effect in either direction (more common). A one-tailed test only checks one direction and has half the p-value of a two-tailed test for the same data — only use it with strong prior directional justification.

Ad Space — Bottom Banner

Embed This Calculator

Copy the code below and paste it into your website or blog.
The calculator will work directly on your page.