hypothesis testing, sample size, effect size, false negatives, power analysis, and research design

Statistical power

Statistical power is the probability that a study will detect a real effect when that effect truly exists.

Core meaning

Power is the chance of rejecting the null hypothesis when a real effect is present.

Common notation

Power is often written as 1 - beta, where beta is the probability of a false negative.

Design use

Power analysis helps estimate sample size before a study begins.

A statistical power curve showing how power changes with an input parameter. — Statistical power connects study design choices to the chance of detecting a real effect.View image on Wikimedia Commons

What statistical power is

Statistical power is the probability that a test will detect an effect if the effect truly exists. In hypothesis testing, this usually means rejecting the null hypothesis when the alternative hypothesis is true. A study with low power can miss real effects, producing a false negative.

False negatives

Power is closely tied to type II error. A type II error happens when a study fails to detect a real effect. If a study has 80 percent power for a specified effect under a specified design, it has a 20 percent chance of missing that effect in repeated use under those assumptions.

What affects power

Power depends on sample size, effect size, measurement noise, study design, statistical test, significance threshold, and the pattern of missing data. Larger samples, stronger effects, lower variability, and better measurement usually increase power. Stricter significance thresholds can reduce power unless the study is enlarged.

Power analysis

A power analysis is a planning calculation. Researchers specify the design, expected effect size, significance level, desired power, and statistical test, then estimate the sample size needed. If the sample size is fixed, power analysis can instead estimate the smallest effect the study is likely to detect.

Effect size matters

Power is not a general property of a study by itself; it is power to detect a particular effect under particular assumptions. A study may have high power to detect a large effect and low power to detect a small but meaningful effect. Choosing a realistic and meaningful effect size is one of the hardest parts of planning.

Underpowered research

Underpowered studies can produce inconclusive results and unstable estimates. They may also contribute to publication bias when only the lucky positive findings are published. Low power does not mean a study is useless, but it does mean the study may not answer the question it was intended to answer.

Limits and tradeoffs

More power is not always simply better. Very large studies can detect effects that are statistically significant but too small to matter practically. Ethical, financial, logistical, and privacy limits also constrain sample size. Good design balances power with relevance, feasibility, and responsible use of participants or resources.

Why it matters

Statistical power matters because research decisions often depend on whether a study was capable of finding the effect it looked for. Without enough power, a null result may mean either no meaningful effect exists or the study was too weak to see it.

Key concepts

Type II errorfailing to detect a real effect.
Effect sizethe size of the difference, association, or change the study is trying to detect.
Alpha levelthe threshold for type I error, often linked to the cutoff for statistical significance.

Power inputs

Sample size, allocation ratio, expected variability, and measurement precision.
The statistical test, significance threshold, and planned analysis model.
The smallest effect size that would be scientifically, clinically, or practically meaningful.

Common misconceptions

Power is not the probability that a reported positive result is true.
A nonsignificant result from a low-power study is not strong evidence of no effect.
A standard target such as 80 percent power is a convention, not a guarantee that a study is well designed.

Open questions

How should researchers choose meaningful effect sizes when prior evidence is weak?
When should precision estimates or decision analysis replace simple power targets?
How can journals and funders discourage underpowered studies without blocking exploratory work?