statistical tests, null hypotheses, p-values, alpha levels, power, and research evidence

Hypothesis testing

Hypothesis testing is a statistical workflow for comparing observed data with a stated baseline claim.

Core purpose
It asks whether observed data are sufficiently inconsistent with a specified null hypothesis.
Main output
Many tests report a test statistic, p-value, and decision about whether to reject the null.
Key caution
A test result depends on design, assumptions, sample size, and the analysis plan.
Hypothesis tests compare a test statistic with a reference distribution and a rejection rule.View image on Wikimedia Commons

What hypothesis testing is

Hypothesis testing is a structured way to compare data with a statistical claim. A researcher states a null hypothesis, chooses a test and significance level, collects or analyzes data, and decides whether the observed result is unusual enough under the null model to reject that baseline claim.

Null and alternative hypotheses

The null hypothesis is the baseline, often no difference, no association, or no effect. The alternative hypothesis describes the kind of departure the researcher is looking for. Clear hypotheses matter because the same dataset can support different tests depending on the question being asked.

Test statistic

A test statistic compresses the data into a number that measures departure from the null expectation. Examples include t statistics, z statistics, chi-square statistics, and F statistics. The test statistic is interpreted using a reference distribution that comes from the chosen model and assumptions.

P-values and alpha

A p-value describes how extreme the observed test statistic, or a more extreme one, would be if the null model were true. The alpha level is the cutoff chosen for rejecting the null. A p-value below alpha is often called statistically significant, but that label does not prove the alternative hypothesis.

Errors and power

A type I error occurs when a true null hypothesis is rejected. A type II error occurs when a real effect is missed. Statistical power is the chance that a test will detect a specified effect when that effect truly exists. Power depends on sample size, effect size, variability, design, and the chosen alpha level.

Assumptions and design

A hypothesis test is only as useful as its design and assumptions. Random sampling, random assignment, independence, measurement quality, missing data, distributional assumptions, and preregistered analysis choices all affect whether the test answers the intended question.

Misuse and alternatives

Hypothesis testing is often misused when results are reduced to a yes-or-no threshold. Better reporting includes effect sizes, confidence intervals, uncertainty, sensitivity analyses, and transparent handling of exploratory work. In some settings, estimation, prediction, Bayesian analysis, or decision analysis may answer the practical question more directly.

Why it matters

Hypothesis tests influence scientific publication, medical claims, product experiments, quality control, and policy analysis. Used carefully, they discipline how evidence is compared with a claim. Used carelessly, they can make fragile or biased results look decisive.