Data, variation, sampling, inference, regression, uncertainty, statistical models, and evidence

Statistics

Statistics turns data into evidence by describing variation, estimating unknown quantities, testing claims, and measuring uncertainty.

Main purpose
Statistics helps describe data, estimate unknowns, compare groups, and judge how much uncertainty remains.
Key ingredient
Sampling matters because data usually represents only part of a larger population or process.
Where it appears
Statistics supports science, medicine, public policy, business, sports, economics, quality control, and machine learning.
Regression is one statistical tool for describing relationships and uncertainty in observed data.View image on Wikimedia Commons

What statistics studies

Statistics is the discipline of learning from data. It asks how data was collected, what patterns appear, how much variation is present, and what conclusions are justified. Unlike raw arithmetic, statistics keeps uncertainty visible, because data is often incomplete, noisy, biased, or drawn from a changing world.

Describing data

Descriptive statistics summarize what has been observed. Measures such as mean, median, range, variance, and standard deviation describe center and spread. Charts such as histograms, scatter plots, and box plots reveal shape, clusters, outliers, and relationships that a single number can hide.

Samples and populations

A population is the larger group or process of interest, while a sample is the data actually observed. A good sample is chosen so that it can support conclusions about the population. Poor sampling can make a precise calculation misleading, because the numbers may reflect selection bias more than the underlying reality.

Inference

Statistical inference uses sample data to estimate unknown quantities or evaluate claims. Confidence intervals express a range of plausible values under a model, while hypothesis tests ask whether observed data is surprising under a stated assumption. These tools require careful interpretation, especially when many comparisons are being made.

Models and assumptions

Statistical models simplify reality so data can be analyzed. A model might assume independence, a particular distribution, a linear relationship, or similar variability across groups. These assumptions are not just technical details; they shape what the results mean and whether the analysis is trustworthy.

Regression and relationships

Regression studies how one variable changes with another while accounting for variation. Linear regression fits a straight-line relationship, but regression can also handle curves, categories, counts, and many predictors. A fitted relationship can be useful for prediction, but it does not automatically prove cause and effect.

Statistics and probability

Probability and statistics work in opposite but connected directions. Probability starts with a model and asks what data might look like. Statistics starts with data and asks what model or explanation is plausible. Modern analysis often combines both, especially in Bayesian methods, simulations, and machine learning.

Why it matters

Statistics matters because data does not speak for itself. The same dataset can support strong evidence, weak evidence, or a misleading story depending on how it was collected and analyzed. Statistical thinking helps people judge claims, measure risk, design better studies, and make decisions under uncertainty.