reading genomes, Sanger method, NGS, bioinformatics

DNA sequencing

DNA sequencing is the process of determining the order of nucleotide bases in DNA. It turns genetic material into readable data, making modern genomics, disease research, evolutionary studies, and metagenomics possible.

Basic idea

DNA sequencing reads the order of A, C, G, and T bases in a DNA molecule.

Classic method

Sanger sequencing uses chain-terminating nucleotides to read DNA fragments.

Modern scale

Next-generation sequencing can read millions or billions of DNA fragments in parallel.

DNA sequencing chromatogram with colored peaks representing nucleotide bases. — A DNA sequencing chromatogram visualizes base calls as colored signal peaks.View image on Wikimedia Commons

What DNA sequencing is

DNA sequencing determines the order of nucleotide bases in a DNA sample. The resulting sequence is written as a string of A, C, G, and T. That string can represent a small gene, a viral genome, a human chromosome region, or DNA fragments from many organisms in the same sample.

Why sequence matters

A DNA sequence is not just a label. It can reveal genes, mutations, ancestry, pathogens, inherited variants, and evolutionary relationships. Sequencing also lets researchers compare samples, track changes over time, and connect biological questions to precise molecular evidence.

Sanger sequencing

Sanger sequencing was a central method for decades and remains useful for targeted reads. It copies DNA in reactions that sometimes stop at specific bases, producing fragments of different lengths. By separating and reading those fragments, researchers infer the original sequence.

Next-generation sequencing

Next-generation sequencing, often shortened to NGS, reads many DNA fragments at the same time. Parallel sequencing made it practical to sequence whole genomes, transcriptomes, microbiomes, and large clinical or population studies. The tradeoff is that the data are large and require careful computation.

From sample to data

A sequencing project usually begins with a biological sample, DNA extraction, library preparation, sequencing, and computational analysis. The exact workflow depends on the question: targeted sequencing, whole-genome sequencing, metagenomics, and ancient-DNA studies all place different demands on sample handling and analysis.

Assembly and alignment

Sequencing instruments often produce short reads rather than complete chromosomes. Researchers may align reads to a reference genome or assemble overlapping reads into longer sequences. Both choices can introduce bias, especially when the sample is diverse, damaged, repetitive, or poorly represented in reference databases.

Accuracy and interpretation

Sequencing errors, contamination, uneven coverage, and analysis settings can affect conclusions. A detected variant still needs interpretation: it may be harmless, important, uncertain, or simply an artifact. Good sequencing work pairs laboratory quality control with transparent bioinformatics.

Why it matters

DNA sequencing changed biology from a field that could infer genetic information indirectly into one that can read it directly. It supports medical genetics, outbreak tracing, conservation, agriculture, forensic science, biotechnology, and the study of microbial communities.

Key concepts

A sequence is an ordered list of DNA bases, usually written with A, C, G, and T.
Coverage describes how many times a base or region is read.
Bioinformatics turns raw instrument output into interpretable biological data.

Common workflows

Targeted sequencing focuses on one gene, variant, or region.
Whole-genome sequencing attempts to read the full genome of an organism.
Metagenomic sequencing reads DNA from mixed communities.

Common misconceptions

Sequencing a genome does not automatically explain every trait or disease risk.
A longer dataset is not always a better answer if sampling and analysis are weak.
A DNA sequence is evidence, not a complete biological interpretation by itself.

Open questions

How can clinical sequencing make uncertain variants easier to interpret?
Which sequencing methods best capture repetitive or structurally complex genome regions?
How should privacy be protected as more human genomes are sequenced?