ORF, start codons, stop codons, genes, and protein prediction

Open reading frame

An open reading frame is a stretch of DNA or RNA that can be read as codons without an internal stop signal, making it a candidate protein-coding sequence.

Short name
Open reading frame is commonly abbreviated ORF.
Core clue
An ORF is read in one frame and lacks a stop codon within the candidate coding stretch.
Genome use
ORF scanning is a basic computational clue for finding possible protein-coding genes.
A DNA open reading frame diagram showing how candidate coding sequences are found in reading frames.Wikimedia Commons

What an open reading frame is

An open reading frame is a continuous stretch of nucleic acid sequence that can be read in codons without running into a stop codon. In practical gene finding, an ORF often means a candidate coding region that begins with a start codon and ends at a stop codon in the same reading frame.

Reading frames

Because codons are read three bases at a time, a single DNA or RNA sequence can be grouped in three possible frames on one strand. Double-stranded DNA adds the opposite strand, giving six possible reading frames to inspect. Only some frames contain long stretches that look compatible with protein coding.

Start and stop signals

A protein-coding ORF is usually bounded by a start codon and a stop codon. In DNA, ATG is the most common start codon, corresponding to AUG in mRNA. Common DNA stop codons are TAA, TAG, and TGA, corresponding to UAA, UAG, and UGA in RNA.

ORF versus gene

An ORF is not automatically a gene. It is a sequence pattern that could encode a protein, but evidence such as transcription, translation, conservation, protein domains, or experimental data may be needed to show that it really functions as a gene. This distinction matters especially in complex genomes and viruses.

How ORF finders work

ORF-finding tools scan nucleotide sequences in possible reading frames and mark stretches between start and stop codons, or sometimes between stop codons depending on the settings. These tools can quickly suggest candidate proteins, but their output depends on genetic code choice, minimum length, and start-codon rules.

Prokaryotes and eukaryotes

ORF scanning is often simpler in bacteria because protein-coding sequences are usually more compact and introns are rare. Eukaryotic gene prediction is harder because coding exons may be separated by introns, and the transcript can include untranslated regions before and after the translated sequence.

Small, overlapping, and upstream ORFs

Not all ORFs are long, isolated, or obvious. Some genomes contain overlapping ORFs, especially viral genomes. Many eukaryotic mRNAs also contain upstream open reading frames in 5' untranslated regions, and these uORFs can regulate translation of the main coding sequence.

Why it matters

Open reading frames are one of the first handles biologists use when turning raw sequence into biological meaning. They help identify candidate proteins, annotate genomes, compare organisms, interpret mutations, and design experiments that test whether a predicted sequence is truly translated.