Open reading frame
An open reading frame is a stretch of DNA or RNA that can be read as codons without an internal stop signal, making it a candidate protein-coding sequence.
What an open reading frame is
An open reading frame is a continuous stretch of nucleic acid sequence that can be read in codons without running into a stop codon. In practical gene finding, an ORF often means a candidate coding region that begins with a start codon and ends at a stop codon in the same reading frame.
Reading frames
Because codons are read three bases at a time, a single DNA or RNA sequence can be grouped in three possible frames on one strand. Double-stranded DNA adds the opposite strand, giving six possible reading frames to inspect. Only some frames contain long stretches that look compatible with protein coding.
Start and stop signals
A protein-coding ORF is usually bounded by a start codon and a stop codon. In DNA, ATG is the most common start codon, corresponding to AUG in mRNA. Common DNA stop codons are TAA, TAG, and TGA, corresponding to UAA, UAG, and UGA in RNA.
ORF versus gene
An ORF is not automatically a gene. It is a sequence pattern that could encode a protein, but evidence such as transcription, translation, conservation, protein domains, or experimental data may be needed to show that it really functions as a gene. This distinction matters especially in complex genomes and viruses.
How ORF finders work
ORF-finding tools scan nucleotide sequences in possible reading frames and mark stretches between start and stop codons, or sometimes between stop codons depending on the settings. These tools can quickly suggest candidate proteins, but their output depends on genetic code choice, minimum length, and start-codon rules.
Prokaryotes and eukaryotes
ORF scanning is often simpler in bacteria because protein-coding sequences are usually more compact and introns are rare. Eukaryotic gene prediction is harder because coding exons may be separated by introns, and the transcript can include untranslated regions before and after the translated sequence.
Small, overlapping, and upstream ORFs
Not all ORFs are long, isolated, or obvious. Some genomes contain overlapping ORFs, especially viral genomes. Many eukaryotic mRNAs also contain upstream open reading frames in 5' untranslated regions, and these uORFs can regulate translation of the main coding sequence.
Why it matters
Open reading frames are one of the first handles biologists use when turning raw sequence into biological meaning. They help identify candidate proteins, annotate genomes, compare organisms, interpret mutations, and design experiments that test whether a predicted sequence is truly translated.