Protocol Online logo
Top : Forum Archives: : Bioinformatics and Biostatistics

Use of ORF's - (Nov/24/2005 )

Is there any use for ORF's in eukaryotic genes?
For instance, they're useful in prokaryotics as these genes tend not have introns, and so you can just look for the longest ORF for coding regions, but can you do a similar thing for eukaryotic genes, that have intons?



QUOTE (jimmy1 @ Nov 24 2005, 04:46 AM)
...they're useful in prokaryotics as these genes tend not have introns...

A bit understated perhaps...biggrin.gif

Finding eukaryotic genes from chromosomal DNA sequence data is, of course, a bit tougher than finding them in prokaryotic sources as relying on the presence of long open reading frames is insufficient. There are, however, several DNA features one can look for to indicate the presence of a gene in a eukaryotic DNA sequence:
  • transcription factor binding sites
  • polyadenylation signal
  • TATA-box
  • Kozak signal
  • non-random distribution of bases in coding sequences
  • shifts in GC content (introns tend to have higher GC content than non-coding sequences)
  • 5' splice site (e.g. AG/GUAAGU)
  • 3' splice site (e.g. (C/U)N<10(C/T)AG/G)
  • branch site (e.g. UAUAAC 20-50 bp upstream of the 3' splice site)
  • six-frame translation similarity with known proteins (e.g. BLAST)
There are likely more, but I'm a bacterial gene jockey, so I'm not as familiar with the eukaryotic stuff. The easiest and most accurate way of determining a eukaryotic gene sequence is, of course, via cDNA.


it's not enough to tell whether you obtained a complete coding sequence. i highly recommend you a relative paper that describes this topic in detail.

Interpreting cDNA sequences: some insights from studies on translation (kozak, 1996)