Primer design for environmental samples - Align sequences first? (Jan/17/2006 )
Ok, this is probably very basic but I'm new to primer design. I want to design primers to amplify a short (50-150 bp) section of the ammonia monooxygenase (amoA) gene from an environmental sample. Since many different organisms can have the this gene, I get over 2000 sequences in a Genbank search. I want the primers to amplify as many of these as possible. My question is, do I downliad the sequences, then align them (in a different program like MEGA), then use Primer Express on the consensus sequences? Or do I somehow load them all into Primer Express?
So confused... Any help appreciated!!
I would compare a few of the genes from distantly related organisms (for variety's sake), look at the conserved sequences in the genes and use that. Maybe design degenerate primers if the sequences are close but not 100% the same.
I've done what Captain_DNA suggests -- download the amino acid sequences and align them. Look for well-conserved stretches of 6 to 8 amino acids (thus 18 to 24 bases) that do not have a whole lot of six-fold degeneracy (like leucine or arginine).
Then you can use a program like backtranseq from EMBOSS to reverse translate the amino acid sequence (see http://emboss.sourceforge.net/apps/backtranseq.html) or, perhaps more useful to you since you're unlikely to have a codon usage table for an unknown environmental sample, a program like rtranslate (see http://arbl.cvmbs.colostate.edu/molkit/rtranslate/).
Remaining degeneracy can be reduced by use of so-called "universal bases", like deoxyinosine or deoxynebularine (which pair with anything); there are also degenerate bases (called dP and dK) that form base pairs with purines (A or G) and pyrimidines (T or C), respectively, which your oligo synthesis company should be able to incorporate into your primer where you specify.
Another way to reduce degeneracy is to just specify "N" in your primer at a particular spot where degeneracy is high -- your primer synthesis company should interpret this to mean "send me a population of primers, some with A's, some with T's, some with C's and some with G's in this position". The approach here is that one of the primers in the set will be correct.
Oh, I'd never thought at looking at the translated protein sequence (like I said I'm new to this) - interesting! I'll try that.
Thanks a mil for all your help!!