The identification of novel members of gene families by PCR using degenerate primers has been considered more of an art than a science, so much so that the methods books I've come across have been too timid to discuss the considerations that go into the design of this experiment, much less give a protocol for its execution. At the risk of leading my readers on wild goose chases, I'm committing my methods to paper. The following is based on my reading of the recent literature (e.g. Buck and Axel, Cell 65: 175-187, 1991; Riddle et al., Cell 75: 1401-1416, 1993; Krauss et al., Cell 75:1431-1444, 1993), discussions with several other successful practitioners of the art, and my own experience isolating vertebrate homologs of the C. elegans egl-10 gene.
This is the most important factor in the success of the experiment, and deserves careful deliberation. I suggest diagramming out an alignment of the existing members of your gene family, highlighting conserved residues, and labeling each important position in the alignment with the number of codons that encode the amino acid(s) at that position. An example based on the original members of the egl-10 gene family is included below, and cited in the following discussion.
In the early days of degenerate PCR some novel genes were successfully amplified using primer pools that were over 1000-fold degenerate. However, primers of such high degeneracy appear not to have been generally successful (with some exceptions, e.g. Giovane et al., Genes Dev. 8: 1502-1513, 1994), and most recent successes have come using primer pools of 100-fold degeneracy or less. Five methods for reducing the degeneracy of the primers are discussed below:
1) judicious selection of the primer sites
The positioning of the primers is a compromise between placing them at the codons for the most conserved amino acids, and placing them at the codons for less conserved amino acids whose degeneracy may be lower. Consider the case of the 3' primers ("3T and 3A") in the example shown below for the egl-10 gene family. At first it might seem more sensible to place these primers 3 codons to the right, where there is a block of 5 out of 6 absolutely conserved amino acids: SY(P/Q)RFL. Unfortunately, this block of amino acids is encoded by 5184 different DNA sequences. The actual primers used were placed 3 codons to the left. At this position only 3 out of 6 amino acids are absolutely conserved: M(E/K)(K/N)(D/N)SY. However, this block of amino acids is encoded by only 768 different DNA sequences.
2) the use of inosine as a "neutral" base
Inosine is a purine (which occurs naturally in tRNAs) that can form base pairs with cytidine, thymidine, and adenosine (although the inosine:adenosine pairing presumably doesn't fit quite correctly in double stranded DNA, so there may be an energetic penalty to pay when the helix bulges out at this purine:purine pairing). Recently, most people have been using inosine in their primers at positions where any of the four bases might be required. Each use of inosine thus reduces the degeneracy of the primer pool 4-fold. However, you risk the occurrence of I:G mismatches, and therefore must assume that exact base pairing at other positions in the primer will overcome such a problem. Most oligo synthesis facilities will make inosine-containing oligos, no problem. I had excellent luck with inosine-containing primers with the egl-10 gene family, except in the case of the primer "5out", a 20mer containing 5 inosines, (including 2 near the 3' end of the primer) which failed to amplify products even from a cloned egl-10 cDNA. So, perhaps 5 out of 20 inosines is too many.
Using inosine in the primers requires that the DNA polymerase used in the PCR reaction be capable of synthesizing DNA over an inosine-containing template. Taq polymerase is capable of doing this, but some others (e.g. Vent) appear not to be able to.
3) using multiple separate oligo pools at a single position
In an effort to use primer pools with the lowest possible degeneracy, it is sometimes useful to synthesize primers over a particular stretch of codons as two or more separate pools, each of which will have lower degeneracy than you would get by synthesizing a single pool including all of the same codons. The pools are then used separately to carry out PCR reactions. For example, the primer pools "3T" and "3A" in the egl-10 example below are identical, except at their serine codons. Sadly, serine is encoded by 6 different codons, TC(A/G/C/T) and AG(T/C). Synthesizing a single pool covering all these possibilities might require a high degeneracy and would necessarily include some non-serine codons. By splitting into two pools (one nondegenerate containing an inosine, the other 2-fold degenerate) I was able to keep the degeneracy low, and avoid all non-serine codons. Another example is shown in the case of primers "5inE" and "5inR", which again are identical except at one codon.
4) including partial codons at the ends of the primers
The various codons encoding an amino acid or a set of similar amino acids are often identical at their first (and maybe second) positions, but different at their third position. You can take advantage of this by synthesizing only the first or first and second positions of the 3' most codon covered by your primer pools, thus giving you one or two extra positions of exact match base pairs without adding any degeneracy. In the egl-10 example, the primer pools "3T" and "3A" cover a stretch in which the last codon must encode proline or glutamine. The codons for these two amino acids all start with C, but their last two positions are degenerate. Therefore, only the nondegenerate C was included in the primer pools.
5) use of codon bias
Some organisms have strong biases for using particular codons to encode certain amino acids. In theory you could reduce the degeneracy of a primer pool by only including these most common codons, and taking the risk that the gene(s) you are looking for will follow the organism's general codon bias enough to allow such primer pools to work. I haven't heard of anyone actually using this codon bias strategy in a successful degenerate PCR experiment, but you might try it if you're desperate.
Other considerations in primer design
1) primer length
In the example below the short stretches of sequence similarity among the egl-10 family members forced me to use primers only 19-21 bases long. These are shorter than the primers I have heard of people using in other successful experiments. For example, Linda Buck's primers were 31-33mers.
2) 3' end
People I talked to emphasized the special importance of having an exact match between the primer and template near the 3' end of the primer, although I'm not aware of specific data supporting this idea. For egl-10 I tried to avoid having any inosines near the 3' ends of the primers (except for primer "5out", which in fact failed to give any products), and also anchored the primers when possible with a nondegenerate codon at their 3' ends, so that 100% of the primers in the pool would be able to pair perfectly with the correct template over these last few bases.
3) nested primers
If the sequence similarity in your gene family permits, it is a good idea to make nested sets of PCR primers. That way one round of PCR can be performed using the outside primers, and individual products (or the whole mix) can then be reamplified using the inside primers. Products amplified through both rounds are more likely to be the desired new gene family members, and less likely to be spurious products from sequences that happen to contain a couple of primer annealing sites by chance.
Determining optimal reaction conditions
A number of parameters can be varied to optimize reaction conditions for degenerate PCR. These include: primer concentration, magnesium concentration, template concentration, number of cycles of amplification, and the temperatures and times of each step in the amplification cycle. If each of these parameters is to be independently varied, the number of possibilities quickly reaches mind boggling proportions. My philosophy has been to fix almost all these parameters at the standard levels that have been successful for other people, and to vary only the one parameter that I think is the most crucial: the temperature of the annealing step during amplification.
My standard PCR reactions are as follows:
1.5 µl template DNA (2-300 ng)
5 µl 5 µl 10X PCR buffer (10X buffer=100 mM Tris pH 8.3, 500 mM KCl, 15 mM MgCl2, 0.01% gelatin)
8 µl dNTP mix (1.25 mM each dNTP)
0.2 µl "ampliTaq" polymerase (5 U/µl)
25 µl dH20
5 µl each primer pool at 20 µM each
total volume 50 µl
In practice, the reactions are set up by placing the primers and template into a 0.5 ml tube, then adding two drops of mineral oil from a blue tip, and adding on top of the oil 38.5 µl of a premix containing all the other components. In this way, it is easy to set up many different primer/template combinations at once. The tubes are then briefly spun in a microfuge to combine the two aqueous phases, and the tubes are immediately placed in the PCR block preheated to 95· for a "hot start".
My amplification program:
95· X 3 min. (hot start)
??· X 1 min. (this annealing temperature is varied to optimize the amplification)
72· X 2 min.
94· X 45 sec.
40 cycles of the above 3 steps
72· X 5 min.
hold at 4·
This takes ~4.5 hours to run on an MJ Research machine.
To test the primers and optimize the conditions, I do a series of amplification runs starting with an annealing temperature of 25·, and increasing in 5· increments until amplification fails to occur. Typically for each primer pair being tested, at each temperature, I run 3 reactions containing different templates: 1) a positive control containing 2 ng of a cloned member of the gene family of interest as template. 2) a negative control containing no template (this is very important-- you don't want to get fooled by contaminants). 3) an experimental reaction containing a complex template such as genomic DNA or total cDNA. For total C. elegans genomic DNA I've been using 300 ng as a template. Using rat brain cDNA as a template I amplified off of only 2 ng. However, this was only because I didn't have very much cDNA. If possible, it would be better to use ~200 ng of cDNA as a template, as Linda Buck did to amplify the odorant receptors.
Choice of template
Genomic DNA has the advantage that all members of your gene family are present in equimolar amounts, and genomic DNA is probably readily available. The obvious disadvantage is that introns may disrupt the primer sites, or may cause the amplification product to be so long that it is not amplified efficiently.
cDNA templates, though harder to obtain, overcome this problem. A big advantage of cDNA is that the desired amplification products should be of a known size, and you can therefore easily pick them out from among spurious products of other sizes. Remember that the "correct" sized band amplified off a cDNA template may be a complex mixture of products from many gene family members, so you may have to analyze many clones generated from such a band to assess its complexity. Linda Buck used random primed cDNA for her template, presumably to avoid biasing the cDNA towards the 3' ends of transcripts. In my case, I knew that the region I was amplifying should be at the extreme 3' end of the coding sequence, so I used oligo-dT primed cDNA.
For the lazy and rich, Clontech sells oligo-dT primed cDNA prepared from various tissues of many species to use as templates for PCR.
Analysis of PCR products
After amplification run 20 µl of each reaction out on an agarose gel. I use 2% 3:1 Nusieve:SeaKem LE agarose (you can buy this premixed from FMC) in 1X TAE. This gel is not very low melting, and thus isn't very suitable for cloning directly from the gel, but it gives very nice resolution. I use the 123 bp ladder from Gibco as a size standard. Obviously you expect to get products off the positive control, and not to get them off the negative control. Using the complex template, you will probably get a smear at the lower annealing temperatures, which will resolve into a small number of bands as the annealing temperature rises. I pick an annealing temperature that gives a modest number of bands, and then clone all these bands and sequence them.
If no products are evident in the experimental samples, a good trick to try is to use 2 µl of the apparently failed reaction as a template, and reamplify under the same conditions. This often gives visible products.
If you want to clone products that are only barely visible, you can get more of them by just reamplifying the original reaction as described above. Another way to amplify individual products separately is to cut the bands out of the gel that was used to analyze the original reactions (actually I take a bore out of the gel with a Pasteur pipette), melt the DNA containing agarose, and use 2 µl of it as a template to reamplify under the same conditions.
The above described method for reamplifying specific bands can (and should) be used to test amplified products to see if they are single primer artefacts. Use 2 µl of an agarose gel bore to set up each of three PCR reactions, containing either individual primer or both together. Obviously, you're only interested in products that require both primers in order to be amplified.
I clone PCR products by running the PCR reaction out on a low melt agarose gel (2% Nusieve agarose in 1X TAE). I cut the desired band out, melt it at 70·, mix well by pipetting up/down, and use 5 µl of the melted agarose directly in a ligation reaction with a dT tailed vector.
This vector DNA is prepared as follows: cut 1 µg bluescript SK with EcoRV in a 20 µl reaction. Add 20 µl 1X PCR buffer, and 2 µl 2 mM dTTP. Add 0.5 µl "ampliTaq" polymerase (2.45 U), and incubate at ~72· for 20 min. Run the DNA out on a 0.8% Seaplaque agarose gel in 1X TAE, cut out the band, melt at 70·, mix well, and use 5 µl in a ligation reaction. It turns out that only about 50% of the colonies obtained after transformation of this type of reaction may have inserts; the rest are vector reclosures. However, if blue/white selection is used, virtually all the white colonies have inserts.