sequence retrieval for primer design - (Nov/17/2007 )
well, I'm trying to design primers for real time pcr and want them to be specific for each transcript variant of the genes I want to study. So far I do not see a problem, at least by now, but I am unsure which source I should use to get the sequences of the transcript variants.
First I only used RefSeq from NCBI, but now some doubts arise. Should I also consider ensembl and/or vega sequences!??
Anyone some suggestions?
Personally, I find Swiss-Prot more up-to-date than NCBI but Swiss-Prot only has protein sequences. Even between the different sources of the sequence, you'd be pretty unlucky if any minor differences occurred right on the position where you wanted to design your primers. You really have to use NCBI and assume it is correct, it's pretty good and it's the standard reference. You certaintly couldn't be blamed if you used their sequences and they weren't correct but 999 times or more out of 1000 they will be correct so don't worry too much.
I do not worry about correctness of sequences in RefSeq, I worry more about completeness of number of transcript variants and I was going to design specific primers for each transcript variant!
Looking for the human ATP5J will yield different numbers of transcript variants in
ensembl: 7 translated
ensembl - vega: 7
For ATP5J I haven't checked already, but for other proteins, even if the number of transcript variants is identical, the sequences are not (ALDOA). Some just show minor differences at beginning or end (e.g. lacking some a at the end), but others show differences in the middle!
So, considering that in the databases the number of transcript variants differs for some genes, genes that show same numbers of transcript variants with different sequences, might have a different number of transcript variants than listed in any of the databases. For example for ALDOA there would be 4 different transcript variants across RefSeq, ensembl and ensembl-vega (at least when ignoring the minor differences). RefSeq just lists 3.
And, if I want to show expression differences between transcript variants, I would need all of them.
Anyway, I would rather need a good argument for excluding databases.