Protocol Online logo
Top : Forum Archives: : Bioinformatics and Biostatistics

blast search - where is the trick? (Jul/30/2007 )

Dear friends,

I'm performing some searches about some antibiotic resistance genes on the beloved NCBI database .
Obviously, when I blast the gene sequence I find in return the complete list of all the possible plasmids and bacteria containing those genes (hundreds of references).
Is there some trick to search for gene sequences only? or to separate "big heavy" plasmids from shorter sequences, such as genes?
What's the best search strategy?

Thanks very much for your suggestions!
ILA

-ila-

hi Ila

what exactly do you need to know about your genes? you can usually sort/remove the information you need, even if you get too much. which information is of interest to you?

-aimikins-

QUOTE (aimikins @ Jul 30 2007, 04:58 PM)
hi Ila

what exactly do you need to know about your genes? you can usually sort/remove the information you need, even if you get too much. which information is of interest to you?


Hi aimee,

my goal is to browse all the genes that blast with the sequence of my gene. I'm not interested in whole sequences of contigs, plasmids, genomes etc containing that gene.
In other words, if I do a simple nucleotide-nucleotide blast I find hundreds of sequences containing my gene sequence, but I am not interested in them. I want to compare it only with other genes.
Today I played a bit with boolean operators. Something like 1000[sleng]:5000[sleng] to select only matching sequences comprised between 1000 - 5000 bp. This strategy seem to work quite fine or to be at least useful.
I bet another way could be exclude the records containing the words vector, plasmid.....NOT[plasmid] AND NOT[vector] could suite???
Am I correct? I don't know...I hope I'm not excluding too much information or making some mistake building the search strategy...

-ila-

I think you are doing it correctly; you just impose different 'filters' till you get what you want. I generally filter as little as possible and slog through them all, to keep from missing something important - but if you have a very specific purpose, it sounds like you're doing it exactly right, especially if you are looking for the entire gene. if you are looking for conserved nucleotide sequences, it seems that your size exclusion might be too stringent? another possibility is to look for conserved amino acid sequences; it depends on what you will do with the information you obtain. I think your strategy to exclude vector sequences sounds very reasonable

good luck

-aimikins-