Search all of GenBank for a given sequence
Posted 29 May 2009 - 02:28 PM
So I have some sequence "ATTCGTAGCTGATGACGATGACATGGGATTTTGAGGGAAC" and I am curious what known sequences happen to have that very same substring somewhere in them.
This is different than a BLAST-like search where I am aligning a sequence to a known genome I provide.. I mean a more general search where I have some sequence I am trying to classify or even determine if it's related to ANYTHING.
I do realize there are alignment tools like Maq and SOAP and MUMer for searching for an alignment with a given sequence. I'm more of asking about searching "all known Genbank DNA sequences" in a database.
Is there such a tool? It would be interesting if I've found some sequence and perhaps I'm looking for what it may be related to, and boom, Genbank can say "hey, that's found here in the human genome, in 44 places, and it's in the mouse genome here, and there's a weird cancer variant of dogs that have it here.." and so on.
Or is such a tool a bizarre fantasy? Genbank has 100M sequences (100T bases!) so maybe searching like that is impractical.
But if it wasn't, would it be useful?
Posted 29 May 2009 - 07:48 PM
Posted 29 May 2009 - 10:21 PM
Don't you have to specify what you want to compare against?
Or does the online BLAST server really search ALL of Genbase (non-redundant parts at least I assume)?
Posted 30 May 2009 - 07:44 AM
Posted 30 May 2009 - 07:57 AM
Ha! That shows where I misunderstood. I really thought you had to specify the genomes!
So that's pretty impressive that it can search the whole database!
Thanks for fixing my broken understanding.