homolog database - Want to convert mouse to rat (Feb/09/2009 )
Wondering if anyone knew of a database that allows input a list of mouse gene/protein accession numbers and returns the homologous gene/protein in another species, in my case, rat or human
The NCBI Blast site can do that (see here). You could either use the BLAST Assembled Genomes and BLAST your mouse sequences against the Rat genome, or use the Basic BLAST section, and, in the Choose Search Set section, specify what you want to include in the Organism input box.
I know this works with a file of fasta sequences; I've never tried entering a list of accession numbers, though I'd be surprised if it didn't work with multiple accession numbers, just like it works with a single file containing multiple fasta sequences.
I looked into that, but I'm not too interested in sequence homology, rather I want know what the rat homolog of a given mouse gene is. Waiting for the blast to run takes longer (per gene) to churn out than what I've been doing. Basically, I've been plugging the trancript ID (NM_/XM_ accession numbers) into iHOP, looking up the rat homolog, and then searching my cDNA array dataset. Needless to say, such an iterative process is time consuming and mind numbing, which leads me to think that someone smarter than me has encountered this dilemma, and formulated an approach with much greater throughput.
Perhaps I should describe what I'm trying to do, which might give an idea. I did a ChIP-chip experiment with primary mouse cells, and I want to compare that data set with a expression microarray experiment our lab did a while back with rat tissue. The samples are from the same tissues, just in different species.
I'm not familiar with the form your data is in, but it sounds like your process could be automated by a Perl script. The only fly in the ointment is that the iHOP data does not seem to be available except through the web interface. It'd be better if it was available in a downloadable form.
The web is not a problem per se; there is a way to send specific querys to the data (see here), and Perl can handle the web scrapes using a module like WWW::Mechanize, but the iHOP people say:
Please note that iHOP is a freely available tool from the academic domain. Not a company! Thus, it is necessary to limit server load and to give preference to individual users.
Bulk downloads may lead to the banning of IP addresses for specific servers or institutions!
Please contact directly with Robert Hoffmann if there should be a scientific reason for bulk downloads. Thank you for your cooperation!
I don't know whether they would consider several dozen requests a minute originating from the same IP address to be "bulk downloading"....
jah on Feb 9 2009, 02:46 PM said:
Of course, the rat homolog of a given mouse gene (otherwise known as an ortholog) is defined by its sequence homology to the mouse gene, no?
jah on Feb 9 2009, 02:46 PM said:
This may be true if you use the web interface, but if you create a local BLASTable database of the rat genes (using the NCBI program makeblastdb.exe or the older formatdb.exe) and a local copy of the BLAST program (either blastn.exe, blastp.exe, or the older blastall.exe), you can speed things up considerably. Using a Perl script, I can take ~5,000 genes from one organism and find the closest match to each of them in another organism in minutes...
Or learn MySQL and use the ensembl database with your sequence identifiers.
You could do this in pure SQL or dollop some #!/usr/bin/perl on top and use DBI;
oh and you could do rat and human at the same time.
mysql -u anonymous -h ensembldb.ensembl.org