Jump to content

  • Log in with Facebook Log in with Twitter Log in with Windows Live Log In with Google      Sign In   
  • Create Account

Submit your paper to J Biol Methods today!
Photo
- - - - -

homolog database


  • Please log in to reply
5 replies to this topic

#1 jah

jah

    Enthusiast

  • Active Members
  • PipPipPipPipPip
  • 44 posts
0
Neutral

Posted 09 February 2009 - 07:25 AM

Wondering if anyone knew of a database that allows input a list of mouse gene/protein accession numbers and returns the homologous gene/protein in another species, in my case, rat or human

#2 HomeBrew

HomeBrew

    Veteran

  • Global Moderators
  • PipPipPipPipPipPipPipPipPipPip
  • 930 posts
16
Good

Posted 09 February 2009 - 09:41 AM

The NCBI Blast site can do that (see here). You could either use the BLAST Assembled Genomes and BLAST your mouse sequences against the Rat genome, or use the Basic BLAST section, and, in the Choose Search Set section, specify what you want to include in the Organism input box.

I know this works with a file of fasta sequences; I've never tried entering a list of accession numbers, though I'd be surprised if it didn't work with multiple accession numbers, just like it works with a single file containing multiple fasta sequences.

#3 jah

jah

    Enthusiast

  • Active Members
  • PipPipPipPipPip
  • 44 posts
0
Neutral

Posted 09 February 2009 - 11:46 AM

I looked into that, but I'm not too interested in sequence homology, rather I want know what the rat homolog of a given mouse gene is. Waiting for the blast to run takes longer (per gene) to churn out than what I've been doing. Basically, I've been plugging the trancript ID (NM_/XM_ accession numbers) into iHOP, looking up the rat homolog, and then searching my cDNA array dataset. Needless to say, such an iterative process is time consuming and mind numbing, which leads me to think that someone smarter than me has encountered this dilemma, and formulated an approach with much greater throughput.

Perhaps I should describe what I'm trying to do, which might give an idea. I did a ChIP-chip experiment with primary mouse cells, and I want to compare that data set with a expression microarray experiment our lab did a while back with rat tissue. The samples are from the same tissues, just in different species.

#4 HomeBrew

HomeBrew

    Veteran

  • Global Moderators
  • PipPipPipPipPipPipPipPipPipPip
  • 930 posts
16
Good

Posted 09 February 2009 - 07:40 PM

I'm not familiar with the form your data is in, but it sounds like your process could be automated by a Perl script. The only fly in the ointment is that the iHOP data does not seem to be available except through the web interface. It'd be better if it was available in a downloadable form.

The web is not a problem per se; there is a way to send specific querys to the data (see here), and Perl can handle the web scrapes using a module like WWW::Mechanize, but the iHOP people say:

Bulk downloads
Please note that iHOP is a freely available tool from the academic domain. Not a company! Thus, it is necessary to limit server load and to give preference to individual users.
Bulk downloads may lead to the banning of IP addresses for specific servers or institutions!
Please contact directly with Robert Hoffmann if there should be a scientific reason for bulk downloads. Thank you for your cooperation!


I don't know whether they would consider several dozen requests a minute originating from the same IP address to be "bulk downloading"....

#5 HomeBrew

HomeBrew

    Veteran

  • Global Moderators
  • PipPipPipPipPipPipPipPipPipPip
  • 930 posts
16
Good

Posted 10 February 2009 - 05:34 AM

I looked into that, but I'm not too interested in sequence homology, rather I want know what the rat homolog of a given mouse gene is.


Of course, the rat homolog of a given mouse gene (otherwise known as an ortholog) is defined by its sequence homology to the mouse gene, no?

Waiting for the blast to run takes longer (per gene) to churn out than what I've been doing.


This may be true if you use the web interface, but if you create a local BLASTable database of the rat genes (using the NCBI program makeblastdb.exe or the older formatdb.exe) and a local copy of the BLAST program (either blastn.exe, blastp.exe, or the older blastall.exe), you can speed things up considerably. Using a Perl script, I can take ~5,000 genes from one organism and find the closest match to each of them in another organism in minutes...

#6 DELETEMYACCOUNTPLEASE

DELETEMYACCOUNTPLEASE

    Y U NOT DELETE MY ACCOUNT?

  • Active Members
  • PipPipPipPipPip
  • 58 posts
1
Neutral

Posted 11 February 2009 - 02:54 PM

Or learn MySQL and use the ensembl database with your sequence identifiers.

You could do this in pure SQL or dollop some #!/usr/bin/perl on top and use DBI; <_<

oh and you could do rat and human at the same time.


Here's how:
mysql -u anonymous -h ensembldb.ensembl.org

Edited by perlmunky, 11 February 2009 - 03:06 PM.





Home - About - Terms of Service - Privacy - Contact Us

©1999-2013 Protocol Online, All rights reserved.