Jump to content

  • Log in with Facebook Log in with Twitter Log in with Windows Live Log In with Google      Sign In   
  • Create Account

Submit your paper to J Biol Methods today!
Photo
- - - - -

Ouput of MUSCLE alignment shortens names, how can I get the long names back agai


  • Please log in to reply
2 replies to this topic

#1 PhilS

PhilS

    member

  • Active Members
  • Pip
  • 17 posts
0
Neutral

Posted 17 December 2009 - 11:50 PM

Hello,
I'm aligning a bunch of sequences with MUSCLE by inputting a FASTA file with regular header info such as this:
>gi|280987219|gb|GQ200200.2| Cohaesibacter sp. DQHS-21 16S ribosomal RNA gene, partial sequence

However, the ouput file from MUSCLE shortens the names, to something like this: gi|2809872. This seems to be a common thing in phylogenetics software too (I think the phylip format has a short name as well). I presume this is so that reference ID's are passed around the program instead of the full name. But what is the easiest way to get the names back again? For example, I'm feeding the MUSCLE alignment file into RaxML, which creates a tree file (I'm not sure what format this is in) and then I want to look at the names on the branches, not just an ID.

Any help much appreciated,
Thanks,
Phil

#2 guyleonard

guyleonard

    member

  • Active Members
  • Pip
  • 5 posts
0
Neutral

Posted 04 May 2010 - 02:13 AM

Hi,

You unfortunately can't do much about this directly with muscle or many other phylogenetic programs, however you can make it easier to deal with.

Enter REFGEN and TREENAMER, these are two programs I have written in my spare time - read the paper here - they take the standard headers from NCBI GenBank and DOE JGI Genome Project which carry a lot of unhelpful information for your resulting trees (and which cannot be handled by most phylogeny programs as you have noticed) and creates an ID from the accession and species name.

This is obviously shortening the header again but once you have a tree from your analysis, you can use the second tool to replace the ID code with species name and/or accession which are the important parts of the header...

Hope that helps...

#3 PhilS

PhilS

    member

  • Active Members
  • Pip
  • 17 posts
0
Neutral

Posted 04 May 2010 - 08:50 PM

GREAT, thanks so much
Phil

Hi,

You unfortunately can't do much about this directly with muscle or many other phylogenetic programs, however you can make it easier to deal with.

Enter REFGEN and TREENAMER, these are two programs I have written in my spare time - read the paper here - they take the standard headers from NCBI GenBank and DOE JGI Genome Project which carry a lot of unhelpful information for your resulting trees (and which cannot be handled by most phylogeny programs as you have noticed) and creates an ID from the accession and species name.

This is obviously shortening the header again but once you have a tree from your analysis, you can use the second tool to replace the ID code with species name and/or accession which are the important parts of the header...

Hope that helps...






Home - About - Terms of Service - Privacy - Contact Us

©1999-2013 Protocol Online, All rights reserved.