Protocol Online logo
Top : Forum Archives: : Bioinformatics and Biostatistics

protein sequence alignment using suffix tree - (Mar/08/2005 )

i download protein files in .faa format from genbank and i want to build a suffix tree using a protein file for sequence alignment.

but the file contain a series of description follow by the protein sequences. Do i have to remove the descriptions and combine all the sequences together? Or i just build the tree using file, or I can't combine the protein sequences together because they are not suppose to be in continous sequences?

thank you very much for your help!


I have been reading up on proteins. Proteins are coded from mRNA, which is exoms of DNA, so poteins are not suppose to be contiguous. Am I right to say that?

So if I use a protein file to construct a suffix tree for sequence alignment, I can left all the >description intact?