protein sequence alignment using suffix tree - (Mar/08/2005 )
i download protein files in .faa format from genbank and i want to build a suffix tree using a protein file for sequence alignment.
but the file contain a series of description follow by the protein sequences. Do i have to remove the descriptions and combine all the sequences together? Or i just build the tree using file, or I can't combine the protein sequences together because they are not suppose to be in continous sequences?
thank you very much for your help!
I have been reading up on proteins. Proteins are coded from mRNA, which is exoms of DNA, so poteins are not suppose to be contiguous. Am I right to say that?
So if I use a protein file to construct a suffix tree for sequence alignment, I can left all the >description intact?