Protocol Online logo
Top : Forum Archives: : Bioinformatics and Biostatistics

Protein sequence alignment - (Jul/09/2008 )

hi, anyone is familar with interpretation alignment output? I am new, not sure whether my sequence alignment is good or poor. Which factor (Score/ Expect/ identities/ positives/ gaps/ E value) to be consider to get good alignment?

Domo arigato.

-mes08-

QUOTE (mes08 @ Jul 9 2008, 09:59 PM)
hi, anyone is familar with interpretation alignment output? I am new, not sure whether my sequence alignment is good or poor. Which factor (Score/ Expect/ identities/ positives/ gaps/ E value) to be consider to get good alignment?

Domo arigato.

The quickest solution is to go back to the software and look in the documentation for the fine details. (Quick, not necessarity very informative...). Who suggested you do this alignment? I'd ask them for their interpretation.
Having said that, the alignment looks OK for the following reasons:
1. there are no gaps (so no insertions or deletions)
2. 31% identity
3. 53% similarity of sequence. A total of 84% similarity+identity suggests a maintenance of the function of the proteins because many charge and hydrophobic interactions are conserved.
4. The fact the the same part of the two proteins were selected also helps. That is, you'd be less convinced if the sequence cam from opposite ends of the two proteins.

How much of your query protein did you input? If the two proteins are truly related, you might expect long stretches of similarity. If you only inputted this sequence, it might simply represent a short motif or module that is common to a large number of otherwise-unrelated proteins.

-swanny-

HI Swanny,

Thanks for your reply. The input length is just 179aa.

QUOTE (swanny @ Jul 9 2008, 09:31 PM)
QUOTE (mes08 @ Jul 9 2008, 09:59 PM)
hi, anyone is familar with interpretation alignment output? I am new, not sure whether my sequence alignment is good or poor. Which factor (Score/ Expect/ identities/ positives/ gaps/ E value) to be consider to get good alignment?

Domo arigato.

The quickest solution is to go back to the software and look in the documentation for the fine details. (Quick, not necessarity very informative...). Who suggested you do this alignment? I'd ask them for their interpretation.
Having said that, the alignment looks OK for the following reasons:
1. there are no gaps (so no insertions or deletions)
2. 31% identity
3. 53% similarity of sequence. A total of 84% similarity+identity suggests a maintenance of the function of the proteins because many charge and hydrophobic interactions are conserved.
4. The fact the the same part of the two proteins were selected also helps. That is, you'd be less convinced if the sequence cam from opposite ends of the two proteins.

How much of your query protein did you input? If the two proteins are truly related, you might expect long stretches of similarity. If you only inputted this sequence, it might simply represent a short motif or module that is common to a large number of otherwise-unrelated proteins.

-mes08-

QUOTE (mes08 @ Jul 9 2008, 04:59 AM)
hi, anyone is familar with interpretation alignment output? I am new, not sure whether my sequence alignment is good or poor. Which factor (Score/ Expect/ identities/ positives/ gaps/ E value) to be consider to get good alignment?

Domo arigato.



As far as the protein sequence sequence alignment is concerned, u should not only concentrate on the % of similarity and gaps but also the domain of the protein. In your case, it seems two regions are highly conserved , 1) PLD and 2) KGG. check whether this two regions represent the active site of your protein, if so, then u can interpret tht ur query protein sequence has tha same active site like tht of ur subjected sequence and may have the same binding properties. Hope this information helps u.

-Biorad-

I would need further evidence that the query protein might be a glycerate kinase than that alignment -- 14 identities over a 45 aa stretch in a presumably ~400 amino acid protein is not significant enough for me.

Look, for example, at how your subject sequence (gi|15897575|ref|NP_342180.1 Glycerate kinase, putative [Sulfolobus solfataricus P2]) lines up when BLASTed against GenBank (see here). The majority of alignments are pretty much full length...

-HomeBrew-