Protocol Online logo
Top : New Forum Archives (2009-): : Bioinformatics and Biostatistics

Determining the N- and C-terminal regions in a protein - (Feb/10/2012 )

Dear all,

So, something's been bugging me all night. I'm doing a literature review on the polyester synthase and a group of related proteins. Coming to the molecular aspect, I encounter the terms "N-terminal / C-terminal regions" often.

How do we (or in the near future, I) determine / decide where the N-terminal or C-terminal spans?

The first thought that came into my mind was a multiple alignment with the same protein in other organisms. But that’s only useful for those having a conserved N or C regions. What more, this particular group of protein has a “non-conserved N-terminal region” while the alpha/beta hydrolase fold domain is conserved in part of the C-terminal (see below).


The regions based on the protein of the model organism. (conserved region in blue)

Until now, only the protein in the model organism is well-studied and annotated. So, another way is to align the amino acid sequence from the model together with those from other organisms. Where the C-terminal begins (aspartate 99), that position will be a benchmark of sort to the other query sequence. Still, I felt this isn’t very convincing.

Waitiing for some feedbacks / suggestions / pointers regarding this.

Thank you


The C or N terminals if described would usually correspond to a particular domain. You would have to get the info from crystal structures.

If you wish to compare it across species, with little homology, you will need to figure the structure of each of them to identify similarities to comment on them.

Just my 2 cents


Thanks scolix. they've been trying for years to crystalize the protein for X-ray but nobody's done that yet; though many studies on the catalytic residues (in C-terminal) was carried out and using lipase as the relative equivalent.

Still, I'd love to know if their highly variable N-terminal region "corresponds to a particular domain"...


N terminal and C terminal just refer to the Amino (Nitrogen based) terminal and Acid (Carbon based) termninii of the proteins - look at how peptides are formed chemically out of the amino-acids- acidic (COOH) ends bind to amine (NH2) ends so that one end has a free amine (N terminus) and the other has a free acid (C terminus)

Nothing fancy...


Yes bob1, I know that those are the terminii (with a single residue at the start / end) but apparently the authors are referring to N/C-terminal domains of different amino acid lengths -- 1-93 in one, 1-104 in another, and 1-98 in the model organism (as mentioned above).

Hmm... I'm curious to know how the regions are defined.


That will be based on the length of the protein and the conserved regions/domains of the protein in different species. For instance one protein the domain stretch from amino-acids 1-104, and another has the same amino-acid sequence stretch from 1-93. O it could be that the 93rd (or 104th etc.) positions are halfway through the protein in each case.


I see. I'm getting it (I think). It's back to the conserved domains across the different species.