Protocol Online logo
Top : Forum Archives: : Bioinformatics and Biostatistics

Issues of EST in whole genome gene discovery - Issues of EST in whole genome gene discovery (Jun/26/2005 )

EST is short for expressed sequence tags, where partially sequenced cDNA fragments are used to represent gene expression. It has the power of gene discovery on a genomic level, However, there are limitations of this technology. Of course the advantage of EST is that it provides direct evidence of the expression of a gene. But there are several drawbacks I could think of:

1. mRNA abundance limitation: it is estimated that genes have an average express ion level below 5% of the mRNA population is less likely to be detected by EST technology. Therefore, genes that are expressed at extreme low level won't be able to be detected by EST, so the 27K in this sense should be an lower estimate (if we only focus on this matter). Now, one of the other technique is more sensitive, it is called SAGE, for serial analysis of gene expression, where tags at 3' end of genes are collected and sequenced to represent mRNA expression.

2. development stage limitation: some of the genes in our genome may only express at certain development stage, therefore, if EST libraries cannot cover this development stage, both for technical and ethical reasons, there is no way these gene can be detected.

3. diseased stage limitation: on the same line of #2, but from an opposite point of view, some mRNA might be a result of diseased stage, not neccissary be a normally expressed gene. Or aberrent expression that are not constitute normal behavior of human genome, hence, enriched EST libraries from cancer cell lines (for example) will likely to result in over estimation of human transcriptome.

4. sequencing error: this could post extreme difficulties for bioinformatics to
further map these ESTs back to genome to identify genes. Especially from paralog genes.

Finally, besides gene discovery, EST can be used to study post transcriptional control on a genomic level, which have been shown very successful, including studies in alternative splicing and alternative polyadenylation. Pioneer of these include Michael Zhang at CSHL, Chris Burge at MIT, and Chris Lee at UC.

Please add.


Good points.

There are some EST libraries built from samples of disease stages such as prostate cancer. Looking back I think the EST project is quite successful. SAGE has the problem of misassignment of tags, thus is less reliable than EST, but more sensitive than EST for detecting low-abundance transcripts. Since EST is single pass cDNA sequences, thus coverage is limited especially in the middle. A group from Brazil invented a method of sequencing cDNA from the middle and has deposited some ESTs (called ORESTES, open reading frame expressed sequence tags) in the database.


I add some of my opinions for the limitaions of ESTs.

Except for what the top guy has mentioned, some more limitations are listed below:

1.tissue-specific limition: some genes express only in specific tissue.
So, if EST libraries is not the tissue-specific one, it is impossible for these genes to be detected.

2.cell-type limition:some genes express only in specific cell type. So,the same thing will happen as issuse 1.

3.signal-specific limitation: It is a more strict requirement. Some gene express only induced by specific signal esspecially in neural system.


Great! More insight into analysis of ESTs.

BTW, here is the link to

"The use of Open Reading frame ESTs (ORESTES) for analysis of the honey bee transcriptome"