Protocol Online logo
Top : New Forum Archives (2009-): : Molecular Biology

Is cDNA an ORF? - (Feb/10/2009 )

Hi all,

I have a bunch of cDNAs (100-500bp) from DDRT-PCR. But when I put them into ORF finder (NCBI), usually the program finds ORFs which are only a part (<half of cDNA length) of my original cDNA.

I then used about 5k bp of the DNA region in a supercontig database (because these cDNA fragments have no significant homology to any identity) to find ORFs that my cDNA may be sitting in.
But the program also finds me small (<50aa) ORFs that overlap my cDNA fragment, but none that actually CONTAIN my WHOLE cDNA as part the ORF.

1. Are all cDNAs ORFs?
2. Why are all my cDNAs only partial ORFs? Shouldn't they be within a larger ORF or stand alone as an ORF themselves?

Thanks
Chris

-chrisbelle-

No, mRNAs often contain large untranslated regions (UTRs) which may also be amplified in DD-RTPCR.

Also, ORF search algorithms often have an option to only find ORFs starting with an "ATG" start codon. In DD-RTPCR you are amplifying internal fragments that would not contain a start codon.

Finally, if your maximum fragment length is 500 bp then the maximum length of your translated protein sequence is 166aa - not very long.

Cheers

MP

-microphobe-

chrisbelle on Feb 10 2009, 03:22 AM said:

Hi all,

I have a bunch of cDNAs (100-500bp) from DDRT-PCR. But when I put them into ORF finder (NCBI), usually the program finds ORFs which are only a part (<half of cDNA length) of my original cDNA.

I then used about 5k bp of the DNA region in a supercontig database (because these cDNA fragments have no significant homology to any identity) to find ORFs that my cDNA may be sitting in.
But the program also finds me small (<50aa) ORFs that overlap my cDNA fragment, but none that actually CONTAIN my WHOLE cDNA as part the ORF.

1. Are all cDNAs ORFs?
2. Why are all my cDNAs only partial ORFs? Shouldn't they be within a larger ORF or stand alone as an ORF themselves?

Thanks
Chris

1. Your cDNA (fragments) are small, so, as microphobe pointed out they may not contain the start codon.

2. You enter any DNA sequence and it can give you a coding sequence of some size, but for a full ORF, you need start and stop codons in the sequence.

3. Just because a coding sequence is predicted, it may not be a real protein. It may just be a nonsense protein coming out of wrong frame of cDNA or internal start codons that in-vivo may not be utilized because of being far away from TSS. You need to match any predicted ORF against your protein sequence.

4. Finally, ORF predictions from a genomic conting are not fullproof algorithms, they miss some.

-cellcounter-

Hi,
thanks cellcounter and mp.

then can anyone of you suggest what can i do in this situation:

I have a DDRT cDNA sequence (let's call it A), minus the DDRT primers. say it is 400bp.

This sequence has no significant similarity in BLASTn (all non-redundant) and BLASTp to my organism.

However, there is a 72/76 identity similarity to organism X's 60s ribosomal protein L40 in BLASTn and 19/19 identity similarity to several organisms' ubiquitin fusion protein, ribosomal protein and hypothetical protein.

There is the supercontig database for my organism, so I BLASTn my sequence there. There is a 100% similarity to a part of a supercontig, so I confirm the sequence is indeed part of my organism.

1. Is there any way I can deduce the possible identity and function of my sequence from the information that I have?
2. Can I submit my sequence as a putative sequence if I don't have the ORF identity?
3. What should I do next if I want to publish my DDRT results? I have already validated some of the expressions, the problems are how to address those cDNAs that do not have a known identity.

Please save my life. :blink: Thanks
Chris.

-chrisbelle-

chrisbelle on Feb 11 2009, 02:20 AM said:

Hi,
thanks cellcounter and mp.

then can anyone of you suggest what can i do in this situation:

I have a DDRT cDNA sequence (let's call it A), minus the DDRT primers. say it is 400bp.

This sequence has no significant similarity in BLASTn (all non-redundant) and BLASTp to my organism.

However, there is a 72/76 identity similarity to organism X's 60s ribosomal protein L40 in BLASTn and 19/19 identity similarity to several organisms' ubiquitin fusion protein, ribosomal protein and hypothetical protein.

There is the supercontig database for my organism, so I BLASTn my sequence there. There is a 100% similarity to a part of a supercontig, so I confirm the sequence is indeed part of my organism.

1. Is there any way I can deduce the possible identity and function of my sequence from the information that I have?
2. Can I submit my sequence as a putative sequence if I don't have the ORF identity?
3. What should I do next if I want to publish my DDRT results? I have already validated some of the expressions, the problems are how to address those cDNAs that do not have a known identity.

Please save my life. :blink: Thanks
Chris.


I suggest the following;

Experimentally:

1. Using the sequence that you have, do 5' and 3' RACE to get a full-length cDNA (You can extend the size of your sequence by using some of the neighbouring genomic sequence cautiously).

2. Use the sequence to do cDNA library screening for your organism (and specific tissue you had used for DDRT-PCR)

Bioinformatically:

1. Take incrementally bigger size of supercontig, 5, 10, 20, 50, 100 Kb and do ORF/gene prediction. See if your sequence comes up in any one of this. Sometimes a 2 kb orf may come from 300Kb genomic region. So, you just have to go through this grilling.

2. Use any of the AA sequence you get to blast against protein motif prediction databases, to see if there is a funtional motif hidden there.

HTH/

-cellcounter-