Protocol Online logo
Top : Forum Archives: : Bioinformatics and Biostatistics

gDNA, mRNA and cDNA sequences of a gene? - (May/29/2005 )

Where and how to get the gDNA, mRNA and cDNA sequences of a gene? Any search tools?



Thanks!

-seasons-

QUOTE (seasons @ May 29 2005, 10:23 AM)
Where and how to get the gDNA, mRNA and cDNA sequences of a gene? Any search tools?


Thanks!


My first suggestion is to use NCBI GENE, http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene

Put in your favorite gene, click search, then you get all information you need. For example, put in FOS, you will get a list of FOS gene from different organisms, find the one from human, click on the link. Then, scroll down, you will find genomic sequences and mRNA sequences for the gene.

For your question, you will need to find genomic sequences in the gene record for your gDNA, NM_005252 (RefSeq) for mRNA, and other mRNA sequences as cDNA sequences.

If you want literature information, scroll up, you will have gene structure and literature reports.

Note that for mRNA sequence, you should probably use RefSeq, which is after curation of NCBI researchers. Other mRNAs listed are genbank sequences, most are from cDNA. If you want more of cDNA/EST, you will need to search through dbEST for a complete list of them related to YFG.

Good luck!

Second suggestion is to check out http://www.ensembl.org/ the resource is pretty much integrated the same way.

-cyberpostdoc-

Thanks for your great answer!

I still have some questions regarding these search tools. Let us take for example the FOS gene for the human.
I came to this site:
http://www.ncbi.nlm.nih.gov/entrez/query.f...&list_uids=2353

I am now overwhelmed with all the infos and sequences in this site. Besides there are alot of socalled mRNA and genomic sequences which i don't know which one of them is the mRNA, gDNA and cDNA for the FOS gene.

My questions:

1: Which link should i now click on to get the mRNA of the fos gene?

2. Which link should i now click on to get the cDNA of the fos gene?

3. Which link should i now click on to get the gDNA of the fos gene?

4. The sequence under "translation" is that the amino acid of a gene?

5. What kind of sequence (mRNA, gDNA or cDNA) is under "CDS" of a gene?



Thanks again!

-seasons-

http://www.ncbi.nlm.nih.gov/entrez/query.f...&list_uids=2353

1: Which link should i now click on to get the mRNA of the fos gene?

I believe you need RefSeq of mRNA of the gene. There are two ways to do this on the page of fos gene:
first, use the gene structure map, this is the picture at the very begining of the fos gene entry showing the intron, exon, and 3'/5' UTR structure of the gene. Now, on left hand side of the picture, there is a NM_005252 link, click on it, javascript will popup a list with two entries: FASTA and GenBank. FASTA will give you the mRNA sequence, Genbank will give you the Genbank record of the mRNA.
second, you can use the links in the "NCBI Reference Sequences (RefSeq)" section, click on NM_005252, it will give you the same information.

2. Which link should i now click on to get the cDNA of the fos gene?

First, you should keep in mind, there are more than one cDNA sequence for this gene. Listed in the "Related Sequences" section of the record, you will see "genomic" and "mRNA" sequences, those "mRNA" sequences are actually cDNA sequences. The more complete view of all cDNA resources of a gene is to look into its UniGene link page, which is in the last section "Additional Links", click on UniGene Hs_25647, then in the "mRNA sequences (8)" section (8 sequences) you can see all the cDNAs associated with this gene, for example, full-length cDNA clone CS0DI066YO13 of Placenta Cot 25-normalized of Homo sapiens (human). As you can see, this resource is much more comprehensive because it provide tissue and disease information.

3. Which link should i now click on to get the gDNA of the fos gene?
In this case it should be the link of "Genomic V01512" under section "Related Sequences". All genomic sequences listed in that section are gDNA, but if you examine each record carefully, you will find most of them are partial, but this one "Genomic V01512" is complete. A trick to locate the complete gDNA CDS is to look in the section "NCBI Reference Sequences (RefSeq)", where you can find that RefSeq NM_005252 is built from source sequence V01512, which means that V01512 is probably the best covered sequene of this gene.

One thing you need to keep in mind is that the genbank record could come from an individual submission rather than from the whole genome sequence project. Therefore it might be specific to the tissue type or diseased type of that record (read carefully all information you can read). It might have SNP compare to genomic sequence of the gene built from whole genome sequencing project. To get the genomic of a gene in the whole genomic sequencing context, you should do things differently, which is use annotations on a NC or NT sequence of a gene to retrieve the gene sequence. There are much more to say on that end.


4. The sequence under "translation" is that the amino acid of a gene?

I didn't find "translation" in the gene page, I guess you opened one of the mRNA genbank record page, say, maybe you opened NM_005252 genbank record, in there, yes, the "translation" gives you the amino acid sequence.

5. What kind of sequence (mRNA, gDNA or cDNA) is under "CDS" of a gene?

This is rather a biology question than bioinformatics question. In biology point of view, coding sequence are portions of a gene that correspond to amino acid, but since is a part of a gene sequence, it should be DNA sequence in principle, however, bioinformatically, CDS as a joint of all exons can only be identified with mRNA and cDNA. Therefore, sequence-wise, CDS are same as mRNA and cDNA from start codon to stop codon, but CDS is actually an annotation term which must be combined with informations with information of intron and exon structure. You can find this within the genbank record of V01512.

To read more about CDS click here: http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html#CDSB

To read more about genbank record click here: http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html

-cyberpostdoc-

Firstly, thank you alot for putting your time on me. I am grateful for that! biggrin.gif

QUOTE (cyberpostdoc @ Jun 2 2005, 01:16 AM)
1: Which link should i now click on to get the mRNA of the fos gene?

I believe you need RefSeq of mRNA of the gene. There are two ways to do this on the page of fos gene:
first, use the gene structure map, this is the picture at the very begining of the fos gene entry showing the intron, exon, and 3'/5' UTR structure of the gene. Now, on left hand side of the picture, there is a NM_005252 link, click on it, javascript will popup a list with two entries: FASTA and GenBank. FASTA will give you the mRNA sequence, Genbank will give you the Genbank record of the mRNA. second, you can use the links in the "NCBI Reference Sequences (RefSeq)" section, click on NM_005252, it will give you the same information.


When I look for the mRNA of the human pituitary fos gene, do I have to make any other advanced search or does pituitary fos gene have the same mRNA sequence like any fos genes in other parts of the body, like no matter it is an intestine or pituitary fos gene?


QUOTE (cyberpostdoc @ Jun 2 2005, 01:16 AM)
2. Which link should i now click on to get the cDNA of the fos gene?

First, you should keep in mind, there are more than one cDNA sequence for this gene. Listed in the "Related Sequences" section of the record, you will see "genomic" and "mRNA" sequences, those "mRNA" sequences are actually cDNA sequences. The more complete view of all cDNA resources of a gene is to look into its UniGene link page, which is in the last section "Additional Links", click on UniGene Hs_25647, then in the "mRNA sequences (8)" section (8 sequences) you can see all the cDNAs associated with this gene, for example, full-length cDNA clone CS0DI066YO13 of Placenta Cot 25-normalized of Homo sapiens (human). As you can see, this resource is much more comprehensive because it provide tissue and disease information.


Why are there more than one cDNA sequence for this gene? For example I want to use cDNA of human pituitary Fos gene to design the primers, which one of these 8 cDNA sequences can I use as the template for designing?


QUOTE (cyberpostdoc @ Jun 2 2005, 01:16 AM)
3. Which link should i now click on to get the gDNA of the fos gene?
In this case it should be the link of "Genomic V01512" under section "Related Sequences". All genomic sequences listed in that section are gDNA, but if you examine each record carefully, you will find most of them are partial, but this one "Genomic V01512" is complete. A trick to locate the complete gDNA CDS is to look in the section "NCBI Reference Sequences (RefSeq)", where you can find that RefSeq NM_005252 is built from source sequence V01512, which means that V01512 is probably the best covered sequene of this gene.


So CDS of gDNA is the part where mRNA or cDNA come from? Can I use Blast to check for at which part of gDNA the mRNA comes from? For example I am going to find the exon-intron junctions of the pituitary fos gene, for doing this I have to compare the gDNA and cDNA sequences. Do you know which one of these gDNA is the gDNA of pituitary fos gene? Or does gDNA for human pituitary fos gene have the same sequence like in any parts of the body?

QUOTE (cyberpostdoc @ Jun 2 2005, 01:16 AM)
One thing you need to keep in mind is that the genbank record could come from an individual submission rather than from the whole genome sequence project.

How can I see this? Is there any special labels for this?


QUOTE (cyberpostdoc @ Jun 2 2005, 01:16 AM)
To get the genomic of a gene in the whole genomic sequencing context, you should do things differently, which is use annotations on a NC or NT sequence of a gene to retrieve the gene sequence. There are much more to say on that end.

I would like to know, since I have used for it. Do you mind telling me how to get the genomic of the human fos gene or another gene in the whole genomic sequencing context?


QUOTE (cyberpostdoc @ Jun 2 2005, 01:16 AM)
5. What kind of sequence (mRNA, gDNA or cDNA)  is under "CDS" of a gene?

This is rather a biology question than bioinformatics question. In biology point of view, coding sequence are portions of a gene that correspond to amino acid, but since is a part of a gene sequence, it should be DNA sequence in principle, however, bioinformatically, CDS as a joint of all exons can only be identified with mRNA and cDNA. Therefore, sequence-wise, CDS are same as mRNA and cDNA from start codon to stop codon, but CDS is actually an annotation term which must be combined with informations with information of intron and exon structure. You can find this within the genbank record of V01512.


I look CDS on genbank record of V01512: http://www.ncbi.nlm.nih.gov/entrez/viewer....nBank&val=29903

CDS: join (289..429,1183..1434,1866..1973,2088..2729)
The different sequences behind the “join” are they the exon sequences that will unite and make the mature mRNA?



Hope for replies!

Thank you alot!

-seasons-

QUOTE
When I look for the mRNA of the human pituitary fos gene, do I have to make any other advanced search or does pituitary fos gene have the same mRNA sequence like any fos genes in other parts of the body, like no matter it is an  intestine or pituitary fos gene?
Every human cell share the same genomic sequences, so gene sequence is the same no matter which part of the body. However, tissue-specific alternative splicing, alternative polyadenylation, and alternative transcription initiation are known mechanisms that contribute to the diversity of mRNA from the same gene. So the answer is no, you cannot assume the pituitary fos gene you are looking at have the same mRNA sequences comparing to those from the other part of a human body. You will need to read literature, and read the gene record that we discussed before for literature links and descriptions to get to know your gene better.

This is something in general to all biologists, you must be an expert to whatever you are studying, inside out, every details. Biologists will always debate with computer scientist and mathmatician, because in biology, you cannot assume YFG just have one form of mRNA, nor could you define a gene's behavior, you would have to keep an open mind and make hypothesis and do experiment or read other people's work to find it out.

QUOTE
Why are there more than one cDNA sequence for this gene? For example I want to use cDNA of human pituitary Fos gene to design the primers, which one of these 8 cDNA sequences can I use as the template for designing?

Like I said before, in eukaryotes, alternative splicing, alternative polyadenylation, and alternative transcription initiation are known mechanisms that contribute to the diversity of mRNA from the same gene. So different group might study the same gene from different tissue or diseased states, and their cloned cDNA might thus have different sequences if there are tissue or disease specific regulation.

To design primers, depends on you goal, you should do it differently. If you know your isoform of mRNA/cDNA. Then you want to target to the specific exon in your isoform. If you don't know, want to fish for all isoforms, then design the primers target to the common sequence expressed in all isoforms. So, that being said, here is what I would do:
1. find out if the gene has evidence of alternative splicing or alternative polyadenylation. You can check out off the shelf database, such as ASD (alternative splicing database: http://www.ebi.ac.uk/asd/) and PolyA_DB (polyadenylation database: http://polya.umdnj.edu/). Or you can get all cDNAs of the gene, align them (use blast) to the genomic sequence, check if there are alternative splicing or alternative polyadenylation. Then find consititutive exons and alternative exons.

2. design primer according to your goal by targeting to either consititutive exons or alternative exons.

Once again, this is YFG, you got to know every detail about it.

QUOTE
So CDS of gDNA is the part where mRNA or cDNA come from? Can I use Blast to check for at which part of gDNA the mRNA comes from? For example I am going to find the exon-intron junctions of the pituitary fos gene, for doing this I have to compare the gDNA and cDNA sequences. Do you know which one of these gDNA is the gDNA of pituitary fos gene? Or does gDNA for human pituitary fos gene have the same sequence like in any parts of the body?
This should be clear by now with the above two answers.


QUOTE (cyberpostdoc @ Jun 2 2005, 01:16 AM)
One thing you need to keep in mind is that the genbank record could come from an individual submission rather than from the whole genome sequence project.
How can I see this? Is there any special labels for this?

No, you just need to read the genbank id carefully see if it is part of the whole genome sequence effort or a BAC clone. You will need to read a bioinformatics book for this background information on human genomic sequencing project.

QUOTE
I would like to know, since I have used for it. Do you mind telling me how to get the genomic of the human fos gene or another gene in the whole genomic sequencing context?
From the fos gene page, look at the gene picture at the top. Click on the NC_000014 link, this will give you the genomic sequence of the gene in a new page. Its a genbank format record, it tells you it displays the REGION: 74815284..74818666 of chromosome 14, which is where the gene is located, at the end you have the sequence. You can get the FASTA format sequence from the drop-down menu at the top of the page.

QUOTE
I look CDS on genbank record of V01512: http://www.ncbi.nlm.nih.gov/entrez/viewer....nBank&val=29903
 
CDS:             join (289..429,1183..1434,1866..1973,2088..2729)
The different sequences behind the “join”  are they the exon sequences that will unite and make the mature mRNA?


Yes! You are right! see, genbank record is straight forward!

QUOTE
Hope for replies!
Thank you alot!

You are welcome! Glad I can help!

-cyberpostdoc-