Protocol Online logo
Top : New Forum Archives (2009-): : Bioinformatics and Biostatistics

Need help with basic bioinformatics - (Jan/11/2012 )

Hey,

Just a little background on myself, I'm a new grad student in a microbiology lab. I have begun to realize that bioinformatics is really important and I admit I'm atrocious in this area. I want a sort of starting point to better my knowledge. Mainly I want to be able to look up DNA and protein sequences, and also make comparisons between two sequences (DNA or protein) which I have looked up. Sadly I don't even know how to do this properly. If anyone can give me some step by step info on how to do this, it would be great.

Thanks

-Ryan Ho-

To find the sequences: Try using a site like NCBI Nucleotide http://www.ncbi.nlm.nih.gov/nuccore
Type in the name of the gene you want and it'll find it. e.g. I'll try and find human TRIM5...
Get my list of results which takes me to this page: http://www.ncbi.nlm....ene?term=(trim5)%20AND%20(Homo%20sapiens)%20AND%20alive%20NOT%20newentry&sort=weight
You have loads of information here. To find the sequence information, click on "Reference sequences" on the menu to the right at the top.

Click on the relevant link under mRNA and protein(s) - e.g. for TRIM5, I click on NM_033034.2. You should be taken to a page with all the sequence information. At the bottom is the mRNA sequence. To see the cDNA sequence, click on CDS (one of the many little blue headings you should see on the left - STS, STS, exon, exon etc., one of them is CDS). This highlights the cDNA within the mRNA and you can also see the protein sequence here.

To compare: Use http://www.ebi.ac.uk...w2/toolform.ebi CLUSTAL
You need your sequences in FASTA format. You can get the FASTA version of the mRNA just by clicking the "FASTA" bit at the very top of the nucleotide page, OR you can get the FASTA cDNA by clicking on the CDS (as before) and then at the bottom right you should see a little thing saying Display: FASTA and you can grab it from there.

Type in the sequences like so:
> SEQUENCE TITLE 1
AAGTCA...etc. sequence
> SEQUENCE TITLE 2
TAGTCA...etc. sequence

Make sure you have > and have set it to compare DNA. If you want to compare protein sequence, obviously enter in a protein sequence and select protein

I was struggling with this literally a few weeks ago, so I hope that helps you! But if you want any complex or further advice, somebody else will have to give you that as I am effectively also a novice. Besides knowing how to do that bit!
Best of luck.

EDIT: I'd like to say that you have to be a genius to get your right and left the wrong way around... but I'm not. Fixed!

-seaholme-