Protocol Online logo
Top : Forum Archives: : Bioinformatics and Biostatistics

clustering: how to - Help on perl scripting for apply clustering to gene caracteristics (Sep/12/2006 )

Dear Friends.

I have a text file with more than 14000 lines containing gene refseq id related gene ontologies ids.
The format is the following: refseq<tab><GOid1><tab><GOid2><tab>...

I need to cluster the genes by their GOid similarities and build hierarchical trees.
Could someone give me a help? I am using Perl.

Thanks a lot for giving a look




Can you post a sample of the file we're trying to parse, and what you'd expect the output to look like?


Hi this, I suspect, may be a stupid question, but why do you cluster on a databases id? I thought most of the database IDs were automatically generated and thus carried absolutly no significance. Having said that I am not all that familiar with GO IDs. unsure.gif

Surely clustering is all about the sequence data!!! biggrin.gif


pmsalves -- BTW, you might also want to check what we did here to see if any of it is applicable or adaptable to what it is you're trying to do...


there are many software to do such job.such as DAVID,seqExpress. you can find more on GO's home page