Protocol Online logo
Top : Forum Archives: : Bioinformatics and Biostatistics

clustering: how to - Help on perl scripting for apply clustering to gene caracteristics (Sep/12/2006 )

Dear Friends.

I have a text file with more than 14000 lines containing gene refseq id related gene ontologies ids.
The format is the following: refseq<tab><GOid1><tab><GOid2><tab>...

I need to cluster the genes by their GOid similarities and build hierarchical trees.
Could someone give me a help? I am using Perl.

Thanks a lot for giving a look

Best

Pedro

-pmsalves-

Can you post a sample of the file we're trying to parse, and what you'd expect the output to look like?

-HomeBrew-

Hi this, I suspect, may be a stupid question, but why do you cluster on a databases id? I thought most of the database IDs were automatically generated and thus carried absolutly no significance. Having said that I am not all that familiar with GO IDs. unsure.gif

Surely clustering is all about the sequence data!!! biggrin.gif

-perlmunky-

pmsalves -- BTW, you might also want to check what we did here to see if any of it is applicable or adaptable to what it is you're trying to do...

-HomeBrew-

there are many software to do such job.such as DAVID,seqExpress. you can find more on GO's home page

-gtj-