Gene set similarity searches in public databases - (May/15/2011 )
I am a newbie at microarray data analysis. I managed to get to the point of generating several differentially expressed gene lists and I am trying to make some sense of them now. I have used several GO tools online to try to find some relevance of the gene lists to diseases or tissue types. These tools are very good at returning information about individual genes but not the whole set of genes. Maybe I am not doing the right thing or looking at the right place. My goal is a very simple one.....just to find the closest match to my gene lists among the published gene lists out there. Hope that some of you know how to do this and could give me a few pointers.
If I understand you correctly, some of your genes did not get matched by the GO analysis programs, right?
ON what platform was your microarray done and what type of ID (Entrez gene ID, Ensembl ID, gene symbol, etc) did you use as input for the GO analysis, where id you get them? What organism are you working on?
No, my problem is not that I am not getting matches for some of the genes. The problem is, the results are too general. Maybe it is clearer if I illustrate it with an example. Lets say I am given a biopsy sample taken from a tumorous growth on the skin of a patient. I do a microarray analysis of the biopsy sample and I now have a list of genes overexpressed in this biopsy versus normal skin. Is there a way I can interrogate this list of genes against deposited data in public databases? With typical GO analysis, the kind of results I get is e.g. gene X is associated with glioma, gene Y is associated with Alzeimer's etc. Although it provides information about single genes, it doesn't tell you what the whole gene signature resembles. Perhaps someone out there has already done microarray on a similar kind of tumor and has deposited the data in a database. I want to be able to fish out that data and compare my data with it. The kind of result I hope to see is something like "82% of your genes matches those found in dataset ID #######". Is this possible?
OK, I understand you now. If you want to compare your data with other similar and published microarray studies, you can use oncomine.org which requires registration and is free (with limited access).
For GO analysis, if you found 100 genes overexpressed in your cancer samples and you want to know what functional categories those genes represent, you don't need additional datasets as references because you must already have control data such as data from normal tissues and that is how you know the 100 genes are overexpressed. when you apply GO analysis, the software will have these numbers from your data: total number of genes on the array, number of overexpressed and downregulated genes, total number of human genes in GO databases, number of genes in each GO category. If for example there are 10 (0.05%) genes in anti-apoptosis category out of 20k total human genes, and 5 out of 100 (5%) overexpressed genes are from anti-apoptosis category, then your overexpression genes are overrepresented by anti-apoptotic genes. This is what you will get from GO analysis.
Sounds like the problem is I didn't do my GO analysis the right way. Do I always have to input the "background" gene list when I do the GO analysis e.g. the whole Illumina HT-12 probe list in my case? I use DAVID but it doesn't make it compulsory to input the background list. Can you pls recommend me some GO programs that you think are good?