Protocol Online logo
Top : Forum Archives: : Macroarray and Microarray

How to get the suitable data from the public Microarray raw data - bioinformatics (Apr/18/2005 )

Dear all,

May I seek for your kind help for those who know how to "pre-processing" the data download from the public microarray experiment data?

I am doing some researches on mathematical tools applied in genetic regulatory system study. For simulation purposes, I need the gene expression microarray data for a specific genome, like human breast tumor. Actually, I have downloaded the raw data from the Strandford University Website Public Dataset. Now the problem is that for the raw data downloaded, there are quite a few value items based on different measurement, like
Log(base2) of R/G Normalized Ratio (Mean),
Log(base2) of R/G Normalized Ratio (Median),
G/R Normalized (Mean),
Channel 2 Normalized (Mean Intensity / Median Background Intensity), etc.

Since I have no experience in microarray experiments, therefore have no idea to choose which item of values as my final dataset for simulation.
Or, do I need to do some "pre-processing" of the raw data in order to achieve the "right" result? oh, poor me blink.gif

I sincerely hope those who have similar experience of using the public raw data could share your experience with me. Your any suggestions and comments would be much appreciated! smile.gif

Many thanks!!


You should do some background reading on two channel microarray experiments , a good starting spot is

Read some of the background on the normalization techniques etc.

The basic isea is that each gene is measured in a control condition (normal breast tissue) vs a treated condition (tumor tissue). A ratio of the two measurements is taken and a gene will generally be considered if it is up-regulated by 2 fold or downregulated to less then 0.5.
There are several ways you can normalize the data to account for artifacts and other physical and technical problems with two channel microarrays. You have to account for local hybridization problems etc. This is basically something that would take several pages to explain even in a superficial manner.

This is a good starting place:

This has a lot of links that may be useful: