Statistical analysis of methylation - (May/23/2005 )
To those of you who have been helping me out all along.... guess what! I now have a data set. It's exciting. I wondered if anyone could give me some pointers on how they have analyzed their methylation data to determine significance? i.e. two different samples with slightly different methylation patterns... what is significant difference between the two and what is noise? Computer programs? Simple chi-squared tests? I'll start combing the literature too, but any help would be appreciated.
Thanks in advance!
That is great news labtechie! Congratulations!!!
Well there is no real measure of significance statistically speaking, most bisulfite sequencing papers publish the patters of 10 or more clones for each sample/treatment and from the difference of methylation patterns between the two you should be able to see the difference visually.
There are ways of rearranging your clones so you have the highly methylated ones at the top and work your way down to the lesser methylated ones.
I have consulted a statistician and they basically say if there is not precedent for calculating statstical significane, then there is no real need to do it then!
Glad to hear that.
As Nick said there is no standard way to qnantitate and present methylation data. Some methods I know or I have used are as follows depending on different mapping techniques.
If you have done cloning, you could obtain the following data:
1) If you focus on individual CpG sites, you can get the percentage of clones showing methylation at a particular site for each sample, then apply Chi-square test, and finally you can say that certain treatment causes significant methylation (or demehtylation) at what CpG site compared to untreated.
2) if you are interested in the overall methylation changes, you can get an average level of methylation by dividing the number of all methylation sites by number of total mapped CpG sites in all ten clones, then you can apply t-test or NOVA.
If you sequenced directly without cloning
1) Qualitative data: yes or no, for all CpG sites mapped, if any CpG is methylated, then the sample is methylation. In this way you get a percentage of methylated samples for a group of samples, then you can use Chi-square.
2. Quantitative data: You can measure the level of methylation (the height of C peak relative to T peak, if there is only a C peak, the methylation level will be 100%), or the prevalence of methylation (percentage of methylated CpG sites acrosse the mapped area)
I am not a biostastician and just want to show how many ways you can present and calculate methylation data. I am not sure if these methods are statistically sound or not.