# ANOVA analysis and normal distribution of data

### #1

Posted 13 April 2009 - 07:25 AM

I read some published papers dealing with similar data. They even did not check the normal distribution stuff before Anova.

Now I am confused. Can Anova be done without checking if the data is normally distributed? If the data don't have normal distribution, what could be wrong to get conclusion? Besides transformation, which I tried, but did not find any good one and moreover I did not see any paper doing such a thing to this kind of data, any other statistics could be done?

Thanks for any help!

### #2

Posted 13 April 2009 - 09:01 PM

### #3

Posted 14 April 2009 - 03:28 AM

In my data, there are actually four groups according to the female male genotypes: WT-WT, WT-knockout, knockout-WT, knock-knockout. I should check each group of data have normal distribution, shouldn't I? What if I have a limited number of data in one group (e.g. 4 )? And what if there is an extreme value in the data. Should I discard it without biological reasons?

One more stupid question about the anova analysis in publications. Are they assumed to have done normal distribution and variance check though they don't put in paper?

In order for the probability levels from a test to be valid, the data must come from a normal distribution. if not, you have to use nonparametric test such as Mann-Whitney U test (nonparametricversion of the two group upaired t test), Wilcoxon signed rank test (paired t test), Kruskal-Wallis test (nonparametric equivalent of a one-way ANOVA).

### #4

Posted 27 April 2009 - 10:10 PM

In my data, there are actually four groups according to the female male genotypes: WT-WT, WT-knockout, knockout-WT, knock-knockout. I should check each group of data have normal distribution, shouldn't I? What if I have a limited number of data in one group (e.g. 4 )?

Look for normality as a larger group first ( ie express each data point as a deviation from its’ group mean then combine all points from all groups). If that doesn't work then treat each group individually.

And what if there is an extreme value in the data. Should I discard it without biological reasons?

With the exception of experimental errors, I don’t think any data should be ‘discarded’ (which is not to say that analysis can’t be done with some points temporarily missing so long as it is acknowledged). I finally got this through to a PI of mine following a nasty incident when we had to repeat an assay for a commercial client and discovered that the data she had (unbeknownst to me) deleted as outliers turned out to be important.

One more stupid question about the anova analysis in publications. Are they assumed to have done normal distribution and variance check though they don't put in paper?

Generally if someone has gone to the trouble of checking the assumptions in their analysis (and found them to be valid) they will make a note of it in the paper otherwise it is probably safest to assume not.

### #5

Posted 28 April 2009 - 05:05 PM

Limited data means that you can not assume that it is normally distributed. Parametric tests such as an ANOVA rely on normal distributions and require a minimum of about 30 samples for it to work. However, as DRT says; you can look for normality as a whole in your data. Papers that haven't checked the normality of their data and then done an ANOVA are doing it wrong, and the reviewers should have picked that up. However, the results generated from the analysis may not be erroneous, because all these sorts of tests are an approximation of the real situation, so the results may be right, but for the wrong reason.Thank you very much for your advice. Still have some confusions.

In my data, there are actually four groups according to the female male genotypes: WT-WT, WT-knockout, knockout-WT, knock-knockout. I should check each group of data have normal distribution, shouldn't I? What if I have a limited number of data in one group (e.g. 4 )? And what if there is an extreme value in the data. Should I discard it without biological reasons?

To me it sounds like you need a Kruskal-Wallis test, possibly followed by a post-hoc test such as Tukey's post hoc if you want to distinguish which two groups are actually significantly different, rather than just saying that one of them is different without knowing which one.

### #6

Posted 29 April 2009 - 06:25 AM

Limited data means that you can not assume that it is normally distributed. Parametric tests such as an ANOVA rely on normal distributions and require a minimum of about 30 samples for it to work. However, as DRT says; you can look for normality as a whole in your data. Papers that haven't checked the normality of their data and then done an ANOVA are doing it wrong, and the reviewers should have picked that up. However, the results generated from the analysis may not be erroneous, because all these sorts of tests are an approximation of the real situation, so the results may be right, but for the wrong reason.Thank you very much for your advice. Still have some confusions.

In my data, there are actually four groups according to the female male genotypes: WT-WT, WT-knockout, knockout-WT, knock-knockout. I should check each group of data have normal distribution, shouldn't I? What if I have a limited number of data in one group (e.g. 4 )? And what if there is an extreme value in the data. Should I discard it without biological reasons?

To me it sounds like you need a Kruskal-Wallis test, possibly followed by a post-hoc test such as Tukey's post hoc if you want to distinguish which two groups are actually significantly different, rather than just saying that one of them is different without knowing which one.

You should also look what type of data you have: nominal, ordinal or interval data. Anova is usable for interval variables.

And if you chose a non-parametric test such as Kruskal-Wallis test, you should use also a non-parametric post-hoc test. Example is the Nemenyi test (similar to Tukey, very conservative) or Steel-test.

...except casandra's that belong to the funniest, most interesting and imaginative (or over-imaginative?) ones, I suppose.