# Narrative description of skewed data

### #1

Posted 13 February 2009 - 07:35 AM

Thanks.

### #2

Posted 17 February 2009 - 11:48 PM

For an example, if it's to small you can't do a t-test.

We once did a Wilcoxon test to compare arbitrarily samples.

But the best thing to do is to

**go to a statistician**for help.

### #3

Posted 18 February 2009 - 06:17 AM

### #4

Posted 18 February 2009 - 06:19 AM

### #5

Posted 18 February 2009 - 08:30 AM

Normally median should do the job. I'd use box plots with median, lower quartile, upper quartile, smallest and largest observation, outliers. Sometimes the mean is also included. It combines the most important descriptive statistics and gives an impression on the distribution of the data.I am trying to write up some results regarding changes to non-guassian distributed data. With normally distributed data it is easy; one can just write the mean +/- SEM. What is the equivalent for non-gaussian data? I imagine one uses the median along with some measure of the spread (interquartile range? some percentile?). I've looked at a few other papers, but there doesn't seem to a consensus (some are just wrong; e.g. median +/- SEM). Any ideas?

Thanks.

"Statistics are like a drunk with a lamppost: used more for support than illumination."

Sir Winston Churchill

**Edited by hobglobin, 18 February 2009 - 08:36 AM.**

One must presume that long and short arguments contribute to the same end. - Epicurus

...except casandra's that did belong to the funniest, most interesting and imaginative (or over-imaginative?) ones, I suppose.

### #6

Posted 18 February 2009 - 09:58 AM

Normally median should do the job. I'd use box plots with median, lower quartile, upper quartile, smallest and largest observation, outliers. Sometimes the mean is also included. It combines the most important descriptive statistics and gives an impression on the distribution of the data.I am trying to write up some results regarding changes to non-guassian distributed data. With normally distributed data it is easy; one can just write the mean +/- SEM. What is the equivalent for non-gaussian data? I imagine one uses the median along with some measure of the spread (interquartile range? some percentile?). I've looked at a few other papers, but there doesn't seem to a consensus (some are just wrong; e.g. median +/- SEM). Any ideas?

Thanks.

"Statistics are like a drunk with a lamppost: used more for support than illumination."

Sir Winston Churchill

Thanks for your help. For my figures, I think am going to plot my raw values as a scatter plot - I have an n of about 50 per treatment, it takes up the same amount of space and it shows all the data (which, in most cases, should be encouraged!). My main issue is how to describe the data in the text. For example, in gaussian-distributed data I would most-likely use the mean and the SEM and therefore the text would include a statement like:

"the mean amplitude of the response changed from 3.2 +/- 0.2 nA in the control to 4.5 +/- 0.3 nA in the presence of the drug"

In my skewed data, to use the mean +/- SEM would be inappropriate. Is there some kind of equivalent? I could use the median, the lower quartile (Q1), and upper quartile (Q3), so the equivalent statement in the results text would read something like :

"the median amplitude of the response changed from 3.4 (Q1 3.0, Q3 3.6) nA in the control to 4.5 (Q1 4.3, Q3 4.8) nA in the presence of the drug"

but this seems a bit clumsy.

Any thoughts?

Thanks so much

### #7

Posted 18 February 2009 - 10:15 AM

Normally median should do the job. I'd use box plots with median, lower quartile, upper quartile, smallest and largest observation, outliers. Sometimes the mean is also included. It combines the most important descriptive statistics and gives an impression on the distribution of the data.I am trying to write up some results regarding changes to non-guassian distributed data. With normally distributed data it is easy; one can just write the mean +/- SEM. What is the equivalent for non-gaussian data? I imagine one uses the median along with some measure of the spread (interquartile range? some percentile?). I've looked at a few other papers, but there doesn't seem to a consensus (some are just wrong; e.g. median +/- SEM). Any ideas?

Thanks.

"Statistics are like a drunk with a lamppost: used more for support than illumination."

Sir Winston Churchill

Thanks for your help. For my figures, I think am going to plot my raw values as a scatter plot - I have an n of about 50 per treatment, it takes up the same amount of space and it shows all the data (which, in most cases, should be encouraged!). My main issue is how to describe the data in the text. For example, in gaussian-distributed data I would most-likely use the mean and the SEM and therefore the text would include a statement like:

"the mean amplitude of the response changed from 3.2 +/- 0.2 nA in the control to 4.5 +/- 0.3 nA in the presence of the drug"

In my skewed data, to use the mean +/- SEM would be inappropriate. Is there some kind of equivalent? I could use the median, the lower quartile (Q1), and upper quartile (Q3), so the equivalent statement in the results text would read something like :

"the median amplitude of the response changed from 3.4 (Q1 3.0, Q3 3.6) nA in the control to 4.5 (Q1 4.3, Q3 4.8) nA in the presence of the drug"

but this seems a bit clumsy.

Any thoughts?

Thanks so much

You could give the confidence interval in which e.g. 95% of the measurements are included; it's used frequently for such skewed data.

One must presume that long and short arguments contribute to the same end. - Epicurus

...except casandra's that did belong to the funniest, most interesting and imaginative (or over-imaginative?) ones, I suppose.

### #8

Posted 19 February 2009 - 10:06 PM

Thanks for your help. For my figures, I think am going to plot my raw values as a scatter plot - I have an n of about 50 per treatment, it takes up the same amount of space and it shows all the data (which, in most cases, should be encouraged!). My main issue is how to describe the data in the text. For example, in gaussian-distributed data I would most-likely use the mean and the SEM and therefore the text would include a statement like:

"the mean amplitude of the response changed from 3.2 +/- 0.2 nA in the control to 4.5 +/- 0.3 nA in the presence of the drug"

In my skewed data, to use the mean +/- SEM would be inappropriate. Is there some kind of equivalent? I could use the median, the lower quartile (Q1), and upper quartile (Q3), so the equivalent statement in the results text would read something like :

"the median amplitude of the response changed from 3.4 (Q1 3.0, Q3 3.6) nA in the control to 4.5 (Q1 4.3, Q3 4.8) nA in the presence of the drug"

but this seems a bit clumsy.

Any thoughts?

Thanks so much

[/quote]

Have you tried taking the logarithm of the data? It doesn't always work but if it is able to make the distribution look "normal" your problems are solved; just report the log(data) and you can use all the standard statistical methods.

Failing that, I would recommend reporting the medium, SEM, and the "skewness" for each data set. Skewness is similar standard deviation except that instead of squaring of the difference between each sample and the mean you use the cubic. For performing the statistical tests; I'd go with Molgen's suggestion and look at nonparametric tests; though I suspect that one called the Mann-Whitney would be more suitable for your data than the Wilkinson which is more analogous to a paired T-test (though I’m pushing the limits of my statistical skill here so it would pay to check with a professional)

Good luck