Jump to content

  • Log in with Facebook Log in with Twitter Log in with Windows Live Log In with Google      Sign In   
  • Create Account

Submit your paper to J Biol Methods today!
Photo
- - - - -

Narrative description of skewed data


  • Please log in to reply
7 replies to this topic

#1 who_throws_a_shoe

who_throws_a_shoe

    member

  • Members
  • Pip
  • 4 posts
0
Neutral

Posted 13 February 2009 - 07:35 AM

I am trying to write up some results regarding changes to non-guassian distributed data. With normally distributed data it is easy; one can just write the mean +/- SEM. What is the equivalent for non-gaussian data? I imagine one uses the median along with some measure of the spread (interquartile range? some percentile?). I've looked at a few other papers, but there doesn't seem to a consensus (some are just wrong; e.g. median +/- SEM). Any ideas?

Thanks.

#2 molgen

molgen

    Veteran

  • Active Members
  • PipPipPipPipPipPipPipPipPipPip
  • 167 posts
0
Neutral

Posted 17 February 2009 - 11:48 PM

I think that it depends on the group size (n).
For an example, if it's to small you can't do a t-test.
We once did a Wilcoxon test to compare arbitrarily samples.

But the best thing to do is to go to a statistician for help.

#3 molgen

molgen

    Veteran

  • Active Members
  • PipPipPipPipPipPipPipPipPipPip
  • 167 posts
0
Neutral

Posted 18 February 2009 - 06:17 AM

Posted Image

#4 molgen

molgen

    Veteran

  • Active Members
  • PipPipPipPipPipPipPipPipPipPip
  • 167 posts
0
Neutral

Posted 18 February 2009 - 06:19 AM

Posted Image

#5 hobglobin

hobglobin

    Growing old is mandatory, growing up is optional...

  • Global Moderators
  • PipPipPipPipPipPipPipPipPipPip
  • 5,519 posts
95
Excellent

Posted 18 February 2009 - 08:30 AM

I am trying to write up some results regarding changes to non-guassian distributed data. With normally distributed data it is easy; one can just write the mean +/- SEM. What is the equivalent for non-gaussian data? I imagine one uses the median along with some measure of the spread (interquartile range? some percentile?). I've looked at a few other papers, but there doesn't seem to a consensus (some are just wrong; e.g. median +/- SEM). Any ideas?

Thanks.

Normally median should do the job. I'd use box plots with median, lower quartile, upper quartile, smallest and largest observation, outliers. Sometimes the mean is also included. It combines the most important descriptive statistics and gives an impression on the distribution of the data.


"Statistics are like a drunk with a lamppost: used more for support than illumination."
Sir Winston Churchill
:)

Edited by hobglobin, 18 February 2009 - 08:36 AM.

One must presume that long and short arguments contribute to the same end. - Epicurus
...except casandra's that belong to the funniest, most interesting and imaginative (or over-imaginative?) ones, I suppose.

#6 who_throws_a_shoe

who_throws_a_shoe

    member

  • Members
  • Pip
  • 4 posts
0
Neutral

Posted 18 February 2009 - 09:58 AM

I am trying to write up some results regarding changes to non-guassian distributed data. With normally distributed data it is easy; one can just write the mean +/- SEM. What is the equivalent for non-gaussian data? I imagine one uses the median along with some measure of the spread (interquartile range? some percentile?). I've looked at a few other papers, but there doesn't seem to a consensus (some are just wrong; e.g. median +/- SEM). Any ideas?

Thanks.

Normally median should do the job. I'd use box plots with median, lower quartile, upper quartile, smallest and largest observation, outliers. Sometimes the mean is also included. It combines the most important descriptive statistics and gives an impression on the distribution of the data.


"Statistics are like a drunk with a lamppost: used more for support than illumination."
Sir Winston Churchill
:)



Thanks for your help. For my figures, I think am going to plot my raw values as a scatter plot - I have an n of about 50 per treatment, it takes up the same amount of space and it shows all the data (which, in most cases, should be encouraged!). My main issue is how to describe the data in the text. For example, in gaussian-distributed data I would most-likely use the mean and the SEM and therefore the text would include a statement like:

"the mean amplitude of the response changed from 3.2 +/- 0.2 nA in the control to 4.5 +/- 0.3 nA in the presence of the drug"

In my skewed data, to use the mean +/- SEM would be inappropriate. Is there some kind of equivalent? I could use the median, the lower quartile (Q1), and upper quartile (Q3), so the equivalent statement in the results text would read something like :

"the median amplitude of the response changed from 3.4 (Q1 3.0, Q3 3.6) nA in the control to 4.5 (Q1 4.3, Q3 4.8) nA in the presence of the drug"

but this seems a bit clumsy.

Any thoughts?

Thanks so much

#7 hobglobin

hobglobin

    Growing old is mandatory, growing up is optional...

  • Global Moderators
  • PipPipPipPipPipPipPipPipPipPip
  • 5,519 posts
95
Excellent

Posted 18 February 2009 - 10:15 AM

I am trying to write up some results regarding changes to non-guassian distributed data. With normally distributed data it is easy; one can just write the mean +/- SEM. What is the equivalent for non-gaussian data? I imagine one uses the median along with some measure of the spread (interquartile range? some percentile?). I've looked at a few other papers, but there doesn't seem to a consensus (some are just wrong; e.g. median +/- SEM). Any ideas?

Thanks.

Normally median should do the job. I'd use box plots with median, lower quartile, upper quartile, smallest and largest observation, outliers. Sometimes the mean is also included. It combines the most important descriptive statistics and gives an impression on the distribution of the data.


"Statistics are like a drunk with a lamppost: used more for support than illumination."
Sir Winston Churchill
:)



Thanks for your help. For my figures, I think am going to plot my raw values as a scatter plot - I have an n of about 50 per treatment, it takes up the same amount of space and it shows all the data (which, in most cases, should be encouraged!). My main issue is how to describe the data in the text. For example, in gaussian-distributed data I would most-likely use the mean and the SEM and therefore the text would include a statement like:

"the mean amplitude of the response changed from 3.2 +/- 0.2 nA in the control to 4.5 +/- 0.3 nA in the presence of the drug"

In my skewed data, to use the mean +/- SEM would be inappropriate. Is there some kind of equivalent? I could use the median, the lower quartile (Q1), and upper quartile (Q3), so the equivalent statement in the results text would read something like :

"the median amplitude of the response changed from 3.4 (Q1 3.0, Q3 3.6) nA in the control to 4.5 (Q1 4.3, Q3 4.8) nA in the presence of the drug"

but this seems a bit clumsy.

Any thoughts?

Thanks so much


You could give the confidence interval in which e.g. 95% of the measurements are included; it's used frequently for such skewed data.
One must presume that long and short arguments contribute to the same end. - Epicurus
...except casandra's that belong to the funniest, most interesting and imaginative (or over-imaginative?) ones, I suppose.

#8 DRT

DRT

    Veteran

  • Active Members
  • PipPipPipPipPipPipPipPipPipPip
  • 160 posts
7
Neutral

Posted 19 February 2009 - 10:06 PM

[/quote]


Thanks for your help. For my figures, I think am going to plot my raw values as a scatter plot - I have an n of about 50 per treatment, it takes up the same amount of space and it shows all the data (which, in most cases, should be encouraged!). My main issue is how to describe the data in the text. For example, in gaussian-distributed data I would most-likely use the mean and the SEM and therefore the text would include a statement like:

"the mean amplitude of the response changed from 3.2 +/- 0.2 nA in the control to 4.5 +/- 0.3 nA in the presence of the drug"

In my skewed data, to use the mean +/- SEM would be inappropriate. Is there some kind of equivalent? I could use the median, the lower quartile (Q1), and upper quartile (Q3), so the equivalent statement in the results text would read something like :

"the median amplitude of the response changed from 3.4 (Q1 3.0, Q3 3.6) nA in the control to 4.5 (Q1 4.3, Q3 4.8) nA in the presence of the drug"

but this seems a bit clumsy.

Any thoughts?

Thanks so much
[/quote]



Have you tried taking the logarithm of the data? It doesn't always work but if it is able to make the distribution look "normal" your problems are solved; just report the log(data) and you can use all the standard statistical methods.
Failing that, I would recommend reporting the medium, SEM, and the "skewness" for each data set. Skewness is similar standard deviation except that instead of squaring of the difference between each sample and the mean you use the cubic. For performing the statistical tests; I'd go with Molgen's suggestion and look at nonparametric tests; though I suspect that one called the Mann-Whitney would be more suitable for your data than the Wilkinson which is more analogous to a paired T-test (though Iím pushing the limits of my statistical skill here so it would pay to check with a professional)

Good luck




Home - About - Terms of Service - Privacy - Contact Us

©1999-2013 Protocol Online, All rights reserved.