Protocol Online logo
Top : Forum Archives: : Bioinformatics and Biostatistics

comparing difference in gene expression - (Nov/04/2007 )

Hello all,

I'm studying temporal expression of some genes in a mouse model, which is I dissect a specific tissue from new born, 4 days old, 6 days old, etc. animals and use real-time PCR to measure the expression levels.

Earlier, I was first normalizing the expression level of the gene to that of a housekeeping gene (as usual). Then I was again normalizing each ratio to the first time point (P0 = newborn) so that P0 expression was always 1.00 and the other time points were multiples of it. This was expected to tell us how many folds the amount of the transcript changes.

After the second normalization, when I calculate the means, P0 is obviously 1.00 and its standard deviation is 0.00 because all the values are the same. On the other hand, standard deviations are high for the other means because the data is distributed in a larger range. As a result of that, I couldn't apply any statistical tests on these data and realized that this approach was not suitible.

However, without doing the second normalization, the result are more meaningful. But I'm still confused about the transformations and tests which can be applied after that. Now my data are just means of expression levels normalized to housekeeping genes. By the way, due to ethic commitee limitations, my sample size cannot be more than 3, which means I can only have 3 P0, 3 P4, etc. animals.

Now, my questions are:

  1. Is N = 3 enough for satisfying normal distribution or shouldn't I even attempt to apply a distribution test and assume that the data is not normally distributed?
  2. Does transforming the data (log transformation, for example) work for N = 3 or should I just forget it and apply nonparametric tests?


Maybe I can help, but first I have two questions, one of clarification and the other of concern. 1) I assume you are using independent groups for each time point, right?--i.e. your n=3 means 3 mice at p0, 3 different mice at 4 days, etc. Is that correct?
2) Why can you only use n=3 per time point? Ethical reasons? Really? Are you at PETA State University or something? Sorry, I don't mean to be flippant, but n=3 for ETHICAL reasaons makes no sense--n=3 will likely yield insufficient data for valid statistical analysis, in which case, ethically, you're on worse ground, 'cause you've just killed 3 mice for NOTHING. For animal studies, the convention is typically at least n=6 per group; for less than that, a good animal care and use committee reviewing a submitted animal protocol (and I guarantee ALL NIH study sections reviewing a grant application) would ask if you had done statistical power analysis to be sure n=3 is enough to see differences. A low n like that without accompanying power analysis to justify it would raise red flags for sure. Now, maybe you are limited by the mice available to you, e.g. if they are rare and expensive transgenic animals, in which case you have little choice, but if they are just garden-variety C57s I don't understand the limitation in numbers.



First answer, yes, I use 3 different samples for each time point.

Secondly, I'm not limited with the mice at the moment since they are normal, wild type animals. But the second part of the project involves mutant mice and eventually we'll have the n = 3 issue sooner or later.

My question was based on this problem actually, what can I do with n = 3?