# Which statistical test use to analyze results from a metagenomic experiment? - (Aug/23/2013 )

Dear all,

I need to choose the correct statistical test to analyze (using SPSS) the results obtained in an experiment as I´m not sure about which would be the correct method. I would really appreciate if someone could give me any advice…

In this case, my variables are a number of different bacteria that have been identified and quanfied for each group in the experiment. In fact, due to the large number of the bacteria identified, I have divided the data in different databases, depending on the phylogenetic level of the detected bacteria.

On the other hand, there were a control group (from which samples were taken at the beginning and at the end of study) and 3 treated groups (with samples taken at the end of treatment). This would be the outline of the experiment. (please find attach)

As observed, the arrows show the comparisons that I´m interested in doing, in the control group (results obtained from the same animals at different time points, at the beginning and at the end of the study) my sample size was of n=5 and in the other groups, the samples size of n=6 (total n=28). Could anyone give me any advice please?

Thank you very much in advance for any help given,

There are a number of ways this could be analysed and you may face criticism for not having the analysis decided upon before commencing the trial.

I would split the formal analysis into two separate tests: First a simple t-test between your two control time points comparing number of species found. Second a 2-way anova dividing your 4 final results of numbers of species (you have two treatments arranged 00, 01, 10, 11).

Then I would do some ‘informal’ or explorative analysis with multivariate clustering on your quantities of each type of bacteria. Basically having a play with the SPSS options to see what might pop out. It is important to call this hypothesis forming to keep it distinct from the hypothesis testing above.

Have fun

Thank you very much for the answer, but a question comes now to my mind. Is it right to do the 2-way ANOVA (one by one this type of comparison) when you have a large number of bacteria (a large number of variables, like a 100)?

Thanks a lot, really appreciate your help

I was thinking about pooling the counts or using the number of species rather than an anova for each species. I suspect that you are going to need multivariate analysis (possibly manova) to encompass all the species.

I see... so if I perform a manova, all my bacteria would be my dependent variables, and my fix factor would be my experimental groups, is that right?

Thanks a lot for your help

I see... so if I perform a manova, all my bacteria would be my dependent variables, and my fix factor would be my experimental groups, is that right?

Thanks a lot for your help

Yes

I envisage that a lot of your bacteria are either not present or very bountiful in a sample (zero inflated) so you may need expert help to make sure that this doesn't invalidate the analysis.

Another possibility that may be worth considering is to pool the bacteria into related groups so that you only have a few anova's to deal with.