• Create Account

Submit your paper to J Biol Methods today!

# Two-proportion z test: is it best for my problem, and can I pool data from multi

z-test comparing proportions pooling data

### #1 dalemcameron

dalemcameron

member

• Members
• 1 posts
0
Neutral

Posted 18 December 2014 - 01:14 PM

I am examining populations (cultures) of yeast cells and scoring them for a particular trait. The data is categorical; within a population, a yeast cell either has the trait or it does not. I wish to compare two populations -- two genetically different strains of yeast -- to see whether there are differences in the proportion of of cells with the trait.

First question: would a two-proportion z test be the way to go here?

Second question: If so, in all of the examples I’ve seen the samples are taken from one “replicate” of each population. Working with yeast, I can set up multiple independent populations (cultures) for each particular strain (in my experiment I sampled four independent cultures for each of the two strains I examined -- scoring several million cells in each sample). Obviously in humans or many other populations such “replicates” are not possible. But for me they are. So my question is this: do you think it is valid to pool the data from all four replicates of each population? Or would it be best to show the calculation from one “representative” population for each?

I am not a statistician nor do I have good tools beyond Excel to do these calculations, so I am looking for a robust yet manageable approach to analyze my data. I can handle a Z test if that is the way to go, but if it's not a valid approach I would love to hear alternatives!

### #2 bob1

bob1

Thelymitra pulchella

• Global Moderators
• 6,507 posts
542
Excellent

Posted 18 December 2014 - 04:33 PM

I think a chi-square test might be better for this sort of analysis, it is used for categorical data like you described (yes or no are the alternatives).

As from each sample you have a high number of cells tested, it shouldn't make much difference to the result if you pool or analyse each sample independently.

Excel is not a good tool for statistics. There are free options, the most powerful of which is "R". THis is normally a command line program, but there are a number of graphical user interface options available (e.g. Mondrian and Rstudio). Have a google for free stats programs, and see what suits you.