# Which regression model used to trace the source of infection - (Feb/19/2019 )

Dear All,

I have data about the expression of 7 genes ( as being present or absent by PCR) in 100 isolate of E. coli bacteria from different sources (e.g. humans, chicken and wild birds) taken at the same place. I would like to use a model to tell which reservoir (chicken or wild birds) can be the source of human infection. in another word, I need a model that I can run in SPSS, MINITAB to show the relationship between these reservoirs based on the expression profile of the 7- genes.

Could any one suggest ...

Thanks

I don't think that you need a regression model as such - you need a dendrogram (phylogenetic tree) that shows where the genes reservoir samples cluster with the human ones. Regression would be used if there was a critical level of expression of several genes before you were considered to have infection.

I did not get it, already I generated a dendogram. However, what do you mean with the model. Peoples advised a regresison model. How do you think

Well, a common linear regression probably won't work - for that you need two variables that are related in some manner (i.e. dependent relationship) - your samples are independent of each-other as they come from different environments, so absolute gene expression levels (mRNA copy number) might not be correlated to the infectious process, but rather to the environment they are sampled from. You might be able to do a basic statistical test like a chi-squared to answer the question "what is the probability of having these two similar expression profiles by random chance?"

However, there are many many other forms of regression - you might need a multivariate regression potentially, but it really depends on what types of data you have and what question you are trying to answer. You may want to look at epidemiological modeling or other forms of statistical testing

I suggest you talk to a statistician, regression may not be the answer you are looking for!