Protocol Online logo
Top : New Forum Archives (2009-): : Bioinformatics and Biostatistics

Probability Models versus Statistical Models - Two Types of Mathematical Cancer, Tumor Models (Aug/05/2011 )

Most readers are probably familiar with Biostatistics, which is basically a Statistical field with "Statistical Models". Yet even such models have some important relationships to Probability:

1) Statistical Models in Biostatistics almost entirely use Probability except in some Nonparametric approximations or models. They focus largely, however, on questions of sampling and experiments which are by their nature finite or "countably infinite" (enumerated by 1, 2, 3, ....). Thus, discrete models are common in Biostatistics, as exemplified by Binomial, Geometric, Hypergeometric, Poisson, Bernoulli Probability Distributions.

Probability Models differ from Statistical Models in that the former tend to be much less concerned with the process of sampling or experiments than the latter, but more with Probabilistic aspects or Mathematical and Physical-Biological aspects of the Model itself, and those are often Continuous rather than Discrete, which roughly means that their population graphs are usually continuously connected rather than a bunch of dots or rectangles/boxes. Examples of Probability Model distributions are given below:

2) Probability Model distributions include the Normal, Student's t or t distribution, F distribution, uniform distribution in the nonzero part of its domain/range, Gamma distribution including its subtypes of Chi Square(d) and Exponential distributions, etc. They are often used in Statistical Models also, although the discrete types in (1) above also are. As mentioned, interest of Probability Modeling is mostly in the nature of the distributions from mathematical, physical, biological viewpoints and in their appropriateness, relationship to other distributions, causation or correlation, prediction other than the sampling or experimental methods.

I will try to expand on these two types later. For readers who want to get a head start in the motivations for this topic (a difference in models which has been largely not formally specified in the past), readers might explore Bayesian Models as discussed in Probability rather than Statistics textbooks and journal papers/articles.

Osher Doctorow

-OsherDoctorow-

Let A, B be "random events", P(A) be the probability of A, P(AB) the probability of "A and B" or technically the intersection of A and B. Then Bayesian Models are based on the notion of Conditional Probability:

1) The Conditional Probability of B given A, or more briefly the Probability of B given A, symbolically P(B|A), is defined by P(B|A) = P(AB)/P(A) if P(A) is not 0, or is undefined if P(A) is 0, where / means "divide by" in the sense of ordinary division of real numbers. The symbol | on the left is merely a symbol for "given", the English word, and has no direct other meaning.

An interesting second type of Probability Model, other than Bayesian Models, is the Probable Causation/Influence (PI) Model that my late wife Marleen and I worked on, and published several peer-reviewed papers on. Here, instead of P(B|A), the symbol used for PI is PA-->B. But in this case, A-->B actually has a definition itself, namely:

2) A-->B = (AB ' ) ' = A ' U B where ' means "not" or "complement of" (part of the universe outside of), so that A ' means "not A" roughly speaking or more precisely the complement of A (the part of the universe outside A).

It is relatively easy to show that, based on the ordinary laws of Probability, we have:

3) PA-->B = 1 + P(AB) - P(A)

Comparing (1) and (3), we note that (3) replaces division by subtraction on the right hand sides, and adds 1. Yet the properties of PB|A and PA-->B, the latter roughly speaking meaning the Probability that A influences or causes B, are extremely different except for one curious property that readers can prove by algebra:

4) P(B|A) < = PA-->B if P(A) is not 0, where < = means "is less than or equal to".

Readers might try to derive corresponding equations for the above for continuous random variables X and Y, using the clues that A = {w: X(w) < = x}, B = {w: Y(w) < = y} (they should more precisely be labelled Ax, Bx, or A_x, B_x, but this way saves time and space). Note that in this notation, AB = {w: X < = x, Y < = y} where the comma (,) means "and".

Osher Doctorow

-OsherDoctorow-

From Osher Doctorow

I suggested that readers give formulas or equations for any continuous random variables X, Y that are analogous to those for A, B respectively in the previous equations, and here I will give those equations. First, some preliminary definitions:

1) The symbol FX(x) is used to mean the (marginal) cumulative distribution function (cdf) of X, namely for continuous random variable X: FX(x) = P(X < = x) where x is some real number.

2) The symbol F(x, y) is the bivariate cdf of X and Y (see (1) above), that is: F(x, y) = P(X < = x, Y < = y).

Here then are the equations:

3) P(X-->Y) = P(A-->B for A = {w: X(w) < = x}, B = {w: Y(w) < = y}, and P(A-->B = 1 + FX(x) - F(x, y). (If the smiley face symbol appears here, it means "B", or more precisely B followed by right-closed-parenthesis.

4) P(Y|X) or more commonly F(Y|X)(y|x) or FY|X(y|x) = F(x, y)/FX(x) is FX(x) is not 0.

Note the perfect analogy between these equations and the earlier equations for P(B|A) versus P(A-->B.

Osher Doctorow

-OsherDoctorow-

The equations are:

1) P(B|A) = P(AB)/P(A) if P(A) is not 0.
2) PA--> B = 1 + P(AB) - P(A) for all random sets A, B.
3. P(Y|X=x) = F(x, y)/FX(x) if FX(x) is not 0, where F(x,y) = P(X < = x, Y < = y), FX(x) = P(X < = x).
4) PX influences Y = 1 + F(x,y) + FX(x), where F(x,y), FX(x) are as defined in (3).

Now how can we illustrate the difference between the two Probability Models, (1) and (2), or in other words (1) and (3) versus (2) and (4)? The first is Conditional Probability, used in Bayesian Statistics (look it up online for example), the second is Probable Causation/Influence (PI). Since Bayesian Statistics has many examples online and in textbooks and journal papers, I'll focus on PI here.

Lets begin by considering Correlation. We have:

5) In Conditional Probability (e.g., Bayesian Statistics), Correlation between two random variables X, Y, is a ratio of Covariances (look up the term online if you don't know it) to Variances (ditto). There is no "Correlation between random Events A, B".

6) In PI (Probable Causation/Influence), there are both Correlations between two random variables X, Y, and between two random events A, B. Let A iff B represent "A influences B and B influences A", where "A influences B" is A ' U B, that is to say the complement of A, "and/or" B. Then it is easy to show that A iff B = AB U A ' B '. It is also easy to show PA iff B = P(AB) + P(A ' B ' ). We define the PROBABLE CORRELATION of A and B in PI to be the expression PA iff B. It is then easy to show that the PROBABLE CORRELATION of random variables X and Y is the expression PX iff Y = F(x, y) + R(x, y), where R(x, y) is the Reliability P(X > x, Y > y) for continuous random variables X, Y.

Next, let us consider cancer or tumors in general. Suppose we regard cancer as a case or type of generalized Entropy or Disorder. The more Entropy a system has, the more Disordered it is. Ordinarily, Entropy of an isolated or closed system increases WITH TIME. But generalized Entropy or generalized Disorder can be treated as capable at least of "local decrease" in PI, WITH TIME. To do this, define time T as:

7) TA-->B = P(A-->B - PA iff B, where PA-->B = 1 + P(AB) - P(A). In words, the Time that it takes for A to Influence B is the probability that A influences B minus the Probable Correlation of A and B. In other words, it is the Probable Influence of A on B that does not occur SIMULTANEOUSLY for A and B.

We can formulate (7) for continuous Random Variables, and obtain:

8) TX-->Y = PX-->Y - PX iff Y, where PXinfluences Y = 1 + F(x, y) - FX(x) and PX iff Y = F(x, y) + R(x, y).

Readers can now try an exercise: examine what happens if first P(X-->Y) is constant, say = k in (8), and then if secondly PX iff Y is constant, say = k2 in (8). It will easily be discovered that in one of these two cases, the more Disorder (the less Probable Correlation between X and Y), the less time TX-->Y takes, while in the other case, the higher PX-->Y for constant Disorder, the more time TX-->Y takes. These are partially opposite effects on Entropy at least locally or arguably in open systems.

The conclusion is that if Entropy can be formulated in this generalized local or open system form, then it can theoretically be held fixed or even reversed, and if cancer can be formulated as a type of Disorder/Entropy, then the same applies to it.

Osher Doctorow

-OsherDoctorow-

Cancer is not a localised decrease or increase in entropy, any more than other dividing cell types are, therefore your logic does not hold true.

-bob1-