Protocol Online logo
Top : New Forum Archives (2009-): : Bioinformatics and Biostatistics

probability question - dna sequence occurance (Sep/14/2010 )

I want to know the equation for if a nucleotide sequence within in genome/libarary is significant. Let me set up an example:

We know that a six base pair sequence AGAATA occurs 1 time in 4096 nucleotides < (1/4)^6 = 4096 > correct?
Let's say we have a DNA library that is 50,000 base pairs in total (small library of 50 clones to keep numbers small) it should occur roughly ~12 times < 50000/4096 = 12.2 > with 100% probability?

1. What is the probability that it would occur twice? How about 5 times?

2. What is the probability it would occur once in a single clone from this library if the clone is 500 bp long? What about twice?

3. And finally, let's say AGAATA occurs 3 times in one 500 bp clone. Is this significant? What is the equation? I wanted to know basically, "This six base pair sequence has an X-percent probability of occurring within the clone, and an x-percent probability of occurring in the whole library. Thanks.

-relapse71-

Wikipedia can help. Try probability and Bayes' Theorem : http://yudkowsky.net/rational/bayes




relapse71 on Tue Sep 14 21:29:21 2010 said:


I want to know the equation for if a nucleotide sequence within in genome/libarary is significant. Let me set up an example:

We know that a six base pair sequence AGAATA occurs 1 time in 4096 nucleotides < (1/4)^6 = 4096 > correct?
Let's say we have a DNA library that is 50,000 base pairs in total (small library of 50 clones to keep numbers small) it should occur roughly ~12 times < 50000/4096 = 12.2 > with 100% probability?

1. What is the probability that it would occur twice? How about 5 times?

2. What is the probability it would occur once in a single clone from this library if the clone is 500 bp long? What about twice?

3. And finally, let's say AGAATA occurs 3 times in one 500 bp clone. Is this significant? What is the equation? I wanted to know basically, "This six base pair sequence has an X-percent probability of occurring within the clone, and an x-percent probability of occurring in the whole library. Thanks.

-perlmunky-