Jump to content

  • Log in with Facebook Log in with Twitter Log in with Windows Live Log In with Google      Sign In   
  • Create Account

Submit your paper to J Biol Methods today!
Photo
- - - - -

Statisitcs


  • Please log in to reply
3 replies to this topic

#1 relapse71

relapse71

    member

  • Active Members
  • Pip
  • 6 posts
0
Neutral

Posted 14 September 2010 - 02:49 PM

I want to know the equation for if a nucleotide sequence within in genome/library is significant. Let me set up an example:

We know that a six base pair sequence AGAATA occurs 1 time in 4096 nucleotides [ (1/4)^6 = 4096 ] correct?
Let's say we have a DNA library that is 50,000 base pairs in total (small library of 50 clones to keep numbers small) it should occur roughly ~12 times [ 50000/4096 = 12.2 ] with 100% probability?

1. What is the probability that it would occur twice? How about 5 times?

2. What is the probability it would occur once in a single clone from this library if the clone is 500 bp long? What about twice?

3. And finally, let's say AGAATA occurs 3 times in one 500 bp clone. Is this significant? What is the equation? I wanted to know basically, "This six base pair sequence has an X-percent probability of occurring within the clone, and an x-percent probability of occurring in the whole library. Thanks.

#2 HomeBrew

HomeBrew

    Veteran

  • Global Moderators
  • PipPipPipPipPipPipPipPipPipPip
  • 930 posts
16
Good

Posted 14 September 2010 - 06:49 PM

Aren't you assuming that all base pairs are equally represented (i.e. that your genome is 25% A, 25% T, 25% C, and 25% G)? If your genome were GC rich, wouldn't you expect this AT-rich sequence to appear less often by chance than it would in an AT-rich genome?

#3 relapse71

relapse71

    member

  • Active Members
  • Pip
  • 6 posts
0
Neutral

Posted 14 September 2010 - 07:41 PM

Aren't you assuming that all base pairs are equally represented (i.e. that your genome is 25% A, 25% T, 25% C, and 25% G)? If your genome were GC rich, wouldn't you expect this AT-rich sequence to appear less often by chance than it would in an AT-rich genome?



Let's assume it's 25% each. And I can replace this sequence with any sequence. I'm just looking on how to do the math.

#4 HomeBrew

HomeBrew

    Veteran

  • Global Moderators
  • PipPipPipPipPipPipPipPipPipPip
  • 930 posts
16
Good

Posted 15 September 2010 - 02:49 AM

I think you want to use a Chi square test for goodness of fit. You would calculate the Chi square statistic by calculating (observed frequency - expected frequency)^2 / expected frequency.




Home - About - Terms of Service - Privacy - Contact Us

©1999-2013 Protocol Online, All rights reserved.