# Frequency of restriction enzyme sites in a genome - (Mar/18/2011 )

Hi all

I want to check one enzyme that on the average how many sites are there for this particular enzyme in human or mouse genome. is there any tool available for that?

Well you can usually calculate it yourself, take the length of the genome, and divide by how often an enzyme cuts on average. For example EcoRI cuts at a 6bp site, the frequency of cutting is 4^6, so it cuts every 4096bp on average, as it's sequnce will occur at random every 4096bp.

This does mean that in a genome of several billion such as humans, it will cut millions of times.

philman on Fri Mar 18 14:31:55 2011 said:

Well you can usually calculate it yourself, take the length of the genome, and divide by how often an enzyme cuts on average. For example EcoRI cuts at a 6bp site, the frequency of cutting is 4^6, so it cuts every 4096bp on average, as it's sequnce will occur at random every 4096bp.

This does mean that in a genome of several billion such as humans, it will cut millions of times.

708,007 times, to be exact

philman on Fri Mar 18 14:31:55 2011 said:

Well you can usually calculate it yourself, take the length of the genome, and divide by how often an enzyme cuts on average. For example EcoRI cuts at a 6bp site, the frequency of cutting is 4^6, so it cuts every 4096bp on average, as it's sequnce will occur at random every 4096bp.

This does mean that in a genome of several billion such as humans, it will cut millions of times.

I dont think thats a valid method bcaz there are enzymes that cut with more frequency, even if they have same recognition sequence length. for example EcoRI cuts 3 times more frequently than MsPI

As a general rule that method is roughly correct, though it does vary... You can use it to estimate the general number of cut sites. If you want to know exactly, then you will have to get the genome and run it through a program like enzymeX or NEBcutter.

You can get a much more accurate estimate if you take into account the probability of GC and AT pairs independently. If the GC content of the organism is (say) 70%, and the recognition site of the enzyme is GAATTC (EcoRI site), then the probability of its presence will be (.35)(.15)(.15)(.15)(.15)(.35) = 6.2e-5, since the probability of G is half of the probability of GC, and the probability of A is half of the probability of AT.

Instead of the naively calculated 4096 bp between sites on average, the 70% GC content version will have an expected distance of 1/6.2e-5 = 16,129 bp.

This still won't be exact, nor does it account for digraphs or special genome sequences.