Protocol Online logo
Top : Forum Archives: : Bioinformatics and Biostatistics

Restriction sites frequencies in mouse genome - (Aug/28/2006 )

Pages: 1 2 Next

Hello eveybody,

I'm a new member here. I'm a bioinformatics and bioanalysis pearson in charge in Lyon.

I'm trying to calculate frequencies of apparition of, at least, 200 restriction sites on mouse genome. I've already tried to do that using Blast or Blast-like tools but it has no success (because of too numerous matches i think). I would like to use already realised solutions instead of developping it on my own. Does someone know a solution different from uploading mouse genome on my local server and using a local application to do so ?
Can someone help me on this topic ? I'd be very greatful to you...

Thanks a lot for your help,
Regards,
Benoît.

-Benoit Varvenne-

I would have thought that some of this data would be available already, but I was unable to find it. I was also unsuccessful at finding a stand-alone or web-based program that might do this for you on a genome scale.

I think we could write one fairly easily in Perl, if you want to go that route...

-HomeBrew-

try the NEB (New England Biolabs) catelogue or their website, there is certainly data on such things for the human and I would also say for the humble mouse as well.

-methylnick-

Thanks for your answers. In fact, i think the only solution is to program a (at the beginning) little script.
Moreover i think we're going to make several additive analysis of the repartition of restriction sites over the genome.
NEB doesn't provide such services and all other ressources are not usable on a whole genome scale.

Regards

-_bioinx6996-

We could write a Perl script to do this. See, for example, the articles here and here. These techniques would allow you to search the genome for restriction enzyme sites without loading the entire genome sequence into memory.

Given an array of restriction enzyme recognition sites (as Perl regular expressions, perhaps, or maybe we could use a Rebase file), it becomes a pattern matching problem. We could iterate over the array of sites, and increment a counter, perhaps in a hash, as each pattern is found. If you need postions as well, we could return the offset of the match with pos()...

-HomeBrew-

QUOTE (HomeBrew @ Aug 31 2006, 01:07 PM)
We could write a Perl script to do this. See, for example, the articles here and here. These techniques would allow you to search the genome for restriction enzyme sites without loading the entire genome sequence into memory.

Given an array of restriction enzyme recognition sites (as Perl regular expressions, perhaps, or maybe we could use a Rebase file), it becomes a pattern matching problem. We could iterate over the array of sites, and increment a counter, perhaps in a hash, as each pattern is found. If you need postions as well, we could return the offset of the match with pos()...


Hello,

Thanks, these papers are interesting.
Do you think we can easily (i mean with little time dedicated to programming) perform this using Ensembl genome available on their server (without downloading it locally) ?

Regards

-_bioinx6996-

It would be infinitely easier if we just had a local copy of the genome as a FASTA file...

Why do you want to avoid this?

-HomeBrew-

QUOTE (HomeBrew @ Aug 31 2006, 04:27 PM)
It would be infinitely easier if we just had a local copy of the genome as a FASTA file...

Why do you want to avoid this?


I agree with you but the aim is to perform (later but soon i suppose smile.gif ) many other analysis and much more complicated ones. I mean that they'll probably ask for comparison between restriction sites found and genome properties in these region, ... lots of things like that. Things i can't analyze using a fasta file because of needed features.
I don't have to do such things now but i'm sure they'll ask me later so ...

-_bioinx6996-

Let me dig through the site a bit and see what's available...

-HomeBrew-

Hi all,

For those interested, i've found the solution for this problem :
- find, with a Perl script, occurences of patterns on genome files downloaded locally (thanks HomeBrew for your links to an efficient script)
- make the link with features available on Ensembl database using Ensembl Perl API which seems to be powerful (so thanks again HomeBrew, i've found the way to do so)

Thanks for your answers, advise and interest

-_bioinx6996-

Pages: 1 2 Next