Protocol Online logo
Top : Forum Archives: : Bioinformatics and Biostatistics

Deterministric indexing of DNA - What do you want to search for? (Jul/11/2005 )

I've figured out a way of building highly optimized inverted (deterministic) indexes on genomic data, but as I'm a programmer and not a bioinformatician, I'm not really sure how to use this system to develop tools that are of use to researchers. My inverted indexes have an indexing cost of approximately 4 times the size of the original sequence,
which I believe is pretty good.

I'm not trying to compete with BLAST here, but to offer a different technology that is complementary to statistical indexing systems such as BLAST.

So, if anyone has a sequence retreival problem that requires high precision searching, I may be able to help.

If you like to play with the system as it stands today, you can at www.bioinfosci.com

Any ideas would be appreciated.

-Patrick_Wenzel-

QUOTE (Patrick_Wenzel @ Jul 11 2005, 12:28 PM)
I've figured out a way of building highly optimized inverted (deterministic) indexes on genomic data, but as I'm a programmer and not a bioinformatician, I'm not really sure how to use this system to develop tools that are of use to researchers.  My inverted indexes have an indexing cost of approximately 4 times the size of the original sequence,
which I believe is pretty good.

I'm not trying to compete with BLAST here, but to offer a different technology that is complementary to statistical indexing systems such as BLAST. 

So, if anyone has a sequence retreival problem that requires high precision searching, I may be able to help.

If you like to play with the system as it stands today, you can at www.bioinfosci.com

Any ideas would be appreciated.


Are you looking for primary application for your techniques?

-cyberpostdoc-

I guess I am. I already implemented an "exact match" search, and a non-gapped alignment search. And then I read about the problems in finding "PCR primers" so I added a search system that looks for unqiue sequences in a genome.

I don't understand the "issues" of bioinformations to really know what to do next.

I got into this because I was told it was impossible to build inverted indexes on genomic data. That wasn't true, of course. The problem was really that the indexes grew to be so large that they are unmanagable. I think I have come up with a pretty good solution to the indexing problem. Now I just need to find bioinformatics applications were deterministic indexes perform better than statistical indexing system, i.e. BLAST ant it's variants.

I'm sure there are such problems, I just don't what they are.

-Patrick_Wenzel-