Protocol Online logo
Top : Forum Archives: : Bioinformatics and Biostatistics

DNA database compression - (Oct/14/2007 )

Hi,
dose anyone know how to convert DNA sequence FASTA file to a binary file;(instead of dealing with the sequence as a string that every character take 1 byte, compression of every 4 characters in 1 byte).

-ya7an-

Hi,
I worked with FASTA files and i never heard about that.. Did you know if it's possible? I don't think so.. but i'll see and if i have news i tell you.

-Cardoso-

QUOTE (Cardoso @ Oct 18 2007, 04:19 PM)
Hi,
I worked with FASTA files and i never heard about that.. Did you know if it's possible? I don't think so.. but i'll see and if i have news i tell you.


Hi,

thank you for participation, but i found that BLAST use this technique to minimize memory usage and you can find two formats of the BLAST data base in the BLAST ftp site one in FASTA and the other in .nsq which is binary.
at the moment i am working on developing a program in C that do such thing by converting each Letter into two boolean values for example A to 00, T to 01, C to 10 and G to 11.
i can give you more details about data base compression if you want.

-ya7an-