Protocol Online logo
Top : New Forum Archives (2009-): : Bioinformatics and Biostatistics

Problem with solexa read assembly - (Feb/20/2011 )

hi,

i am trying to sequence the bacterial genome using solexa reads...From solexa i got 8,053,769 reads.

I use GENEIOUS software to assemble them..when i tried to de-novo assemble it, it gave me 381 contigs....

However, i am not sure how do i proceed with this from here...

since i am ameture in this field, i am not sure what i need to do..

please help

-roshanbernard-

i'm no specialist at bioinformatics at all ...but as far as i know you can have a hard time on de-novo assembly using Illumnia reads.
It very much depends on the technology you have used ...paired-end or not and what coverage do you have ...whether this will be successful or not. But if there is no reference genome i would prefere using 454 technology to create a scaffold first. Since the reads with Illumnia are short you will have a hard time with all that repetitive elements in your genome (tRNAs, IS-elements, REP-sequences).

I don't know what assembler the Geneious software suit uses but the choice of the assembler has also great influence on the result :)

Maybe you can get in conntact with some bioinformatics that will help you on that issue ...since finishing a genome is not that easy and a waste of time and money if it has not been done correctly.

If you really have 800 contigs this would mean you'll have a lot of gaps to close by sanger sequencing!
Try to get some help from specialists!

Regards,
p

roshanbernard on Mon Feb 21 01:04:10 2011 said:


hi,

i am trying to sequence the bacterial genome using solexa reads...From solexa i got 8,053,769 reads.

I use GENEIOUS software to assemble them..when i tried to de-novo assemble it, it gave me 381 contigs....

However, i am not sure how do i proceed with this from here...

since i am ameture in this field, i am not sure what i need to do..

please help

-pDNA-

I'm not familiar with the Illumina either, but we are starting on a 454jr. The rep who showed us how to do data analysis recommended we randomly select less reads (especially when you have as many as 8 millions). She found the amount of samples that was best to give a single contig for this particular sample. Unfortunately, she said that there was no magic number, you need to play around with the number of reads you use. I'd try with 1 million first see if the number of contigs decreases and then go from there. But again, this was advice for 454. Not sure it applies to Solexa.

-Maddie-

Thank you for the advise....as u guys suggested me i will just try trimming and using different assemble and see what will be the outcome...

-roshanbernard-