Protocol Online logo
Top : New Forum Archives (2009-): : Bioinformatics and Biostatistics

Question on blasting sequence with gaps - (Jun/30/2015 )

Hi, I'm not sure if this is the best place for this question, do point me to the right board if needed. ^^

 

I have some sequences obtained from combining 2 fragments from separate sequencing runs of the same gene. The sequence will contain the primer sequence in the middle, something like:

 

ATCGATCGATCGATCGPPPPPPPATCGATCGATCGATCGATC

 

where P are the sequence for the primers and is also the place where both fragments join seamlessly. We tried designing primers that can produce overlapping fragments but none of them worked well during sequencing. 

 

Normally, we would remove the primer sequence and combine the overlapping fragments before using them in blast. If we remove the primers for the above sequence, there will be gaps between the both fragments. 

 

I tried inserting NNNN in place of the primer sequence followed by blast and found that the % identity decreases 1% compared to when the primer sequence is intact.

 

Is there a way to set blast to ignore the primer sequence? so as to not having the primer sequence affecting the % identity.

 

I hope I have explained clearly. Any insight will be very much appreciated. Thanks!

-Jing2-

To do a BLAST search with a section of the sequence to be ignored use chevrons < > around the unwanted sequence. Using your example, you would enter:

 

ATCGATCGATCGATCG<PPPPPPP>ATCGATCGATCGATCGATC

 

This may help with search efficiency of BLAST in some cases. However, it your primer example will yield an equivalent % identity to replacing the primer sequence with NNNNN. So out of curiosity why is % identity important in your case?

 

Hope this helps.

M smile.png

-Micro-

Thanks for the suggestion. I will try out and see how it goes. We are doing 16s sequencing for a certain group of bacteria, and would normally report the %identity and E-value for the results. 

 

Any more suggestions would be welcomed. 

-Jing2-