Protocol Online logo
Top : New Forum Archives (2009-): : Bioinformatics and Biostatistics

Sequence alignment: one large sequence (e.g. 1200 bp) and many small (e.g. 30 bp - (May/25/2011 )

I would like to align a large sequence with many small ones (ASCII schematic below). MegAlign software tried to align all sequences to each other but I only want the small ones aligned to the large one. I could do them all separately, but that's a lot messier than it needs to be.

This is the sort of thing I'm trying to achieve:


---------------------------------------------------------------------------------------------------------------------------------------
-------------------
----------------------
----------------------------


Is LaserGene's MegAlign can't do it, can some other software?

-seanspotatobusiness-

Sequencher if you happen to know someone that has it, or you could try the freeware program ClustalX, or alternatively interfaces that use ClustalW eg. BioEdit or MEGA. In any of these programs you will need to reduce the "gap opening penalty" and/or the "gap extension penalty" in order to help it along to create the large gaps needed for this alignment.

Otherwise if this doesn't work (and assuming you don't have hundreds of fragments to align) you could select your long sequence and one of the short sequences and then align, repeat for each short sequence.

Cheers
M :)

-Micro-

This is a typical situation in genome assembly where you have a reference sequence and many smaller sequences to be aligned against it. Multiple Sequence Alignment (MSA) tools do not behave well in a case such as this since they assume that all the sequences have a mutual relationship, which is not the case here. There are many specialist tools to deal with this, to begin with I would try the est2genome program in the EMBOSS suite. You can find online services providing this tool.

-Hamish McWilliam-

You can also try making what is known as a contig. I think lasergene has a tool for that.

-bob1-