Jump to content

  • Log in with Facebook Log in with Twitter Log in with Windows Live Log In with Google      Sign In   
  • Create Account

Submit your paper to J Biol Methods today!
- - - - -

Looking for a method to filter out data from related BLAST results


  • Please log in to reply
2 replies to this topic

#1 dreugene



  • Members
  • Pip
  • 2 posts

Posted 15 June 2012 - 07:57 AM

I am a new member on this forum and I like what I have seen so far. I am currently running a miRNA search as a part of my undergraduate honors thesis. I am analyzing expressed sequence tags from a fern species and using them to search (blastn) against the miRNA database on mirbase (both mature and hairpin sequences). To search for non-coding ESTs I am searching (blastx) against the plant database on UniProt. I have complete the initial searches which have yielded quite a bit of data from only about ~5000 ESTs... I have to now filter out all the protein coding ESTs (evidenced by the blastx result) from my blastn result to end up with only non-protein coding ESTs and their respective mature/hairpin alignments.

Does anyone have a streamlined method for comparing two datasets? I realize that I can manually go through each blastn hit and check the blastx result to see if it is significant or not, but that would be tedious and time consuming (and error prone).

Eager to hear some solutions!Posted Image

#2 Felipillo



  • Active Members
  • PipPipPipPipPip
  • 55 posts

Posted 20 June 2012 - 07:15 PM

Try a blast parser, galaxy has a nice one and what about doing phylogenetic analysis from your blast results, and then you can compare the trees
Chance favors the prepared mind
Louis Pasteur.

#3 dreugene



  • Members
  • Pip
  • 2 posts

Posted 20 June 2012 - 07:22 PM


Using Galaxy (https://main.g2.bx.psu.edu/) tools I can convert the columns of interest using Convert delimiters to TAB, to remove white spaces and then using Join, Subtract, Group -Compare two Datasets tool to find common or distinct rows to display only non-matching alignments. Which is an okay method.

The method I went with is excel LOOKUP formula

For example

Copying the query id column from blastx with %identity>85 into the blastn results column AA, you can create a new column , B and use =lookup(A2,$AA$1:$AA$50,000). Then copy the formula into all B rows adjacent to an A row , until the last A row. Hope that made sense. Once the formula finishes you copy A and B and paste as values into AB to enable further data sorting.

This formula yielded better results than the galaxy compare tool because it is still in excel, no need to use .txt delimited.

Edited by dreugene, 20 June 2012 - 07:23 PM.

Home - About - Terms of Service - Privacy - Contact Us

©1999-2013 Protocol Online, All rights reserved.