Protocol Online logo
Top : Forum Archives: : Bioinformatics and Biostatistics

Blastpgp query: pssm/scoremat generation - (Sep/13/2006 )

Hi, everybody.....

I have been trying to use the blastpgp for generating a profile for the database file that i have.

I have been having trouble figuring out the difference between the -C option, which apparently generates a "PSSM with parameters file" in ASN.1 format, and to something which is called a "scoremat'" . If so how is the scoremat in ASN.1 format generated.
Are they both one and the same???.

Also, my blastpgp seems to be running for unbound time, with an iteration of 5. (j -5.). How can i bring down the profile generation time??

Here is the command line option that i employ.

blastpgp -i <seqfile> -d <dbfile> -j 5 -C <seqfile.pssm> -J T -u 1.

Thanks a lot in advance.

-kamesh-

QUOTE (kamesh @ Sep 13 2006, 04:04 AM)
Hi, everybody.....

I have been trying to use the blastpgp for generating a profile for the database file that i have.

I have been having trouble figuring out the difference between the -C option, which apparently generates a "PSSM with parameters file" in ASN.1 format, and to something which is called a "scoremat'" . If so how is the scoremat in ASN.1 format generated.
Are they both one and the same???.

Also, my blastpgp seems to be running for unbound time, with an iteration of 5. (j -5.). How can i bring down the profile generation time??

Here is the command line option that i employ.

blastpgp -i <seqfile> -d <dbfile> -j 5 -C <seqfile.pssm> -J T -u 1.

Thanks a lot in advance.


I would do this instead:
blastpgp -i <input.txt> -d <database.txt> -j 3 -h 0.001 -e 10.0 -F T -Q <pssm.txt>

Where the database should be the NR database (non-redundant) reduce the iterations - most things hsould converge after 3 -e expectation to get rid of junk and it should be quicker... I am in the process of running some 40,000 jobs.

-perlmunky-

QUOTE (perlmunky @ Sep 13 2006, 08:03 AM)
QUOTE (kamesh @ Sep 13 2006, 04:04 AM)


Thanks a lot in advance.


I would do this instead:
blastpgp -i <input.txt> -d <database.txt> -j 3 -h 0.001 -e 10.0 -F T -Q <pssm.txt>

Where the database should be the NR database (non-redundant) reduce the iterations - most things hsould converge after 3 -e expectation to get rid of junk and it should be quicker... I am in the process of running some 40,000 jobs.


Thanks perl munky

I require the scoremat, that i generate through the blastpgp, for subsequent operations with formatrpsdb and rpsblast......

but is there a difference between a scorematobject and pssmwithparameters file ..... ??. Which of these should i input to the formatrpsdb.

The -C option I understand is what favors the "Pssm with parameters" in ASN.1 format ( checkpoint file).
The -Q (readable pssm), proabaly cannot be taken for subsequent formatrpsdb/rpsblast....(is that right??).]
The u -1, supposedly produces a scorematobject in ASCII.

Have any ideas as to what is the best option that i should use in my command line, to get a scoremat object for the formatrpsdb, rpsblast......processes......

-kamesh-

Sadly I have no idea. Given that RPS blast is designed around the PSI-BLAST data I would gues that is will take the standard PSSM, the only way is to try.

Sorry, I can't be of any help.

-perlmunky-

Hi folks, a more detailed description of my problem

I have been using the blast 2.2.14 version in my Redhat linux OS. While I have
not had problems working on the other blast programs, I am up with some
problems running blastpgp.

I have a query file, that has about 1500 protein sequences, in FASTA format. I
did a blastpgp of this query file, against another database which has about
17,000 protein sequences(FASTA).

The command line option (as i earlier said) that i employ is :

blastpgp -i <query.file> -d <database.file> -j 5 -C <query.pssm> -u 1 -J T.

The search went on for an unbound time, although when I used the -Q option, like what perlmunky suggested it
ended up fairly quick. (I dont understand how this could be.......happening could it be a bug in the 2.2.14 version)

blastpgp -i <query.file> -d <database.file> -j 5 -Q <query.pssm> -u 1 -J T.

1. I have been using the -C, -u combination to generate a PSSMwithparameters
file in ASN.1 format for subsequent formatrpsdb/rpsblast operations...hence
the -Q option is not of use, as only the ASN.1 format is preferred,

2. What is the best command line option that you suggest for me to generate a
pssmwithparametersfile in ASN.1 format for subsequent operations....albeit bringing down the run time....

3. As,I understand the newer versions of blastpgp can process multiple
queries, or is there a limit to the query file size???.

Any help me on these, is welcome.....

-string-