How can get a list of the AA sequences for a given (long) list of accessions? - (Oct/03/2006 )
I have a list of some 500+ SwissProt Accessions of proteins, all are viral proteins.
I need to generate a list (Tab- or Comma-separated, Excel, whatever), containing the AA sequence of that protein.
For example, given only "P20509" I want an entry in the list like that:
P20509, MDVRCINWFESHGENRFLYLKSRCRNGET FIRFPHYFYYVVTDEIYQSL [...] PTFYEA
Can I do that somehow with Uniprot? Or which Database can do that? It would be great if I can somehow upload my list of accessions and get the actual sequences in return.
Serious help is greatly appreciated, thanks a lot!
most of the databases have this as part of the search feature (advanced in some cases).
if they don't you gcould try the mysql backend or write a program to do it for you.
I suggest looking at the advanced search features.
you can definately do this in the bioedit program (freeware) you can then do some alignments if.
select file retrieve sequences from genbank, then input the accession of GI numbers and hey presto your sequences should show up!!
hope this helps
500+ is way too many to do by hand, but we could write a Perl script that would retrieve the sequences for you...
We'd just need to read in the SwissProt Accession numbers, then retrieve the sequence (from, for example, http://www.expasy.org/uniprot/P20509.fas), then reformat them and save them in a file.
Actually this shouldn't be too hard; it's similar to what we did here. Give me about 25 of your accession numbers, one per line, and I'll see what I can whip up for you...
thanks for your answers, especially HomeBrew for his great offer for help. I already did this just now, I downloaded the entire UniProt FASTA-data and wrote a Perl script that looks for the ID and gives me back the sequences.
It was definitely something you did not want to do by hand! Glad you got it worked out, gojira...