The Sequence Manipulation Suite:
About
The Sequence Manipulation Suite is written in JavaScript 1.5, which is a lightweight, cross-platform, object-oriented scripting language. JavaScript is now standardized by the ECMA (European Computer Manufacturers Association). The first version of the ECMA standard is documented in the ECMA-262 specification. The ECMA-262 standard is also approved by the ISO (International Organization for Standards) as ISO-16262. JavaScript 1.5 is fully compatible with ECMA-262, Edition 3.

Sequences submitted to the Sequence Manipulation Suite do not leave your computer and are instead manipulated by your web browser, which executes the JavaScript. The Sequence Manipulation Suite was written by Paul Stothard (University of Alberta, Canada). Send questions and comments to stothard@ualberta.ca.

Here are short descriptions of the programs that comprise the Sequence Manipulation Suite:

Format Conversion:
  • Combine FASTA - converts multiple FASTA sequence records into a single sequence. Use Combine FASTA, for example, when you wish to determine the codon usage for a collection of sequences using a program that accepts a single sequence as input.
  • EMBL to FASTA - accepts an EMBL file as input and returns the entire DNA sequence in FASTA format. Use this program when you wish to quickly remove all of the non-DNA sequence information from an EMBL file.
  • EMBL Feature Extractor - accepts an EMBL file as input and reads the sequence feature information described in the feature table. The program extracts or highlights the relevant sequence segments and returns each sequence feature in FASTA format. EMBL Feature Extractor is particularly helpful when you wish to derive the sequence of a cDNA from a genomic sequence that contains many introns.
  • EMBL Trans Extractor - accepts an EMBL file as input and returns each of the protein translations described in the file in FASTA format. EMBL Trans Extractor can be used when you are more interested in the predicted protein translations of a DNA sequence than the DNA sequence itself.
  • Filter DNA - removes non-DNA characters from text. Use this program when you wish to remove digits and blank spaces from a sequence to make it suitable for other applications.
  • Filter Protein - removes non-protein characters from text. Use this program when you wish to remove digits and blank spaces from a sequence to make it suitable for other applications.
  • GenBank to FASTA - accepts a GenBank file as input and returns the entire DNA sequence in FASTA format. Use this program when you wish to quickly remove all of the non-DNA sequence information from a GenBank file.
  • GenBank Feature Extractor - accepts a GenBank file as input and reads the sequence feature information described in the feature table, according to the rules outlined in the GenBank release notes. The program extracts or highlights the relevant sequence segments and returns each sequence feature in FASTA format. GenBank Feature Extractor is particularly helpful when you wish to derive the sequence of a cDNA from a genomic sequence that contains many introns.
  • GenBank Trans Extractor - accepts a GenBank file as input and returns each of the protein translations described in the file in FASTA format. GenBank Trans Extractor should be used when you are more interested in the predicted protein translations of a DNA sequence than the DNA sequence itself.
  • One to Three - converts single letter translations to three letter translations.
  • Range Extractor DNA - accepts a DNA sequence along with a set of positions or ranges. The bases corresponding to the positions or ranges are returned, either as a single new sequence, a set of FASTA records, uppercase text, or lowercase text. Use Range Extractor DNA to obtain subsequences using position information.
  • Range Extractor Protein - accepts a protein sequence along with a set of positions or ranges. The residues corresponding to the positions or ranges are returned, either as a single new sequence, a set of FASTA records, uppercase text, or lowercase text. Use Range Extractor Protein to obtain subsequences using position information.
  • Reverse Complement - converts a DNA sequence into its reverse, complement, or reverse-complement counterpart. The entire IUPAC DNA alphabet is supported, and the case of each input sequence character is maintained. You may want to work with the reverse-complement of a sequence if it contains an ORF on the reverse strand.
  • Three to One - converts three letter translations to single letter translations. Digits and blank spaces are removed automatically. Non-standard triplets are ignored.
Sequence Analysis:
  • Codon Plot - accepts a DNA sequence and generates a graphical plot consisting of a horizontal bar for each codon. The length of the bar is proportional to the frequency of the codon in the codon frequency table you enter. Use Codon Plot to find portions of DNA sequence that may be poorly expressed, or to view a graphic representation of a codon usage table (by using a DNA sequence consisting of one of each codon type).
  • Codon Usage - accepts one or more DNA sequences and returns the number and frequency of each codon type. Since the program also compares the frequencies of codons that code for the same amino acid (synonymous codons), you can use it to assess whether a sequence shows a preference for particular synonymous codons.
  • DNA Pattern Find - accepts one or more sequences along with a search pattern and returns the number and positions of sites that match the pattern. The search pattern is written as a JavaScript regular expression, which resembles the regular expressions written in other programming languages, such as Perl.
  • DNA Stats - returns the number of occurrences of each residue in the sequence you enter. Percentage totals are also given for each residue, and for certain groups of residues, allowing you to quickly compare the results obtained for different sequences.
  • Fuzzy Search DNA - accepts a DNA sequence along with a query sequence and returns sites that are identical or similar to the query. You can use this program, for example, to find sequences that can be easily mutated into a useful restriction site.
  • Fuzzy Search Protein - accepts a protein sequence along with a query sequence and returns sites that are identical or similar to the query.
  • Ident and Sim - accepts a group of aligned sequences (in FASTA or GDE format) and calculates the identity and similarity of each sequence pair. Identity and similarity values are often used to assess whether or not two sequences share a common ancestor or function.
  • Multi Rev Trans - accepts a protein alignment and uses a codon usage table to generate a graph that can be used to find regions of minimal degeneracy at the nucleotide level. Each column in the protein sequence alignment is converted into three sets of four horizontal bars. Each set represents one position in the codon, and each of the four bars in the set represents one of the four bases. The length of each bar is proportional to the observed frequency of the base in the codon usage table, for the given residues and codon position.
  • ORF Finder - searches for open reading frames (ORFs) in the DNA sequence you enter. The program returns the range of each ORF, along with its protein translation. ORF Finder uses the standard genetic code and supports the entire IUPAC alphabet. Use ORF Finder to search newly sequenced DNA for potential protein encoding segments.
  • Pairwise Align DNA - accepts two DNA sequences and determines the optimal global alignment. Use Pairwise Align DNA to look for conserved sequence regions.
  • Pairwise Align Protein - accepts two protein sequences and determines the optimal global alignment. Use Pairwise Align Protein to look for conserved sequence regions.
  • PCR Products - accepts one or more DNA sequence templates and two primer sequences. The program searches for perfectly matching primer annealing sites that can generate a PCR product. Any resulting products are sorted by size, and they are given a title specifying their length, their position in the original sequence, and the primers that produced them. You can use linear or circular molecules as the template. Use PCR Products to determine the product sizes you can expect to see when you perform PCR in the lab.
  • Protein Isoelectric Point - calculates the theoretical pI (isoelectric point) for the protein sequence you enter. Use Protein Isoelectric Point when you want to know approximately where on a 2-D gel a particular protein will be found.
  • Protein Molecular Weight - accepts a protein sequence and calculates the molecular weight. You can append copies of commonly used epitopes and fusion proteins using the supplied list. Use Protein Molecular Weight when you wish to predict the location of a protein of interest on a gel in relation to a set of protein standards.
  • Protein Pattern Find - accepts one or more sequences along with a search pattern and returns the number and positions of sites that match the pattern. The search pattern is written as a JavaScript regular expression, which resembles the regular expressions written in other programming languages, such as Perl.
  • Protein Stats - returns the number of occurrences of each residue in the sequence you enter. Percentage totals are also given for each residue, and for certain groups of residues, allowing you to quickly compare the results obtained for different sequences.
  • Restriction Digest - cleaves a DNA sequence in a virtual restriction digest, with one, two, or three restriction enzymes. The resulting fragments are sorted by size, and they are given a title specifying their length, their position in the original sequence, and the enzyme sites that produced them. You can digest linear or circular molecules, and even a mixture of molecules (by entering more than one sequence in FASTA format). Use Restriction Digest to determine the fragment sizes you will see when you perform a digest in the lab.
  • Restriction Summary - accepts a DNA sequence and returns the number and positions of commonly used restriction endonuclease cut sites. Use this program if you wish to quickly determine whether or not an enzyme cuts a particular segment of DNA.
  • Reverse Translate - accepts a protein sequence and uses a codon usage table to generate a graph that can be used to find regions of minimal degeneracy at the nucleotide level. Each residue in the protein sequence is converted into three sets of four horizontal bars. Each set represents one position in the codon, and each of the four bars in the set represents one of the four bases. The length of each bar is proportional to the observed frequency of the base in the codon usage table, for the given residue and codon position.
  • Translate - accepts a DNA sequence and converts it into a protein in the reading frame you specify. Translate uses the standard genetic code and supports the entire IUPAC alphabet.
Sequence Figures:
  • Color Align Conservation - accepts a group of aligned sequences (in FASTA or GDE format) and colors the alignment. The program examines each residue and compares it to the other residues in the same column. Residues that are identical among the sequences are given a black background, and those that are similar among the sequences are given a gray background. The remaining residues receive a white background. You can specify the percentage of residues that must be identical and similar for the coloring to be applied. Use Color Align Conservation to enhance the output of sequence alignment programs.
  • Color Align Properties - accepts a group of aligned sequences (in FASTA or GDE format) and colors the alignment. The program examines each residue and compares it to the other residues in the same column. Residues that are identical or similar among the sequences are given a colored background. The color is chosen according to the biochemical properties of the residue. You can specify the percentage of residues that must be identical and similar for the coloring to be applied. Use Color Align Properties to highlight protein regions with conserved biochemical properties.
  • Group DNA - adjusts the spacing of DNA sequences and adds numbering. You can specify the group size (the number of bases per group), as well as the number of bases per line. The output of this program can serve as a convenient reference, since the numbering and spacing allows you to quickly locate specific bases.
  • Group Protein - adjusts the spacing of protein sequences and adds numbering. You can specify the group size (the number of residues per group), as well as the number of residues per line. The output of this program can serve as a convenient reference, since the numbering and spacing allows you to quickly locate specific residues.
  • Primer Map - accepts a DNA sequence and returns a textual map showing the annealing positions of PCR primers. Restriction endonuclease cut sites, and the protein translations of the DNA sequence can also be shown. Use this program to produce a useful reference figure, particularly when you have designed a large number of primers for a particular template. Primer Map uses the standard genetic code and supports the entire IUPAC alphabet.
  • Restriction Map - accepts a DNA sequence and returns a textual map showing the positions of restriction endonuclease cut sites. The translation of the DNA sequence is also given, in the reading frame you specify. Use the output of this program as a reference when planning cloning strategies. Restriction Map uses the standard genetic code and supports the entire IUPAC alphabet.
  • Translation Map - accepts a DNA sequence and returns a textual map displaying protein translations. The reading frame of the translation can be specified (1, 2, 3, or all three), or you can choose to treat uppercase text as the reading frame. Translation Map uses the standard genetic code and supports the entire IUPAC alphabet.
Random Sequences:
  • Random DNA Sequence - generates a random sequence of the length you specify. Random sequences can be used to evaluate the significance of sequence analysis results.
  • Random Protein Sequence - generates a random sequence of the length you specify. Random sequences can be used to evaluate the significance of sequence analysis results.
  • Sample DNA - randomly selects bases from the guide sequence until a sequence of the length you specify is constructed. Each selected base is replaced so that it can be selected again.
  • Sample Protein - randomly selects bases from the guide sequence until a sequence of the length you specify is constructed. Each selected residue is replaced so that it can be selected again.
  • Shuffle DNA - randomly shuffles a DNA sequence. Shuffled sequences can be used to evaluate the significance of sequence analysis results, particularly when sequence composition is an important consideration.
  • Shuffle Protein - randomly shuffles a protein sequence. Shuffled sequences can be used to evaluate the significance of sequence analysis results, particularly when sequence composition is an important consideration.
[home]
2.013-Thu Nov 25 15:21:11 2004
Valid XHTML 1.0! Valid CSS!