Protocol Online logo
Top : Forum Archives: : Bioinformatics and Biostatistics

HELP: convert mouse genomic coordinates into sequence - (Aug/12/2008 )

I have a series of short genomic strand expressed in genomic coordinates

e.g. chr1 4482820-4482829
chr1 4567880-4567891
...........

how can I convert these loci into sequence massively

I want to generate a consensus sequence with WebLogo using these sequences after conversion

Thanks a lot

-pretender-

Your first sequence is here. You can retrieve others by putting the coordinates in the range boxes on the web page.

If you have too many to do by hand, I would place your coordinates in a hash of arrays and cyle through them using a different form of the URL (http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?&db=nucleotide&list_uids=NT_039169.7&dopt=fasta&sendto=t&from=4482820&to=4482829), replacing the &from= and the &to= parameters each time (assuming they're all from chromosome 1). The URL above returns the data in the form of a text file, which can be saved separately or appended each time to one big file.

-HomeBrew-

don't forget to check which assembly and species your coordinates are for, otherwise you might get totally different sequences than you expect!

as a fan of the UCSC browser, i would also like to provide an alternative to the NCBI way offered by HomeBrew: here you can find the link in the human genome build 36 for your first sequence, just click on the DNA in the top menu bar and you can obtain the DNA sequence

-dpo-

My link is to mouse chromosome 1 -- somehow, when I first read the question, I got the impression we were looking for sequences from a mouse. Was this edited out of the original question, or did I just space out?

Anyway, I would probably use something like this Perl script if I had many sequences to retrieve:

CODE
#!/usr/bin/perl -w
use strict;
use LWP::Simple;

# seq_name => ['from', 'to'],

my %seqs = (
seq1 => [4482820, 4482829],
seq2 => [4567880, 4567891],
seq3 => [4652940, 4652950],
seq4 => [4738000, 4738011],
seq5 => [4823060, 4823069],
);

# NT_039169.7
# DEFINITION Mus musculus chromosome 1 genomic contig, strain C57BL/6J.
# ACCESSION NT_039169
# VERSION NT_039169.7 GI:149233633

my $base_url = 'http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?&db=nucleotide&dopt=fasta&sendto=t&list_uids=NT_039169.7';

foreach my $seq (sort keys %seqs) {
my $link = $base_url . "&from=$seqs{$seq}[0]&to=$seqs{$seq}[1]";
my $dna = get_seq($link);
$dna =~ s/\n\n/\n/g; # remove extra blank lines
print $dna;
}

sub get_seq {
my $link = shift;
my $page = get $link;
unless (defined $page) {
warn "Having trouble contacting NCBI. Sleeping once...\n";
sleep 3;
$page = get $link;
unless (defined $page) {
warn "Still having trouble contacting NCBI. Sleeping twice...\n";
sleep 5;
$page = get $link;
}
}
if (!(defined $page)) {
warn "Three attempts to retrieve $link from NCBI were unsuccessful...\n";
return;
} else {
return ($page);
}
}


This returns:

>ref|NT_039169.7|Mm1_39209_37:4482820-4482829 Mus musculus chromosome 1 genomic contig, strain C57BL/6J
TCCCCTGCTG
>ref|NT_039169.7|Mm1_39209_37:4567880-4567891 Mus musculus chromosome 1 genomic contig, strain C57BL/6J
ATTCAGATATCC
>ref|NT_039169.7|Mm1_39209_37:4652940-4652950 Mus musculus chromosome 1 genomic contig, strain C57BL/6J
ATGCATTTTAA
>ref|NT_039169.7|Mm1_39209_37:4738000-4738011 Mus musculus chromosome 1 genomic contig, strain C57BL/6J
TCATCAATTAAC
>ref|NT_039169.7|Mm1_39209_37:4823060-4823069 Mus musculus chromosome 1 genomic contig, strain C57BL/6J
AACTTAATCA


I just whipped this up for demo purposes. It would be better to have the script run from the command line and take as input a tab delimited file containing seq_name, start coordinate, and stop coordinate, and to have the script output the results to a file, rather than print them to the screen.

I don't have time to do this right now, but if you'd like to pursue it, let me know...




-HomeBrew-

QUOTE (HomeBrew @ Aug 14 2008, 02:32 PM)
My link is to mouse chromosome 1 -- somehow, when I first read the question, I got the impression we were looking for sequences from a mouse. Was this edited out of the original question, or did I just space out?



my bad, it's in the title of the post ... you're absolutely correct HomeBrew

-dpo-

QUOTE (dpo @ Aug 14 2008, 09:07 AM)
...it's in the title of the post ...


Wow -- I thought I was going crazy or something. Missed the title this morning (not enough coffee, I guess...). Thanks for reassuring my of my sanity, dpo!

-HomeBrew-

thanks all, solved with UCSC

-pretender-