Protocol Online logo
Top : New Forum Archives (2009-): : Bioinformatics and Biostatistics

Promoter sequence - ensembl vs BLAST (Mar/20/2009 )

Hi,

I'm new to working with promoters. I have to clone the promoters of two gene variants.

I located 1.5kb uptream sequence of the two promoters using ensembl. To cnfirm the sequnce I did a blast using NCBI and it gave me teh correct chromosome contig etc: Below are teh results. What does it mean by features flanking the subject sequence? I'm cloning the promoter of DKK3 gene, which is located on chromosome 11

>ref|NT_009237.17|Hs11_9394 Homo sapiens chromosome 11 genomic contig, reference assembly
Length=49571094

Features flanking this part of subject sequence:
789 bp at 5' side: dickkopf homolog 3 precursor
151286 bp at 3' side: microtubule associated monoxygenase, calponin and LIM dom...

Score = 2771 bits (1500), Expect = 0.0
Identities = 1500/1500 (100%), Gaps = 0/1500 (0%)
Strand=Plus/Minus

2) Also, my 2 variants only differ in their 5'UTR, they encode teh same protein. This shallI clone -1.5bp to just befor ethe start codon of either of teh variants? I mean how long after the start codon shall I clone?
variant 1 cds: 240bp
variant2 cds: 226bp

-SF_HK-

The promoter sequence given by ensembl is the promoter for the longer isoform of DKK3, that is why NCBI says that sequence is 789 bp away from the 5' side of DKK3 gene. Here NCBI means the shorter isoform.

How far the promoter sequence should go downstream depends on which isoform is more important biological and is the dominant form. If you are not sure, you can include the sequence all the way to the TSS or start codon of the short isoform.

-pcrman-

pcrman on Mar 20 2009, 11:17 PM said:

The promoter sequence given by ensembl is the promoter for the longer isoform of DKK3, that is why NCBI says that sequence is 789 bp away from the 5' side of DKK3 gene. Here NCBI means the shorter isoform.

How far the promoter sequence should go downstream depends on which isoform is more important biological and is the dominant form. If you are not sure, you can include the sequence all the way to the TSS or start codon of the short isoform.


I'm confused. It says the same thing when I blast the 1.5kb promoter regions of two of the isoforms? am'I to trust the emsemble sequence?

-SF_HK-

I don't understand what you meant. Can you post the sequence here?

-pcrman-

Hi,

I'm to clone the promoters of two isoforms, which diferin their 5'UTR. For both isoforms I got the 1.5kb proter sequence from ensembl and then did a BLASt of the sequence (NCBI).

vaiant 1: 1.5kb seq from ensembl

gggcagttcgatatagagagatttttaggttgactctgaaagtcaagacctccagaccgc
atggtagaaggtgtaaggcagaagacaatctcagctggggaaattcctggtctttaagcc
agcaacatgaaggactggaagagcatgatgtgctctgtaaacccgcagcactgcatttcc
tagctcggccccacaatatgcccccacagcaccctccagttcggcattagtttcttccta
atgtccactctgcccgaagtgacaagcgggggcatgtggagactcagctccaggttcctg
gacgggctcagccacccccagaaagctaatgaatgctcaaccagggcttccagatgccca
ggggacagagcaggagatgccggggaatggggctttccttgcagttcaggagggccctgc
cccaggcccagaagtagaagggaaagcggctgttttggcggtaaacagtaatgtggggag
tgctgcagagaaaggcagtcttggggtttcaagctggagagcagtcagctacactcagga
cctctggccatccctgccttcacctgctgtttggcctgatcgtctaacttctctgattct
ccactacccactccttattacgtttttgagacttgtcaaagttttatattagggctaact
gggacgcatacaaatctggtaacttcgccagggcgggaagttaggaaggagcagagctgg
ctgcaggtgtctggtcctgaccactcctctatgccacccttgaggagcttgctgactttc
tcatgacgttctcccattccaggagctgcaagtgcgttatcctggctggagcacggtgtc
aatcacggcagactaaggccagcggtgatggcttgaatgccaggctgggggctgggattt
ttcctgaggatttcacaggacagaggttggcttggaaagaccaaggtgggactgaggaac
attccccctacccccaacctcggtgggctgttgcaagcctggaggccagagaagacgggc
ctgggatgccgcgggcgcaggggcaggcagtgaaggagatggctgccttcggtagagctg
gtcgctgaggcagaagaggagggcgtggggcgtggggcgtgaggtggccggcgccccggc
tggccaatggccgggctgcggcccctccgcggggcggggtgggcctggtgggcgggcggg
gctcggggcgggggcggagagggagcctggtgggcgggcggggcgcgtcttgcgggctcc
ctcgggtaccggcgctgccgcaccccgccgcgctcccgcacccgcggcccgcccaccgcg
ccgctcccgcatctgcacccgcagcccggcggcctcccggcgggagcgagcagatccagt
ccggcccgcagcgcaactcggtccagtcggggtgggtgaggggcggcggcgggggagggg
acgactctgctgagctcagcctctcttggtggatgtggggcggggcgctcgagtaggacc

BLAST results:
>ref|NM_013253.4| Homo sapiens dickkopf homolog 3 (Xenopus laevis) (DKK3), transcript
variant 2, mRNA
Length=2755

This shows that the last 200bp of the 1.5kb variant 1 seq is identical to the variant 2 seq.

Query 1213 GGCGGAGAGGGAGCCTGGTGGGCGGGCGGGGCGCGTCTTGCGGGCTCCCTCGGGTACCGG 1272

Sbjct 1 GGCGGAGAGGGAGCCTGGTGGGCGGGCGGGGCGCGTCTTGCGGGCTCCCTCGGGTACCGG 60

Query 1273 CGCTGCCGCACCCCGCCGCGCTCCCGCACCCGCGGCCCGCCCACCGCGCCGCTCCCGCAT 1332

Sbjct 61 CGCTGCCGCACCCCGCCGCGCTCCCGCACCCGCGGCCCGCCCACCGCGCCGCTCCCGCAT 120

Query 1333 CTGCACCCGCAGCCCGGCGGCCTCCCGGCGGGAGCGAGCAGATCCAGTCCGGCCCGCAGC 1392

Sbjct 121 CTGCACCCGCAGCCCGGCGGCCTCCCGGCGGGAGCGAGCAGATCCAGTCCGGCCCGCAGC 180

Query 1393 GCAACTCGGTCCAGTCGGGG 1412

Sbjct 181 GCAACTCGGTCCAGTCGGGG 200


GENE ID: 27122 DKK3 | dickkopf homolog 3 (Xenopus laevis)
(Over 10 PubMed links)


Score = 370 bits (200), Expect = 3e-99
Identities = 200/200 (100%), Gaps = 0/200 (0%)
Strand=Plus/Plus

>ref|NT_009237.17|Hs11_9394 Homo sapiens chromosome 11 genomic contig, reference assembly
Length=49571094

Features flanking this part of subject sequence:
501 bp at 5' side: dickkopf homolog 3 precursor
151574 bp at 3' side: microtubule associated monoxygenase, calponin and LIM dom...


Score = 2771 bits (1500), Expect = 0.0
Identities = 1500/1500 (100%), Gaps = 0/1500 (0%)


Variant 2: 1.5kb 1.5kb seq from ensembl

agaccacttatatttgagacctgtagattttcttaccgtttcttctctctccctttcttt
ctttctttctttctttctttctttctttctttctttctttctttctttctttctttcttt
tctttctttctctctttctctttctctctttctttccttcttttctttcctttctttttt
tcgtttgtagtttaacctaataattgaactactgataaattattacatttgggaatacaa
aatgtagactccacacaagaaaacaagcgtccctttgcctgacacttggggcagttcgat
atagagagatttttaggttgactctgaaagtcaagacctccagaccgcatggtagaaggt
gtaaggcagaagacaatctcagctggggaaattcctggtctttaagccagcaacatgaag
gactggaagagcatgatgtgctctgtaaacccgcagcactgcatttcctagctcggcccc
acaatatgcccccacagcaccctccagttcggcattagtttcttcctaatgtccactctg
cccgaagtgacaagcgggggcatgtggagactcagctccaggttcctggacgggctcagc
cacccccagaaagctaatgaatgctcaaccagggcttccagatgcccaggggacagagca
ggagatgccggggaatggggctttccttgcagttcaggagggccctgccccaggcccaga
agtagaagggaaagcggctgttttggcggtaaacagtaatgtggggagtgctgcagagaa
aggcagtcttggggtttcaagctggagagcagtcagctacactcaggacctctggccatc
cctgccttcacctgctgtttggcctgatcgtctaacttctctgattctccactacccact
ccttattacgtttttgagacttgtcaaagttttatattagggctaactgggacgcataca
aatctggtaacttcgccagggcgggaagttaggaaggagcagagctggctgcaggtgtct
ggtcctgaccactcctctatgccacccttgaggagcttgctgactttctcatgacgttct
cccattccaggagctgcaagtgcgttatcctggctggagcacggtgtcaatcacggcaga
ctaaggccagcggtgatggcttgaatgccaggctgggggctgggatttttcctgaggatt
tcacaggacagaggttggcttggaaagaccaaggtgggactgaggaacattccccctacc
cccaacctcggtgggctgttgcaagcctggaggccagagaagacgggcctgggatgccgc
gggcgcaggggcaggcagtgaaggagatggctgccttcggtagagctggtcgctgaggca
gaagaggagggcgtggggcgtggggcgtgaggtggccggcgccccggctggccaatggcc
gggctgcggcccctccgcggggcggggtgggcctggtgggcgggcggggctcggggcggg

BLASt results:


>ref|NT_009237.17|Hs11_9394 Homo sapiens chromosome 11 genomic contig, reference assembly
Length=49571094

Features flanking this part of subject sequence:
789 bp at 5' side: dickkopf homolog 3 precursor
151286 bp at 3' side: microtubule associated monoxygenase, calponin and LIM dom...


Score = 2771 bits (1500), Expect = 0.0
Identities = 1500/1500 (100%), Gaps = 0/1500 (0%)
Strand=Plus/Minus

Query 1 AGACCACTTATATTTGAGACCTGTAGATTTTCTTACCGTTTCTTCTCTCTCCCTTTCTTT 60

Sbjct 10819658 AGACCACTTATATTTGAGACCTGTAGATTTTCTTACCGTTTCTTCTCTCTCCCTTTCTTT 10819599

Query 61 CTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTT 120

Sbjct 10819598 CTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTT 10819539

Query 121 TCTTTCTTTCTCTCTTTCTCTTTCTCTCTTTCTTTCCTTCTTTTCTTTCCTTTCTTTTTT 180

Sbjct 10819538 TCTTTCTTTCTCTCTTTCTCTTTCTCTCTTTCTTTCCTTCTTTTCTTTCCTTTCTTTTTT 10819479

Query 181 TCGTTTGTAGTTTAACCTAATAATTGAACTACTGATAAATTATTACATTTGGGAATACAA 240

Sbjct 10819478 TCGTTTGTAGTTTAACCTAATAATTGAACTACTGATAAATTATTACATTTGGGAATACAA 10819419

Query 241 AATGTAGACTCCACACAAGAAAACAAGCGTCCCTTTGCCTGACACTTGGGGCAGTTCGAT 300

Sbjct 10819418 AATGTAGACTCCACACAAGAAAACAAGCGTCCCTTTGCCTGACACTTGGGGCAGTTCGAT 10819359

Query 301 ATAGAGAGATTTTTAGGTTGACTCTGAAAGTCAAGACCTCCAGACCGCATGGTAGAAGGT 360

Sbjct 10819358 ATAGAGAGATTTTTAGGTTGACTCTGAAAGTCAAGACCTCCAGACCGCATGGTAGAAGGT 10819299

Query 361 GTAAGGCAGAAGACAATCTCAGCTGGGGAAATTCCTGGTCTTTAAGCCAGCAACATGAAG 420

Sbjct 10819298 GTAAGGCAGAAGACAATCTCAGCTGGGGAAATTCCTGGTCTTTAAGCCAGCAACATGAAG 10819239

Query 421 GACTGGAAGAGCATGATGTGCTCTGTAAACCCGCAGCACTGCATTTCCTAGCTCGGCCCC 480

Sbjct 10819238 GACTGGAAGAGCATGATGTGCTCTGTAAACCCGCAGCACTGCATTTCCTAGCTCGGCCCC 10819179

Query 481 ACAATATGCCCCCACAGCACCCTCCAGTTCGGCATTAGTTTCTTCCTAATGTCCACTCTG 540

Sbjct 10819178 ACAATATGCCCCCACAGCACCCTCCAGTTCGGCATTAGTTTCTTCCTAATGTCCACTCTG 10819119

Query 541 CCCGAAGTGACAAGCGGGGGCATGTGGAGACTCAGCTCCAGGTTCCTGGACGGGCTCAGC 600

Sbjct 10819118 CCCGAAGTGACAAGCGGGGGCATGTGGAGACTCAGCTCCAGGTTCCTGGACGGGCTCAGC 10819059

Query 601 CACCCCCAGAAAGCTAATGAATGCTCAACCAGGGCTTCCAGATGCCCAGGGGACAGAGCA 660

Sbjct 10819058 CACCCCCAGAAAGCTAATGAATGCTCAACCAGGGCTTCCAGATGCCCAGGGGACAGAGCA 10818999

Query 661 GGAGATGCCGGGGAATGGGGCTTTCCTTGCAGTTCAGGAGGGCCCTGCCCCAGGCCCAGA 720

Sbjct 10818998 GGAGATGCCGGGGAATGGGGCTTTCCTTGCAGTTCAGGAGGGCCCTGCCCCAGGCCCAGA 10818939

Query 721 AGTAGAAGGGAAAGCGGCTGTTTTGGCGGTAAACAGTAATGTGGGGAGTGCTGCAGAGAA 780

Sbjct 10818938 AGTAGAAGGGAAAGCGGCTGTTTTGGCGGTAAACAGTAATGTGGGGAGTGCTGCAGAGAA 10818879

Query 781 AGGCAGTCTTGGGGTTTCAAGCTGGAGAGCAGTCAGCTACACTCAGGACCTCTGGCCATC 840

Sbjct 10818878 AGGCAGTCTTGGGGTTTCAAGCTGGAGAGCAGTCAGCTACACTCAGGACCTCTGGCCATC 10818819

Query 841 CCTGCCTTCACCTGCTGTTTGGCCTGATCGTCTAACTTCTCTGATTCTCCACTACCCACT 900

Sbjct 10818818 CCTGCCTTCACCTGCTGTTTGGCCTGATCGTCTAACTTCTCTGATTCTCCACTACCCACT 10818759

Query 901 CCTTATTACGTTTTTGAGACTTGTCAAAGTTTTATATTAGGGCTAACTGGGACGCATACA 960

Sbjct 10818758 CCTTATTACGTTTTTGAGACTTGTCAAAGTTTTATATTAGGGCTAACTGGGACGCATACA 10818699

Query 961 AATCTGGTAACTTCGCCAGGGCGGGAAGTTAGGAAGGAGCAGAGCTGGCTGCAGGTGTCT 1020

Sbjct 10818698 AATCTGGTAACTTCGCCAGGGCGGGAAGTTAGGAAGGAGCAGAGCTGGCTGCAGGTGTCT 10818639

Query 1021 GGTCCTGACCACTCCTCTATGCCACCCTTGAGGAGCTTGCTGACTTTCTCATGACGTTCT 1080

Sbjct 10818638 GGTCCTGACCACTCCTCTATGCCACCCTTGAGGAGCTTGCTGACTTTCTCATGACGTTCT 10818579

Query 1081 CCCATTCCAGGAGCTGCAAGTGCGTTATCCTGGCTGGAGCACGGTGTCAATCACGGCAGA 1140

Sbjct 10818578 CCCATTCCAGGAGCTGCAAGTGCGTTATCCTGGCTGGAGCACGGTGTCAATCACGGCAGA 10818519

Query 1141 CTAAGGCCAGCGGTGATGGCTTGAATGCCAGGCTGGGGGCTGGGATTTTTCCTGAGGATT 1200

Sbjct 10818518 CTAAGGCCAGCGGTGATGGCTTGAATGCCAGGCTGGGGGCTGGGATTTTTCCTGAGGATT 10818459

Query 1201 TCACAGGACAGAGGTTGGCTTGGAAAGACCAAGGTGGGACTGAGGAACATTCCCCCTACC 1260

Sbjct 10818458 TCACAGGACAGAGGTTGGCTTGGAAAGACCAAGGTGGGACTGAGGAACATTCCCCCTACC 10818399

Query 1261 CCCAACCTCGGTGGGCTGTTGCAAGCCTGGAGGCCAGAGAAGACGGGCCTGGGATGCCGC 1320

Sbjct 10818398 CCCAACCTCGGTGGGCTGTTGCAAGCCTGGAGGCCAGAGAAGACGGGCCTGGGATGCCGC 10818339

Query 1321 GGGCGCAGGGGCAGGCAGTGAAGGAGATGGCTGCCTTCGGTAGAGCTGGTCGCTGAGGCA 1380

Sbjct 10818338 GGGCGCAGGGGCAGGCAGTGAAGGAGATGGCTGCCTTCGGTAGAGCTGGTCGCTGAGGCA 10818279

Query 1381 GAAGAGGAGGGCGTGGGGCGTGGGGCGTGAGGTGGCCGGCGCCCCGGCTGGCCAATGGCC 1440

Sbjct 10818278 GAAGAGGAGGGCGTGGGGCGTGGGGCGTGAGGTGGCCGGCGCCCCGGCTGGCCAATGGCC 10818219

Query 1441 GGGCTGCGGCCCCTCCGCGGGGCGGGGTGGGCCTGGTGGGCGGGCGGGGCTCGGGGCGGG 1500

Sbjct 10818218 GGGCTGCGGCCCCTCCGCGGGGCGGGGTGGGCCTGGTGGGCGGGCGGGGCTCGGGGCGGG 10818159

-SF_HK-