Promoter sequence - ensembl vs BLAST (Mar/20/2009 )
Hi,
I'm new to working with promoters. I have to clone the promoters of two gene variants.
I located 1.5kb uptream sequence of the two promoters using ensembl. To cnfirm the sequnce I did a blast using NCBI and it gave me teh correct chromosome contig etc: Below are teh results. What does it mean by features flanking the subject sequence? I'm cloning the promoter of DKK3 gene, which is located on chromosome 11
>ref|NT_009237.17|Hs11_9394 Homo sapiens chromosome 11 genomic contig, reference assembly
Length=49571094
Features flanking this part of subject sequence:
789 bp at 5' side: dickkopf homolog 3 precursor
151286 bp at 3' side: microtubule associated monoxygenase, calponin and LIM dom...
Score = 2771 bits (1500), Expect = 0.0
Identities = 1500/1500 (100%), Gaps = 0/1500 (0%)
Strand=Plus/Minus
2) Also, my 2 variants only differ in their 5'UTR, they encode teh same protein. This shallI clone -1.5bp to just befor ethe start codon of either of teh variants? I mean how long after the start codon shall I clone?
variant 1 cds: 240bp
variant2 cds: 226bp
The promoter sequence given by ensembl is the promoter for the longer isoform of DKK3, that is why NCBI says that sequence is 789 bp away from the 5' side of DKK3 gene. Here NCBI means the shorter isoform.
How far the promoter sequence should go downstream depends on which isoform is more important biological and is the dominant form. If you are not sure, you can include the sequence all the way to the TSS or start codon of the short isoform.
pcrman on Mar 20 2009, 11:17 PM said:
How far the promoter sequence should go downstream depends on which isoform is more important biological and is the dominant form. If you are not sure, you can include the sequence all the way to the TSS or start codon of the short isoform.
I'm confused. It says the same thing when I blast the 1.5kb promoter regions of two of the isoforms? am'I to trust the emsemble sequence?
I don't understand what you meant. Can you post the sequence here?
Hi,
I'm to clone the promoters of two isoforms, which diferin their 5'UTR. For both isoforms I got the 1.5kb proter sequence from ensembl and then did a BLASt of the sequence (NCBI).
vaiant 1: 1.5kb seq from ensembl
gggcagttcgatatagagagatttttaggttgactctgaaagtcaagacctccagaccgc
atggtagaaggtgtaaggcagaagacaatctcagctggggaaattcctggtctttaagcc
agcaacatgaaggactggaagagcatgatgtgctctgtaaacccgcagcactgcatttcc
tagctcggccccacaatatgcccccacagcaccctccagttcggcattagtttcttccta
atgtccactctgcccgaagtgacaagcgggggcatgtggagactcagctccaggttcctg
gacgggctcagccacccccagaaagctaatgaatgctcaaccagggcttccagatgccca
ggggacagagcaggagatgccggggaatggggctttccttgcagttcaggagggccctgc
cccaggcccagaagtagaagggaaagcggctgttttggcggtaaacagtaatgtggggag
tgctgcagagaaaggcagtcttggggtttcaagctggagagcagtcagctacactcagga
cctctggccatccctgccttcacctgctgtttggcctgatcgtctaacttctctgattct
ccactacccactccttattacgtttttgagacttgtcaaagttttatattagggctaact
gggacgcatacaaatctggtaacttcgccagggcgggaagttaggaaggagcagagctgg
ctgcaggtgtctggtcctgaccactcctctatgccacccttgaggagcttgctgactttc
tcatgacgttctcccattccaggagctgcaagtgcgttatcctggctggagcacggtgtc
aatcacggcagactaaggccagcggtgatggcttgaatgccaggctgggggctgggattt
ttcctgaggatttcacaggacagaggttggcttggaaagaccaaggtgggactgaggaac
attccccctacccccaacctcggtgggctgttgcaagcctggaggccagagaagacgggc
ctgggatgccgcgggcgcaggggcaggcagtgaaggagatggctgccttcggtagagctg
gtcgctgaggcagaagaggagggcgtggggcgtggggcgtgaggtggccggcgccccggc
tggccaatggccgggctgcggcccctccgcggggcggggtgggcctggtgggcgggcggg
gctcggggcgggggcggagagggagcctggtgggcgggcggggcgcgtcttgcgggctcc
ctcgggtaccggcgctgccgcaccccgccgcgctcccgcacccgcggcccgcccaccgcg
ccgctcccgcatctgcacccgcagcccggcggcctcccggcgggagcgagcagatccagt
ccggcccgcagcgcaactcggtccagtcggggtgggtgaggggcggcggcgggggagggg
acgactctgctgagctcagcctctcttggtggatgtggggcggggcgctcgagtaggacc
BLAST results:
>ref|NM_013253.4| Homo sapiens dickkopf homolog 3 (Xenopus laevis) (DKK3), transcript
variant 2, mRNA
Length=2755
This shows that the last 200bp of the 1.5kb variant 1 seq is identical to the variant 2 seq.
Query 1213 GGCGGAGAGGGAGCCTGGTGGGCGGGCGGGGCGCGTCTTGCGGGCTCCCTCGGGTACCGG 1272
Sbjct 1 GGCGGAGAGGGAGCCTGGTGGGCGGGCGGGGCGCGTCTTGCGGGCTCCCTCGGGTACCGG 60
Query 1273 CGCTGCCGCACCCCGCCGCGCTCCCGCACCCGCGGCCCGCCCACCGCGCCGCTCCCGCAT 1332
Sbjct 61 CGCTGCCGCACCCCGCCGCGCTCCCGCACCCGCGGCCCGCCCACCGCGCCGCTCCCGCAT 120
Query 1333 CTGCACCCGCAGCCCGGCGGCCTCCCGGCGGGAGCGAGCAGATCCAGTCCGGCCCGCAGC 1392
Sbjct 121 CTGCACCCGCAGCCCGGCGGCCTCCCGGCGGGAGCGAGCAGATCCAGTCCGGCCCGCAGC 180
Query 1393 GCAACTCGGTCCAGTCGGGG 1412
Sbjct 181 GCAACTCGGTCCAGTCGGGG 200
GENE ID: 27122 DKK3 | dickkopf homolog 3 (Xenopus laevis)
(Over 10 PubMed links)
Score = 370 bits (200), Expect = 3e-99
Identities = 200/200 (100%), Gaps = 0/200 (0%)
Strand=Plus/Plus
>ref|NT_009237.17|Hs11_9394 Homo sapiens chromosome 11 genomic contig, reference assembly
Length=49571094
Features flanking this part of subject sequence:
501 bp at 5' side: dickkopf homolog 3 precursor
151574 bp at 3' side: microtubule associated monoxygenase, calponin and LIM dom...
Score = 2771 bits (1500), Expect = 0.0
Identities = 1500/1500 (100%), Gaps = 0/1500 (0%)
Variant 2: 1.5kb 1.5kb seq from ensembl
agaccacttatatttgagacctgtagattttcttaccgtttcttctctctccctttcttt
ctttctttctttctttctttctttctttctttctttctttctttctttctttctttcttt
tctttctttctctctttctctttctctctttctttccttcttttctttcctttctttttt
tcgtttgtagtttaacctaataattgaactactgataaattattacatttgggaatacaa
aatgtagactccacacaagaaaacaagcgtccctttgcctgacacttggggcagttcgat
atagagagatttttaggttgactctgaaagtcaagacctccagaccgcatggtagaaggt
gtaaggcagaagacaatctcagctggggaaattcctggtctttaagccagcaacatgaag
gactggaagagcatgatgtgctctgtaaacccgcagcactgcatttcctagctcggcccc
acaatatgcccccacagcaccctccagttcggcattagtttcttcctaatgtccactctg
cccgaagtgacaagcgggggcatgtggagactcagctccaggttcctggacgggctcagc
cacccccagaaagctaatgaatgctcaaccagggcttccagatgcccaggggacagagca
ggagatgccggggaatggggctttccttgcagttcaggagggccctgccccaggcccaga
agtagaagggaaagcggctgttttggcggtaaacagtaatgtggggagtgctgcagagaa
aggcagtcttggggtttcaagctggagagcagtcagctacactcaggacctctggccatc
cctgccttcacctgctgtttggcctgatcgtctaacttctctgattctccactacccact
ccttattacgtttttgagacttgtcaaagttttatattagggctaactgggacgcataca
aatctggtaacttcgccagggcgggaagttaggaaggagcagagctggctgcaggtgtct
ggtcctgaccactcctctatgccacccttgaggagcttgctgactttctcatgacgttct
cccattccaggagctgcaagtgcgttatcctggctggagcacggtgtcaatcacggcaga
ctaaggccagcggtgatggcttgaatgccaggctgggggctgggatttttcctgaggatt
tcacaggacagaggttggcttggaaagaccaaggtgggactgaggaacattccccctacc
cccaacctcggtgggctgttgcaagcctggaggccagagaagacgggcctgggatgccgc
gggcgcaggggcaggcagtgaaggagatggctgccttcggtagagctggtcgctgaggca
gaagaggagggcgtggggcgtggggcgtgaggtggccggcgccccggctggccaatggcc
gggctgcggcccctccgcggggcggggtgggcctggtgggcgggcggggctcggggcggg
BLASt results:
>ref|NT_009237.17|Hs11_9394 Homo sapiens chromosome 11 genomic contig, reference assembly
Length=49571094
Features flanking this part of subject sequence:
789 bp at 5' side: dickkopf homolog 3 precursor
151286 bp at 3' side: microtubule associated monoxygenase, calponin and LIM dom...
Score = 2771 bits (1500), Expect = 0.0
Identities = 1500/1500 (100%), Gaps = 0/1500 (0%)
Strand=Plus/Minus
Query 1 AGACCACTTATATTTGAGACCTGTAGATTTTCTTACCGTTTCTTCTCTCTCCCTTTCTTT 60
Sbjct 10819658 AGACCACTTATATTTGAGACCTGTAGATTTTCTTACCGTTTCTTCTCTCTCCCTTTCTTT 10819599
Query 61 CTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTT 120
Sbjct 10819598 CTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTT 10819539
Query 121 TCTTTCTTTCTCTCTTTCTCTTTCTCTCTTTCTTTCCTTCTTTTCTTTCCTTTCTTTTTT 180
Sbjct 10819538 TCTTTCTTTCTCTCTTTCTCTTTCTCTCTTTCTTTCCTTCTTTTCTTTCCTTTCTTTTTT 10819479
Query 181 TCGTTTGTAGTTTAACCTAATAATTGAACTACTGATAAATTATTACATTTGGGAATACAA 240
Sbjct 10819478 TCGTTTGTAGTTTAACCTAATAATTGAACTACTGATAAATTATTACATTTGGGAATACAA 10819419
Query 241 AATGTAGACTCCACACAAGAAAACAAGCGTCCCTTTGCCTGACACTTGGGGCAGTTCGAT 300
Sbjct 10819418 AATGTAGACTCCACACAAGAAAACAAGCGTCCCTTTGCCTGACACTTGGGGCAGTTCGAT 10819359
Query 301 ATAGAGAGATTTTTAGGTTGACTCTGAAAGTCAAGACCTCCAGACCGCATGGTAGAAGGT 360
Sbjct 10819358 ATAGAGAGATTTTTAGGTTGACTCTGAAAGTCAAGACCTCCAGACCGCATGGTAGAAGGT 10819299
Query 361 GTAAGGCAGAAGACAATCTCAGCTGGGGAAATTCCTGGTCTTTAAGCCAGCAACATGAAG 420
Sbjct 10819298 GTAAGGCAGAAGACAATCTCAGCTGGGGAAATTCCTGGTCTTTAAGCCAGCAACATGAAG 10819239
Query 421 GACTGGAAGAGCATGATGTGCTCTGTAAACCCGCAGCACTGCATTTCCTAGCTCGGCCCC 480
Sbjct 10819238 GACTGGAAGAGCATGATGTGCTCTGTAAACCCGCAGCACTGCATTTCCTAGCTCGGCCCC 10819179
Query 481 ACAATATGCCCCCACAGCACCCTCCAGTTCGGCATTAGTTTCTTCCTAATGTCCACTCTG 540
Sbjct 10819178 ACAATATGCCCCCACAGCACCCTCCAGTTCGGCATTAGTTTCTTCCTAATGTCCACTCTG 10819119
Query 541 CCCGAAGTGACAAGCGGGGGCATGTGGAGACTCAGCTCCAGGTTCCTGGACGGGCTCAGC 600
Sbjct 10819118 CCCGAAGTGACAAGCGGGGGCATGTGGAGACTCAGCTCCAGGTTCCTGGACGGGCTCAGC 10819059
Query 601 CACCCCCAGAAAGCTAATGAATGCTCAACCAGGGCTTCCAGATGCCCAGGGGACAGAGCA 660
Sbjct 10819058 CACCCCCAGAAAGCTAATGAATGCTCAACCAGGGCTTCCAGATGCCCAGGGGACAGAGCA 10818999
Query 661 GGAGATGCCGGGGAATGGGGCTTTCCTTGCAGTTCAGGAGGGCCCTGCCCCAGGCCCAGA 720
Sbjct 10818998 GGAGATGCCGGGGAATGGGGCTTTCCTTGCAGTTCAGGAGGGCCCTGCCCCAGGCCCAGA 10818939
Query 721 AGTAGAAGGGAAAGCGGCTGTTTTGGCGGTAAACAGTAATGTGGGGAGTGCTGCAGAGAA 780
Sbjct 10818938 AGTAGAAGGGAAAGCGGCTGTTTTGGCGGTAAACAGTAATGTGGGGAGTGCTGCAGAGAA 10818879
Query 781 AGGCAGTCTTGGGGTTTCAAGCTGGAGAGCAGTCAGCTACACTCAGGACCTCTGGCCATC 840
Sbjct 10818878 AGGCAGTCTTGGGGTTTCAAGCTGGAGAGCAGTCAGCTACACTCAGGACCTCTGGCCATC 10818819
Query 841 CCTGCCTTCACCTGCTGTTTGGCCTGATCGTCTAACTTCTCTGATTCTCCACTACCCACT 900
Sbjct 10818818 CCTGCCTTCACCTGCTGTTTGGCCTGATCGTCTAACTTCTCTGATTCTCCACTACCCACT 10818759
Query 901 CCTTATTACGTTTTTGAGACTTGTCAAAGTTTTATATTAGGGCTAACTGGGACGCATACA 960
Sbjct 10818758 CCTTATTACGTTTTTGAGACTTGTCAAAGTTTTATATTAGGGCTAACTGGGACGCATACA 10818699
Query 961 AATCTGGTAACTTCGCCAGGGCGGGAAGTTAGGAAGGAGCAGAGCTGGCTGCAGGTGTCT 1020
Sbjct 10818698 AATCTGGTAACTTCGCCAGGGCGGGAAGTTAGGAAGGAGCAGAGCTGGCTGCAGGTGTCT 10818639
Query 1021 GGTCCTGACCACTCCTCTATGCCACCCTTGAGGAGCTTGCTGACTTTCTCATGACGTTCT 1080
Sbjct 10818638 GGTCCTGACCACTCCTCTATGCCACCCTTGAGGAGCTTGCTGACTTTCTCATGACGTTCT 10818579
Query 1081 CCCATTCCAGGAGCTGCAAGTGCGTTATCCTGGCTGGAGCACGGTGTCAATCACGGCAGA 1140
Sbjct 10818578 CCCATTCCAGGAGCTGCAAGTGCGTTATCCTGGCTGGAGCACGGTGTCAATCACGGCAGA 10818519
Query 1141 CTAAGGCCAGCGGTGATGGCTTGAATGCCAGGCTGGGGGCTGGGATTTTTCCTGAGGATT 1200
Sbjct 10818518 CTAAGGCCAGCGGTGATGGCTTGAATGCCAGGCTGGGGGCTGGGATTTTTCCTGAGGATT 10818459
Query 1201 TCACAGGACAGAGGTTGGCTTGGAAAGACCAAGGTGGGACTGAGGAACATTCCCCCTACC 1260
Sbjct 10818458 TCACAGGACAGAGGTTGGCTTGGAAAGACCAAGGTGGGACTGAGGAACATTCCCCCTACC 10818399
Query 1261 CCCAACCTCGGTGGGCTGTTGCAAGCCTGGAGGCCAGAGAAGACGGGCCTGGGATGCCGC 1320
Sbjct 10818398 CCCAACCTCGGTGGGCTGTTGCAAGCCTGGAGGCCAGAGAAGACGGGCCTGGGATGCCGC 10818339
Query 1321 GGGCGCAGGGGCAGGCAGTGAAGGAGATGGCTGCCTTCGGTAGAGCTGGTCGCTGAGGCA 1380
Sbjct 10818338 GGGCGCAGGGGCAGGCAGTGAAGGAGATGGCTGCCTTCGGTAGAGCTGGTCGCTGAGGCA 10818279
Query 1381 GAAGAGGAGGGCGTGGGGCGTGGGGCGTGAGGTGGCCGGCGCCCCGGCTGGCCAATGGCC 1440
Sbjct 10818278 GAAGAGGAGGGCGTGGGGCGTGGGGCGTGAGGTGGCCGGCGCCCCGGCTGGCCAATGGCC 10818219
Query 1441 GGGCTGCGGCCCCTCCGCGGGGCGGGGTGGGCCTGGTGGGCGGGCGGGGCTCGGGGCGGG 1500
Sbjct 10818218 GGGCTGCGGCCCCTCCGCGGGGCGGGGTGGGCCTGGTGGGCGGGCGGGGCTCGGGGCGGG 10818159