Protocol Online logo
Top : Forum Archives: : Bioinformatics and Biostatistics

Searching for gene ontology terms automatically and adding to a text file - (Aug/14/2006 )

Pages: Previous 1 2 3 Next

Glad it worked for you. There's very little Perl can not do -- it's the duct tape that holds the Internet together... biggrin.gif

QUOTE
Now I can use DAG-Edit to get the top level terms by just loading these terms (through a .obo file) into the interface and looking at a hierarchy of parent terms.


Must you use DAG-Edit, or any other program for that matter? Is there a way we can extend the Perl script to get you the "top level terms" directly, rather than using its output as input to a second program?

-HomeBrew-

QUOTE (HomeBrew @ Aug 15 2006, 02:49 PM)
Must you use DAG-Edit, or any other program for that matter? Is there a way we can extend the Perl script to get you the "top level terms" directly, rather than using its output as input to a second program?


I think I can only use DAG-Edit because Swissprot only contains the lowest level terms. By clicking on the lowest level terms (i.e. the GO id with 7 digits), Swissprot takes you to EBI where you can see the tree of parent and child terms. I think it's quite complicated to get the script to move between servers unsure.gif

-sara.pl-

I'm not entirely clear on what it is we're doing (I'm not familiar with GO terms), but if all EBI requires is the GO number to show you the tree, we could capture the GO numbers for each accession (rather than the textual descriptions of each GO number, as we did above), then zip on over to EBI and get whatever data you require based on the GO number(s) we found...

-HomeBrew-

QUOTE (HomeBrew @ Aug 15 2006, 06:31 PM)
I'm not entirely clear on what it is we're doing (I'm not familiar with GO terms), but if all EBI requires is the GO number to show you the tree, we could capture the GO numbers for each accession (rather than the textual descriptions of each GO number, as we did above), then zip on over to EBI and get whatever data you require based on the GO number(s) we found...


OK..... basically I'd like to caputre the GO terms at a top level (i.e. parent terms instead of the child terms that are shown on ExPASY.

if we caputre the GO number from ExPASY, then how does the program click on the link and transfer us to the EBI server to show the tree of GO terms??? And how would this tree be shown on notepad??? That's what I'm worried about..... unsure.gif

-sara.pl-

QUOTE (HomeBrew @ Aug 15 2006, 06:31 PM)
(I'm not familiar with GO terms),


OK....Here goes a little story........

All things in biology are associated with each other. Right? Basically gene ontology shows you how everything ranging from biological components to cellular components to molecular components are linked through ontologies. There are 3 areas of ontology:

1) Cellular components
2) Biological processes
3) Molecular functions

These three entities are the foundations of seeing if anything fits within these categories. So a cellular component could be a nucleus (Nucleus = level 1 GO term, which is part of the cellular component category)

Within a nucleus, you can get a nucleolus (Nucleolus = level 2 GO term, which is part of nucleus)

In the nucleolus, you can get chromatin (Chromatin = level 3 GO term, part of nucleolus)

And so on.......

Until you get a tree which connects all these terms together to form a GENE ONTOLOGY.

Each child term has a parent term.....so the chromatin term is a child of the nucleolus term which is a child of the nucleus term which is a child of the final cellular component term.

And that's what gene ontology is all about.....yep, it's pretty boring stuff rolleyes.gif

-sara.pl-

So, accession number P30460 maps to GO:0005887, among others:

P30460 GO:0005887; Cellular component: integral to plasma membrane
GO:0030106; Molecular function: MHC class I receptor activity
GO:0006955; Biological process: immune response

Would you normally look at this further in:

  1. tree mode?
  2. graph mode?
  3. or just the image?
I'm thinking it would be trivial to change to the script to have it put out an HTML file rather than just a text file, which you could open with your browser, and thus have clickable links built in for each accession number's corresponding GOs.

-HomeBrew-

QUOTE (HomeBrew @ Aug 15 2006, 11:29 PM)
So, accession number P30460 maps to GO:0005887, among others:

P30460 GO:0005887; Cellular component: integral to plasma membrane
GO:0030106; Molecular function: MHC class I receptor activity
GO:0006955; Biological process: immune response

Would you normally look at this further in:
  1. tree mode?
  2. graph mode?
  3. or just the image?
I'm thinking it would be trivial to change to the script to have it put out an HTML file rather than just a text file, which you could open with your browser, and thus have clickable links built in for each accession number's corresponding GOs.


Hi HomeBrew,

Could you try and create the files in all 3 modes? Or would that be too much? If so then I think tree mode would be fine for me to do further analysis.

If you did execute the program, would there be too many files corresponding to each accession number? If you could just post the ammended program, then I can test out in all three modes.

Thank you very much

Sara smile.gif

-sara.pl-

Sorry, Sara -- I got very busy over the last two days and didn't find time to get to this until now. Anyway, here's an example:

CODE
#!/usr/bin/perl -w
use strict;
use LWP::Simple;

open (ACC, "acc_numbers.txt") or die "Can't open acc_numbers.txt: $!\n";
open (GO, ">go.txt") or die "Can't open go.txt: $!\n";
open (HTML, ">go.html") or die "Can't open go.html: $!\n";
open (NOGO, ">no_go.txt") or die "Can't open no_go.txt: $!\n";

my (%data, $count);
my $tree = '&viz=tree';
my $graph = '&viz=graph';

while (<ACC>) {
    chomp($_);
    $_ =~ s/\s+//g;
    my $acc = $_;
    my $link = 'http://ca.expasy.org/cgi-bin/niceprot.pl/printable?ac=' . $acc;
    my $page = get $link;
    unless (defined $page) {
        warn "Having trouble contacting ExPASy server.  Sleeping once...\n";
        sleep 3;
        $page = get $link;
        unless (defined $page) {
            warn "Still having trouble contacting  ExPASy server.  Sleeping twice...\n";
            sleep 5;
            $page = get $link;
        }
    }
    if (!(defined $page)) {
        warn "Three attempts to retrieve data from  ExPASy server were unsuccessful...\n";
        sleep 3;
        die;
    } else {
        my ($seg) = ($page =~ /<td>GO<\/td>(.*)<\/i>/ms);
        if ($seg) {
            $seg =~ s/<.*?>//g;
            $seg =~ s/\s*(GO:\d{7};\s+.*)\s+\(.*?\)\./\t$1\n/g;
            my @lines = ($seg =~ /\tGO:.+?\n/g);
            for (@lines) {
                my ($go, $descr) = ($_ =~ /(GO:\d{7});\s+(.*)\n/);
                $data{$acc}{$go} = $descr;
            }
            $seg =~ s/GO:\d{7};\s*//g;
            print GO "$_", $seg, "\n";
        } else {
            print NOGO "$_\tGO NOT FOUND\n";
        }
    }
}

print HTML <<BGN;
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>\n<head>\n<title>Sara\'s GO page</title>
<style type='text/css'>\n<!--
body { font-family: arial, verdana, sans-serif; }
.bld { font-weight: bold; }
.lft { text-align: left; }
TD { text-align: center; }
-->\n</style>\n</head>\n<body>
<table width='95%' align='center' cellspacing='2' cellpadding='2' border='0'>
<tr class=bld><td>Accession Number</td><td>GO number</td>
<td class=lft>Description</td><td colspan=2>EBI links</td></tr>
<tr><td> </td></tr>
BGN

foreach my $acc (sort keys %data) {
    print HTML "<tr><td class=bld>$acc</td>\n";
    $count = 0;
    foreach my $go (sort keys %{$data{$acc}}) {
        my $url = "http://www.ebi.ac.uk/ego/DisplayGoTerm?id=$go&selected=$go";
        print HTML "<td></td>" unless $count == 0;
        $count++;
        my ($clean_go) = ($go =~ /GO:(\d{7})/);
        print HTML "<td>$clean_go</td><td class=lft>$data{$acc}{$go}</td>";
        print HTML "<td><a href=$url$tree target='_blank'>tree</a></td>";
        print HTML "<td class=cntr><a href=$url$graph target='_blank'>graph</a></td></tr>\n";
    }
    print HTML "<tr><td> </td></tr>\n";
}

print HTML "</table></body></html>\n";
print "Done.\n";

This puts out the same 'go.txt' and 'no_go.txt' files as before, but in addition creates a new file, called 'go.html', which you can open in a web browser. The HTML page contains the same data as go.txt, but provides HTML links to the EBI page for each GO number associated with an accession number.

The HTML is pretty rudimentary; we could get much fancier if need be, but as it is it's clean and functional. The Perl could be a bit more effiecient as well, but it too is functional as is.

Let me know how you like it...

-HomeBrew-

QUOTE (HomeBrew @ Aug 18 2006, 04:57 PM)
Sorry, Sara -- I got very busy over the last two days and didn't find time to get to this until now.

The HTML is pretty rudimentary; we could get much fancier if need be, but as it is it's clean and functional. The Perl could be a bit more effiecient as well, but it too is functional as is.

Let me know how you like it...


Hi HomeBrew,

Please don't apologise.... I can understand that you do have other commitments and I really do appreciate you finding the time to help me with this project. I will try out the code over the weekend and let you know how it works.

Just want to say a BIG THANK YOU for all your help. I don't know what I'd have done without your help or this forum for that matter. Please let me know how do I cite you in my dissertation reference section? Should I just give this website address?

Thank you once again!

Sara smile.gif

-sara.pl-

Attached FileHi HomeBrew,

I've just tested the program and it's perfect!! The HTML makes it easier to view the parent GO terms in both tree and graph format. I've added an attachment of the HTML output as a .doc.

I honestly can't thank you enough....... Even my lecturers and supervisor's didn't help me like you did.

THANK YOU VERY MUCH FOR ALL YOUR HELP!! smile.gif I really appreciate it.

Sara

-sara.pl-

Pages: Previous 1 2 3 Next