Jump to content

  • Log in with Facebook Log in with Twitter Log in with Windows Live Log In with Google      Sign In   
  • Create Account

Submit your paper to J Biol Methods today!
Photo
- - - - -

excel


  • Please log in to reply
3 replies to this topic

#1 SF_HK

SF_HK

    Veteran

  • Active Members
  • PipPipPipPipPipPipPipPipPipPip
  • 196 posts
2
Neutral

Posted 07 September 2009 - 01:58 AM

HI

I have an excel file contaning the names of 2000 genes. I have another file containing 100 genes. Is there a short cut in excel that can quickly do a 'find' or 'search' to let me know if those 2000 genes contain the 100 genes?

thanks

#2 HomeBrew

HomeBrew

    Veteran

  • Global Moderators
  • PipPipPipPipPipPipPipPipPipPip
  • 930 posts
16
Good

Posted 07 September 2009 - 05:04 AM

I don't know how to do it in Excel -- at least not quickly -- but if you save both lists as text files containing one gene name per line, you can compare the two lists using Perl:

#!/usr/bin/perl -w
use strict;

# Union: all elements in both lists excluding duplicates
# Intersection:  items that are in both lists
# Difference (symmetric): members of FIRST or SECOND but not of FIRST and SECOND
# Difference (simple): members of FIRST but not SECOND

open (FIRST, "gene_list_100.txt") or die;
open (SECOND, "gene_list_2000.txt") or die;

my @first = (<FIRST>);
chomp (@first);

my @second = (<SECOND>);
chomp (@second);

my @union = my @isect = my @sym_diff = ();
my %union = my %isect = my %count = ();

foreach my $e (@first, @second) {
	$count{$e}++;
}

foreach my $e (keys %count) {
	push(@union, $e);
	if ($count{$e} == 2) {
		push @isect, $e;
	} else {
		push @sym_diff, $e;
	}
}

my %seen;
my @first_only;
@seen{@second} = ();

foreach my $item (@first) {
	push (@first_only, $item) unless exists $seen{$item};
}

@union = sort (@union);
@isect = sort (@isect);
@sym_diff = sort (@sym_diff);
@first_only = sort (@first_only);

print "Union: " . scalar(@union) . " " . join (" ", @union) . "\n";
print "Intersection: " . scalar(@isect) . " " . join (" ", @isect) . "\n";
print "Symmetrical Diff: " . scalar(@sym_diff) . " " . join (" ", @sym_diff) . "\n";
print "Simple Diff: " . scalar(@first_only) . " " . join (" ", @first_only) . "\n";

I guess in Excel you could have both gene lists in a single column, and then sort the column by name. Next to the column, you could enter a formula like:

=IF(A1=A2,"present"," ")

Drag the formula so that it covers the whole column, and count the number of times "present" appears. You could keep track of the two gene lists by changing the font color of the shorter one before sorting them on name.

Perl seems a better solution, though, unless someone better at Excel than me (read: almost anyone) has a better way...

#3 kstevens

kstevens

    member

  • Members
  • Pip
  • 1 posts
0
Neutral

Posted 29 October 2009 - 02:20 PM

How I do it...

Excel 2007 - A little late but place each list of genes in separate columns. Highlight each column and then go to conditional formatting and scroll down to highlight, with color the duplicates. Now go to your list of 2000 genes and sort via cell color. Now you will have the genes that match, highlighted and at the top.

Linux - cat file1 file2 | sort | uniq -d
This concatenates both files, sorts the information, and then prints out the duplicates.

#4 Grake

Grake

    member

  • Active Members
  • Pip
  • 6 posts
0
Neutral

Posted 22 December 2009 - 01:37 PM

If your 2000 genes are all in a same column, you can use the formula "MATCH". It will give you the raw where you can find your gene of interest...




Home - About - Terms of Service - Privacy - Contact Us

©1999-2013 Protocol Online, All rights reserved.