Protocol Online logo
Top : Forum Archives: : Bioinformatics and Biostatistics

A bioststistics and Bioinformatics Knowledgebase? A interesting CASE STUDY - (Mar/01/2007 )

Here is the current situation example:

By datamining (meaing reading a lot ) the articles in the database of PUMMED, I selectively collected a list of genes (500) have a common property X (here on called X-genes).

Then, I then divided these genes into five catorgories:

1. Find in Condition A (100 genes in this group, called them AX-genes)
2. Find in Condition B (200 genes in this group, called them BX genes)
3. Find in Condition C (400 genes in this group, called them CX genes)
4. Find in Condition D (20 genes in this group, called them DX genes)
5. Find in Condition E (100 genes in this group, called them EX-genes)

As you can see the number, some X-genes will be in A, B and C; Some will only be in A; Some will not be in any.

(here on call these five catorgories group A, B, C, D and E)

I now run an experiment. and now that I have identified a list of x-genes in right hand (now call RHX-genes).

Here are the question that I would like to discuss with you:

What will be a good statiscal analysis that can give a score that indicate the degree of association between AX-genes and RHX-genes? this score must be able indicate that your right hand most likely to have condition A among conditon A to E becuase the AX-genes vs RHX genes socre is highest in among these conditoons.

what do you guys think?

-faipoon-

QUOTE (faipoon @ Mar 1 2007, 11:19 AM)
Here is the current situation example:

By datamining (meaing reading a lot ) the articles in the database of PUMMED, I selectively collected a list of genes (500) have a common property X (here on called X-genes).

Then, I then divided these genes into five catorgories:

1. Find in Condition A (100 genes in this group, called them AX-genes)
2. Find in Condition B (200 genes in this group, called them BX genes)
3. Find in Condition C (400 genes in this group, called them CX genes)
4. Find in Condition D (20 genes in this group, called them DX genes)
5. Find in Condition E (100 genes in this group, called them EX-genes)

As you can see the number, some X-genes will be in A, B and C; Some will only be in A; Some will not be in any.

(here on call these five catorgories group A, B, C, D and E)

I now run an experiment. and now that I have identified a list of x-genes in right hand (now call RHX-genes).

Here are the question that I would like to discuss with you:

What will be a good statiscal analysis that can give a score that indicate the degree of association between AX-genes and RHX-genes? this score must be able indicate that your right hand most likely to have condition A among conditon A to E becuase the AX-genes vs RHX genes socre is highest in among these conditoons.

what do you guys think?

simple probability of AX-genes being a member of type RHX.
I am not sure about you classification - if some genes are not in any of your groups then why are they there at all? Are they background noise?

-perlmunky-

QUOTE (faipoon @ Mar 1 2007, 12:19 PM)
Here is the current situation example:

By datamining (meaing reading a lot ) the articles in the database of PUMMED, I selectively collected a list of genes (500) have a common property X (here on called X-genes).

Then, I then divided these genes into five catorgories:

1. Find in Condition A (100 genes in this group, called them AX-genes)
2. Find in Condition B (200 genes in this group, called them BX genes)
3. Find in Condition C (400 genes in this group, called them CX genes)
4. Find in Condition D (20 genes in this group, called them DX genes)
5. Find in Condition E (100 genes in this group, called them EX-genes)

As you can see the number, some X-genes will be in A, B and C; Some will only be in A; Some will not be in any.

(here on call these five catorgories group A, B, C, D and E)

I now run an experiment. and now that I have identified a list of x-genes in right hand (now call RHX-genes).

Here are the question that I would like to discuss with you:

What will be a good statiscal analysis that can give a score that indicate the degree of association between AX-genes and RHX-genes? this score must be able indicate that your right hand most likely to have condition A among conditon A to E becuase the AX-genes vs RHX genes socre is highest in among these conditoons.

what do you guys think?


The key question here:
1. you didn't say whether members of RHX are all from the 500.
2. if they are, they you are looking at hypergeometric test.
3. for genes in RHX but not in any of your 5 category, there is not test can be done, because there is no relations here. Unless you introduce gene family or sequence similarity or other factors.

-cyberpostdoc-

QUOTE (perlmunky @ Mar 1 2007, 03:19 PM)
QUOTE (faipoon @ Mar 1 2007, 11:19 AM)
Here is the current situation example:

By datamining (meaing reading a lot ) the articles in the database of PUMMED, I selectively collected a list of genes (500) have a common property X (here on called X-genes).

Then, I then divided these genes into five catorgories:

1. Find in Condition A (100 genes in this group, called them AX-genes)
2. Find in Condition B (200 genes in this group, called them BX genes)
3. Find in Condition C (400 genes in this group, called them CX genes)
4. Find in Condition D (20 genes in this group, called them DX genes)
5. Find in Condition E (100 genes in this group, called them EX-genes)

As you can see the number, some X-genes will be in A, B and C; Some will only be in A; Some will not be in any.

(here on call these five catorgories group A, B, C, D and E)

I now run an experiment. and now that I have identified a list of x-genes in right hand (now call RHX-genes).

Here are the question that I would like to discuss with you:

What will be a good statiscal analysis that can give a score that indicate the degree of association between AX-genes and RHX-genes? this score must be able indicate that your right hand most likely to have condition A among conditon A to E becuase the AX-genes vs RHX genes socre is highest in among these conditoons.

what do you guys think?

simple probability of AX-genes being a member of type RHX.
I am not sure about you classification - if some genes are not in any of your groups then why are they there at all? Are they background noise?



1. Can you please explan what is the simple probaility of AX-genes being a member of typ RHX? How would you calculate the probability?
2. Sorry about the confusion: they are not background noise. the genes is not in A-E, it might be in condition F that have not yet identify. Did I clarify the question?

Thank you for your help

-faipoon-

QUOTE (cyberpostdoc @ Mar 1 2007, 08:29 PM)
QUOTE (faipoon @ Mar 1 2007, 12:19 PM)
Here is the current situation example:

By datamining (meaing reading a lot ) the articles in the database of PUMMED, I selectively collected a list of genes (500) have a common property X (here on called X-genes).

Then, I then divided these genes into five catorgories:

1. Find in Condition A (100 genes in this group, called them AX-genes)
2. Find in Condition B (200 genes in this group, called them BX genes)
3. Find in Condition C (400 genes in this group, called them CX genes)
4. Find in Condition D (20 genes in this group, called them DX genes)
5. Find in Condition E (100 genes in this group, called them EX-genes)

As you can see the number, some X-genes will be in A, B and C; Some will only be in A; Some will not be in any.

(here on call these five catorgories group A, B, C, D and E)

I now run an experiment. and now that I have identified a list of x-genes in right hand (now call RHX-genes).

Here are the question that I would like to discuss with you:

What will be a good statiscal analysis that can give a score that indicate the degree of association between AX-genes and RHX-genes? this score must be able indicate that your right hand most likely to have condition A among conditon A to E becuase the AX-genes vs RHX genes socre is highest in among these conditoons.

what do you guys think?


The key question here:
1. you didn't say whether members of RHX are all from the 500.
2. if they are, they you are looking at hypergeometric test.
3. for genes in RHX but not in any of your 5 category, there is not test can be done, because there is no relations here. Unless you introduce gene family or sequence similarity or other factors.


1.RHX all "not" all from the 500.
2. thank you for your input
3. If i intoduce sequence similarity, then what probability test should I use?

Thank you so much for your help

-faipoon-