Jump to content

  • Log in with Facebook Log in with Twitter Log in with Windows Live Log In with Google      Sign In   
  • Create Account

Submit your paper to J Biol Methods today!
Photo
- - - - -

Problem with Canonical Correspondence Analysis. Every environmental variables in


  • Please log in to reply
4 replies to this topic

#1 Procyon

Procyon

    member

  • Active Members
  • Pip
  • 17 posts
1
Neutral

Posted 15 June 2014 - 04:22 PM

Hi, I'm trying to do a CCA, using PAST3, of two phylotypes in 7 samples to analyze the relationship with 8 to 12 environmental variables. These phylotypes are from a certain molecular marker. A problem is that the designated PCR primers also amplified two types sequences totally unrelated to my molecular marker, yet they make about 50% of the total clones.  The situation is that if I use only the correct phylotypes in a CCA, I get that all variables are in one axis, thus making the CCA analysis useless. If I add the other two unrelated "phylotypes", I get a CCA plot that makes sense: but I doubt it is right and representative about the relationship of the environmental variables and the "correct" marker phylotypes. Can I really use the 2 correct phylotypes of my gene in combination with the 2 unrelated-to-my-gene "phylotypes"?

 

If not, I'm looking for alternatives for software or multivariate analyses. I'm using PCA with the environmental variables and the Simpson index, so in the biplot, the Simspon index is a variable, but I've no idea if I'm doing it right. Please help me.


Edited by Procyon, 15 June 2014 - 05:17 PM.


#2 DRT

DRT

    Veteran

  • Active Members
  • PipPipPipPipPipPipPipPipPipPip
  • 176 posts
10
Good

Posted 16 June 2014 - 02:00 PM

Apologies if I’m way off the mark here.

With two phylotypes and 12 variables I would have first looked at fitting a binary (Logistic) regression to the first two or three principle components from the environmental variables.



#3 Procyon

Procyon

    member

  • Active Members
  • Pip
  • 17 posts
1
Neutral

Posted 20 June 2014 - 10:41 AM

I'm going to do this, but for the moment I´ve the following question: How does a CCA treat the missing values? Are they ignored or imputed?

 

How would the amount of missing values define the criteria for selection a certain variable? 

 

I´ve some unavailable data, which is available at most of the sampling sites, yet I don´t know if I shouldn´t use those variables. How do I explain a CCA where a variable data is unavailable for 2 of 7 sampling points?

 

Is there a way I can calculate the effects of missing values in a CCA or PCA?

 

What about the CCA using the additional 2 unrelated-to-my-gene "phylotypes"??


Edited by Procyon, 20 June 2014 - 01:14 PM.


#4 Procyon

Procyon

    member

  • Active Members
  • Pip
  • 17 posts
1
Neutral

Posted 20 June 2014 - 02:00 PM

Apologies if I’m way off the mark here.

With two phylotypes and 12 variables I would have first looked at fitting a binary (Logistic) regression to the first two or three principle components from the environmental variables.

 

How do I interpret and explain that analysis??



#5 DRT

DRT

    Veteran

  • Active Members
  • PipPipPipPipPipPipPipPipPipPip
  • 176 posts
10
Good

Posted 22 June 2014 - 01:00 PM

 

Apologies if I’m way off the mark here.

With two phylotypes and 12 variables I would have first looked at fitting a binary (Logistic) regression to the first two or three principle components from the environmental variables.

 

How do I interpret and explain that analysis??

 

 

One of the reasons I like logistic regression over the multivariate methods is that interpretation isn’t much different to ordinary regression. You will end up with a set of parameters and their errors/significance which translate to the intercept and slopes that most people are familiar with. If you are lucky there will be two parameters, maybe with an interaction term, which predominate. This will allow you to illustrate the solution as a simple scatter plot with a curved line through your data corresponding to the midpoint of the logistic equation, viz. one side of the line is probably one phylotype and the further from the line the greater the probability.

 

I see the CCA coming into its own when you add the other two “phylotypes” but I wouldn’t like to comment on the scientific validity of including these. No harm in looking though; if the data is there use it.

 

The missing values problem is a tricky one (I assume they are genuinely missing not just 0 ie zero inflated) and you may need to seek out guidance because the response could depend on exactly what is missing. Usually the simplest solution is to delete the ‘row’ but with only 7 sampling points this is going to take out a big proportion of your data. If the distribution around the missing data points is normal I would be tempted to run a series of CCA each time inserting an appropriately distributed random number into the missing data points. This would give you a sense of how dependent the results of your analysis are upon the values of the missing data.

 

good luck






Home - About - Terms of Service - Privacy - Contact Us

©1999-2013 Protocol Online, All rights reserved.