Protocol Online logo
Top : Forum Archives: : Bioinformatics and Biostatistics

Bootstrap values -please help - (Dec/20/2007 )

i need to understand the bootstap values of phylogenic tree.
If bootstrap values are 1000, 786, 502 etc at the branch points of phylogenic tree, what does those no i.e. 1000, 786, 502 mean?

So far I know that bootstap analysis is a method of testing how well a particular data set fits a model. Than what does those value i.e. 1000, 786 etc stand for?

please help me.

I have collected one book on Bioinformatics....sequence and genome analysis by David W. Mount. But not much it says about bootstap values. Is there any suitable online source to un derstand it?

any further recommendation will be appreciated.
thank you in advance.

-hasina-

Bootstrapping is a method in which you takes a subsample of the sites in an alignment and create trees based on those subsamples - the original tree is compared to the new tree. For every clade in the original tree, a score of 1 is assigned if that clade is present in the new tree; a score of 0 is assigned if the clade is not present in the new tree. That process constitutes one bootstrap sample. The score for each clade is recorded and the next bootstrap cycle can be initiated. The higher the score, the more reliable the branching is at that point. Typically 100 to 1000 bootstrap repetitions are used to estimate tree reliability.

Hope that helps!

-Bunsen Honeydew-

Hi Hasina,

Bootstrapping is a simple way of testing how reliable a multiple sequence alignment (MSA) is. The basic premise of a MSA is that homologous residues are aligned in every column of the MSA. But that is only in the most ideal of cases, and even the best of MSA programs cannot guarantee that. So in most MSAs it happens that certain parts of the alignment are very well aligned while a few others are not. So one way to test how good different parts of the alignment are is to introduce a bit of noise in the MSA. This is done by something called "Sampling with replacement", where keeping the length of the MSA constant, the columns are randomly sampled and replaced. This introduces a bit of noise in the MSA and the tree for this new MSA is calculated. Similarly the noise is introduced in different parts of the MSA and many more trees are calculated. Typically, the number of bootstraps u perform on a MSA depends on the length of the MSA. Sampling 2/3 of the MSA by bootstrapping is generally considered to be sufficient. (1000 bootstraps infact samples with replacement more than 2/3 of the MSA in almost all cases).

Now that different trees have been calculated for the different MSAs a consensus of the trees need to be generated (with usually majority wins option). The bootstrap values now can be thought of as implying how much percentage of the bootstrapped MSAs supported a particular clade inspite of all the noise introduction. Strong homology associations do not suffer even though when noise is introduced while weaker associations just show up with weak bootstrap values and can float around in the tree in any place with no particular or consistent association at all.

I would recommend anyone building phylogenetic trees to read this wonderful tutorial. It is simple to read and anyone can just started very quickly with building trees.


Let me know if u still have any queries

-string-

Thx Bunsen Honeydew and String for providing me the necessary information.
best regards












QUOTE (string @ Dec 23 2007, 06:11 PM)
Hi Hasina,

Bootstrapping is a simple way of testing how reliable a multiple sequence alignment (MSA) is. The basic premise of a MSA is that homologous residues are aligned in every column of the MSA. But that is only in the most ideal of cases, and even the best of MSA programs cannot guarantee that. So in most MSAs it happens that certain parts of the alignment are very well aligned while a few others are not. So one way to test how good different parts of the alignment are is to introduce a bit of noise in the MSA. This is done by something called "Sampling with replacement", where keeping the length of the MSA constant, the columns are randomly sampled and replaced. This introduces a bit of noise in the MSA and the tree for this new MSA is calculated. Similarly the noise is introduced in different parts of the MSA and many more trees are calculated. Typically, the number of bootstraps u perform on a MSA depends on the length of the MSA. Sampling 2/3 of the MSA by bootstrapping is generally considered to be sufficient. (1000 bootstraps infact samples with replacement more than 2/3 of the MSA in almost all cases).

Now that different trees have been calculated for the different MSAs a consensus of the trees need to be generated (with usually majority wins option). The bootstrap values now can be thought of as implying how much percentage of the bootstrapped MSAs supported a particular clade inspite of all the noise introduction. Strong homology associations do not suffer even though when noise is introduced while weaker associations just show up with weak bootstrap values and can float around in the tree in any place with no particular or consistent association at all.

I would recommend anyone building phylogenetic trees to read this wonderful tutorial. It is simple to read and anyone can just started very quickly with building trees.


Let me know if u still have any queries

-hasina-