PhyloPars glossary

Cross-validation bias

This value equals the mean of the differences between the observed feature values, and their estimates as calculated by the evolutionary model. It is a measure of the bias one can expect for the predicted missing feature values.

Cross-validation error

This value equals the mean of the absolute differences between the observed feature values, and their estimates as calculated by the evolutionary model. It is a measure of the accuracy one can expect for the predicted missing feature values.

Cross-validation error plot

This density plot shows the distribution of the difference between observed feature values and their model estimates.

The blue area denotes the error distribution for the evolutionary model. The red line denotes the equivalent error distribution for a simple mean model, which assumes the best estimate for a feature value is given by the mean of all other observed values. The green line denotes the equivalent error distribution for a simple nearest neighbor model, which assumes the best estimate for a feature value is given by phylogentically nearest observed value.

Estimated value

This is the expected feature value for the node under consideration. It equals the mean of the marginal likelihood, which in the context of the evolutionary model equals a normal distribution.

Evolutionary model

This is the model used by PhyloPars to estimate missing feature values. It is based on the work by Felsenstein (2008) and Lynch (1991), and can account for phylogenetic as well as phenotypic variability. Phylogenetic variability of features is assumed to be due to the fact that feature values perform a random walk in evolutionary time; changes in the value of different features may be correlated. Phenotypic variability is due to intraspecific variation or measurement error, and assumed to be described by a normal distribution.

Felsenstein, J. 2008. Comparative methods with sampling error and within-species variation: Contrasts revisited and revised. American Naturalist 171:713-725.

Lynch, M. 1991. Methods for the Analysis of Comparative Data in Evolutionary Biology. Evolution 45:1065-1080.

Feature matrix

The feature matrix describes all observations for the different species or strains. It must be supplied as a tab-separated text file with one line for each node and one column for each feature. It should use UTF-8 or ASCII encoding. The file must contain both column and row headers; each row- or column label can contain any character except a tab. Thus a feature matrix that specifies the values for M features of N nodes contains N+1 rows (the first row for feature labels, consecutive rows for the nodes), and M+1 columns (the first column for node labels, consecutive columns for the nodes). Logically this means that the first item on the first row will be ignored - a feature matrix file therefore commonly starts with a tab character.

If a feature label ends with an expression between parentheses, this expression is assumed to represent the feature unit. It will be used as such in the presentation of results.

Features values are specified as (US ASCII representations of) floating point numbers. If multiple observations are available on a specific feature of a single species, these may be provided as a semi-colon separated list of floating point values. The feature matrix can contain missing values (if fact, one of the main reasons for using the application is to estimate such missing values); these are specified as empty strings. Thus 2 consecutive tab characters, or a tab character followed by a newline, indicates that the value between those characters is unknown.

Example of a feature matrix

Mean bias

The mean bias equals the mean of the differences between the observed feature values, and their estimates as calculated by the evolutionary model.

Mean error

The mean error equals the mean of the absolute differences between the observed feature values, and their estimates as calculated by the evolutionary model.

Mean model

This null model assumes that the expected feature value is the mean of all other observations on that feature. Cross-validation errors for this null model are provided to better assess the cross-validation result of the full optimized evolutionary model.

Nearest neighbor model

This null model assumes that the expected feature value is the phylogenetically nearest observed value. Cross-validation errors for this null model are provided to better assess the cross-validation result of the full optimized evolutionary model.

Observed values

These are the observed feature values for the node under consideration, as present in the input feature matrix.

Phenotypic standard deviation

This is the square root of the estimated phenotypic variance, i.e., the variance due to intraspecific variation or measurement error.

Phylogenetic standard deviation

This is the square root of the estimated phylogenetic variance, i.e., the increase of feature variance per unit evolutionary time (branch length).

The evolutionary model assumes that each feature value performs a random walk. This implies that the feature variance increases linearly with evolutionary time. The rate of increase is called phylogenetic variance.

Standard deviation of estimate

The is a measure of the uncertainty associated with the estimated feature value. It equals the standard deviation of the marginal likelihood, which in the context of the evolutionary model equals a normal distribution.

Support plot

This plot shows the support for the different feature values. The support curve (blue area) is composed of the sum of normal densities centered at the original observations, indicated by vertical black lines. The weights for each observation account for phylogenetic proximity, phylogenetic correlations and phenotypic variability through the evolutionary model. The standard deviation ("bandwidth") of each normal density equals a fraction of the value range spanned by the feature; no additional steps were taken for estimation of the optimal bandwidth (cf. estimated kernel densities)

In addition the marginal likelihood (a normal distribution) is shown as a red line; this might be termed the posterior distribution of the feature value for the node under consideration.

Any available observed feature values are indicated by vertical red lines.

Variance explained by phylogeny

This is the proportion of the total (phylogenetic and phenotypic) variance that is phylogenetic. It indicates how much of the feature variability in observations can be accounted for with the phylogenetic model. A similar statistic was used by Housworth et al. (2004) to distinguish between phylogenetic and residual variability within the Phylogenetic Mixed Model.

It is worth noting that this proportion usually includes a contribution by natural selection (Westoby et al. 1995); further analyses (Desdevises et al. 2003; Cubo et al. 2005) might be used to disentangle phylogenetic and selection components.

Cubo, J., F. Ponton, M. Laurin, E. de Margerie, and J. Castanet. 2005. Phylogenetic signal in bone microstructure of sauropsids. Systematic Biology 54:562-574.
Desdevises, Y., P. Legendre, L. Azouzi, and S. Morand. 2003. Quantifying phylogenetically structured environmental variation. Evolution 57:2647-2652.
Housworth, E.A., Martins, E.P. and Lynch, M. 2004. The phylogenetic mixed model. American Naturalist 163:84-96.
Westoby, M., Leishman, M.R. and Lord, J.M. 1995. On misinterpreting the phylogenetic correction. Journal of Ecology 83:531-534.