next up previous
Next: Aitchison's solution to the Up: Tectonic discrimination diagrams revisited Previous: Discriminant analysis


The compositional data problem

One of the assumptions of discriminant analysis is that the elements of X are statistically independent from each other, apart from the covariance structure contained in their multivariate normality. However, geochemical data are generally expressed as parts of a whole (percent or ppm) and, therefore, are not free to vary independently from each other. For example, in a three-component system (A+B+C=100%), increasing one component (e.g., A) causes a decrease in the two other components (B and C). The constant-sum constraint has several consequences, besides introducing a negative bias into correlations between components. One of these consequences is that the arithmetic mean of compositional data has no physical meaning (Figure 3). This is very unfortunate because some popular discrimination diagrams (e.g., Pearce and Cann, 1973) are based on the arithmetic means of multiple samples, and it is these averages that are published in the literature. Therefore, the discriminant analyses discussed in this paper will not be based on these historic datasets, but will use a newly compiled database of individual analyses.

Another statistical issue that deserves to be mentioned is spurious correlation. Bivariate plots of the form X vs. X/Y, X vs. Y/X or X/Z vs. Y/Z can show some degree of correlation, even when X, Y and Z are completely independent from each other (Figure 4). This effect was first discussed more than a century ago by Pearson (1897), and was brought to the attention of geologists more than half a century ago by Chayes (1949). Spurious correlation is an effect that should be borne in mind when interpreting discrimination diagrams like the Zr/Y-Ti/Y diagram (Pearce and Gale, 1977), the Zr/Y-Zr diagram (Pearce and Norry, 1979), or the Ti/Y-Nb/Y and K$ _2$O/Yb-Ta/Yb diagrams (Pearce, 1982). Note that whereas in Figure 4, X, Y and Z are completely independent, this is never the case for compositional data, due to the constant-sum constraint described before. This only aggravates the problem of spurious correlation.


next up previous
Next: Aitchison's solution to the Up: Tectonic discrimination diagrams revisited Previous: Discriminant analysis
Pieter Vermeesch 2005-11-21