One of the assumptions of discriminant analysis is that the elements
of X are statistically independent from each other, apart from
the covariance structure contained in their multivariate normality.
However, geochemical data are generally expressed as parts of a whole
(percent or ppm) and, therefore, are not free to vary independently
from each other. For example, in a three-component system
(A+B+C=100%), increasing one component (e.g., A) causes a decrease in
the two other components (B and C). The constant-sum constraint has
several consequences, besides introducing a negative bias into
correlations between components. One of these consequences is that
the arithmetic mean of compositional data has no physical meaning
(Figure 3). This is very unfortunate because some
popular discrimination diagrams (e.g., Pearce and Cann, 1973) are
based on the arithmetic means of multiple samples, and it is these
averages that are published in the literature. Therefore, the
discriminant analyses discussed in this paper will not be based on
these historic datasets, but will use a newly compiled database of
individual analyses.
Another statistical issue that deserves to be mentioned is spurious correlation. Bivariate plots of the form X vs. X/Y, X vs. Y/X or X/Z vs. Y/Z can show some degree of correlation, even when X, Y and Z are completely independent from each other (Figure 4). This effect was first discussed more than a century ago by Pearson (1897), and was brought to the attention of geologists more than half a century ago by Chayes (1949). Spurious correlation is an effect that should be borne in mind when interpreting discrimination diagrams like the Zr/Y-Ti/Y diagram (Pearce and Gale, 1977), the Zr/Y-Zr diagram (Pearce and Norry, 1979), or the Ti/Y-Nb/Y and KO/Yb-Ta/Yb diagrams (Pearce, 1982). Note that whereas in Figure 4, X, Y and Z are completely independent, this is never the case for compositional data, due to the constant-sum constraint described before. This only aggravates the problem of spurious correlation.