Some of the discrimination diagrams of the previous section were
extremely good at classifying the training data. However, as briefly
mentioned in Section 5, the resubstitution error is
not the best way to assess performance on future data. Furthermore,
QDA nearly always performed better than LDA, because the former
involves more parameters than the latter. As the number of parameters
in a model increases, its ability to resolve even the smallest
subtleties in the training data improves. In a regression context,
this would correspond to adding terms to a polynomial interpolator
(Figure 36). For a very large number of
parameters (equaling or exceeding the number of datapoints), the curve
will eventually pass through all the points and the ``error'' (e.g.,
squared distance) will become zero. In other words, the high order
polynomial model has zero bias. However, unbiased models rarely
are the best predictive models, because they suffer from high
variance. High-order polynomial models built on different sets of
training data are likely to look significantly different because of
irreproducible random variations in the sampling or measuring process.
On the other hand, a one-parameter linear model will have low
variance, but can be very biased (e.g., when the true model is really
polynomial). This phenomenon is called the bias-variance
tradeoff, and exists for all data mining methods.
By assuming equal covariance between the different classes of the
training data, LDA is a very crude approximation of the data space.
Therefore, it is likely to be quite biased in many cases. However,
because of the bias-variance tradeoff, the variance of the LDAs
described in previous sections is low. Therefore, the resubstitution
error might actually be a decent estimator of future performance.
However, things are different for QDA because it estimates the
covariance of each of the classes from the training data, thereby
dramatically increasing the number of parameters in the model.
Although this reduces the bias (i.e., a QDA describes the training
data better than an LDA), it causes an increased variance. For
example, some of the intricate structure of Figures
16 or 20 might not be very
stable. Therefore, the resubstution error is not a good predictor of
future performance. It must also not be used for comparing the
performances of bivariate and ternary discrimination diagrams.
The easiest way to obtain a more objective estimate of future performance is to use a second database of test data, which had not been used for the construction of the discrimination diagrams. Implementing this idea, a database of 182 test data was compiled from three locations:
All previously discussed discrimination diagrams are represented in the error-analysis of Table 5. The left part of the table shows the resubstitution errors, while the right side shows the performance on the test-data. Figures 37 - 46 show the test data plotted on the binary and ternary discrimination diagrams. The new decision boundaries are shown in both log-ratio space and conventional compositional data space. As explained in Section 2, the decision boundaries are linear for LDA in log-ratio space. To allow an easy reproduction of these decision boundaries, four ``anchor points'' are provided for each LDA in Figure 21, 22, 37 - 46 and Table 6. Figures 37 - 41 and Table 7 allow a direct comparison of the decision boundaries of Shervais (1982), Pearce and Cann (1976), Meschede (1986) and Wood (1980) with the new decision boundaries constructed using LDA and QDA. Although it is hard to make a definite comparison due to the relatively small size of the effectively used test dataset, the new decision boundaries seem to always perform at least as well as the old ones. Because the test dataset is much smaller than the training dataset, it is more likely affected by the missing-data problem. For example, the test data contained no MORBs that had been simultaneously analysed for Th, Ta and Hf. For all the discrimination diagrams of Table 5, QDA performs better than LDA on the training data. On the other hand, LDA often performs better than QDA on the test data because of its lower variance. For example, LDA misclassified 17 out of 85 test samples using Ti, Zr and Y, whereas QDA misclassified 38 using the same three elements (Table 5). However, in most cases the difference is not so dramatic.