Next: Conclusions Up: Tectonic discrimination diagrams revisited Previous: Ternary discrimination diagrams

Testing the results

Some of the discrimination diagrams of the previous section were extremely good at classifying the training data. However, as briefly mentioned in Section 5, the resubstitution error is not the best way to assess performance on future data. Furthermore, QDA nearly always performed better than LDA, because the former involves more parameters than the latter. As the number of parameters in a model increases, its ability to resolve even the smallest subtleties in the training data improves. In a regression context, this would correspond to adding terms to a polynomial interpolator (Figure 36). For a very large number of parameters (equaling or exceeding the number of datapoints), the curve will eventually pass through all the points and the ``error'' (e.g., squared distance) will become zero. In other words, the high order polynomial model has zero bias. However, unbiased models rarely are the best predictive models, because they suffer from high variance. High-order polynomial models built on different sets of training data are likely to look significantly different because of irreproducible random variations in the sampling or measuring process. On the other hand, a one-parameter linear model will have low variance, but can be very biased (e.g., when the true model is really polynomial). This phenomenon is called the bias-variance tradeoff, and exists for all data mining methods.

By assuming equal covariance between the different classes of the training data, LDA is a very crude approximation of the data space. Therefore, it is likely to be quite biased in many cases. However, because of the bias-variance tradeoff, the variance of the LDAs described in previous sections is low. Therefore, the resubstitution error might actually be a decent estimator of future performance. However, things are different for QDA because it estimates the covariance of each of the classes from the training data, thereby dramatically increasing the number of parameters in the model. Although this reduces the bias (i.e., a QDA describes the training data better than an LDA), it causes an increased variance. For example, some of the intricate structure of Figures 16 or 20 might not be very stable. Therefore, the resubstution error is not a good predictor of future performance. It must also not be used for comparing the performances of bivariate and ternary discrimination diagrams.

The easiest way to obtain a more objective estimate of future performance is to use a second database of test data, which had not been used for the construction of the discrimination diagrams. Implementing this idea, a database of 182 test data was compiled from three locations:

67 IABs from the Aleutian arc.
55 MORBs from the Galapagos ridge.
60 OIBs from the Pitcairn islands.

All previously discussed discrimination diagrams are represented in the error-analysis of Table 5. The left part of the table shows the resubstitution errors, while the right side shows the performance on the test-data. Figures 37 - 46 show the test data plotted on the binary and ternary discrimination diagrams. The new decision boundaries are shown in both log-ratio space and conventional compositional data space. As explained in Section 2, the decision boundaries are linear for LDA in log-ratio space. To allow an easy reproduction of these decision boundaries, four ``anchor points'' are provided for each LDA in Figure 21, 22, 37 - 46 and Table 6. Figures 37 - 41 and Table 7 allow a direct comparison of the decision boundaries of Shervais (1982), Pearce and Cann (1976), Meschede (1986) and Wood (1980) with the new decision boundaries constructed using LDA and QDA. Although it is hard to make a definite comparison due to the relatively small size of the effectively used test dataset, the new decision boundaries seem to always perform at least as well as the old ones. Because the test dataset is much smaller than the training dataset, it is more likely affected by the missing-data problem. For example, the test data contained no MORBs that had been simultaneously analysed for Th, Ta and Hf. For all the discrimination diagrams of Table 5, QDA performs better than LDA on the training data. On the other hand, LDA often performs better than QDA on the test data because of its lower variance. For example, LDA misclassified 17 out of 85 test samples using Ti, Zr and Y, whereas QDA misclassified 38 using the same three elements (Table 5). However, in most cases the difference is not so dramatic.

Next: Conclusions Up: Tectonic discrimination diagrams revisited Previous: Ternary discrimination diagrams

Pieter Vermeesch 2005-11-21