next up previous
Next: Pruning a Tree Up: METHOD Previous: Classification Trees: Introduction

Building a Tree

At any point in the recursive process, a partition is defined by two quantities: the split variable j (1$\leq$j$\leq$J) and the split point s (-$\infty$$<$s$<$$\infty$). For any given partition R$_m$ of a tree T, there are at most N$\times$J possible binary subpartitions, which can be exhaustively searched. We choose the one that minimizes the ``node impurity'' Q$_m$(T). Let $\hat{p}_{mk}$ be the proportion of class k observations in node m, then


\begin{displaymath}
Q_m(T) = \sum_{k=1}^{K} \hat{p}_{mk}(1-\hat{p}_{mk})
\end{displaymath} (1)

This particular form of Q$_m$(T) is called the ``Gini index of diversity'', but alternatives exist. Note that if we simply used the ``misclassification error'' (i.e., the number of misclassified data points resulting from a candidate split) as a measure of node impurity, it would be impossible to partition the data set shown in Figure 1 because any initial split would misclassify all the triangles, yielding the same misclassification error (=23/63 in this case). The recursive partitioning process continues until all the end-nodes are ``pure'', i.e. all belong to the same class. The maximum sized tree thus obtained perfectly describes the training data. In other words, it has zero bias. However, for the purpose of prediction, this tree is not optimal, because it overfits the training data, causing high variance. The tree with optimal predictive power will be smaller than the largest possible tree, and can be found by ``cost-complexity pruning''.


next up previous
Next: Pruning a Tree Up: METHOD Previous: Classification Trees: Introduction
Pieter Vermeesch 2005-12-14