next up previous
Next: Building a Tree Up: METHOD Previous: METHOD

Classification Trees: Introduction

Suppose we have N J-dimensional data points X$^n$={x$_1^n$,...,x$_j^n$,...,x$_J^n$}, (1$\leq$n$\leq$N) belonging to one of K classes: Y$^n$=c$_1$$\vert$...$\vert$$c_k$$\vert$...$\vert$$c_K$. For example, X={X$^1$,...,X$^n$,...,X$^N$} might represent J features (e.g., various major and trace element concentrations, isotopic ratios, color, weight,...) measured in N samples of basalt. c$_1$,...,c$_K$ might then be tectonic affinity, e.g., ``mid-ocean ridge'', ``ocean island'', ``island arc'', ... . The basic idea behind classification and regression trees (CART) is to approximate the parameter space by a piecewise constant function, in other words, to partition X into M disjoint regions {R$_1$,...,R$_m$,...,R$_M$}. An example of such a partition is given in Figure 1.a. Because it is impossible to describe all possible partitions of the feature space, we restrict ourselves to a small subset of possible solutions, the recursive binary partitions (Figure 1.b). Surprising as it may seem, considering the crudeness of this approximation, trees are one of the most powerful and popular data mining techniques in existence. One of the reasons for this is that besides assuring computational feasibility, the recursive partitioning technique described next also allows the representation of the multi-dimensional decision-space as a two dimensional tree graph (Figure 1.c). Although building a tree might seem complicated, this is not important to the user, because it only has to be done once, after which using a classification tree for data interpretation is extremely simple.

Figure 1: An example of a bivariate (X$_1$,X$_2$) classification tree. Classification trees approximate the data space with a piecewise constant function (a). To ensure computational feasibility, a recursive binary partitioning method is used (b). Such partitions have the added advantage that the results can be visualized as a two-dimensional graph or ``tree'' (c). In this and subsequent trees, left branches mean ``Yes'' and right branches ``No''.
Image W3441-fig1


next up previous
Next: Building a Tree Up: METHOD Previous: METHOD
Pieter Vermeesch 2005-12-14