Suppose we have N J-dimensional data points
X={x
,...,x
,...,x
}, (1
n
N) belonging
to one of K classes: Y
=c
...
...
. For
example, X={X
,...,X
,...,X
} might represent J
features (e.g., various major and trace element concentrations,
isotopic ratios, color, weight,...) measured in N samples of basalt.
c
,...,c
might then be tectonic affinity, e.g., ``mid-ocean
ridge'', ``ocean island'', ``island arc'', ... . The basic idea
behind classification and regression trees (CART) is to approximate
the parameter space by a piecewise constant function, in other words,
to partition X into M disjoint regions
{R
,...,R
,...,R
}. An example of such a partition is given
in Figure 1.a. Because it is impossible to describe
all possible partitions of the feature space, we restrict ourselves to
a small subset of possible solutions, the recursive binary partitions
(Figure 1.b). Surprising as it may seem,
considering the crudeness of this approximation, trees are one of the
most powerful and popular data mining techniques in existence. One of
the reasons for this is that besides assuring computational
feasibility, the recursive partitioning technique described next also
allows the representation of the multi-dimensional decision-space as a
two dimensional tree graph (Figure 1.c). Although
building a tree might seem complicated, this is not important to the
user, because it only has to be done once, after which using a
classification tree for data interpretation is extremely simple.
![]() |