CART and random forests

Posted: 2016-06-29 , Modified: 2016-06-29

Tags: CART, adaptive basis functions, random forests

See Ch. 16 of Murphy. Presentation

An adaptive basis-function model (ABM) is a model of the form \[ f(x) = w_0+\sumo mM w_m \phi_m(x).\] Typically \(\phi_m(x) = \phi(x;v_m)\)

CART (Classification and regression trees)

Decision trees recursively partition the input space and define a local model on each region.

For example, if the model is constant on each region, \(f(x) = \sumo mM w_m (x\in R_m)\).

At each node, consider these kinds of splits:

The cost can be regression or classification cost. Sum the costs for each leaf. Cost:

Algorithm:

  1. Start at the root node of a single-node tree, and put all data points at that node.
  2. Find the split at the current node (attribute) that minimizes the cost (maximizes information gain). (If it is deemed not worth splitting, e.g. it doesn’t decrease the cost by much1, if it’s reached a specified depth, etc., then move on instead.) Make the split. (The data points are now distributed among the two children.)
  3. Add both child nodes. to the queue.
  4. Continue the algorithm (at 2) with the next node in the queue.

Advantages:

Disadvantages:

Random forests

Bagging (bootstrap aggregating): Train \(M\) different trees on independently selected subsets of the data, and compute \(f(x)=\sumo mM \rc M f_m(x)\).

(OR: use boosting instead of taking majority vote.)

BUT this can result in highly correlated predictors.

Random forest: Decorrelate base learners by learning rees based on a randomly chosen subset of input variables and data points.

(What are the right parameters? \(\sqrt d\) features?)

Bayesian adaptive regression trees (BART).

? Hierarchical mixtures of experts.


  1. This is usually too myopic. Instead use pruning. Grow the full tree, evaluate the cross-validated error on subtrees, and pick a minimal tree whose CV error is within 1 se of the min.