CART and random forests
Posted: 2016-06-29 , Modified: 2016-06-29
Posted: 2016-06-29 , Modified: 2016-06-29
See Ch. 16 of Murphy. Presentation
An adaptive basis-function model (ABM) is a model of the form \[ f(x) = w_0+\sumo mM w_m \phi_m(x).\] Typically \(\phi_m(x) = \phi(x;v_m)\)
Decision trees recursively partition the input space and define a local model on each region.
For example, if the model is constant on each region, \(f(x) = \sumo mM w_m (x\in R_m)\).
At each node, consider these kinds of splits:
The cost can be regression or classification cost. Sum the costs for each leaf. Cost:
Algorithm:
Advantages:
Disadvantages:
Bagging (bootstrap aggregating): Train \(M\) different trees on independently selected subsets of the data, and compute \(f(x)=\sumo mM \rc M f_m(x)\).
(OR: use boosting instead of taking majority vote.)
BUT this can result in highly correlated predictors.
Random forest: Decorrelate base learners by learning rees based on a randomly chosen subset of input variables and data points.
(What are the right parameters? \(\sqrt d\) features?)
Bayesian adaptive regression trees (BART).
? Hierarchical mixtures of experts.
This is usually too myopic. Instead use pruning. Grow the full tree, evaluate the cross-validated error on subtrees, and pick a minimal tree whose CV error is within 1 se of the min.↩