Ensemble Learners

Supervised learning > classification

generally: (using weak learners) learn over subset of data to learn simple rules => combine simple rules to create a more complex rule
how to choose subsets:
- bagging: choose uniformly randomly, with replacement => combine by averaging
- boosting: choose and focus on the "hardest" examples (by error) => combine by weighted mean

Pseudo boosting

for t = 1 to T  # time step
    construct D_t distribution
    find weak classifier h_t(x) with small error 
    # where error: e_t = P_t(h_t(x) != y_i)

combine and output H_final

Given

binary classification training data: ${x_{i}, y_{i}}$ all labels ${1, - 1}$
initially $D_{1} (i) = 1 / n$ uniform distribution of examples
at each time step thereafter, construct a new distribution:

$D_{t + 1} (i) = \frac{D_{t} (i) \cdot e^{- α_{t} y_{i} h_{t} (x_{i})}}{z (t)}$ where $α_{t} = \frac{1}{2} \ln \frac{1 - e_{t}}{e_{t}}$

i.e. starting with the previous distribution, make it bigger or smaller, based on how well the current hypothesis performs => correctly classified examples have less weight and the incorrect ("harder" examples) get more weight
$H_{final} (x) = sgn (\sum_{t} α_{t} h_{t} (x))$