Skip to content

Ensemble Learners

Supervised learning > classification

  • generally: (using weak learners) learn over subset of data to learn simple rules => combine simple rules to create a more complex rule
  • how to choose subsets:
    • bagging: choose uniformly randomly, with replacement => combine by averaging
    • boosting: choose and focus on the "hardest" examples (by error) => combine by weighted mean

Pseudo boosting

1
2
3
4
5
6
for t = 1 to T  # time step
    construct D_t distribution
    find weak classifier h_t(x) with small error 
    # where error: e_t = P_t(h_t(x) != y_i)

combine and output H_final

Given

  • binary classification training data: {xi,yi} all labels {1,1}
  • initially D1(i)=1/n uniform distribution of examples
  • at each time step thereafter, construct a new distribution:

    Dt+1(i)=Dt(i)eαtyiht(xi)z(t)     where αt=12ln1etet

    i.e. starting with the previous distribution, make it bigger or smaller, based on how well the current hypothesis performs => correctly classified examples have less weight and the incorrect ("harder" examples) get more weight

  • Hfinal(x)=sgn(tαtht(x))