Ensemble Learners
Supervised learning > classification
- generally: (using weak learners) learn over subset of data to learn simple rules => combine simple rules to create a more complex rule
- how to choose subsets:
- bagging: choose uniformly randomly, with replacement => combine by averaging
- boosting: choose and focus on the "hardest" examples (by error) => combine by weighted mean
Pseudo boosting
1 2 3 4 5 6 |
|
Given
- binary classification training data:
all labels - initially
uniform distribution of examples -
at each time step thereafter, construct a new distribution:
wherei.e. starting with the previous distribution, make it bigger or smaller, based on how well the current hypothesis performs => correctly classified examples have less weight and the incorrect ("harder" examples) get more weight
-