Rule-Based Learning
Supervised learning > classification
- classify records based on a collection of
if...then
rules - \(\text{(Condition)} \rightarrow y\) where
- condition is a conjunction of tests on attributes
- \(y\) is the class label
- rules covers an instance is hte instance satisfies the condition of the rule
- rule coverage - fraction of records that satisfy the antecedent of a rule
- rule accuracy - fraction of records that satisfy the antecedent and consequent of the rule
- rules should be
- mutually exclusive: rules are independent of each other and every record is covered by at most one rule
- exhaustive: coverage accounts for every possible combination of attribute values, and each record is covered by at least one rule
- if rules is not mutually exclusive use: (1) ordered rule set or (2) voting scheme
- ordered rule set means ordering rules by rank/priority; also called decision list
- classifier chooses the rule with highest rank, or default class if no rule applies
-
if rules are not exhaustive use a default class for fallthrough instances
-
methods for building rule-based classifiers
- direct methods - extract rules directly from data - e.g. RIPPER< CN2, Holte's 1R
- indirect methods - extract rules from classification model (NN, decision tree, etc.) - e.g. C4.5rules
Pseudo algorithm
- Start from an empty rule
- Grow a rule by using Learn-One-Rule function (use Foil's information gain for evaluation)
- Remove training records covered by the rule
- Repeat from 2 until stop criterion is met
Direct Method: RIPPER
For 2-class problem choose one class as positive other as negative
- Learn rules for the positive class
- Negative class will be the default class
For multi-class problem
- order the classes according to increasing class prevalence (fraction of instances belonging to a particular class)
- learn the rule set for the smallest class first, treat the rest as negative class
- repeat with next smallest class as positive class
Growing a rule
- start with empty rule
- add conjuncts as long as they improve FOIL's information gain
- stop when rule no longer covers negative examples
Building a rule set
- use sequential coverage algorithm finds the best rule that covers the current set of positive examples
- eliminate both positive and negative examples covered by the rule
Evaluation
- Similar to decision tress in properties: equally expressive, easy to interpret, comparable performance
- can handle redundant and irrelevant attributes
- variable interaction can be problematic (e.g. XOR)
- better suited for handling imbalanced classes
- handling missing values is problematic