Rule-Based Learning

Supervised learning > classification

classify records based on a collection of if...then rules
$(Condition) \to y$ where
- condition is a conjunction of tests on attributes
- $y$ is the class label
rules covers an instance is hte instance satisfies the condition of the rule
rule coverage - fraction of records that satisfy the antecedent of a rule
rule accuracy - fraction of records that satisfy the antecedent and consequent of the rule
rules should be
- mutually exclusive: rules are independent of each other and every record is covered by at most one rule
- exhaustive: coverage accounts for every possible combination of attribute values, and each record is covered by at least one rule
if rules is not mutually exclusive use: (1) ordered rule set or (2) voting scheme
- ordered rule set means ordering rules by rank/priority; also called decision list
- classifier chooses the rule with highest rank, or default class if no rule applies
if rules are not exhaustive use a default class for fallthrough instances
methods for building rule-based classifiers
- direct methods - extract rules directly from data - e.g. RIPPER< CN2, Holte's 1R
- indirect methods - extract rules from classification model (NN, decision tree, etc.) - e.g. C4.5rules

Pseudo algorithm

Start from an empty rule
Grow a rule by using Learn-One-Rule function (use Foil's information gain for evaluation)
Remove training records covered by the rule
Repeat from 2 until stop criterion is met

For 2-class problem choose one class as positive other as negative

For multi-class problem

order the classes according to increasing class prevalence (fraction of instances belonging to a particular class)
learn the rule set for the smallest class first, treat the rest as negative class
repeat with next smallest class as positive class

Growing a rule

Building a rule set

use sequential coverage algorithm finds the best rule that covers the current set of positive examples
eliminate both positive and negative examples covered by the rule

Similar to decision tress in properties: equally expressive, easy to interpret, comparable performance
can handle redundant and irrelevant attributes
variable interaction can be problematic (e.g. XOR)
better suited for handling imbalanced classes
handling missing values is problematic