Skip to content

Rule-Based Learning

Supervised learning > classification

  • classify records based on a collection of if...then rules
  • \(\text{(Condition)} \rightarrow y\)   where
    • condition is a conjunction of tests on attributes
    • \(y\) is the class label
  • rules covers an instance is hte instance satisfies the condition of the rule
  • rule coverage - fraction of records that satisfy the antecedent of a rule
  • rule accuracy - fraction of records that satisfy the antecedent and consequent of the rule
  • rules should be
    • mutually exclusive: rules are independent of each other and every record is covered by at most one rule
    • exhaustive: coverage accounts for every possible combination of attribute values, and each record is covered by at least one rule
  • if rules is not mutually exclusive use: (1) ordered rule set or (2) voting scheme
    • ordered rule set means ordering rules by rank/priority; also called decision list
    • classifier chooses the rule with highest rank, or default class if no rule applies
  • if rules are not exhaustive use a default class for fallthrough instances

  • methods for building rule-based classifiers

    • direct methods - extract rules directly from data - e.g. RIPPER< CN2, Holte's 1R
    • indirect methods - extract rules from classification model (NN, decision tree, etc.) - e.g. C4.5rules

Pseudo algorithm

  1. Start from an empty rule
  2. Grow a rule by using Learn-One-Rule function (use Foil's information gain for evaluation)
  3. Remove training records covered by the rule
  4. Repeat from 2 until stop criterion is met

Direct Method: RIPPER

For 2-class problem choose one class as positive other as negative

  • Learn rules for the positive class
  • Negative class will be the default class

For multi-class problem

  • order the classes according to increasing class prevalence (fraction of instances belonging to a particular class)
  • learn the rule set for the smallest class first, treat the rest as negative class
  • repeat with next smallest class as positive class

Growing a rule

  • start with empty rule
  • add conjuncts as long as they improve FOIL's information gain
  • stop when rule no longer covers negative examples

Building a rule set

  • use sequential coverage algorithm finds the best rule that covers the current set of positive examples
  • eliminate both positive and negative examples covered by the rule

Evaluation

  • Similar to decision tress in properties: equally expressive, easy to interpret, comparable performance
  • can handle redundant and irrelevant attributes
  • variable interaction can be problematic (e.g. XOR)
  • better suited for handling imbalanced classes
  • handling missing values is problematic