• whitepaper

    Using Decision Trees



    This short tutorial illustrates why decision trees can offer a more practical way of capturing knowledge than coding rules in more conventional languages. Note that screen shots are taken from the earlier XpertRule KBS (Knowledge Based System) although the same principles apply to XpertRule Knowledge Builder.

    The Company’s accounts department wants to build a simple decision making system that captures the logic by which they pass or reject claims for hotel expenses from their employees. These decisions are based on things like the Grade of the employee (Director, Senior Manager or Junior Manager) and the type of Hotel they stayed in (a quality rating of A, B, or C). They want to automate the processing of claims so that , for example, when a claim is sent in by an employee with a Grade of Senior Manager and the Hotel stayed in was type A then the system would Reject the claim.

    Step one might be to assign an expert from the accounts department to handle the project. The department gives some thought to writing the rules in COBOL or C, as the suggested syntax of IF condition AND condition THEN outcome would be quite easy to code in computer languages such as these. Another option considered is why not keep these rules in a database? They could use an industry standard database, have an index of rules, sort the rules by topic and easily do “find & replace” changes. Their confidence in the expert being able to author the rules is high. Let’s imagine that is just what the expert started to do for their expenses claims rule base.

    Here are the expert’s first attempt at hand crafted rules. Are they ready to be coded by a programmer or put into a database? Are they good quality rules? Do you see anything wrong?

    IF Grade = Director AND Sex = Male THEN Pass
    IF Grade = Director AND Sex = Female THEN Pass
    IF Grade = Senior_Manager AND Hotel = A THEN Reject
    IF Grade = Senior_Manager AND Hotel = B AND Sex = Male THEN Pass
    IF Grade = Junior_Manager AND Hotel = A AND Sex = Female THEN Reject
    IF Grade = Senior_Manager AND Hotel = C THEN Pass
    IF Grade = Senior_Manager AND Hotel = B AND Sex = Male THEN Reject
    IF Grade = Junior_Manager AND Hotel = B THEN Reject
    IF Grade = Junior_Manager AND Hotel = B THEN Reject
    IF Grade = Junior_Manager AND Hotel = C THEN Pass
    IF Grade = Junior_Manager AND Hotel = A THEN Reject

    There are only eleven rules. How long did it take you to find the expert’s errors? It’s not too difficult given some time.

    Let’s put these rules into a table style format, as shown below. We can enter “decision tables” just like this into XpertRule KBS. We have put the factors (attributes) as the column heading and can “read off” our eleven rules like this, starting from line 1:

    If Grade = director (* means just ignore whatever Hotel is) and Sex = female then pass.




    We have our same eleven rules entered in the expert’s same order.

    This could be a table in a database. In a database, even with more rules, sorting and sub sorting could have identified the fact that we appear to have two rules like this (the highlighted 4 and 7) where the values are the same, but the outcome is contradictory. One of these two rules is wrong or perhaps there is some other factor (attribute) that the expert needs to take into account and modify the rule base. But, would such sorting or visual searching be very practical in real life? A bit of expertise in software might help our expert. Using XpertRule, a few mouse-clicks would automatically generate a decision tree from our table of rules, as shown below:




    We can “read off” the actual rule set by following the tree branching from the top to each Pass/Reject outcome. If you do this you will see that our expert’s knowledge so far only really consists of the six rules shown below.

    IF Grade = Director THEN Pass
    IF Grade = Senior_Manager AND Hotel = A THEN Reject
    IF Grade = Senior_Manager AND Hotel = B THEN (Clash)Pass/Reject
    IF Grade = Senior_Manager AND Hotel = C THEN Pass
    IF Grade = Junior_Manager AND Hotel = A or B THEN Reject
    IF Grade = Junior_Manager AND Hotel = C THEN Pass

    The last rule IF Grade = Junior_Manager AND Hotel = A or B THEN Reject is just shorthand that the expert could have used anyway, and reduced the number of rules in the first place. The main point is not the reduction of rules (although this invariably happens in real life) but that we have been able to validate the QUALITY of the rules. If we scale this up to the real world and have more factors (attributes) and more rules then the problem of validation becomes very real. Bringing in another expert who is better at hand crafting rules is also a problem of validation. How do you test the expert’s ability?

    To build a successful and quality knowledge base our domain experts must have some help to:

    • Identify gaps in the logic: Lists of rules don’t illustrate these at all well, but decision trees are very good at avoiding gaps in your logic.
    • Identify conflicts: The discovery of the (Clash) by automatic tree building illustrates this. If the tree had been created manually by the expert this error would also not have been made.
    • Identify missing factors: This overlaps the previous point. The (Clash) suggests some missing factor to our expert, who might need to make the tree branch further at this point.
    • Identify redundant factors and rules: Our expert has no sex in his rule base! The expert should not be discriminating against people like this of course, but thankfully, sex has been shown to be redundant. We will sack this expert anyway for even thinking such a thing!

    We mentioned using OR conditions before. When we start using more complex syntax like this in more conventional rule statements, such as within programming languages, these statements will get harder to follow. Individually they may be quite simple to read, but they get much harder to manually validate.

    Here is a hand crafted decision tree that illustrates the rich variety of expressions that you can use. It’s not the simple If A = B then C logic that we used above, but its still just as simple to understand, simply because of the graphical structure.




    In trees like this we can use:

    • Numeric/date compare
    • Compare attributes
    • Group values
    • ‘Otherwise’ splits
    • Multiple (list) outcomes & attributes

    Structuring Knowledge

    In real world systems with hundreds or thousands of rules the expert needs to be able to break down the knowledge into manageable units. Just as you don’t want a big bag of rules, you also don’t want the mother of all decision trees. The simple task structure below illustrates a typical “backward chain”, where the attribute Hotel is now also a decision making “task” with its own decision tree (i.e. its own rule set). In order to evaluate the Hotel attribute, the main tree would call the Hotel tree to decide on the value A, B, or C and then return it back to the higher level decision tree so that it can continue.




    This also allows a variety of techniques for knowledge capture to be used for each individual task. The decision making tasks make the components understandable, both as individual units and to show the hierarchy of the knowledge.

    A task Map enables you to review and navigate around your application, as shown by this illustration of an application for corporate loans. You can simply zoom in and out of the tasks.