EASE Material

This is an old revision of the document!

NEEMS Lecture: 3. Brief Introduction to Decision Trees

In the previous section we prepared the NEEMS data for training. Here we will get into decision trees, how they are built and how to read them.

The statistic model of choice is a decision tree. Such models have the advantage of being visually inspectable and comprehendible, as well as more or less easy to understand. Based on the Information about the type and parent of a task, the decision tree model predicts the most probable next action to come. In this section, a minimal set of example data is used to train a decision tree. It should give a first grasp of what we are about to do without our NEEMS data later.

The first part of this section showcases what we have done so far on a simple example.

The chapter 3.2 Preview of a Trained Decision Tree shows an example decision tree, based on the example data. It resembles the decision about where to put an object. In each node of the tree, there are 4 to 5 entries. To elaborate on the entries' meaning, we focus on the root node.

object_milk <= 0.5 names a condition. Since all our features have been prepared with one-hot-encoding, their value will always be either 1 or 0, whereas in more intricate models those values may have a continuous floating value ranging between those values. If the object at hand is milk is presented as a numerical value 0.0 or 1.0, where 0.0 evaluates the condition to true, and 1.0 evaluates to false. To understand this, look at a node and negate the first predicate of this condition, meaning the outcome of object_milk <= 0.5 is true when the predicate is not milk (below 0.5). Whether the condition is true or false, the decision tree branches down to the next node, in this case to the left (true), when it's probably not milk, and to the right (false), when it is milk. If it is not milk (true), the tree goes down to the left child node and does the same process of decision as the root node, now investigating if the condition object_cereal <= 0.5 is true. This process goes on until a leaf node is found.

gini = 0.719 gives the Gini Impurity value, which is calculated and explained in section 3.4

samples = 8 tells the number of possible decisions for the whole scenario. In the root, we have all 8 entries of our example_data at our service.

value = [3, 2, 1, 2] tells the sum of decidable goal locations over all objects, sorted like in the table above. [cupboard, dishwasher, drawer, fridge]. The further down we go along the tree, the more decisions have been made, and the fewer possibilities for a decision are left. So if the first node decides, the object is not milk, the predicted goal location will never be fridge, since no other object than milk is going to the fride, an therefore the last entry of this array of numbers is going to be 0 for all upcoming nodes.

class = cupboard is the output decision of the model. In this example the model tries to find a place, where to put an object. If the object is milk (like in the root node) the decision is always to put it in the fridge (branch right from the root node). For any node other that a leaf, the class represents the most likely place to put the object up until this point of decision-making.

3.3 Classification and Regression Trees / 3.4 Gini Impurity

Gini Impurity, just like the entropy of a model, is a measurement for the likelihood of a new random variable to be incorrectly labeled, if it were to be randomly labeled according to the distribution of the labels. The higher the Gini impurity, the less likely successful labeling of a new variable is.

The calculation of the Gini Impurity is shown in the lecture, as well as its implementation. For our example_data this value is at 0.71875, rounded to 0.719 as shown in the decision tree above. We will now find out how to use this calculation for our purposes.

3.5 Cost Function / 3.6. Picking a Threshold / 3.7 Determining the Root node

The goal now is to calculate the impurity depending on which feature of the example_data is chosen as the root node of the decision tree. Therefore a cost function is implemented considering one feature (e.g. 'object_milk') and a threshold for false data.

This cost function is applied to every feature and sorted ascending with their respective cost value. Determining the root node is essential for optimizing the model. A feature with especially high influence in the model is considered to gain a lot of information, which makes it reasonable to put closer to the root of the decision tree.

In the next section we will talk about some more machine learning theory.