Differences

This shows you the differences between two versions of the page.

--- ease:machinelearning [2020/06/22 09:37] – s_fuyedc
+++ ease:machinelearning [2020/06/22 11:36] (current) – s_fuyedc
@@ Line 1: / Line 1: @@
 ====== Machine Learning through NEEMS ======
-This tutorial gives an overview of the lecture about narrative enabled episode memories (NEEMS) by Sebastian Koralewski.
+This lecture teaches about narrative enabled episode memories (NEEMS) held by Sebastian Koralewski.
 Nowadays machine learning is applicable for a large variety of cases. This lecture shows how to use the recordings of activities performed by a robot in a kitchen environment to predict a likely course of action, based on the observed data and their probability of success, e.g. setting up a table for breakfast, or cleaning it up afterward.
-The platform of choice is Jupyter Notebook, a handy framework that is ready to launch in any python environment. Please find the repository's README for a setup manual. Within the Jupyter environment, there are explanations on how to solve the tasks at hand. Most of the implementation utilizes the //Pandas// and //sklearn// package, which makes analyzing and learning data easily possible.
+===== Setup =====
+The platform of choice is Jupyter Notebook, a handy framework that is ready to launch in any python environment. Within the Jupyter Notebook environment, there are explanations on how to solve the tasks at hand. Most of the implementation utilizes the //Pandas// and //sklearn// package, which makes analyzing and learning data easily possible.
 Some example data is provided with this tutorial in a CSV format. They contain records of robot activities, the so-called NEEMS.
-[[https://github.com/code-iai/ease-fall-school-day-3-exercises|Link to the lecture.]]
+Please follow the [[https://github.com/code-iai/ease-fall-school-day-3-exercises|link to the Jupyter Notebook]] and open the README.md. It contains all relevant information on how to set up and start the notebook on your machine. A working Linux, MacOS or Windows machine with internet connection is required.
+When the setup is finished your default browser should open the Notebook. In the browser, the filesystem of the Notebook is shown. Click on the Exercises.ipynb to get to the lecture's content. The following figure shows the toolbar of the Notebook.
+{{ :ease:navbar.png |}}
+From left to right:
+  *Save the notebook with all your changes to Exercise.ipynb on your machine.
+  *Inserting a cell block won't be necessary for this lecture, since all necessary blocks are already included.
+  *Cut, copy and paste can be either done with these buttons or via the usual CTRL shortcuts.
+  *Moving cells up and down isn't necessary as well and would potentially break the logical structure of the lecture.
+  *Running a code block can either be done with this button or with //CTRL-Enter//.
+  *Interrupting the kernel is useful when a code block takes to long to be executed. You can change your code and re-run the block again. Remember: training a model might take some time, depending on your machines capacity.
+  *Restarting the kernel resets all local variables and definitions from executing the code blocks.
+  *Restart & running all code blocks can be helpful when picking up this lecture at a later point, when you have already implemented some of the functionality and don't want to execute every block one by one.
+  *Markdown is the default language of choice for writing plain text. You don't need to change this.
+  *The full palette of possible commands can be opened with the right-most button. Feel free to investigate further possibilities, but it won't be necessary to complete this lecture.
+Hitting the TAB button while coding can be helpful for auto-completion.
+Remember to always execute all code blocks from top to bottom, either by executing one by one or via the //restart kernel & re-run// button. Using the latter won't cause much trouble, since later code-blocks are simply missing some functionality which can be added while you progress through the Notebook.
 ===== NEEMS Lecture =====
-In the browser, the filesystem of the repository is shown. Click on the Exercises.ipynb to get to the tutorial. The code snippets can be executed with //Ctrl-Enter// or by hitting the //Run// button in the navigation bar at the top. The code needs to be executed step by step and the program state is kept throughout the tutorial, such that earlier declared variables can be used later.
+This lecture is granulated into six consecutive sections. Since the Jupyter Notebook is a Python program itself, it is important to do each section in the given order. If you separate the lecture over several sessions, remember to execute the code from previous sections before continuing your work.
   *[[ease:machinelearning:visualizing_the_data|1. Data Analysis - Visualizing the Data]]
@@ Line 22: / Line 44: @@
   *[[ease:machinelearning:classifier_evaluation|6. Evaluate the Next Action Classifier]]
-=== Visualizing the Data ===
-In the first section, we analyze a log from an actual robot performance. Every action the robot has done within a chain of tasks is recorded with the help of the knowledge base framework [[http://knowrob.org/|KnowRob]]. The output has already been prepared as a CSV file and can be loaded in this Python context. With the [[https://pandas.pydata.org/|Pandas]] package this data is analyzed and with matplotlib visualized.
-{{ :ease:neems_piechart.png |}}
-The goal of this whole tutorial is to predict the next robot action, based on the probable likelihood of action following another. Every entry in the knowledge log is one recorded action; it has an entry for its duration (time to finish), the hierarchical parent action as well as the sequentially previous and next action. Parent action can be understood as a higher-order purpose, including multiple little activities, e.g. picking up an object includes moving to the object, perceiving it, and finally picking it up. The slots startTime and endTime are not set to anything meaningful and are independent of the duration slot.
-Feel free to look into the narratives' data object. Also, find the constant strings of the table column headers in the file header_names.py, which has been imported in this section's code cells.
-==== Solutions ====
-When the exercise is opened freshly, remember to execute all the code snippets again, such that the variables and functions are defined and usable later in the lecture. Hit the button in the header of the lecture to execute all code blocks at once. If you get stuck somewhere in the lecture, feel free to ask the tutor or consider the following solutions.
-== 1. Data Analysis ==
-<code python>#TODO Create a piechart for label 'Next'
-narratives[header_names.NEXT].value_counts().plot.pie(figsize=(10,10),autopct='%1.1f%%')
-</code>
-== 2.1 Filling Empty Cells ==
-We are working on the //narratives// variable here. If you want to check your implementation, put the following code below the implementation of the function, and execute the line.
-<code python># Print modified data
-fill_empty_cells(narratives)
-# Check if the code is working
-fill_empty_cells(narratives).isna().any()
-# If all entries tell 'false', the function works.</code>
-This function takes care of null entries in the data, and replaces those entries with predefined values.
-<code python>
-# Solution
-def fill_empty_cells(data):
-    filled_data = data.copy()
-    filled_data[header_names.PARENT]= filled_data[header_names.PARENT].fillna('NoParent')
-    #TODO Fill the rest of the remaining empty cells
-    filled_data[header_names.NEXT]= filled_data[header_names.NEXT].fillna('NoNext')
-    filled_data[header_names.PREVIOUS]= filled_data[header_names.PREVIOUS].fillna('NoPrevious')
-    return filled_data
-</code>
-== 2.2 Transform Categorical Values to Numeric Values ==
-One-hot-encoding transforms our data into values of 0 or 1, which makes it easier to work with. When you print out the function output on the narratives data, scroll to the left to see that the table has expanded.
-<code python>
-def transform_categorial_to_one_hot_encoded(data):
-    encoded_data = data.copy()
-    encoded_parent_data = pd.get_dummies(encoded_data[header_names.PARENT], prefix='parent')
-    encoded_data = pd.concat([encoded_data, encoded_parent_data],axis=1)
-    #TODO Transform the rest of the categorial features into one hot encoded features
-    #Hint: NEXT must not be encoded
-    encoded_type_data = pd.get_dummies(encoded_data[header_names.TYPE], prefix='type')
-    encoded_previous_data = pd.get_dummies(encoded_data[header_names.PREVIOUS], prefix='previous')
-    encoded_data = pd.concat([encoded_data,
-                              encoded_type_data,
-                              encoded_previous_data],axis=1)
-    return encoded_data
-</code>
-== 2.3 Data Cleaning ==
-For predicting which action follows upon another action, we don't need any of the initial columns, only the ones generated by one-hot-encoding. //Remember to also remove the ID column!//
-<code python>
-def clean(data):
-    cleaned_data = data.copy()
-    #TODO Decide which columns are not required to be able to predict the next robot action
-    #Hint: The NEXT column IS required.
-    cols = [header_names.PARENT,
-            header_names.TYPE,
-            header_names.START_TIME,
-            header_names.END_TIME,
-            header_names.DURATION,
-            header_names.PREVIOUS,
-            header_names.ID]
-    for col in cols:
-        cleaned_data = cleaned_data.drop(col, 1)
-    return cleaned_data
-</code>
-== 2.4 Data Preparation Pipeline ==
-Simply apply the three functions above on the //narratives// data.
-<code python>
-def prepare_data(data):
-    prepared_data = data.copy()
-    prepared_data = fill_empty_cells(prepared_data)
-    #TODO apply all preparation methods on prepare_data
-    prepared_data = transform_categorial_to_one_hot_encoded(prepared_data)
-    prepared_data = clean(prepared_data)
-    return prepared_data
-</code>
-== 2.5 Prepared Data Evaluation ==
-<code python>
-#TODO store the prepared narratives in a prepared_narratives variable and evalute them by printing them
-prepared_narratives = prepare_data(narratives)
-</code>
-<code python>
-#TODO verifiy that the prepared narratives do not have any empty cells
-prepared_narratives.isna().any()
-</code>
-== 3. Brief Introduction to Decision Trees ==
-The statistic model of choice is a decision tree. Such models have the advantage of being visually inspectable and comprehendible, as well as more or less easy to understand. Based on the Information about the //type// and //parent// of a task, the decision tree model predicts the most probable //next// action to come. In this section, a minimal set of example data is used to train a decision tree. It should give a first grasp of what we are about to do without our NEEMS data later.
-The first part of this section showcases what we have done so far on a simple example.
-The chapter **3.2 Preview of a Trained Decision Tree** shows an example decision tree, based on the example data. It resembles the decision about where to put an object. In each node of the tree, there are 4 to 5 entries. To elaborate on the entries' meaning, we focus on the root node.
-{{ :ease:example_tree.png |}}
-//object_milk %%<=%% 0.5// names a condition. If the object at hand is milk is presented as a numerical value between 0.0 and 1.0, where a value below 0.5 means true, and above 0.5 is false. To understand this, look at a node and negate the first predicate of this condition, meaning the outcome of //object_milk %%<= 0.5%%// is true when the predicate is **not** milk (below 0.5). Whether the condition is true or false, the decision tree branches down to the next node, in this case to the left (true), when it's probably **not** milk, and to the right (false), when it **is** milk. If it is not milk (true), the tree goes down to the left child node and does the same process of decision as the root node, now investigating if the condition //object_cereal %%<=%% 0.5// is true. This process goes on until a leaf node is found.
-//gini = 0.719// gives the Gini Impurity value, which is calculated and explained in section 3.4
-//samples = 8// tells the number of possible decisions for the whole scenario. In the root, we have all 8 entries of our example_data at our service.
-//value = [3, 2, 1, 2]// tells the sum of decidable goal locations over all objects, sorted like in the table above. [cupboard, dishwasher, drawer, fridge]. The further down we go along the tree, the more decisions have been made, and the fewer possibilities for a decision are left. So if the first node decides, the object is **not** milk, the predicted goal location will never be //fridge//, since no other object than milk is going to the fride, an therefore the last entry of this array of numbers is going to be 0 for all upcoming nodes.
-//class = cupboard// is the output decision of the model. In this example the model tries to find a place, where to put an object. If the object is milk (like in the root node) the decision is always to put it in the fridge (branch right from the root node). For any node other that a leaf, the class represents the most likely place to put the object up until this point of decision-making.
-**3.3 Classification and Regression Trees / 3.4 Gini Impurity**
-Gini Impurity, just like the //entropy// of a model, is a measurement for the likelihood of a new random variable to be incorrectly labeled, if it were to be randomly labeled according to the distribution of the labels. The higher the Gini impurity, the less likely successful labeling of a new variable is.
-The calculation of the Gini Impurity is shown in the lecture, as well as its implementation. For our example_data this value is at 0.71875, rounded to 0.719 as shown in the decision tree above. We will now find out how to use this calculation for our purposes.
-**3.5 Cost Function / 3.6. Picking a Threshold / 3.7 Determining the Root node**
-The goal now is to calculate the impurity depending on which feature of the example_data is chosen as the root node of the decision tree. Therefore a cost function is implemented considering one feature (e.g. 'object_milk') and a threshold for false data.
-This cost function is applied to every feature and sorted ascending with their respective cost value. Determining the root node is essential for optimizing the model. A feature with especially high influence in the model is considered to gain a lot of information, which makes it reasonable to put closer to the root of the decision tree.
-**4. Additional Machine Learning Theory**
-Cross-validation is a technique of training a model, where training-set and testing-set are interchanged a couple of times, to potentially exclude valleys of falsely learned influence of features.
-Confusion Matrices illustrate how well the model labels the data correctly and incorrectly, given any new data. The given table takes the diagnosis of a disease as an example.
-Accuracy, Precision, and Recall are measurements of the quality of a model, just like confusion matrices.
-**5. Train the Next Action Classifier**
-First, the prepared_narratives that were created earlier in this lecture, are split between train_set and test_set. Then the train_set is split into the features and labels, where the features contain the previous and parent actions for each action, and the labels contain the data about which action comes after another (with the //next_// prefix).
-Since all the functionality is already implemented, the test_set can be prepared as the train_set has already been.
-<code python>
-#TODO Split the test set into features and labels. Their variables should have the prefix test_set
-test_set_features, test_set_features_cols, test_set_labels, test_set_labels_cols \
-    = split_data_in_features_and_labels(test_set)
-test_set_labels.head()
-</code>
-The parameters that we speak of are within the two //range// constructions, namely the 9 for the max_depth and 21 for max_leaf_notes. Finding parameters that train the model to an F1 value above 0.9 can be done programmatically by increasing both values, since more depth and nodes result in a more precise model. Taking a maximum depth of 14 and 26 leaf nodes are the lowest values that reach a score of just above 0.9.
-<code python>
-parameters = {'max_depth':range(1,14,1), 'max_leaf_nodes': range(2,26,1)}
-</code>
-Now if the tree model is exported, a .dot file should be generated in the //data// directory, where this exercise is downloaded to. By executing the shell command below the export, a PNG image is generated.
-<code bash>
-# cd to the data directory, then execute this
-dot -Tpng tree.dot -o tree.png
-</code>
-Open the PNG file to see the decision tree of the model we just trained. With the parameters used above, the generated decision tree looks like this.
-{{ :ease:tree.png |}}
-Zooming in to the root node we can see the fist decisions made. The root contains the decision, that gains the most information. First the //type// of the current action is determined, then the higher-order //parent// action. Both information finally lead to a concluded //next// action in one of the leaf nodes.
-{{ :ease:tree_root.png |}}
-Depending on the highest entry in the //value// list in a node, the most likely //next// action is determined, which is //NoAction// in the root node, since this is the best prediction so far, without having made any decisions.
-.) Say, the type of action we currently try to predict something for is //MovingToLocation// (value higher than 0.5). From the root node we would go down the right branch (false), since //type_MovingToLocation %%<=%% 0.5// is false.\\
-a.) The next decision to be done is based on whether the //parent// is //MovingToOperate// and let's say it is **not** (value **below** 0.5). Also recognize, that the //class// entry on the node changed to //LookingForSomething//, since this is the most likely //next// action so far. When //parent_MovingToOperate %%<=%% 0.5// evaluates to true the branching goes on to the left, finally hitting a leaf node.\\
-a.) The model predicts, when the type of action is //MovingToLocation// and its parent is **not** //MovingToOperate//, the most likely //next// action to follow will be //LookingForSomething// since this is the only entry in the //value// list.
-b.) In a different scenario the //parent_MovingToOperate// is **above** 0.5, meaning the decision would evaluate false. We now hit a different leaf node to the right.\\
-b.) Even though the //value// list has multiple positive entries, the highest value is //ClosingADrawer//, which would be the prediction in a scenario where //type// is //MovingToLocation// and //parent// is //MovingToOperate//. You can see that the prediction is not very accurate, since there is at least one other high entry in the //value// list, representing the //OpeningADrawer// action.
-If you are interested in specific entries in the //value// list, there will be an illustration in the next chapter, where the values are represented in a confusion matrix. The labels of said confusion matrix are in the same order as the //value// list, such that you can investigate what other classes could have been predicted aside from the highest value.
-** 6. Evaluate the Next Action Classifier **
-Now it comes to evaluating what the tree model is capable of. The purpose of this model is to predict which action is the most likely to happen, depending on the previously performed action. Simply execute the code blocks to see the outcome.
-The first block shows a table that gives information about precision, recall, F1-Score, and support of the model. Read more about these terms further above in the lesson.
-The second code block is much more interesting. It generates a confusion matrix, showing for each action how often it was predicted successfully. In an optimal model, this matrix would only show entries on a diagonal line from top-left to bottom-right. If the confusion matrix is visualized without labels at the left and bottom side, check the code in the beginning and compare it with the solutions provided in this tutorial. Having the //NEXT// column removed is the most probable mistake.
-{{ :ease:generated_cof_matrix.png |}}
-Along the rows we see the true classes, or what is expected to be predicted. In the columns are the classes predicted from our decision-tree model. Notice, that the most prominent and also correct prediction is //NoNext//, whereas the other classes line up pretty well along the optimal diagonal line. Some predictions are either falsely classified, or defaulted to //NoNext//.
-There seems to be some confusion especially about the //AcquireGraspOfSomething// and the //MovingToLocation//. Print out the narratives for //MovingToLocation// and compare it with the other table.
-<code python>
-other_narratives = narratives[(narratives.next == 'MovingToLocation')]
-other_narratives[[header_names.PARENT, header_names.PREVIOUS, header_names.TYPE]]
-</code>
-Apparently both actions seem to have mostly the same parent and previous actions, as well as its type. Wanting to do a //PickingUpAction// and just being done with //LookingForSomething//, which is a //VisualPerception// can mean that the robot now either will do //MovingToLocation// to relocate itself, in order to better perceive the object, or already perceived the object in the previous action and can now //AcquireGraspOfSomething//.