While preparing for my matriculation at Northwestern, I decided to get my hands dirty in some more data analytics. Upon delving deeper into the myriad of techniques, I soon stumbled upon a not-yet industry standard of Data Science/Analytic workflow.
I decided to learn Python and was recently admitted into Northwestern's Predictive Analytics Graduate program. So I figured, meh! Why not learn both at the same time? So my first foray into predictive analytics had to do with a supervised classification model called a decision tree. What is a decision tree? In essence it's a predictive algorithm that just so happens to be (when drawn out / visualized), well... a tree. I first encountered decision trees in the book published by O'Reilly called, Data Science for Business. I used most of Joe McCarthy's primer as the guide to my programming exercises and modified it a bit to better suit my nuances in programming style. It was the first predictive model they described and one of the more interesting ones in my opinion because of its relative simplicity. They cited using a data set of mushrooms samples, courtesy of UCI. The aim of the tree was to predict whether any additional samples based on its attributes, was either poisonous or edible. Which leads me to the question; how did they manage to find out whether the samples from the original data set was poisonous or edible? The sample data set can be found here: Mushrooms.
Attention to detail? Nah, attention to the whole picture.