## Introduction

ID3 is a classification algorithm which for a given set of attributes and class labels, generates the model/decision tree that categorizes a given input to a specific class label * Ck* [C1, C2, …, Ck]. The algorithm follows a greedy approach by selecting a best attribute that yields maximum information gain (

*) or minimum entropy (*

**IG****H**). The algorithm then splits the data-set (

*recursively upon other unused attributes until it reaches the stop criteria (no further attributes to split). The non-terminal nodes in the decision tree represents the selected attribute upon which the split occurs and the terminal nodes represent the class labels.*

**S)**## Implementation

In this blog you can find step by step implementation of ID3 algorithm. The functions used in the implementation is also discussed.

#### Software Used

- Python 2.7
- Spyder IDE

Major steps involved in the implementation are,

- Entropy Calculation
- Attribute Selection
- Split the data-set
- Build Decision Tree

#### Step 1 : Entropy Calculation

The method takes the given data-set as an argument and performs Entropy calculation over the given data-set.

Where,

*H* –> entropy,

*S* –> data-set,

*X* –> set of Class labels

*p(x)* –> no of elements in Class x to no of elements in entire data-set S

Information gain is the measure of difference in entropy before and after the data-set split.

Where,

*H(S)* –> entropy

*T* –> subset created from data-set split

*p(t)* –> no of elements in Class t to no of elements in entire data-set *S*

*H(t)* –> entropy of subset *t*

#Entropy Calculation methoddef calc_entropy(data):#Calculate the length of the data-set entries = len(data) labels = {} #Read the class labels from the data-set file into the dict object "labels" for rec in data: label = rec[-1] if label not in labels.keys(): labels[label] = 0 labels[label] += 1 #entropy variable is initialized to zero entropy = 0.0 #For every class label (x) calculate the probability p(x) for key in labels: prob = float(labels[key])/entries #Entropy formula calculation entropy -= prob * log(prob,2) #print "Entropy -- ",entropy #Return the entropy of the data-set return entropy

#### Step 2: Attribute Selection

We have to determine the attribute based on which the data-set *(S)* is to be split. The attribute-selection method returns the best attribute with the maximum information gain *IG(S)* which is used in building the decision tree.

#Function to determine the best attribute for the split criteriadef attribute_selection(data):#get the number of features available in the given data-set features = len(data[0]) - 1 #Fun call to calculate the base entropy (entropy of the entire data-set) baseEntropy = calc_entropy(data) #initialize the info-gain variable to zero max_InfoGain = 0.0; bestAttr = -1 #iterate through the features identified for i in range(features): #store the values of the features in a variable AttrList = [rec[i] for rec in data] #get the unique values from the feature values uniqueVals = set(AttrList) #initializing the entropy and the attribute entropy to zero newEntropy = 0.0 attrEntropy = 0.0 #iterate through the list of unique values and perform split for value in uniqueVals: #function call to split the data-set newData = dataset_split(data, i, value) #probability calculation prob = len(newData)/float(len(data)) #entropy calculation for the attributes newEntropy = prob * calc_entropy(newData) attrEntropy += newEntropy #calculation of Information Gain infoGain = baseEntropy - attrEntropy #identify the attribute with max info-gain if (infoGain > max_InfoGain): max_InfoGain = infoGain bestAttr = i #return the attribute identified return bestAttr

#### Step 3: Split the data-set

In this step, considering the attribute selected from the previous step, the axis or the arc (attribute index) and the attribute value as input a split is done on the data-set. This method is recursively called from the <> step for every attribute present in the given data-set in the order of decreasing information gain or until the algorithm reaches the stop criteria.

#Function to split the data-set based on the attribute that has maximum information gain #input values:data-set, attribute index and attribute-valuedef dataset_split(data, arc, val):#declare a list variable to store the newly split data-set newData = [] #iterate through every record in the data-set and split the data-set for rec in data: if rec[arc] == val: reducedSet = list(rec[:arc]) reducedSet.extend(rec[arc+1:]) newData.append(reducedSet) #return the new list that has the data-set that is split on the selected attribute return newData

#### Step 4: Build Decision Tree

In this step we will integrate the above process flow into a single function and generate the rules in the decision tree.

#Function to build the decision treedef decision_tree(data, labels):#list variable to store the class-labels (terminal nodes of decision tree) classList = [rec[-1] for rec in data] if classList.count(classList[0]) == len(classList): return classList[0] #functional call to identify the attribute for split maxGainNode = attribute_selection(data) #variable to store the class label value treeLabel = labels[maxGainNode] #dict object to represent the nodes in the decision tree theTree = {treeLabel:{}} del(labels[maxGainNode]) #get the unique values of the attribute identified nodeValues = [rec[maxGainNode] for rec in data] uniqueVals = set(nodeValues) for value in uniqueVals: subLabels = labels[:] #update the non-terminal node values of the decision tree theTree[treeLabel][value] = decision_tree(dataset_split(data, maxGainNode, value),subLabels) #return the decision tree (dict object) return theTree

Once the tree is built, it can be used in predicting class labels in a classification task. Also the rules in the decision tree can be derived and visualized. Decision trees can be visualized using libraries like Graphviz in python.