ID3 Manual

User Manual:

Open the PDF directly: View PDF .
Page Count: 14

INSTITUTO POLITÉCNICO NACIONAL

ESCUELA SUPERIOR DE CÓMPUTO

Data Mining

Practice no. 3

ID3 Algorithm Implementation

Student: López Ayala Eric Alejandro

Professor: Hernández Contreras Euler

June 18th, 2018

 
Content 
ABSTRACT .......................................................................................................................................................... 3 
INTRODUCTION ................................................................................................................................................. 3 
THEORETICAL FRAMEWORK ......................................................................................................................... 4 
Data mining .................................................................................................................................................... 4 
Decision Trees ............................................................................................................................................... 5 
ID3 Algorithm ................................................................................................................................................. 5 
Algorithm Pseudocode ............................................................................................................................ 6 
Modified ID3 Algorithm ................................................................................................................................ 7 
Algorithm Pseudocode ............................................................................................................................ 7 
DATABASE ENTITY RELATIONAL MODEL ................................................................................................... 8 
DATABASE RELATIONAL MODEL .................................................................................................................. 9 
USER MANUAL ............................................................................................................................................... 10 
REFERENCES .................................................................................................................................................. 14 
 
   

ABSTRACT

In this document, ID3 algorithm of decision tree is implemented in order to help us to make

decisions over a real database implementation. In this work we will make use of a music

application database to extract and process the given information.

Through a web application the user will decide which attributes and class wants to analyze and

generate its respective ID3 decision tree.

A theoretical content will be review, to help us to understand how this algorithm works and that

it is possible to apply it not only for specific purposes but for a much wider real-life application

that require business intelligence and decision making.

INTRODUCTION

In today’s world, the organizations strive for neck to neck competition. To exist in the market,

every organization must take correct and efficient decisions. So, decision making activity is the

most important activity for the businessmen.

They must analyze all the existing data and conditions for decision making. They must extract new

information. Only because of this new information the decision makers can take competitive

decisions.

The technique to extract new knowledge from the existing information is known as data mining,

there are different techniques to mine the data from databases. One important technique is

classification and segmentation, under which decision trees are created to predict the data from

the existing one. Decision trees are created to predict the data from the existing one. Decision

trees are created with the help of different algorithms. One such algorithm, namely, ID3 is used

here.

Spotify is music streaming service which give us access to hundreds of songs and vary content of

music artist from all over the world.

In addition to basic functions such as listening to music, Either way, you can:

• Chooser what you want to Search or Explore new music.

• Receive recommendations in customizable functions.

• Create your own music collections.

• See what are listening your friends or look for releases.

• Add your own radio list, in which there will be music streaming 24/7

Spotify is available in its website, which is compatible with the latest web browsers such as Mozilla

Firefox, Google Chrome, and Safari. Soon Spotify will be available for your tablet and mobile

devices.

THEORETICAL FRAMEWORK

Data mining

Data mining can be as the extraction of hidden predictive information from large databases, is a

powerful new technology with great potential to help companies focus on the most important

information in their data warehouses.

Decision Trees

A decision tree is a powerful and popular tool for classification an d prediction. Decision trees

represent rules, a decision tree is a predictive model that, as its name implies, can be viewed as a

tree. Specifically, each branch of the tree is a classification question and the leaves of the tree are

partitions of the dataset with their classification.

Decision tree is a classifier in the form of a tree structure, where each node is either:

• A leaf node indicates the value of the target: attribute(class) of examples.

• A decision node, specifies some test to be carried out on a single attribute-value, with one

branch and sub-tree for each possible outcome to test.

Decision making tree algorithms transform raw information into ruled based decision trees. The

ID3 algorithm is one of the most common algorithms to make decision trees.

ID3 Algorithm

Most algorithms that have been developed for learning decision trees are variations on a core

algorithm that employs a top-down, greedy search through the space of possible decision trees.

Decision tree programs construct a decision tree from a set of training cases.

ID3 decision tree algorithm creates a hierarchical structure of classification rules “If-Then” looking

for a tree. Much work has been done in the field of decision tree algorithm.

J. Ross Quinlan originally developed ID3 algorithm at the University of Sydney. He first presented

ID3 in 1975 in a book “Machine Learning”, vol 1, no. 1. ID3 is based on the concept Learning System

algorithm. ID3 is an acronym of Interactive Dichotomizer and it is a precursor of C45 algorithm

which have applications for both Machine Learning as well as Processing of the Natural Language.

To understand how this algorithm works, first of all, it is necessary to understand the following

concepts:

• Node: Name or identifier of an attribute.

• Branch: Possible values of an attribute associated to a node.

• Leaf: Set of named and classified examples of a class.

• Attribute: It is a factor that influence the classification or decision.

• Class: Possible values of a solution.

• Example: It is a combination of the given attributes.

• Entropy: It is the measure of randomness that a system has. That is, in a certain

situation, the probability of each of the possible outcomes occurring.

It is defined by the following formula:

Where pp is the proportion of possible examples according to their target

classification.

• Information Gain: Measures how well a given attribute separates the training examples

according to their target classification.

It is defined by the following formula:

Algorithm Pseudocode

-Create a root node for the tree.

-If all examples are positive, return the single-node tree root, with the label = +.

-If all examples are negative, return the single-node tree root, with the label = -.

-If the number of predicting attributes is empty, then return the single node tree Root,

with label = most common value of the target attribute.

-Otherwise Begin:

- A = The attribute that best classifies examples.

- Decision tree attribute for root = A.

- For each possible value, Vp of A.

- Add a new Tree branch below root, corresponding to the test A = vi

- Let examples(Vi), be the subset of examples that have value Vi for A.

- If examples (Vi) is empty, then below this new branch add a leaf

node with label = most common target value in the examples.

- Else below this branch add the subtree ID3 (Examples(Vi)), Target Attribute,

Attributes-{A})

- End

- Return root

Modified ID3 Algorithm

In ID3 Algorithm, every attribute has the binary valued domain (i.e positive and negative). But it is

also possible that we have some specific attributes that have multiple valued domain (i.e high,

medium, low). For such attributes the algorithm can be modified as bellow.

Algorithm Pseudocode

-Create a root node for the tree.

-If all examples are of the same value say high, medium, low. Return the single-node tree

root, with the label = most common value of the target attribute.

-If the number of predicting attributes is empty, then return the single node tree Root,

with label = most common value of the target attribute.

-Otherwise Begin:

- A = The attribute that best classifies examples.

- Decision tree attribute for root = A.

- For each possible value, Vp of A.

- Add a new Tree branch below root, corresponding to the test A = ai

- Let examples(ai), be the subset of examples that have value ai for A.

- If examples (ai) is empty, then below this new branch add a leaf

node with label = most common target value in the examples.

- Else below this branch add the subtree ID3 (Examples(ai)), Target Attribute,

Attributes-{A})

- End

- Return root

For implementing the modified ID3 algorithm, the information Gain and Entropy is calculated as

follows:

DATABASE ENTITY RELATIONAL MODEL

DATABASE RELATIONAL MODEL

The following image shows our database relational model:

The model includes the entities, the attributes of each one and finally their relations

between each other.

USER MANUAL

At first our user has to login into the system with Admin permissions, for this he must input an

email and password matching to an admin role registered in the database.

Once the user has successfully logged in as an Admin, the following view will be display, then the

user must do click in the “ID3 Generator” tab.

Then it must select a database and then one of its tables, for the ID3 generator “spotify” database

must be selected, once the database and table had been selected the table metadata will be display

as follows:

Once the table has appeared the user has the option to add the attribute for the system to analyze

it, then that attribute will be added to the options selected and it will appear as follows:

Then the user must choose which of the attributes added will be our class (decision). Once the user

has clicked one of the radio buttons, summit button will be available to the user, so now we can

generate the ID3 tree decision.

The user must press the summit button in order to send the information to the server and generate

the tree. The once the server has finished processing the information, the following image will be

displayed, along with the graphical representation of the tree.

Finally, the user has the option to generate a more detailed report of the ID3 generation process,

for this it has to press the “Generate Report” button, all generated data during ID3 creation will be

send from the server and it will be displayed on a log.

REFERENCES

Kurt Thearling, “An Introduction to Data Mining”, a paper published in “Data Mining-Vol-1”, ICFAI

[2002].

[2] http://www.bandmservices.com/DecisionTrees/Decision _trees.htm

[3] Quinlan, J.R. [1986], Introduction of decision trees, “Machine Learning”.

[4] H.Hamilton. E. Gurak, L. Findlater W. Olive, “Overview of Decision Trees” as published on the

website http://www.cs.uregina.cd/~dbd/cs831/notes/ml/dtrees/4_ dtrees1.html

[5] http://en.wikipedia.org/wiki.ID3_algorithm#algorithm

[6] Nagabhushana, S., [2006] “Data Warehousing-OLAP and Data Mining”, New Age International

Publishers.

[7] Chaudhari, S., and Dayal, U. [1997] “An Overview of Data Warehousing and OLAP

Technology”,SIGMOD Record, Vol.26, No. 1, March 1997. [8] Breiman,L., Friedman,J.H., Olsen,R.A.,

and Stone,C.J.(1984) “Classification and Regression Trees”,Belmont,C.A. : Wadsworth Statistical

Press.

ID3 Manual

Navigation menu

Versions of this User Manual:

Views

Navigation