• About Us
  • Terms of Service
  • Privacy Policy
  • Contact Us
Go to...
    Teskostudio Teskostudio
    • Data Mining
    • Data Recovery
    • Managed IT
    Go to...

      Understanding the Data Mining Decision Tree

      teskuser
      February 12, 2017
      Data Mining
      0 Comments
      Views : 7413
      0

      Data mining tree is a classification model that is structured like a tree. It comprises of leaf nodes, branches and root node. The leaf node represents a class label while a branch node indicates a blue book for an attribute. The root is the highest node and it matches with the best predictor.

      Data mining tree is mainly used to divide datasets into smaller and more manageable subsets. With this, you can handle large data whether categorical or numerical data. A decision tree, in data mining, can be described as the use of both computer and mathematical techniques to describe, categorize and generalize a set of data.

      A decision tree in data mining is used to describe data though at times it can be used in decision making. There are two main types of decision trees used in data mining. They are classification trees and regression trees.

      • Classification trees – Classification trees are generally described as the analysis where the predictable result is the same class as the data.
      • Regression Trees- They are generally described as the analysis in which the most likely outcome is a number.

      Decision Tree Algorithm

      There are different algorithms that are used data mining trees. They include ID3, C4.5, CART, k-means, EM among others. Here we will mainly focus on ID3 and C4.5.

      ID3 Algorithm

      Almost all data mining algorithm can be traced to the Iterative Dichotomiser popularly known as the ID3. It was developed in the year 1980 by a computer science researcher named J. Ross Quinlan. The ID3 is constructed in a top-down manner and there is no backtracking. To construct a decision tree, it mainly uses information gain and entropy.

      • Entropy
        A decision tree is made from top to bottom and it mainly involves sub-dividing data into subsets of the same values. Entropy is used in ID3 algorithm to determine the similarity of a given sample. A sample can be entropy 1 or entropy zero. Entropy one is where the sample is equally divided while entropy zero is where the sample is completely similar.
      • Information Gain
        The main aim of constructing a decision tree is to get the most information from a given data. Information gain uses the results from entropy to further split the datasets with the sole aim of finding the most valuable information.

      C4.5 algorithm

      C4.5 is a successor of ID3 and it was also developed by J. Ross Quinlan. It also uses the top-down approach and has no backtracking. The most notable difference between ID3 and C4.5 is that the later uses data that is already classified.

      The C4.5 algorithm uses information gain and it has a single-pass pruning process which is different from others. It can also work with both complete and incomplete sets of data and it has a way of dealing with incomplete data.

      Tree pruning

      As normal trees are pruned, decision trees should also be pruned. Pruning helps remove anomalies and in return making it easier to use decision trees.  When you prune a decision tree, it will be smaller and less complicated.

      There are two types of pruning namely: pre-pruning and post-pruning. As the name suggests, post-pruning is removing of details from an already existing tree while pre-pruning is pruning the tree while it is under construction.

      Pros of data mining trees

      There are so many benefits of using data mining trees.  Below are a few:

      • Decision trees are relatively easy to use. This makes it easier for data preparation and decision making.
      • Decision trees from data mining can be used to screen variables or select features. The top nodes are often the most important features in the whole tree. This makes it easier for predictive analysis.
      Share :
      • Facebook
      • Twitter
      • Google+
      • Pinterest
      • Linkedin
      • Email
      Next article
      Types of Managed IT services
      Previous article
      The 5 Best Data Recovery Service Providers

      teskuser

      Related Posts

      • Data Mining January 12, 2017

        What You Need to Know

        What You Need to Know About Data Mining Algorithms
        Data Mining January 7, 2017

        What is Data Mining?

        What is Data Mining?
        Data Mining September 15, 2016

        What Are The Benefits Of

        What Are The Benefits Of IT Asset Disposition To Your Business?
      • Data Mining April 11, 2016

        Why You Need to Build

        Why You Need to Build a Better BYOD Policy
      April 2021
      M T W T F S S
      « Jun    
       1234
      567891011
      12131415161718
      19202122232425
      2627282930  

      Hellow

      • Managed Services Vs. Cloud Computing
        Managed Services Vs. Cloud Computing
        June 13, 2017 0

        The terms managed services, and cloud computing usually confuses a lot of people. Also, some managers are running some organizations ...

        Read More
      • What You Need to Know
        March 6, 2017 0
      • Types of Managed IT services
        February 26, 2017 0
      • Understanding the Data Mining Decision
        February 12, 2017 0
      • The 5 Best Data Recovery
        February 10, 2017 0

      Facebook

      Copyright 2017 WorldPlus | Developed By 2codeThemes