create account

The best way to analyze a huge amount of corporate data - Decision tree by krishtopa

View this thread on: hive.blogpeakd.comecency.com
· @krishtopa ·
$76.18
The best way to analyze a huge amount of corporate data - Decision tree
![](https://i.imgur.com/5DrMVlu.jpg)

*The rapid development of information technology and progress in methods of data collection, storage, and processing has allowed many organizations collect massive amounts of data that must be analyzed. The volume of data is so large that experts’ capacity is not enough- it raised a demand on automatic data analysis techniques.*

*Decision tree - one of the major and most popular methods in the field of analysis and decision-making.*

-------
# What is a decision tree

**Decision tree** - it is a way of representing rules in a hierarchical, coherent structure, where each object corresponds to a single node, which gives a solution.

*Under the rule I mean a logical structure, presented in the form of "if ... then ...".*

![](https://i.imgur.com/KdEWtSN.jpg)



***There are a lot of ways of using decision trees, but the problems that can be solved by this machine can be combined into the following three classes:***

- **Information Description:** Decision trees allow you to store information about the data in a compact form - we can store a decision tree that contains an exact description of the objects.

- **Classification:** Decision trees brilliantly cope with classification tasks - assigning objects to one of the previously known classes. 

- **Regression:** If the target variable has got continuous values, decision trees allow us to establish the dependence of the target variable from independent variables. For example, numerical prediction problems are related to this class.

# How to build a decision tree?

Suppose we are given a training set T, containing objects each of which is characterized by m attributes, and one of them points that the object belongs to a particular class.
{C1, C2, ... Ck} - classes (label  of class value), then there are three situations:

- set T contains one or more examples of the same class Ck. Then the decision tree for T – is a leaf that defines the class Ck;
- set T does not contain a single example, it is an empty set. Then it is a leaf again, and a class, associated with a leaf, is selected from the plurality of another non-T set, from the set associated with the parent;
- set T contains examples belonging to different classes. In this case, you should divide the T set into some subsets. For this purpose one of the features having two or more distinct values O1, O2, ... On is selected. T is divided into subsets T1, T2, ... Tn, where each subset Ti contains all examples having Oi value for the selected characteristic. This procedure will continue recursively until the final set consists of the examples related to the same class.

![](https://i.imgur.com/WbyxlGX.jpg)

# Stages of decision tree constructing

**Division rule**

To build the tree on each internal node it is necessary to find such a condition, which would divide the set associated with this node in the subset. One of the attributes must be selected for this check. A general rule for selecting an attribute can be summarized as follows: The selected attribute should divide the set so that the result obtained in the subset would consist of objects that belong to the same class, or would be as close to it as possible, it means that the number of objects of other classes in each of these sets would be as small as possible.

**Stop rule**

The use of statistical methods to assess the feasibility of further decomposition, the so-called  prepruning. Ultimately prepruning of a process is attractive in terms of training time economy, but it is appropriate to make one important caveat: this approach builds less accurate classification models.

It limits the depth of the tree and stops further construction if a division leads to a creation of a tree with a depth greater than a predetermined value.

**Pruning rule** 

Very often the algorithms for constructing decision trees provide complex trees that are "filled with the data" have a lot of nodes and branches. Such "branchy " trees are very difficult to understand.
To solve this problem pruning is often used.

A precision of recognition of the decision tree is the ratio of correctly classified objects during training to the total number of objects from the training set, and a mistake - the number of misclassified objects. Let’s assume that we know the tree error estimation method, branches, and leaves. Then we can use the following simple rule:
- build a tree;
- prune or replace those branches that do not lead to an increase of error.

In contrast to the build process, pruning of branches is done from up from the bottom, moving through leaves in a tree, marking nodes as leaves or replacing them with subtree.

# The advantages of using decision trees

- quick learning process;
- intuitive classification model;
- high accuracy of the forecast;
- construction of non-parametric models.

![](https://i.imgur.com/Y0VIZk0.png)

---------- 
*In conclusion, I want to say that decision trees are a wonderful tool for decision support systems and data mining.*

*The structure of many packages for data mining includes methods of construction of decision trees. In areas where the cost of failure is high, they serve as an excellent tool for analyst or supervisor.*

***Decision trees are successfully used to solve practical problems in the following areas:***

- *Banking. The credit rating of bank customers when issuing loans.*
- *Industry. Quality control, non-destructive testing, etc.*
- *Medicine. Diagnosis of various diseases.*

References: [1](https://en.wikipedia.org/wiki/Decision_tree)


*[Follow me](https://steemit.com/@krishtopa), to learn more about **popular science**, **math** and **technologies***

**With Love,
Kate**

Image credit: [1](http://mir-animasii.ru/oboi/priroda/derevja_vesnoj/53-1-0-4087), [2](http://premiereflooring.com/pages/8/decision-tree-analysis-finance), [3](https://madhureshkumar.wordpress.com/tag/decision-tree/), [4](http://www.sfs.uni-tuebingen.de/~vhenrich/ss13/java/homework/hw7/decisionTrees.html)
πŸ‘  , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , and 127 others
properties (23)
authorkrishtopa
permlinkthe-best-way-to-analyze-a-huge-amount-of-corporate-data-decision-tree
categorypopularscience
json_metadata{"tags":["popularscience","science","it","education","technology"],"image":["https://i.imgur.com/5DrMVlu.jpg","https://i.imgur.com/KdEWtSN.jpg","https://i.imgur.com/WbyxlGX.jpg","https://i.imgur.com/Y0VIZk0.png"],"links":["https://en.wikipedia.org/wiki/Decision_tree","https://steemit.com/@krishtopa","http://mir-animasii.ru/oboi/priroda/derevja_vesnoj/53-1-0-4087","http://premiereflooring.com/pages/8/decision-tree-analysis-finance","https://madhureshkumar.wordpress.com/tag/decision-tree/","http://www.sfs.uni-tuebingen.de/~vhenrich/ss13/java/homework/hw7/decisionTrees.html"]}
created2016-10-19 23:27:57
last_update2016-10-19 23:27:57
depth0
children9
last_payout2016-11-20 03:54:48
cashout_time1969-12-31 23:59:59
total_payout_value69.743 HBD
curator_payout_value6.440 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length6,003
author_reputation48,350,480,258,009
root_title"The best way to analyze a huge amount of corporate data - Decision tree"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id1,577,206
net_rshares71,113,759,233,270
author_curate_reward""
vote details (191)
@buddy67 ·
thank you for the info - I guess I'm not an engineer of sorts....  :)
upvoted and following you.
properties (22)
authorbuddy67
permlinkre-krishtopa-the-best-way-to-analyze-a-huge-amount-of-corporate-data-decision-tree-20171108t181254735z
categorypopularscience
json_metadata{"tags":["popularscience"],"app":"steemit/0.1"}
created2017-11-08 18:13:09
last_update2017-11-08 18:13:09
depth1
children1
last_payout2017-11-15 18:13:09
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length96
author_reputation1,522,668,089,635
root_title"The best way to analyze a huge amount of corporate data - Decision tree"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id19,802,303
net_rshares0
@krishtopa ·
thank you)
properties (22)
authorkrishtopa
permlinkre-buddy67-re-krishtopa-the-best-way-to-analyze-a-huge-amount-of-corporate-data-decision-tree-20171209t175002624z
categorypopularscience
json_metadata{"tags":["popularscience"],"app":"steemit/0.1"}
created2017-12-09 17:50:03
last_update2017-12-09 17:50:03
depth2
children0
last_payout2017-12-16 17:50:03
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length10
author_reputation48,350,480,258,009
root_title"The best way to analyze a huge amount of corporate data - Decision tree"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id22,910,609
net_rshares0
@justtryme90 ·
Nice post Kate :)
πŸ‘  ,
properties (23)
authorjusttryme90
permlinkre-krishtopa-the-best-way-to-analyze-a-huge-amount-of-corporate-data-decision-tree-20161020t030208729z
categorypopularscience
json_metadata{"tags":["popularscience"]}
created2016-10-20 03:02:09
last_update2016-10-20 03:02:09
depth1
children1
last_payout2016-11-20 03:54:48
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length17
author_reputation140,118,479,939,905
root_title"The best way to analyze a huge amount of corporate data - Decision tree"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id1,578,215
net_rshares65,375,373
author_curate_reward""
vote details (2)
@krishtopa ·
thanks for the feedback. Appreciate it
πŸ‘  
properties (23)
authorkrishtopa
permlinkre-justtryme90-re-krishtopa-the-best-way-to-analyze-a-huge-amount-of-corporate-data-decision-tree-20161021t123149973z
categorypopularscience
json_metadata{"tags":["popularscience"]}
created2016-10-21 12:31:57
last_update2016-10-21 12:31:57
depth2
children0
last_payout2016-11-20 03:54:48
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length38
author_reputation48,350,480,258,009
root_title"The best way to analyze a huge amount of corporate data - Decision tree"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id1,588,081
net_rshares0
author_curate_reward""
vote details (1)
@lemouth ·
I have already used several times (boosted) decision trees in my work, trying to use some observable quantities to be able to extract a signal from some (simulated) data. On my field, we can use the BDT in order to determine automatically  how a signal would behave with respect to those observables in contrast to the background and use that information to extract it.
πŸ‘  
properties (23)
authorlemouth
permlinkre-krishtopa-the-best-way-to-analyze-a-huge-amount-of-corporate-data-decision-tree-20161020t064716441z
categorypopularscience
json_metadata{"tags":["popularscience"]}
created2016-10-20 06:47:18
last_update2016-10-20 06:47:18
depth1
children2
last_payout2016-11-20 03:54:48
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length369
author_reputation338,011,164,701,274
root_title"The best way to analyze a huge amount of corporate data - Decision tree"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id1,578,961
net_rshares0
author_curate_reward""
vote details (1)
@krishtopa ·
wow, sounds interesting.  Probably you could tell about it in more details in one of your articles
πŸ‘  
properties (23)
authorkrishtopa
permlinkre-lemouth-re-krishtopa-the-best-way-to-analyze-a-huge-amount-of-corporate-data-decision-tree-20161021t123507179z
categorypopularscience
json_metadata{"tags":["popularscience"]}
created2016-10-21 12:35:15
last_update2016-10-21 12:35:15
depth2
children1
last_payout2016-11-20 03:54:48
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length98
author_reputation48,350,480,258,009
root_title"The best way to analyze a huge amount of corporate data - Decision tree"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id1,588,105
net_rshares0
author_curate_reward""
vote details (1)
@lemouth ·
I will think about it and add it to my list. I am however pretty busy those days and have not much time to write... patience may be required.... especially as i would like to continue writing my lecture notes on quantum mechanics ...
πŸ‘  
properties (23)
authorlemouth
permlinkre-krishtopa-re-lemouth-re-krishtopa-the-best-way-to-analyze-a-huge-amount-of-corporate-data-decision-tree-20161021t213947272z
categorypopularscience
json_metadata{"tags":["popularscience"]}
created2016-10-21 21:39:51
last_update2016-10-21 21:39:51
depth3
children0
last_payout2016-11-20 03:54:48
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length233
author_reputation338,011,164,701,274
root_title"The best way to analyze a huge amount of corporate data - Decision tree"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id1,591,515
net_rshares0
author_curate_reward""
vote details (1)
@lydon.sipe ·
This is incredibly helpful. Thank you for posting. The pruning process was especially enlightening for me. 
πŸ‘  
properties (23)
authorlydon.sipe
permlinkre-krishtopa-the-best-way-to-analyze-a-huge-amount-of-corporate-data-decision-tree-20161019t235546232z
categorypopularscience
json_metadata{"tags":"popularscience"}
created2016-10-20 03:55:48
last_update2016-10-20 03:55:48
depth1
children1
last_payout2016-11-20 03:54:48
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length107
author_reputation61,285,932,034,541
root_title"The best way to analyze a huge amount of corporate data - Decision tree"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id1,578,406
net_rshares0
author_curate_reward""
vote details (1)
@krishtopa ·
I'm glad you found this interesting. Thanks
πŸ‘  
properties (23)
authorkrishtopa
permlinkre-lydonsipe-re-krishtopa-the-best-way-to-analyze-a-huge-amount-of-corporate-data-decision-tree-20161021t123222975z
categorypopularscience
json_metadata{"tags":["popularscience"]}
created2016-10-21 12:32:30
last_update2016-10-21 12:32:30
depth2
children0
last_payout2016-11-20 03:54:48
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length43
author_reputation48,350,480,258,009
root_title"The best way to analyze a huge amount of corporate data - Decision tree"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id1,588,085
net_rshares0
author_curate_reward""
vote details (1)