create account

Simple Explanation of Machine Learning shown with Excel - Part 1 by theexcelclub

View this thread on: hive.blogpeakd.comecency.com
· @theexcelclub · (edited)
$1.64
Simple Explanation of Machine Learning shown with Excel - Part 1
Every wondered how Machine learning works?  In this 2 part article we are going to explain machine learning with Excel
<h2>Introduction</h2>
The world is full with a lot of data, pictures, music, text, video, spreadsheets.  Machine learning brings the promise of delivering meaning from that data.  (although Excel is not normally used as a tool for ML, in this article we will explain machine learning with Excel)

Human’s learn from experience. if we touch something hot, that burning sense is stored in memory and we quickly learn not to touch it again.  Machine learning is like human learning as it learns from the past.  We feed a computer data, which represents some past experience, and then with the use of different statistical methods, we can make predictions.

Machine learning is all around us, from using Google Search, to diagnosis of skin cancer.  There are few industries yet to feel the effect.

This include the banking and finance sector.  One of the core uses of machine learning in banking has been to combat fraud and comply with AML regulations.  Banks can also use machine learning algorithms to analyse an applicant for credit, be that an individual or a business, and make approvals according to a set of pre-defined parameters. Some of these algorithms simply look at a customers credit score, age and postcode to make a decision in seconds.

To get an idea how machine learning works, we are going to focus on a bank and how they are using machine learning to make approval on loans.  We will dive into some theory and we will set up a simple algorithm to decided if a loan should be approved or rejected.
<h2>Types of Machine Learning</h2>
There are generally 4 types of Machine learning based on their purpose:

<strong>Supervised:</strong>  This is a type of learning where by both inputs and outputs are known. For example, in supervised learning for image recognition a computer might be fed a series of pictures with labels, such as cars, vans and trucks. After feeding the computer with so many images, it should be able to classify an unlabeled image as a car, van or truck.

It is called supervised learning because the process of an algorithm learning from the training data set is like a teacher supervising the learning of a student.

Supervised learning is used for both classification and regression problems

A classification problem is where a label is the output. Is it a car or van? Should you approve the loan or reject the loan?

A Regression problem deals with values and quantities.  Given the size of a house, predict the value.

<strong>Unsupervised Learning:</strong>  Unsupervised learning has unlabeled input data only.  The output is not known, and it is often used as exploratory analysis.   One aim is to classify &amp; cluster data, for example grouping customers by purchasing behaviors.  Another aim is to find association rules.  For example, if you buy product Y you are also likely to buy product Z

<strong>Semi-supervised:</strong> machine learning algorithms that are semi supervised are fed small amounts of labeled data and taught how to classify a large batch of unprocessed data.

<strong>Reinforcement Learning:</strong> When an agent interacts with an environment where some actions give rewards.  For example in a computer game, the player would be the agent.  The agent must navigate a maze, at the end, is the key to the next level.  The key is the reward, the incentive.  However, it might take the agent 2-3 attempts, each time learning from their mistake in the environment, until the finally get out of the maze.

<a href="https://steemmonsters.com?ref=paulag" target="_blank" rel="noopener noreferrer"><img src="https://theexcelclub.com/wp-content/uploads/2018/12/2-5-300x153.png" alt="STEEMMONSTERS" width="600" height="306" /><br/></a>
<h2>What is a Decision Tree?</h2>
A decision tree is a set of questions used to filter information allowing you make a more informed decision. It is used in supervised learning to classify data.

Let’s look at our bank example. For a loan to be approved by a bank, the applicant completes a list of questions and these questions are used to judge if it is safe to give the loan or not.  To keep things at a very simple level, our sample bank asks 3 questions?

1. Do you own a home? Yes or No

2. What is your income bracket? Low, Average, Above average

3. What is your credit score? Below average, average, high

<img src="https://steemitimages.com/640x0/https://cdn.steemitimages.com/DQmZ6nU5eKhKsiesdyC2qRRF2cXyzDT7WcChxkPhQjuB7nS/1.png" alt="explain machine learning with excel" width="640" height="419" /><br/>

This process is the most basic form of decision tree, each question is asked and at the end, the loan is either approved or rejected.
<h2>Machine Learning and Decision Trees</h2>
Looking at our sample banks decision tree, how does the bank know which question, or attribute to start the tree with?  How can the bank ensure that they lend to people who won’t default and do it with the least amount of questions possible?

Using existing data on loan defaults, the bank can use this data as training data to teach a machine how to classify data and come up with a best direction to take with the questions that will minimize the number of loan defaults.

The bank will have existing data from loan application and will be able to tie this into a table showing who defaulted and who didn’t.  Like the sample shown. This existing data is known as training data because the machine will learn from this data.

<img src="https://steemitimages.com/640x0/https://cdn.steemitimages.com/DQmdaEFPhFPUow9PCZyJrZsisDN1kLyyAPcg2G18WvoWc2J/2.png" alt="Explain machine learning with excel" width="472" height="439" /><br/>

Machine learning will use algorithms to establish the best route to take in the decision tree, based on this past experience.
<h2>Entropy and Information Gain</h2>
Entropy is a concept from Information Theory.  It is a measure of randomness of the information being processed.  In general, the more random the event, the more information it will contain. Entropy of 1 would suggest high probability and randomness and lots of information.  The closer is to 0, the less the less randomness and a lower probability.

The formula entropy for Entropy is:

Entropy of the data set(D) = -P1*log2(P1)-P2*log2(P2)…..

Where P= Probability

Looks a little complicated so let’s use it with our bank.

We want to use our training data to predict if an applicant should be approved or rejected for the loan.  The entropy of the data set would therefore be the entropy of the ‘default’ column, as this is the label we want to predict.  If they person is likely to default, then we will not lend the money.

We have 18 observations.  10 default on the loans (yes) and 8 do not default (no).  Based on this we can calculate some probabilities
<h4>Entropy and Information gain Calculations</h4>
Probability of not defaulting = 8/18 = 0.444

Probability of default = 10/18 =0.556

Now we have our probabilities we can plug these into our entropy formula

D=-0.444*log2(0.444) - 0.556*log2(.556) = 0.991

0.991 is rather close to 1.  This suggests there is a lot of randomness in the data. If there is a lot of randomness it would be very hard to decide if the loan application should be approved or rejected.  There is a lot of information in these, such as if they own a home and their credit score. If we narrow our focus to just one piece of information, can we reduce this randomness?

Let’s calculated the entropy for the ‘home owners’ column.

We have 18 observations. 8 of which are homeowners, 10 are not.

Looking at just the home owners, 8 in total, there are 4 defaulter and 4 do not default

Therefore, the probability of a home owner defaulting is 4/8 = 0.5 and the probability of a home owner not defaulting is also 4/8 = 0.5

Entropy Home Owner  =-0.5*log2(0.5) - 0.5*log2(.5) = 1

Now looking at those that do not own homes.  There are 10 in total

The probability of a non home owner defaulting is 6/10 = 0.60 and the probability of a non home owner not defaulting is 4/10 = 0.40

Entropy Non Home Owner  =-0.60*log2(0.60) - 0.40*log2(.40) = 0.971

As there were two possible answers to the home owner question, we now have two entropy values. To get the entropy for the total column, we must combine these entropys proportionally.

Probability of being a home owner * entropy home owner = 8/18 * 1= 0.444

Probability of not being a home owner * entropy of not being a home owner  = 10/18 * 0.971 = 0.539

Entropy = 0.444 + 0.539= 0.984

We can see now that the entropy for the data set has been reduced from 0.991 to 0.984.  Although a small reduction, we can see by reducing the information we have reduced the randomness. By reducing the randomness, we increase the changes of not approving someone that will default.

There is a value attached to this reduction in randomness and this is known as information gain.  Information gain can be calculated by deducting the entropy for the restricted information from the entropy for the data set.

Information gain = 0.991 – 0.984 = 0.007

This information gain calculation comes from Information theory.
<h3>How to Calculated Entropy and Information Gain In Excel (Explain Machine learning with Excel)</h3>
Now that you have be introduced to some of the probability calculations that can be used in algorithms, lets take a look at how these can be calculated in Excel.  Remember, this sample data is extremely simply and so we can easily use Excel for this demonstration.

Using Pivot tables in Excel is a very quick way of calculating probability.  All of the above seems complicated, but when you watch this video you will see how quickly you can calculated entropy and information gain

https://youtu.be/71cL-bap0WE
<h2>To be Continued...........</h2>
In part 2 we will look at iterations and put all of this together to show how machine learning works.

<a href="http://theexcelclub.com/simple-explanation-of-machine-learning-shown-with-excel-part-2/">You can access part 2 here</a>

<strong>Do you have any questions or comments on the above?  If so please do use the comments section below, I would love to hear from you ( and I would love to reward your interaction with cryptocurrency)</strong>

Don't forget, If you care, you will share.  The sharing buttons can be found below the comments section

<section data-id="8694598" data-element_type="section">
<ul>
 <li>Don’t have a Hive wallet or a Steempress Account?
<h4>I would suggest that you sign up directly for your own hive wallet and sign into steempress using your wallet. this way all rewards will be paid directly to your wallet within 7 days. You can use this link to sign up now for your Hive wallet</h4>
<h4>&gt;&gt; <a href="https://hiveonboard.com/create-account?ref=paulag" target="_blank" rel="noreferrer noopener">GET HIVE WALLET NOW</a>&lt;&lt;</h4>
If you sign up using the comments section below you will get a Steempress account. Steempress will hold any rewards you earn until you have a hive wallet.

Have questions?  Please use the Hive powered comments section below and we will do our best to help you.  Alternatively, you can <a href="https://theexcelclub.newzenler.com/support/">contact us with this link</a>.

Like what you see? I do hope that you will share this article across your social profiles</li>
</ul>
</section>
<h4><strong>Sign up for my newsletter – Don’t worry, I wont spam. Just useful Excel and Power BI tips and tricks to your inbox</strong></h4>
<a role="button" href="https://theexcelclub.newzenler.com/f/email-signup">
SIGN UP NOW
</a>
<h3><center>Community Invitation</center></h3>
<center>- <a href="https://peakd.com/c/hive-102332/about">Excel For All</a> -</center></h3>
<center>Decentralized and tokenized</center></h3>
<center><a href="https://peakd.com/c/hive-102332/about">Join today</a></center></h3> <br /><center><hr/><em>Cross posted from my blog with <a href='https://wordpress.org/plugins/steempress/'>SteemPress</a> : https://theexcelclub.com/simple-explanation-of-machine-learning-shown-with-excel-part-1/ </em><hr/></center>          
👍  , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
properties (23)
authortheexcelclub
permlinksimpleexplanationofmachinelearningshownwithexcel-part1-o4ouartkg1
categoryentropyinexcel
json_metadata{"community":"steempress","app":"steempress","image":[""],"tags":["excel","leofinance","stemgeeks","tutorials","coding"],"canonical_url":"https://theexcelclub.com/simple-explanation-of-machine-learning-shown-with-excel-part-1/"}
created2019-01-08 09:40:45
last_update2020-09-14 09:46:18
depth0
children11
last_payout2019-01-15 09:40:45
cashout_time1969-12-31 23:59:59
total_payout_value1.240 HBD
curator_payout_value0.402 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length12,302
author_reputation49,417,421,433,400
root_title"Simple Explanation of Machine Learning shown with Excel - Part 1"
beneficiaries
0.
accountsteempress
weight1,500
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id78,063,871
net_rshares3,660,588,592,757
author_curate_reward""
vote details (32)
@coop78 ·
$0.28
8tngj7t7sa
Entropy and information gain...new concepts for me.  Looking forward to Part 2.
👍  , ,
properties (23)
authorcoop78
permlinkm4h983k27ah4bkn
categoryentropyinexcel
json_metadata{"app":"steempress/2.0"}
created2019-08-10 03:30:12
last_update2019-08-10 03:30:12
depth1
children0
last_payout2019-08-17 03:30:12
cashout_time1969-12-31 23:59:59
total_payout_value0.213 HBD
curator_payout_value0.071 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length79
author_reputation81,667,515,497
root_title"Simple Explanation of Machine Learning shown with Excel - Part 1"
beneficiaries
0.
accountsteempress
weight100
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id89,355,937
net_rshares852,654,581,264
author_curate_reward""
vote details (3)
@eastmael ·
$0.19
58aarhzwia
Just visited the excel club site from your newsletter to see how you integrate steem to your blog. Sleak integration via SteemPress. 

Machine Learning has been one of the areas I haven't explored yet. Seeing this blog already informs me that it can be applied in Excel. Now I have an idea. Hope to be able to particiapte in one of your exercises for excel functions new to me that I can apply at work.
👍  , ,
properties (23)
authoreastmael
permlinksvd3ohjit2u99uc
categoryentropyinexcel
json_metadata{"app":"steempress/2.0"}
created2019-01-10 11:20:33
last_update2019-01-10 11:20:33
depth1
children1
last_payout2019-01-17 11:20:33
cashout_time1969-12-31 23:59:59
total_payout_value0.140 HBD
curator_payout_value0.048 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length402
author_reputation78,967,407,130,763
root_title"Simple Explanation of Machine Learning shown with Excel - Part 1"
beneficiaries
0.
accountsteempress
weight500
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id78,175,380
net_rshares378,409,338,535
author_curate_reward""
vote details (3)
@theexcelclub ·
hi @eastmael, the integration is way cool and I am hoping it will add a lot of value to this blog.

Machine learning is best not applied with Excel.  Its a very interesting subject that can be explained with Excel, but when you get to larger data sets that require a number of iterations on calculations, workbooks can easily crash.

Im sure over the course of the next few weeks you will find some interesting tutorials that you will benefit from.  thanks for stopping by.
properties (22)
authortheexcelclub
permlinkre-eastmael-svd3ohjit2u99uc-20190110t155208510z
categoryentropyinexcel
json_metadata{"tags":["entropyinexcel"],"users":["eastmael"],"app":"steemit/0.1"}
created2019-01-10 15:52:09
last_update2019-01-10 15:52:09
depth2
children0
last_payout2019-01-17 15:52:09
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length473
author_reputation49,417,421,433,400
root_title"Simple Explanation of Machine Learning shown with Excel - Part 1"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id78,186,072
net_rshares0
@gadrian ·
$0.04
cvl3uwtumi
That's a great site you've got here Paula!

First time I see the Steempress comments in action too. Well done for integrating it, I love it!
👍  ,
properties (23)
authorgadrian
permlink5ton6lwzbz14xlt
categoryentropyinexcel
json_metadata{"app":"steempress/2.0"}
created2019-01-11 09:23:15
last_update2019-01-11 09:23:15
depth1
children1
last_payout2019-01-18 09:23:15
cashout_time1969-12-31 23:59:59
total_payout_value0.031 HBD
curator_payout_value0.010 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length140
author_reputation644,691,742,451,719
root_title"Simple Explanation of Machine Learning shown with Excel - Part 1"
beneficiaries
0.
accountsteempress
weight500
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id78,220,839
net_rshares83,082,438,138
author_curate_reward""
vote details (2)
@theexcelclub ·
thank you @gadrian
properties (22)
authortheexcelclub
permlinkre-gadrian-5ton6lwzbz14xlt-20190114t115418357z
categoryentropyinexcel
json_metadata{"tags":["entropyinexcel"],"users":["gadrian"],"app":"steemit/0.1"}
created2019-01-14 11:54:21
last_update2019-01-14 11:54:21
depth2
children0
last_payout2019-01-21 11:54:21
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length18
author_reputation49,417,421,433,400
root_title"Simple Explanation of Machine Learning shown with Excel - Part 1"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id78,360,121
net_rshares0
@newageinv ·
$0.09
This is great @paulag!  The insights that can potentially be driven from this over large sets of data are interesting.  I am looking through the site more and more and seeing when I can gather time to sit and acquire some.  Great content!

Posted using [Partiko iOS](https://steemit.com/@partiko-ios)
👍  , ,
properties (23)
authornewageinv
permlinknewageinv-re-theexcelclub-simpleexplanationofmachinelearningshownwithexcel-part1-o4ouartkg1-20190108t170112113z
categoryentropyinexcel
json_metadata{"app":"partiko","client":"ios"}
created2019-01-08 17:01:36
last_update2019-01-08 17:01:36
depth1
children1
last_payout2019-01-15 17:01:36
cashout_time1969-12-31 23:59:59
total_payout_value0.065 HBD
curator_payout_value0.021 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length300
author_reputation263,935,678,415,882
root_title"Simple Explanation of Machine Learning shown with Excel - Part 1"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id78,083,617
net_rshares166,451,461,801
author_curate_reward""
vote details (3)
@paulag ·
thank you so much @newageinv, glad you like the content
properties (22)
authorpaulag
permlinkre-newageinv-newageinv-re-theexcelclub-simpleexplanationofmachinelearningshownwithexcel-part1-o4ouartkg1-20190108t215036136z
categoryentropyinexcel
json_metadata{"tags":["entropyinexcel"],"users":["newageinv"],"app":"steemit/0.1"}
created2019-01-08 21:50:36
last_update2019-01-08 21:50:36
depth2
children0
last_payout2019-01-15 21:50:36
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length55
author_reputation274,264,287,951,003
root_title"Simple Explanation of Machine Learning shown with Excel - Part 1"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id78,095,534
net_rshares0
@sparkesy43 ·
$0.05
You were placed either directly above or below me in @paulag's Redfish Power Up League this week, so I've come across to put a little vote on your latest post.
👍  , ,
properties (23)
authorsparkesy43
permlinkre-theexcelclub-simpleexplanationofmachinelearningshownwithexcel-part1-o4ouartkg1-20190109t105422399z
categoryentropyinexcel
json_metadata{"tags":["entropyinexcel"],"users":["paulag"],"app":"steemit/0.1"}
created2019-01-09 10:55:24
last_update2019-01-09 10:55:24
depth1
children2
last_payout2019-01-16 10:55:24
cashout_time1969-12-31 23:59:59
total_payout_value0.039 HBD
curator_payout_value0.012 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length159
author_reputation7,015,064,405,365
root_title"Simple Explanation of Machine Learning shown with Excel - Part 1"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd0
post_id78,123,638
net_rshares98,745,694,255
author_curate_reward""
vote details (3)
@paulag ·
Hi @sparkesy43, well I never, would you believe this is my brand account.  thanks you so much for stopping by to show support
properties (22)
authorpaulag
permlinkre-sparkesy43-re-theexcelclub-simpleexplanationofmachinelearningshownwithexcel-part1-o4ouartkg1-20190109t201735027z
categoryentropyinexcel
json_metadata{"tags":["entropyinexcel"],"users":["sparkesy43"],"app":"steemit/0.1"}
created2019-01-09 20:17:36
last_update2019-01-09 20:17:36
depth2
children1
last_payout2019-01-16 20:17:36
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length125
author_reputation274,264,287,951,003
root_title"Simple Explanation of Machine Learning shown with Excel - Part 1"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id78,146,639
net_rshares0
@sparkesy43 ·
$0.03
I did spot that, but if I am going to do this each week, then I need to it regardless of who is above or below.  The only thing that can haltlme would be if someone has no posts or comments less than 7 days old.

Posted using [Partiko Android](https://steemit.com/@partiko-android)
👍  ,
properties (23)
authorsparkesy43
permlinksparkesy43-re-paulag-re-sparkesy43-re-theexcelclub-simpleexplanationofmachinelearningshownwithexcel-part1-o4ouartkg1-20190110t054605056z
categoryentropyinexcel
json_metadata{"app":"partiko","client":"android"}
created2019-01-10 05:46:06
last_update2019-01-10 05:46:06
depth3
children0
last_payout2019-01-17 05:46:06
cashout_time1969-12-31 23:59:59
total_payout_value0.024 HBD
curator_payout_value0.007 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length281
author_reputation7,015,064,405,365
root_title"Simple Explanation of Machine Learning shown with Excel - Part 1"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id78,165,007
net_rshares64,024,152,109
author_curate_reward""
vote details (2)
@steemitboard ·
Congratulations @theexcelclub! You have completed the following achievement on the Steem blockchain and have been rewarded with new badge(s) :

<table><tr><td>https://steemitimages.com/60x70/http://steemitboard.com/@theexcelclub/votes.png?201901101237</td><td>You made more than 400 upvotes. Your next target is to reach 500 upvotes.</td></tr>
</table>

<sub>_[Click here to view your Board](https://steemitboard.com/@theexcelclub)_</sub>
<sub>_If you no longer want to receive notifications, reply to this comment with the word_ `STOP`</sub>



**Do not miss the last post from @steemitboard:**
<table><tr><td><a href="https://steemit.com/steem/@steemitboard/steemwhales-has-officially-moved-to-steemitboard-ranking"><img src="https://steemitimages.com/64x128/https://cdn.steemitimages.com/DQmfRVpHQhLDhnjDtqck8GPv9NPvNKPfMsDaAFDE1D9Er2Z/header_ranking.png"></a></td><td><a href="https://steemit.com/steem/@steemitboard/steemwhales-has-officially-moved-to-steemitboard-ranking">SteemWhales has officially moved to SteemitBoard Ranking</a></td></tr><tr><td><a href="https://steemit.com/steemitboard/@steemitboard/steemitboard-witness-update-2019-01-07"><img src="https://steemitimages.com/64x128/http://i.cubeupload.com/7CiQEO.png"></a></td><td><a href="https://steemit.com/steemitboard/@steemitboard/steemitboard-witness-update-2019-01-07">SteemitBoard - Witness Update</a></td></tr></table>

> Support [SteemitBoard's project](https://steemit.com/@steemitboard)! **[Vote for its witness](https://v2.steemconnect.com/sign/account-witness-vote?witness=steemitboard&approve=1)** and **get one more award**!
properties (22)
authorsteemitboard
permlinksteemitboard-notify-theexcelclub-20190110t133944000z
categoryentropyinexcel
json_metadata{"image":["https://steemitboard.com/img/notify.png"]}
created2019-01-10 13:39:45
last_update2019-01-10 13:39:45
depth1
children0
last_payout2019-01-17 13:39:45
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length1,605
author_reputation38,975,615,169,260
root_title"Simple Explanation of Machine Learning shown with Excel - Part 1"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id78,180,305
net_rshares0