create account

How did I learn Machine Learning : part 3 - Implement a simple neural network from scratch II by boostyslf

View this thread on: hive.blogpeakd.comecency.com
· @boostyslf · (edited)
$0.06
How did I learn Machine Learning : part 3 - Implement a simple neural network from scratch II
Hi,
<div class="text-justify">
In my previous <a href="https://steemit.com/technology/@boostyslf/how-did-i-learn-machine-learning-part-3-implement-a-simple-neural-network-from-scratch-i">post</a>, I have discussed how does a simple neural network work and what are the notations we are going to use in this series of blogs. Today I am going to present to you how do we implement a simple neural network from scratch. I expect that you have a basic understanding of calculus as we are going to work with some mathematical equations. If you don't have such knowledge, please keep on reading and trying out the examples. I guarantee that you will become an expert in machine learning after the end of this series of blogs. I am trying to explain all the questions as much as simple. Also, I am not going to use any programming frameworks (such as TensorFlow, Keras, etc.) because I need to explain to you how to implement the machine learning stuff without using those programming frameworks. Then you can deeply understand how these things are working. If you do not understand any notation, derivation or expression please refer to the previous blog or put some comment. Let's start ...

In this neural network, we are trying to recognize cats vs. non-cats. For that, we are given a training dataset of 209 images and a test dataset of 50 images. The size of one image is 64\*64 pixels, and since those images are colored images there are three channels. We are feeding images one by one to the neural network. There are several ways of feeding an image to a neural network. In this example, a flattened image is fed to the neural network and as a result of that, the number of neurons in the input layer equals <b>12288</b> (64pixels \* 64pixels \* 3channels). <div class="pull-left">https://cdn.steemitimages.com/DQmX6wFyxKLHrLXHxQtGxu2BeFgi5kjAU2f18cfqbbMF7SB/nn.png</div>
 In this network, we don't have any hidden layer and the output layer is the first and only layer. In the output layer, we expect a value between 0 and 1. If that value is greater than 0.5, it is considered as a cat. Therefore we have only one neuron in the output layer. Please refer to the attached image to get an idea about the network architecture.  

When we consider the implementation of the neural network, we need to write equations for the forward propagation, cost, backward propagation and updating weights and bias. Forward propagation equations are mentioned below. 

eqn (1) : <code><b>z<sub>1</sub><sup>(i)</sup> = w<sub>1</sub>x<sub>1</sub><sup>(i)</sup> + w<sub>2</sub>x<sub>2</sub><sup>(i)</sup> + ... + w<sub>12288</sub>x<sub>12288</sub><sup>(i)</sup> + b</b></code>
eqn (2) : <code><b>ŷ<sup>(i)</sup> = a<sub>1</sub><sup>(i)</sup> = g(z<sub>1</sub><sup>(i)</sup>)</b></code>

Now we should calculate the cost per data(L) and cost for the whole dataset(J).

eqn (3) : <code><b>L(a<sub>1</sub><sup>(i)</sup>, y<sup>(i)</sup>)<sup>(i)</sup> = -y<sup>(i)</sup>log(a<sub>1</sub><sup>(i)</sup>) - (1 - y<sup>(i)</sup>)log(1 - a<sub>1</sub><sup>(i)</sup>)</b></code> 
eqn (4) : <code><b>J = (Σ<sub>i=1</sub><sup>m</sup>L(a<sub>1</sub><sup>(i)</sup>, y<sup>(i)</sup>)<sup>(i)</sup>)/m</b></code>

Now we should calculate the derivative for each weight and bias(∂J/∂w<sub>1</sub>, ∂J/∂w<sub>2</sub>, ∂J/∂b<sub>2</sub>, ... ) such that the cost should be reduced in each iteration. Let's try to calculate the ∂J/∂w<sub>1</sub> and I will write ∂J/∂w<sub>1</sub> as <b>∂w<sub>1</sub></b>. By the <a href="https://en.wikipedia.org/wiki/Chain_rule">chain rule</a> we can calculate the ∂w<sub>1</sub> as follows.

eqn (5.1) : <code>∂w<sub>1</sub> = ∂J/∂w<sub>1</sub> = (Σ<sub>i=1</sub><sup>m</sup><b>∂L<sup>(i)</sup>/∂w<sub>1</sub></b>)/m</code>
eqn (5.2) : <code><b>∂L<sup>(i)</sup>/∂w<sub>1</sub></b> = ∂L<sup>(i)</sup>/∂a<sub>1</sub><sup>(i)</sup>\*<b>∂a<sub>1</sub><sup>(i)</sup>/∂w<sub>1</sub> </b></code>
eqn (5.3) : <code><b>∂a<sub>1</sub><sup>(i)</sup>/∂w<sub>1</sub> </b> = ∂a<sup>(i)</sup>/∂z<sub>1</sub><sup>(i)</sup>*<b>∂z<sub>1</sub><sup>(i)</sup>/∂w<sub>1</sub> </b></code>

Using eqn (5.2) and eqn (5.3), 
eqn (5.4) : <code>∂L<sup>(i)</sup>/∂w<sub>1</sub> = ∂L<sup>(i)</sup>/∂a<sub>1</sub><sup>(i)</sup>\*∂a<sup>(i)</sup>/∂z<sub>1</sub><sup>(i)</sup>*∂z<sub>1</sub><sup>(i)</sup>/∂w<sub>1</sub></code> 

By getting the derivative of the eqn (3) w.r.t. ∂a<sub>1</sub><sup>(i)</sup>,
eqn (5.5) : <code>∂L<sup>(i)</sup>/∂a<sub>1</sub><sup>(i)</sup> = - y<sup>(i)</sup>/a<sub>1</sub><sup>(i)</sup> + (1-y<sup>(i)</sup>)/(1-a<sub>1</sub><sup>(i)</sup>)</code>

Let's assume activation function <b>g()</b> is sigmoid function. Therefore ∂a<sup>(i)</sup>/∂z<sub>1</sub><sup>(i)</sup> equals the <a href="https://towardsdatascience.com/derivative-of-the-sigmoid-function-536880cf918e">derivative of sigmoid function</a>.
eqn (5.6) : <code>∂a<sup>(i)</sup>/∂z<sub>1</sub><sup>(i)</sup> = a<sub>1</sub><sup>(i)</sup>(1 - a<sub>1</sub><sup>(i)</sup>)</code>

By getting the derivative of the eqn (1) w.r.t. dw<sub>1</sub>,
eqn (5.7) : <code>∂z<sub>1</sub><sup>(i)</sup>/∂w<sub>1</sub> = x<sub>1</sub><sup>(i)</sup></code>

By substituting eqn (5.5), eqn (5.6) and eqn (5.7) into eqn (5.4) and simplifying the expression, 
eqn (5.8) : <code>∂L<sup>(i)</sup>/∂w<sub>1</sub> = (a<sub>1</sub><sup>(i)</sup> - y<sup>(i)</sup>)x<sub>1</sub><sup>(i)</sup></code>

By substituting eqn (5.8) into eqn (5.1),  
eqn (6) : <code><b>∂w<sub>1</sub> = (Σ<sub>i=1</sub><sup>m</sup><b>(a<sub>1</sub><sup>(i)</sup> - y<sup>(i)</sup>)x<sub>1</sub><sup>(i)</sup></b>)/m</b></code>

Following the eqn (6), we can write derivate for all weights.

<code>∂w<sub>2</sub> = (Σ<sub>i=1</sub><sup>m</sup><b>(a<sub>1</sub><sup>(i)</sup> - y<sup>(i)</sup>)x<sub>2</sub><sup>(i)</sup></b>)/m</code> ...
<code>∂w<sub>12288</sub> = (Σ<sub>i=1</sub><sup>m</sup><b>(a<sub>1</sub><sup>(i)</sup> - y<sup>(i)</sup>)x<sub>12288</sub><sup>(i)</sup></b>)/m</code>


Since ∂z<sub>1</sub><sup>(i)</sup>/∂b = 1, 
eqn (7) : <code>**∂b = ∂J/∂b = (Σ<sub>i=1</sub><sup>m</sup><b>(a<sub>1</sub><sup>(i)</sup> - y<sup>(i)</sup>))/m**</code>

Now we have to update the weights and bias as shown below.

eqn (8) : <code>**w<sub>1</sub> = w<sub>1</sub> - α\*∂w<sub>1</sub>**</code> \...
                  <code>w<sub>12288</sub> = w<sub>12288</sub> - α*∂w<sub>12288</sub></code>
eqn (9) : **b = b - α\*∂b**

Finally, we have derived all the questions we need to implement our neural network. In the next blog post, I will explain to you how to implement our neural network using theses equations. Please feel free to raise any concerns/suggestions on this blog post. Let's meet in the next post.
</div>
👍  , , , , , , , , , , ,
properties (23)
authorboostyslf
permlinkhow-did-i-learn-machine-learning-part-3-implement-a-simple-neural-network-from-scratch-ii
categorytechnology
json_metadata{"tags":["technology","machine-learning","education","science","maths"],"image":["https://cdn.steemitimages.com/DQmX6wFyxKLHrLXHxQtGxu2BeFgi5kjAU2f18cfqbbMF7SB/nn.png"],"links":["https://steemit.com/technology/@boostyslf/how-did-i-learn-machine-learning-part-3-implement-a-simple-neural-network-from-scratch-i","https://en.wikipedia.org/wiki/Chain_rule","https://towardsdatascience.com/derivative-of-the-sigmoid-function-536880cf918e"],"app":"steemit/0.1","format":"markdown"}
created2019-08-22 17:18:36
last_update2019-08-24 04:16:03
depth0
children1
last_payout2019-08-29 17:18:36
cashout_time1969-12-31 23:59:59
total_payout_value0.028 HBD
curator_payout_value0.028 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length6,637
author_reputation212,876,204,537
root_title"How did I learn Machine Learning : part 3 - Implement a simple neural network from scratch II"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id89,832,220
net_rshares288,480,674,111
author_curate_reward""
vote details (12)
@steemitboard ·
Congratulations @boostyslf! You have completed the following achievement on the Steem blockchain and have been rewarded with new badge(s) :

<table><tr><td><img src="https://steemitimages.com/60x70/http://steemitboard.com/@boostyslf/posts.png?201908221814"></td><td>You published more than 10 posts. Your next target is to reach 20 posts.</td></tr>
<tr><td><img src="https://steemitimages.com/60x70/http://steemitboard.com/@boostyslf/commented.png?201908221814"></td><td>You got more than 10 replies. Your next target is to reach 50 replies.</td></tr>
</table>

<sub>_You can view [your badges on your Steem Board](https://steemitboard.com/@boostyslf) and compare to others on the [Steem Ranking](https://steemitboard.com/ranking/index.php?name=boostyslf)_</sub>
<sub>_If you no longer want to receive notifications, reply to this comment with the word_ `STOP`</sub>


To support your work, I also upvoted your post!


###### [Vote for @Steemitboard as a witness](https://v2.steemconnect.com/sign/account-witness-vote?witness=steemitboard&approve=1) to get one more award and increased upvotes!
properties (22)
authorsteemitboard
permlinksteemitboard-notify-boostyslf-20190822t185302000z
categorytechnology
json_metadata{"image":["https://steemitboard.com/img/notify.png"]}
created2019-08-22 18:53:00
last_update2019-08-22 18:53:00
depth1
children0
last_payout2019-08-29 18:53:00
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length1,094
author_reputation38,975,615,169,260
root_title"How did I learn Machine Learning : part 3 - Implement a simple neural network from scratch II"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id89,834,905
net_rshares0