How did I learn Machine Learning : part 3 - Implement a simple neural network from scratch II by boostyslf

technology · @boostyslf · Aug 24 '19 (edited)

$0.06

How did I learn Machine Learning : part 3 - Implement a simple neural network from scratch II

Hi,
<div class="text-justify">
In my previous <a href="https://steemit.com/technology/@boostyslf/how-did-i-learn-machine-learning-part-3-implement-a-simple-neural-network-from-scratch-i">post</a>, I have discussed how does a simple neural network work and what are the notations we are going to use in this series of blogs. Today I am going to present to you how do we implement a simple neural network from scratch. I expect that you have a basic understanding of calculus as we are going to work with some mathematical equations. If you don't have such knowledge, please keep on reading and trying out the examples. I guarantee that you will become an expert in machine learning after the end of this series of blogs. I am trying to explain all the questions as much as simple. Also, I am not going to use any programming frameworks (such as TensorFlow, Keras, etc.) because I need to explain to you how to implement the machine learning stuff without using those programming frameworks. Then you can deeply understand how these things are working. If you do not understand any notation, derivation or expression please refer to the previous blog or put some comment. Let's start ...

In this neural network, we are trying to recognize cats vs. non-cats. For that, we are given a training dataset of 209 images and a test dataset of 50 images. The size of one image is 64\*64 pixels, and since those images are colored images there are three channels. We are feeding images one by one to the neural network. There are several ways of feeding an image to a neural network. In this example, a flattened image is fed to the neural network and as a result of that, the number of neurons in the input layer equals <b>12288</b> (64pixels \* 64pixels \* 3channels). <div class="pull-left">https://cdn.steemitimages.com/DQmX6wFyxKLHrLXHxQtGxu2BeFgi5kjAU2f18cfqbbMF7SB/nn.png</div>
 In this network, we don't have any hidden layer and the output layer is the first and only layer. In the output layer, we expect a value between 0 and 1. If that value is greater than 0.5, it is considered as a cat. Therefore we have only one neuron in the output layer. Please refer to the attached image to get an idea about the network architecture.  

When we consider the implementation of the neural network, we need to write equations for the forward propagation, cost, backward propagation and updating weights and bias. Forward propagation equations are mentioned below. 

eqn (1) : <code><b>z<sub>1</sub><sup>(i)</sup> = w<sub>1</sub>x<sub>1</sub><sup>(i)</sup> + w<sub>2</sub>x<sub>2</sub><sup>(i)</sup> + ... + w<sub>12288</sub>x<sub>12288</sub><sup>(i)</sup> + b</b></code>
eqn (2) : <code><b>ŷ<sup>(i)</sup> = a<sub>1</sub><sup>(i)</sup> = g(z<sub>1</sub><sup>(i)</sup>)</b></code>

Now we should calculate the cost per data(L) and cost for the whole dataset(J).

eqn (3) : <code><b>L(a<sub>1</sub><sup>(i)</sup>, y<sup>(i)</sup>)<sup>(i)</sup> = -y<sup>(i)</sup>log(a<sub>1</sub><sup>(i)</sup>) - (1 - y<sup>(i)</sup>)log(1 - a<sub>1</sub><sup>(i)</sup>)</b></code> 
eqn (4) : <code><b>J = (Σ<sub>i=1</sub><sup>m</sup>L(a<sub>1</sub><sup>(i)</sup>, y<sup>(i)</sup>)<sup>(i)</sup>)/m</b></code>

Now we should calculate the derivative for each weight and bias(∂J/∂w<sub>1</sub>, ∂J/∂w<sub>2</sub>, ∂J/∂b<sub>2</sub>, ... ) such that the cost should be reduced in each iteration. Let's try to calculate the ∂J/∂w<sub>1</sub> and I will write ∂J/∂w<sub>1</sub> as <b>∂w<sub>1</sub></b>. By the <a href="https://en.wikipedia.org/wiki/Chain_rule">chain rule</a> we can calculate the ∂w<sub>1</sub> as follows.

eqn (5.1) : <code>∂w<sub>1</sub> = ∂J/∂w<sub>1</sub> = (Σ<sub>i=1</sub><sup>m</sup><b>∂L<sup>(i)</sup>/∂w<sub>1</sub></b>)/m</code>
eqn (5.2) : <code><b>∂L<sup>(i)</sup>/∂w<sub>1</sub></b> = ∂L<sup>(i)</sup>/∂a<sub>1</sub><sup>(i)</sup>\*<b>∂a<sub>1</sub><sup>(i)</sup>/∂w<sub>1</sub> </b></code>
eqn (5.3) : <code><b>∂a<sub>1</sub><sup>(i)</sup>/∂w<sub>1</sub> </b> = ∂a<sup>(i)</sup>/∂z<sub>1</sub><sup>(i)</sup>*<b>∂z<sub>1</sub><sup>(i)</sup>/∂w<sub>1</sub> </b></code>

Using eqn (5.2) and eqn (5.3), 
eqn (5.4) : <code>∂L<sup>(i)</sup>/∂w<sub>1</sub> = ∂L<sup>(i)</sup>/∂a<sub>1</sub><sup>(i)</sup>\*∂a<sup>(i)</sup>/∂z<sub>1</sub><sup>(i)</sup>*∂z<sub>1</sub><sup>(i)</sup>/∂w<sub>1</sub></code> 

By getting the derivative of the eqn (3) w.r.t. ∂a<sub>1</sub><sup>(i)</sup>,
eqn (5.5) : <code>∂L<sup>(i)</sup>/∂a<sub>1</sub><sup>(i)</sup> = - y<sup>(i)</sup>/a<sub>1</sub><sup>(i)</sup> + (1-y<sup>(i)</sup>)/(1-a<sub>1</sub><sup>(i)</sup>)</code>

Let's assume activation function <b>g()</b> is sigmoid function. Therefore ∂a<sup>(i)</sup>/∂z<sub>1</sub><sup>(i)</sup> equals the <a href="https://towardsdatascience.com/derivative-of-the-sigmoid-function-536880cf918e">derivative of sigmoid function</a>.
eqn (5.6) : <code>∂a<sup>(i)</sup>/∂z<sub>1</sub><sup>(i)</sup> = a<sub>1</sub><sup>(i)</sup>(1 - a<sub>1</sub><sup>(i)</sup>)</code>

By getting the derivative of the eqn (1) w.r.t. dw<sub>1</sub>,
eqn (5.7) : <code>∂z<sub>1</sub><sup>(i)</sup>/∂w<sub>1</sub> = x<sub>1</sub><sup>(i)</sup></code>

By substituting eqn (5.5), eqn (5.6) and eqn (5.7) into eqn (5.4) and simplifying the expression, 
eqn (5.8) : <code>∂L<sup>(i)</sup>/∂w<sub>1</sub> = (a<sub>1</sub><sup>(i)</sup> - y<sup>(i)</sup>)x<sub>1</sub><sup>(i)</sup></code>

By substituting eqn (5.8) into eqn (5.1),  
eqn (6) : <code><b>∂w<sub>1</sub> = (Σ<sub>i=1</sub><sup>m</sup><b>(a<sub>1</sub><sup>(i)</sup> - y<sup>(i)</sup>)x<sub>1</sub><sup>(i)</sup></b>)/m</b></code>

Following the eqn (6), we can write derivate for all weights.

<code>∂w<sub>2</sub> = (Σ<sub>i=1</sub><sup>m</sup><b>(a<sub>1</sub><sup>(i)</sup> - y<sup>(i)</sup>)x<sub>2</sub><sup>(i)</sup></b>)/m</code> ...
<code>∂w<sub>12288</sub> = (Σ<sub>i=1</sub><sup>m</sup><b>(a<sub>1</sub><sup>(i)</sup> - y<sup>(i)</sup>)x<sub>12288</sub><sup>(i)</sup></b>)/m</code>


Since ∂z<sub>1</sub><sup>(i)</sup>/∂b = 1, 
eqn (7) : <code>**∂b = ∂J/∂b = (Σ<sub>i=1</sub><sup>m</sup><b>(a<sub>1</sub><sup>(i)</sup> - y<sup>(i)</sup>))/m**</code>

Now we have to update the weights and bias as shown below.

eqn (8) : <code>**w<sub>1</sub> = w<sub>1</sub> - α\*∂w<sub>1</sub>**</code> \...
                  <code>w<sub>12288</sub> = w<sub>12288</sub> - α*∂w<sub>12288</sub></code>
eqn (9) : **b = b - α\*∂b**

Finally, we have derived all the questions we need to implement our neural network. In the next blog post, I will explain to you how to implement our neural network using theses equations. Please feel free to raise any concerns/suggestions on this blog post. Let's meet in the next post.
</div>

👍 zuerich, inertia, doitvoluntarily, steemitboard, tonimontana, quantumdeveloper, sepracore, tute, onespringday, deepak0018, sasifuddin, boostyslf

`author`	boostyslf
`permlink`	how-did-i-learn-machine-learning-part-3-implement-a-simple-neural-network-from-scratch-ii
`category`	technology
`json_metadata`	{"tags":["technology","machine-learning","education","science","maths"],"image":["https://cdn.steemitimages.com/DQmX6wFyxKLHrLXHxQtGxu2BeFgi5kjAU2f18cfqbbMF7SB/nn.png"],"links":["https://steemit.com/technology/@boostyslf/how-did-i-learn-machine-learning-part-3-implement-a-simple-neural-network-from-scratch-i","https://en.wikipedia.org/wiki/Chain_rule","https://towardsdatascience.com/derivative-of-the-sigmoid-function-536880cf918e"],"app":"steemit/0.1","format":"markdown"}
`created`	2019-08-22 17:18:36
`last_update`	2019-08-24 04:16:03
`depth`	0
`children`	1
`last_payout`	2019-08-29 17:18:36
`cashout_time`	1969-12-31 23:59:59
`total_payout_value`	0.028 HBD
`curator_payout_value`	0.028 HBD
`pending_payout_value`	0.000 HBD
`promoted`	0.000 HBD
`body_length`	6,637
`author_reputation`	212,876,204,537
`root_title`	"How did I learn Machine Learning : part 3 - Implement a simple neural network from scratch II"
`beneficiaries`	`[]`
`max_accepted_payout`	1,000,000.000 HBD
`percent_hbd`	10,000
`post_id`	89,832,220
`net_rshares`	288,480,674,111
`author_curate_reward`	""

properties (23)vote details (12)

voter	rshares	pct
inertia	86,718,344,762	100%
doitvoluntarily	17,039,781,651	100%
steemitboard	15,789,354,101	1%
sepracore	2,866,904,418	100%
tute	1,466,206,105	100%
deepak0018	474,537,312	100%
tonimontana	3,544,383,390	100%
zuerich	156,629,063,716	30%
onespringday	687,276,535	10%
quantumdeveloper	3,226,808,053	100%
sasifuddin	38,014,068	100%
boostyslf	0	100%