create account

Predicting Hive - An Intro to Regression - 2 by medro-martin

View this thread on: hive.blogpeakd.comecency.com
· @medro-martin · (edited)
$20.46
Predicting Hive - An Intro to Regression - 2
<div class="pull-left"><h1>A</h1></div> <br> lright! <br>So, we saw how to perform <b>Linear Regression</b> and <b>Polynomial Regression</b> (using a quadratic polynomial) in the first part to this two part series! 

>**NOTE:** If you haven't seen the previous post, you can read it here: [Predicting Hive - An Intro to Regression - 1](https://peakd.com/@medro-martin/...).

> **NOTE:** Obtaining a better prediction only means that we have higher chances of being closer to the actual value when the event happens in reality.

## 1) Polynomial Regression (order 2) - Revisited
In order to use a quadratic, equation, the matrix equation that needs to be solved is as follows:

https://files.peakd.com/file/peakd-hive/medro-martin/N0YirkmC-image.png

The code that we wrote for this kind of regression was:
```
import os
import numpy as np
import matplotlib.pyplot as plt

# Function to sum the elements of an array
def sum(a):
    sum = 0
    for i in range(len(a)):
        sum += a[i]
    return sum

fileData = open("<location>/data.csv", "r")

for line in fileData:
    line = fileData.readlines()

print(*line)

x = np.zeros(len(line))
y = np.zeros(len(line))

for i in range(len(line)):
    x[i], y[i]= line[i].split(",")

sumX = sum(x)
sumX2 = sum(pow(x,2))
sumX3 = sum(pow(x,3))
sumX4 = sum(pow(x,4))
sumY = sum(y)
sumXY = sum(x*y)
sumX2Y = sum(pow(x,2)*y)

print(x,y)
print("sumX, sumX2, sumX3, sumX4, sumY, sumXY, sumX2Y", sumX, sumX2, sumX3, sumX4, sumY, sumXY, sumX2Y)

n = 3

data_m = np.zeros((n,n+1))

#Explicitly Defining the Augmented Matrix
data_m[0,0] = n
data_m[0,1] = sumX
data_m[0,2] = sumX2
data_m[0,3] = sumY
data_m[1,0] = sumX
data_m[1,1] = sumX2
data_m[1,2] = sumX3
data_m[1,3] = sumXY
data_m[2,0] = sumX2
data_m[2,1] = sumX3
data_m[2,2] = sumX4
data_m[2,3] = sumX2Y

print("Initial augmented matrix = \n", data_m)

# Elimination
for j in range(1,n):
    #print("LOOP-j:", j)
    for k in range(j,n):
        #print("     LOOP-k:", k)
        factor = data_m[k,j-1] / data_m[j-1,j-1]
        for i in range(n+1):
            #print("         LOOP-i:", i, "| ", data_m[k,i])
            data_m[k,i] = format(data_m[k,i] - factor*data_m[j-1,i], '7.2f')
            #print("-->",data_m[k,i])

print("Matrix after elimination = \n", data_m)

# Back Substitution
solution = np.zeros(n)

for j in range(n-1, -1, -1):
    subtractant = 0
    for i in range(n-1,-1,-1):
        subtractant = subtractant + solution[i] * data_m[j,i] 
    solution[j] = (data_m[j,n] - subtractant)/data_m[j,j]
print("Solution matrix:\n", solution)

y2 = solution[0] + solution[1]*x + solution[2]*pow(x,2)

ax = plt.subplot()
ax.plot(28*x,y, "ob", linestyle="solid")
ax.plot(28*x, y2, "ob", linestyle="solid", color="g")
ax.plot(350,(solution[0] + solution[1]*12.5 + solution[2]*pow(12.5,2)), 'ro')
plt.grid(True)
plt.title("Hive Price Chart")
ax.set_xlabel("Time (days)")
ax.set_ylabel("Price in USD")

plt.show()
```
...and the output we got looked something like this:

|||
|---|---|
|![](https://images.hive.blog/p/MG5aEqKFcQi6ksuzVh6JJptBJCL6eFwx2gvRnpcRhTRKgmTKc5WMfwDzRmfFiVA4JeJuTt9JwqPfGcaadwxKy5JaHz9tgwDin?format=match&mode=fit)|The green curve is the one we have fitted, the blue one is the actual data, and the red dot is our prediction for June 1.|

Now, we'll try to improve our curve-fitting and try to use a Polynomial Regression using higher order polynomial, so that we can get a more accurate prediction. 

## 2) Why do we need higher order polynomials?
Just ponder upon the following statements:
- A unique curve that'll always pass through **two** given points is a **straight line**.
- A unique curve that'll pass through three given points is a quadratic polynomial.
-    ...and so on...
- **A unique curve that'll pass through n given points is a polynomial of order (n-1).**

That's the reason behind our craving for an n-th order poly.

If we take the matrix equation for the previous quadratic case and carefully look at it, we'll find a pattern...
(Just have a look again!)
![](https://images.hive.blog/p/MG5aEqKFcQi6ksuzVh6JJptBJCL6eFwx2gvRnpcRhTRKgmTKc5WMfwDzRmfFiVA4JeJuTt9JvWEMQZHS2iWnz66ZrAbsxrSxN?format=match&mode=fit)

> **PATTERN:** 
>- In the first matrix, we can clearly see how the powers of `xi` keep increasing as we move down and to the right. This matrix is actually symmetric about its diagonal. 
>- Next, in the matrix **P** (i.e. the right-most one), we again have powers of `xi` progressively increasing as we move down.

So, building upon the pattern, we can easily say that the matrix equation to be solved for n<sup>th</sup> order polynomial will be:

![image.png](https://files.peakd.com/file/peakd-hive/medro-martin/xuz8HuJ2-image.png)

Now, based upon our knowledge, we'll modify the code for the quadratic case, and make it generalised.

**Full Code:**
```
import os
import numpy as np
import matplotlib.pyplot as plt

# Function to sum the elements of an array
def sum(a):
    sum = 0
    for i in range(len(a)):
        sum += a[i]
    return sum

fileData = open("<location>/data.csv", "r")

for line in fileData:
    line = fileData.readlines()

print("Data received:\n",*line)

x = np.zeros(len(line))
y = np.zeros(len(line))

for i in range(len(line)):
    x[i], y[i]= line[i].split(",")

print("x-matrix:\n",x,"\n y-matrix:\n",y)

# Defining order for the polynomial to be used in regression
order = input("Please enter the order of the polynomial you wish to use for interpolation.")

if (order == "default"):
    n = len(x)
if (order != "default"):
    n = int(order) + 1

print(n, type(n))
data_m = np.zeros((n,n+1))

#Explicitly Defining the Augmented Matrix
#Generalising Augmented Matrix Definition
# Defining the matrix A
for j in range(0,n): #Row counter
    for i in range(0,n): #Column counter
        if(i == 0 and j == 0):
            data_m[j,i] = n
        if(i!=0 or j!=0):
            data_m[j,i] = sum(pow(x,(i+j)))

# Defining the matrix B
for j in range(0,n):
    data_m[j,n] = sum(y*pow(x,j))

print("Initial augmented matrix = \n", data_m)

# Elimination
for j in range(1,n):
    #print("LOOP-j:", j)
    for k in range(j,n):
        #print("     LOOP-k:", k)
        factor = data_m[k,j-1] / data_m[j-1,j-1]
        for i in range(n+1):
            #print("         LOOP-i:", i, "| ", data_m[k,i])
            data_m[k,i] = format(data_m[k,i] - factor*data_m[j-1,i], '7.2f')
            #print("-->",data_m[k,i])

print("Matrix after elimination = \n", data_m)

# Back Substitution
solution = np.zeros(n)

for j in range(n-1, -1, -1):
    subtractant = 0
    for i in range(n-1,-1,-1):
        subtractant = subtractant + solution[i] * data_m[j,i] 
    solution[j] = (data_m[j,n] - subtractant)/data_m[j,j]
print("Solution matrix:\n", solution)

y2 = np.zeros(len(x))

for j in range(0,n):
    y2 = y2 + solution[j]*pow(x,j)

print(y2)

ax = plt.subplot()
ax.plot(28*x,y, "ob", linestyle="solid")
ax.plot(28*x, y2, "ob", linestyle="solid", color="g")
plt.grid(True)
plt.title("Hive Price Chart")
ax.set_xlabel("Time (days)")
ax.set_ylabel("Price in USD")

plt.show()
```
> **Additional Features we have added:** 
>- The user can now decide the order of the polynomial to fit. Selecting `default` tells the program to take the total number of data points as the order of the polynomial.

One thing we haven't added yet is the prediction (extrapolation) part. We need to extrapolate the fitted curve to `day=350`, to get the Hive price on June 1 2020.

For this, we'll just make the following changes near the end of code just above `plt.show()` :
```
.
.
.
for j in range(0,n):
    y2 = y2 + solution[j]*pow(x,j)

def predict(x):
    prediction = 0
    for j in range(0,n):
        prediction += solution[j]*pow(x,j)
    return prediction

print(y2)

ax = plt.subplot()
ax.plot(28*x,y, "ob", linestyle="solid")
ax.plot(28*x, y2, "ob", linestyle="solid", color="g")
ax.plot(350,predict(12.5), 'ro')
plt.grid(True)
plt.title("Hive Price Chart")
ax.set_xlabel("Time (days)")
ax.set_ylabel("Price in USD")
.
.
.
```

> **NOTE:** As already mentioned in the previous post, we are using `12.5` and not `350` for our prediction because our step size is 28. (12.5 * 28 = 350).

Now, our coding part is complete!...and we are **ready to test and predict**!!

|**Order of Polynomial** | **Fit** | **Prediction**|
|---|---|---|
|1 (linear)|![image.png](https://files.peakd.com/file/peakd-hive/medro-martin/HrCCCOAA-image.png)|**0.043 $** (ERROR: Here, one thing is very clear. This fit shouldn't start from 0!)|
|2 (quadratic)|![image.png](https://files.peakd.com/file/peakd-hive/medro-martin/z830TTZv-image.png)|**0.214 $**|
|3 (cubic)|![image.png](https://files.peakd.com/file/peakd-hive/medro-martin/OtIernQr-image.png)|**0.58 $**|
|4|![image.png](https://files.peakd.com/file/peakd-hive/medro-martin/GWDEeYsn-image.png)|**0.28 $**|
|10|![image.png](https://files.peakd.com/file/peakd-hive/medro-martin/aWSkUlgI-image.png)|**-1.103 $** (The ERROR is very much clear here!!)|
|default (order = 12)|![image.png](https://files.peakd.com/file/peakd-hive/medro-martin/HPNrYxem-image.png)|**-0.55 $** (Haha!!)|

Ok, so we have seen above that though polynomial interpolation seems to fit the data well..but, we have checked at intervals of `x = 28`...let's try checking the fitted function at a higher resolution so that we can see what is happening in between those points.

> **Plus we also know one problem with this technique, that the curves always try to pass from near 0 before they rise up to the desired level.**

In the table above, we can see the curve clearly up till an order of 4, but for higher orders, the curve is not clear because of the low resolution we have used to print it. So, let's use higher resolution, and see how the curve really performs...this will also tell us why we are getting odd, erroneous predictions.

|**Order of Polynomial**|**Fit**|**Comments**|
|---|---|---|
|5|![image.png](https://files.peakd.com/file/peakd-hive/medro-martin/vXmaPC1p-image.png)|We can see the curve varies wildly between data points.|
|8|![image.png](https://files.peakd.com/file/peakd-hive/medro-martin/RREaEoPs-image.png)||
|10|![image.png](https://files.peakd.com/file/peakd-hive/medro-martin/1quVNbkd-image.png)|This choice of order performs the worst (for our case).|
|14|![image.png](https://files.peakd.com/file/peakd-hive/medro-martin/hU7QW90H-image.png)||

**So, as you can see...using higher order polynomials to fit the curve is not much of a boon because it gives the curve, extra degrees of freedom, allowing the curve to vary wildly between and beyond the given data points.**

**CONCLUSION:** Using Higher order polynomials may be good for fitting the data, but it is definitely not a good idea for use in extrapolation, and for purposes of prediction.

**IN THE NEXT ARTICLE:**
We'll see how to find the optimum order of polynomial to fit our curve, too low is bad because it doesn't fit the data properly, and hence has less amount of info, too high is also bad because it allows the curve to vary wildly. **Optimisation is the key!!**

---
Hope you learned something new from this article!
Thanks for your time.

Best,
M. Medro

---

#### Credits
All media used in this article have been created by me.
👍  , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , and 532 others
properties (23)
authormedro-martin
permlinkpredicting-hive-an-intro-to-regression-2
categoryhive-196387
json_metadata{"app":"peakd/2020.05.5","format":"markdown","tags":["steemstem","science","maths","programming","python","coding","regression","hive","hive-148441"],"users":["medro-martin"],"links":["/@medro-martin/..."],"image":["https://files.peakd.com/file/peakd-hive/medro-martin/N0YirkmC-image.png","https://images.hive.blog/p/MG5aEqKFcQi6ksuzVh6JJptBJCL6eFwx2gvRnpcRhTRKgmTKc5WMfwDzRmfFiVA4JeJuTt9JwqPfGcaadwxKy5JaHz9tgwDin?format=match&amp;mode=fit","https://images.hive.blog/p/MG5aEqKFcQi6ksuzVh6JJptBJCL6eFwx2gvRnpcRhTRKgmTKc5WMfwDzRmfFiVA4JeJuTt9JvWEMQZHS2iWnz66ZrAbsxrSxN?format=match&amp;mode=fit","https://files.peakd.com/file/peakd-hive/medro-martin/xuz8HuJ2-image.png","https://files.peakd.com/file/peakd-hive/medro-martin/HrCCCOAA-image.png","https://files.peakd.com/file/peakd-hive/medro-martin/z830TTZv-image.png","https://files.peakd.com/file/peakd-hive/medro-martin/OtIernQr-image.png","https://files.peakd.com/file/peakd-hive/medro-martin/GWDEeYsn-image.png","https://files.peakd.com/file/peakd-hive/medro-martin/aWSkUlgI-image.png","https://files.peakd.com/file/peakd-hive/medro-martin/HPNrYxem-image.png","https://files.peakd.com/file/peakd-hive/medro-martin/vXmaPC1p-image.png","https://files.peakd.com/file/peakd-hive/medro-martin/RREaEoPs-image.png","https://files.peakd.com/file/peakd-hive/medro-martin/1quVNbkd-image.png","https://files.peakd.com/file/peakd-hive/medro-martin/hU7QW90H-image.png"]}
created2020-06-04 05:15:42
last_update2020-06-04 17:51:42
depth0
children4
last_payout2020-06-11 05:15:42
cashout_time1969-12-31 23:59:59
total_payout_value10.360 HBD
curator_payout_value10.102 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length11,201
author_reputation12,784,697,208,436
root_title"Predicting Hive - An Intro to Regression - 2"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id97,763,198
net_rshares41,727,278,708,157
author_curate_reward""
vote details (596)
@culgin ·
Interesting way to predict HIVE price :)

@tipu curate
👍  
properties (23)
authorculgin
permlinkre-medro-martin-qbdz7f
categoryhive-196387
json_metadata{"tags":["hive-196387"],"app":"peakd/2020.05.5"}
created2020-06-04 05:28:33
last_update2020-06-04 05:28:33
depth1
children1
last_payout2020-06-11 05:28:33
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length54
author_reputation170,100,255,531,223
root_title"Predicting Hive - An Intro to Regression - 2"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id97,763,325
net_rshares4,649,980,624
author_curate_reward""
vote details (1)
@tipu ·
<a href="https://tipu.online/hive_curator?culgin" target="_blank">Upvoted  &#128076;</a> (Mana: 5/10)
properties (22)
authortipu
permlinkre-re-medro-martin-qbdz7f-20200604t052853
categoryhive-196387
json_metadata""
created2020-06-04 05:28:51
last_update2020-06-04 05:28:51
depth2
children0
last_payout2020-06-11 05:28:51
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length101
author_reputation55,954,426,099,732
root_title"Predicting Hive - An Intro to Regression - 2"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id97,763,328
net_rshares0
@hivebuzz ·
Congratulations @medro-martin! You have completed the following achievement on the Hive blockchain and have been rewarded with new badge(s) :

<table><tr><td><img src="https://images.hive.blog/60x70/http://hivebuzz.me/@medro-martin/upvoted.png?202006040530"></td><td>You received more than 8000 upvotes. Your next target is to reach 9000 upvotes.</td></tr>
</table>

<sub>_You can view [your badges on your board](https://hivebuzz.me/@medro-martin) And compare to others on the [Ranking](https://hivebuzz.me/ranking)_</sub>
<sub>_If you no longer want to receive notifications, reply to this comment with the word_ `STOP`</sub>


To support your work, I also upvoted your post!


**Do not miss the last post from @hivebuzz:**
<table><tr><td><a href="/hivebuzz/@hivebuzz/update-202006"><img src="https://images.hive.blog/64x128/https://i.imgur.com/C5NcoUe.png"></a></td><td><a href="/hivebuzz/@hivebuzz/update-202006">Project Activity Update</a></td></tr></table>
👍  
properties (23)
authorhivebuzz
permlinkhivebuzz-notify-medro-martin-20200604t055038000z
categoryhive-196387
json_metadata{"image":["http://hivebuzz.me/notify.t6.png"]}
created2020-06-04 05:50:36
last_update2020-06-04 05:50:36
depth1
children0
last_payout2020-06-11 05:50:36
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length962
author_reputation369,408,267,263,436
root_title"Predicting Hive - An Intro to Regression - 2"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id97,763,523
net_rshares4,745,630,857
author_curate_reward""
vote details (1)
@steemstem ·
re-medro-martin-predicting-hive-an-intro-to-regression-2-20200605t014605004z
<div class='text-justify'> <div class='pull-left'>
 <img src='https://stem.openhive.network/images/stemsocialsupport7.png'> </div>

Thanks for your contribution to the STEMsocial community. Feel free to join us on discord to get to know the rest of us!

Please consider <a href="https://hivesigner.com/sign/update-proposal-votes?proposal_ids=%5B91%5D&amp;approve=true">supporting our funding proposal</a>, <a href="https://hivesigner.com/sign/account_witness_vote?approve=1&witness=stem.witness">approving our witness</a> (@stem.witness) or delegating to the @stemsocial account (for some ROI).

Please consider using the <a href='https://stem.openhive.network'>STEMsocial app</a> app and including @stemsocial as a beneficiary to get a stronger support.&nbsp;<br />&nbsp;<br />
properties (22)
authorsteemstem
permlinkre-medro-martin-predicting-hive-an-intro-to-regression-2-20200605t014605004z
categoryhive-196387
json_metadata{"app":"stemsocial"}
created2020-06-05 01:46:06
last_update2020-06-05 01:46:06
depth1
children0
last_payout2020-06-12 01:46:06
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length778
author_reputation262,017,435,115,313
root_title"Predicting Hive - An Intro to Regression - 2"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id97,780,184
net_rshares0