損失函數梯度下降

梯度下降是函數找極值的方法,如果應用在損失函數(迴歸線),就是在找極小值的參數。比如
一階迴歸線 : $(y=ax+b)$,找尋 a, b 的值
二階迴歸線 : $(y=ax^2+bx+c)$,找尋 a, b, c 的值

本篇使用動畫說明每一步驟的變化。

使用 numpy 

假設 一元 6 階方程式為 $(y=0.001x^6-0.1x^5-0.68x^4+10000x^2+2)$ 

計算一階迴歸線

一階迴歸線的方程式為 $(y=ax+b)$,使用 numpy 計算出來的結果為
$(a*x + b=-258285.54*x + 8838090.57)$

import numpy as np
import pylab as plt
x=np.linspace(0,100, 10000000)
y=0.001*(x**6)-0.1*(x**5)-0.68*(x**4)+10000*(x**2)+2
plt.plot(x,y)

args=np.polyfit(x, y, 1)
f=np.poly1d(args)
plt.title(f'Regression: a*x+b\na={p[0]:.2f}, b={p[1]:.2f}')
plt.plot(x, f(x), 'g')
plt.show()

上述代碼執行後的結果如下

計算四階迴歸線

當 args=np.polyfit(x, y, 4) 作 4 階迴歸線時,迴歸線就愈接近實際值,列印的 args 值如下

[ 1.52290916e+01 -2.62626265e+03 1.31212114e+05 -1.92640644e+06
6.85425254e+06]

計算六階迴歸線

當 args=np.polyfit(x, y, 6) 作 6 階迴歸線時,迴歸線等於實際值。所得到的 args 值也幾乎等於方程式的每個參數。所以迴歸線就是使用每個點的值反推其原方程式。

[ 1.00000000e-03 -1.00000000e-01 -6.80000000e-01 -2.15229667e-11
1.00000000e+04 4.93033525e-10 2.00000004e+00]

計算八階迴歸線

args=np.polyfit(x, y, 8) 作 8 階迴歸線時,迴歸線也是等於實際值。但 args 多了二個 0 值。此時不但沒有提高精準度,反而耗時計算二個沒用的值,稱為 “過度擬合”。 

列印的參數如下

[ 3.51955078e-20 -1.32903585e-17 1.00000000e-03 -1.00000000e-01
-6.80000000e-01 -1.64237572e-10 1.00000000e+04 -6.90394067e-09
1.99999997e+00]

迴歸線或是PCA

損失函數的定義是 “實際值與預測值的差平方總合”。預測值如果是迴歸線產生出來的,那就是在計算迴歸線。但如果預測值是由 PCA 產生的,那就是計算 PCA 線。

實際與預測的最小值,不是只有這二種,還有其它的,以後還會有更多的理論。

一元一階梯度下降

損失函數偏微分的說明,一階迴歸線的公式是 $(y=ax+b)$,
對 a 的偏微分是 $(=2\sum_{i=1}^{n}(\tilde{y_i}-y_i)x_i^1=2\sum_{i=1}^{n}(\tilde{y_i}-y_i)x_i)$
對 b 的偏微分是 $(=2\sum_{i=1}^{n}(\tilde{y_i}-y_i)x_i^0=2\sum_{i=1}^{n}(\tilde{y_i}-y_i))$

如果要儲存動畫,需先安裝套件

pip install celluloid
import pylab as plt
import numpy as np
from celluloid import Camera
fig=plt.figure(figsize=(9,6))
ax=plt.axes()
np.random.seed(1)
plt.rcParams['font.sans-serif'] = ['Microsoft JhengHei']

#解決UserWarning: Glyph 8722 (\N{MINUS SIGN}) missing from current font.
plt.rcParams['axes.unicode_minus']=False

n=20
x=np.linspace(-10,10 ,n)
y=0.5*x+3 + np.random.randint(-5,5, n)

#camera=Camera(fig)
epochs=200
a=0
b=0
lr=2.5e-3
# 一階迴歸線的公式 r = ax + b
r = a * x + b
diff=100000
pre_loss=100000
i=1
#for i in range(epochs):
while diff>0.001:
    ax.clear()
    ax.set_xlim(-15, 15)
    ax.set_ylim(-8, 15)
    ax.plot([-15,15], [0,0], c='k', linewidth=0.5)
    ax.plot([0,0], [-8,15], c='k', linewidth=0.5)
    ax.scatter(x, y, c='g')
    f=np.poly1d(np.polyfit(x, y, 1))
    ax.plot(x, f(x), c="b", linewidth=5)

    #開始梯度下降
    da = ((r - y) * (x ** 1)).sum()
    db = ((r - y)).sum()

    a = a - da * lr
    b = b - db * lr

    #計算損失函數的值
    r = a * x + b
    loss=np.square((y-r)).sum()
    diff=pre_loss-loss
    pre_loss=loss
    print(f'epochs : {i}, loss{loss}')
    ax.plot(x, r, c='r', linewidth=1)
    i += 1
    txt = f'逼近法迴歸函數 y={a:.7f}x+{b:.7f}'
    ax.text(-6, -7, txt, color='red')
    plt.pause(0.01)
#     camera.snap()
# animation=camera.animate()
# animation.save("loss_1.gif", writer='PillowWriter', fps=10)
plt.show()

Tensorflow版本

import pylab as plt
import numpy as np
import tensorflow as tf
from celluloid import Camera
fig=plt.figure(figsize=(9,6))
ax=plt.axes()
np.random.seed(1)
plt.rcParams['font.sans-serif'] = ['Microsoft JhengHei']

#解決UserWarning: Glyph 8722 (\N{MINUS SIGN}) missing from current font.
plt.rcParams['axes.unicode_minus']=False

n=20
x=np.linspace(-10,10 ,n)
y=0.5*x+3 + np.random.randint(-5,5, n)

#camera=Camera(fig)
a=tf.Variable(0.)#一定要用 tf.float32
b=tf.Variable(0.)
lr=9e-4
# 一階迴歸線的公式 r = ax + b
#r = a * x + b
diff=1
pre_loss=1_000_000_000

epoch=1
while diff>0.001:
    dr=[]
    params=[a,b]
    for i in range(2):
        with tf.GradientTape() as tape:
            r = tf.reduce_sum(tf.pow(y - (a * x + b), 2))
        dr.append(tape.gradient(r, params[i]))
    ax.clear()
    ax.set_xlim(-15, 15)
    ax.set_ylim(-8, 15)
    ax.plot([-15,15], [0,0], c='k', linewidth=0.5)
    ax.plot([0,0], [-8,15], c='k', linewidth=0.5)
    ax.scatter(x, y, c='g')
    f = np.poly1d(np.polyfit(x, y, 1))
    ax.plot(x, f(x), c="b", linewidth=5)

    a = tf.Variable(a- dr[0] * lr)
    b = tf.Variable(b- dr[1] * lr)
    #計算損失的值
    loss = tf.reduce_sum(tf.pow(y - r, 2)).numpy()
    diff=pre_loss-loss
    pre_loss=loss
    print(f'epochs : {epoch}, loss : {loss}, diff:{diff}')
epoch += 1 ax.plot(x, a * x + b, c='r', linewidth=1) txt = f'逼近法迴歸函數 y={a.numpy():.7f}x+{b.numpy():.7f}' ax.text(-6, -7, txt, color='red') plt.pause(0.01) # camera.snap() # animation=camera.animate() # animation.save("loss_1.gif", writer='PillowWriter', fps=10) plt.show()

一元四階梯度下降

損失函數偏微分的說明,四階迴歸線的公式是 $(y=ax^4+bx^3+cx^2+dx+e)$,
對 a 的偏微分是 $(=2\sum_{i=1}^{n}(\tilde{y_i}-y_i)x_i^4)$
對 b 的偏微分是 $(=2\sum_{i=1}^{n}(\tilde{y_i}-y_i)x_i^3)$
對 c 的偏微分是 $(=2\sum_{i=1}^{n}(\tilde{y_i}-y_i)x_i^2)$
對 d 的偏微分是 $(=2\sum_{i=1}^{n}(\tilde{y_i}-y_i)x_i^3)$
對 e 的偏微分是 $(=2\sum_{i=1}^{n}(\tilde{y_i}-y_i)x_i^0)$
梯度下降逼近的代碼如下。

import pylab as plt
import numpy as np
from celluloid import Camera
fig=plt.figure(figsize=(9,6))
ax=plt.axes()
np.random.seed(1)
plt.rcParams['font.sans-serif'] = ['Microsoft JhengHei']

#解決UserWarning: Glyph 8722 (\N{MINUS SIGN}) missing from current font.
plt.rcParams['axes.unicode_minus']=False

n=20
x=np.linspace(-10,10 ,n)
y=0.5*x+3 + np.random.randint(-5,5, n)

#camera=Camera(fig)
epochs=500
a=0
b=0
c=0
d=0
e=0
lr=2.5e-3
r = a * (x ** 4) + b * (x ** 3) + c * (x ** 2) + d * x + e
diff=100000
pre_loss=100000
#for i in range(epochs):
i=1
while diff>0.0001:
    ax.clear()
    ax.set_xlim(-15, 15)
    ax.set_ylim(-8, 15)
    ax.plot([-15,15], [0,0], c='k', linewidth=0.5)
    ax.plot([0,0], [-8,15], c='k', linewidth=0.5)
    ax.scatter(x, y, c='g')
    f=np.poly1d(np.polyfit(x, y, 4))
    ax.plot(x, f(x), c="b", linewidth=5)

    #開始逼近
    da = ((r - y) * (x ** 4)).sum()
    db = ((r - y) * (x ** 3)).sum()
    dc = ((r - y) * (x ** 2)).sum()
    dd = ((r - y) * x).sum()
    de = ((r - y)).sum()

    a = a - da * lr / 1e6
    b = b - db * lr / 1e4
    c = c - dc * lr / 1e3
    d = d - dd * lr / 1e1
    e = e - de * lr / 1e0

    r = a * (x ** 4) + b * (x ** 3)+ c * (x ** 2) +d * x + e
    ax.plot(x, r, c='r', linewidth=1)

    #計算損失函數的值
    loss=np.square(r - y).sum()
    diff=pre_loss-loss
    pre_loss=loss
    print(f'epochs : {i} : loss : {loss}')
    i+=1
    txt = f'逼近法迴歸函數 y={a:.7f}x^4+{b:.7f}x^3+{c:.7f}x^2+{d:.7f}x+{e:.7f}'
    ax.text(-12, -7, txt, color='red')
    #camera.snap()
    plt.pause(0.01)

# animation = camera.animate()
# animation.save('reg_ani.gif', writer='PillowWriter', fps=10)
plt.show()

結果 : 
.....................
epochs : 969 : loss : 119.32315850329866
epochs : 970 : loss : 119.3230572088199
epochs : 971 : loss : 119.32295658324597
epochs : 972 : loss : 119.32285662215968

Tensorflow 版

import pylab as plt
import numpy as np
import tensorflow as tf
from celluloid import Camera
fig=plt.figure(figsize=(9,6))
ax=plt.axes()
np.random.seed(1)
plt.rcParams['font.sans-serif'] = ['Microsoft JhengHei']

#解決UserWarning: Glyph 8722 (\N{MINUS SIGN}) missing from current font.
plt.rcParams['axes.unicode_minus']=False

n=20
x=np.linspace(-10,10 ,n)
y=0.5*x+3 + np.random.randint(-5,5, n)

#camera=Camera(fig)
a=tf.Variable(0.)#一定要用 tf.float32
b=tf.Variable(0.)
c=tf.Variable(0.)
d=tf.Variable(0.)
e=tf.Variable(0.)
lr=9e-4
# 一階迴歸線的公式 r = ax + b
#r = a * x**x + b * x**3 +c *x**2 +d * x +e
diff=1
pre_loss=1e9
epoch=1
while diff>1e-6:
    dr=[]
    params=[a,b,c,d,e]
    for i in range(5):
        with tf.GradientTape() as tape:
            r = tf.reduce_sum(tf.pow(y - ( a * x**4 + b * x**3 +c * x**2 +d * x +e), 2))
        dr.append(tape.gradient(r, params[i]))
    ax.clear()
    ax.set_xlim(-15, 15)
    ax.set_ylim(-8, 15)
    ax.plot([-15,15], [0,0], c='k', linewidth=0.5)
    ax.plot([0,0], [-8,15], c='k', linewidth=0.5)
    ax.scatter(x, y, c='g')
    f = np.poly1d(np.polyfit(x, y, 4))
    ax.plot(x, f(x), c="b", linewidth=5)

    a = tf.Variable(a - dr[0] * lr/1e6)
    b = tf.Variable(b - dr[1] * lr/1e5)
    c = tf.Variable(c - dr[2] * lr/1e3)
    d = tf.Variable(d - dr[3] * lr/1e1)
    e = tf.Variable(e - dr[4] * lr)
    #計算損失的值
    loss = tf.reduce_sum(tf.pow(y - r, 2)).numpy()/1e6
    diff=pre_loss-loss
    pre_loss=loss
    print(f'epochs : {epoch}, loss : {loss}, diff:{diff}')
epoch += 1 ax.plot(x, a * x**4 + b * x**3 +c *x**2 +d * x +e, c='r', linewidth=1) txt = f'逼近法迴歸函數 y={a.numpy():.7f}x^4+{b.numpy():.7f}x^3+{c.numpy():.7f}x^2+{d.numpy():.7f}x+{e.numpy():.7f}' ax.text(-12, -7, txt, color='red') plt.pause(0.01) # camera.snap() # animation=camera.animate() # animation.save("loss_1.gif", writer='PillowWriter', fps=10) plt.show()

新版程式

import pylab as plt
import numpy as np
import tensorflow as tf
fig=plt.figure(figsize=(9,6))
ax=plt.axes()
np.random.seed(1)
#plt.rcParams['font.sans-serif'] = ['Microsoft JhengHei']

#解決UserWarning: Glyph 8722 (\N{MINUS SIGN}) missing from current font.
#plt.rcParams['axes.unicode_minus']=False

n=20
x=np.linspace(-10,10 ,n)
y=0.5*x+3 + np.random.randint(-5,5, n)
a=tf.Variable(0.)
b=tf.Variable(0.)
c=tf.Variable(0.)
d=tf.Variable(0.)
e=tf.Variable(0.)
lr=9e-4
diff=1
pre_loss=1e9
p = ["a", "b", "c", "d", "e"]
l=len(p)
str_f=""
for i in range(l):
    str_f+=f'{p[i]}*x**{l-1-i}+'
str_f=str_f[:-1]
str_loss=f"tf.reduce_sum(tf.square(y-({str_f})))"

print(str_loss)
epoch=0
while diff>1e-6:
    ax.clear()
    ax.set_xlim(-15, 15)
    ax.set_ylim(-8, 15)
    ax.plot([-15,15], [0,0], c='k', linewidth=0.5)
    ax.plot([0,0], [-8,15], c='k', linewidth=0.5)
    ax.scatter(x, y, c='g')
    f = np.poly1d(np.polyfit(x, y, 4))
    ax.plot(x, f(x), c="b", linewidth=5)
    dr=[]
    params=[a,b,c,d,e]
    for i in range(len(params)):
        with tf.GradientTape() as tape:
            r=eval(str_loss)
        dr.append(tape.gradient(r, params[i]))
    a = tf.Variable(a - dr[0] * lr/1e6)
    b = tf.Variable(b - dr[1] * lr/1e5)
    c = tf.Variable(c - dr[2] * lr/1e3)
    d = tf.Variable(d - dr[3] * lr/1e1)
    e = tf.Variable(e - dr[4] * lr)
    #計算損失的值
    loss = tf.reduce_sum(tf.pow(y - r, 2)).numpy()/1e6
    diff=pre_loss-loss
    pre_loss=loss
    print(f'epochs : {epoch}, loss : {loss}, diff:{diff}')
    epoch += 1
    ax.plot(x,  eval(str_f), c='r', linewidth=1)
    print(f'epoch:{epoch}')

    plt.pause(0.1)
plt.show()

二元損失函數

二元方程式 $(z=\frac{z=0.006x^6-0.005y^6+0.5x^5-0.1y^5+0.005x^4+0.003y^4}{10000000})$

二元一階迴歸(面)線為 $(y=ax+by+c)$,

下圖的迴歸面, a = 0.9,b =1.8,c = 200,怎麼算出來的? 那是本人瞎掰的,最終還是要使用梯度下降法來求。

import plotly
import plotly.graph_objs as go
import numpy as np
import pandas as pd
display=pd.options.display
display.max_columns=None
display.max_rows=None
display.width=None
display.max_colwidth=None
n=100
z=np.zeros([n, n])
for x in range(n):
    for y in range(n):
        z[x, y] = (0.0006 * x ** 6 - 0.005 * y ** 6 + 0.5 * x ** 5 - 0.1 * y ** 5 + 0.005 * x ** 4 + 0.003 * y ** 4) / 10000000
#z 必需是二維的陣列或 list
trace1=go.Surface(z=z)

a=0.9
b=1.8
c=-200
for x in range(n):
    for y in range(n):
        z[x, y] = a*x+b*y+c
trace2=go.Surface(z=z, coloraxis='coloraxis')
data=[trace1, trace2]
layout=go.Layout(title=f'Regression : a*x+b*y+c', autosize=True,margin=dict(l=0, r=0, b=0, t=40))
fig=go.Figure(data=data, layout=layout)
fig.show()

地型圖

地型圖也可以用 2 元 n 階方程式來表示,但方程式非常猙獰且複雜,所以只好使用這些點產生一個回歸面,再由 (x, y) 算出這個迴歸面的值來預估。展示圖片的代碼如下

import plotly
import plotly.graph_objs as go
import numpy as np
import pandas as pd

df = pd.read_csv(
    'https://raw.githubusercontent.com/plotly/datasets/master/api_docs/mt_bruno_elevation.csv',
    dtype='float')

trace1=go.Surface(z=df.values)
n=df.shape[0]
z=np.zeros([n, n])
a=0.9
b=1.8
c=50
for x in range(n):
    for y in range(n):
        z[x, y] = a*x+b*y+c
trace2=go.Surface(z=z, coloraxis='coloraxis1')
data=[trace1, trace2]
layout=go.Layout(title=f'Regression : a*x+b*y+c', autosize=True,margin=dict(l=0, r=0, b=0, t=40))
fig=go.Figure(data=data, layout=layout)
fig.show()

多元迴歸線

其實並沒有迴歸面這個名詞,我們統一稱為 “多元迴歸線”。多元迴歸線的公式為

$(f(x_1, x_2, x_3, x_4…)=ax_1+bx_2+cx_3+dx_4+……)$

todo

後續工作

1. a, b, c,…. 的初始值該如何定才能加速收斂。
2. 如何加速梯度下降。
3. 學習率該如何定。
4. 要訓練幾次(epochs)。

 

發佈留言

發佈留言必須填寫的電子郵件地址不會公開。 必填欄位標示為 *