標量-Scalar

標量 (scalar) 就是只有一個數字，不屬於陣列，又稱為 0 維數組，也就是所謂的純量，請查看如下執行結果。

沒有 “[]” 包含的，就是標量。

import torch
print('0 維       : 標量 ',torch.tensor(5.))
print('1 維 0 元素 : 非標量',torch.tensor([]))
print('1 維 1 元素 : 非標量',torch.tensor([5.]))
print('1 維 2 元素 : 非標量', torch.tensor([5., 6.]))

結果:
0維        : 標量  tensor(5.)
1維 0 元素 : 非標量 tensor([])
1維 1 元素 : 非標量 tensor([5.])
1維 2 元素 : 非標量 tensor([5., 6.])

簡易微分

假設 $(f(x)=0.6*x^2+5)$，微分函數為 $(f'(x)=0.6*2x )$。當 x = 2 時，$(f'(2)= 0.6 *2 * 2 = 2.4 )$，也就是說，x = 2 時的斜率是 2.4。

因為要對 x 微分，所以 x 變數宣告為張量，requires_grad = True 是跟 torch 說這是要進行追蹤(微分)的變數，然後用 y.backward() 開始計算微分，最後再由 x.grad 取出斜率。

請特別注意，x 值必需是小數才能計算微分，整數是無法計算的。

import torch
x=torch.tensor(2., requires_grad = True)
y=0.6 * x.square() + 5
y.backward()
print(x.grad)

計算多個 x 位置的微分

上述公式中，如果要計算 x 分別在 [1, 2, 3, 4] 這 4 處的斜率，張量宣告如下。

x = torch.tensor([1., 2., 3., 4.], requires_grad = True)

微分後 f'(x) 會產生 4 個值，分別為 [1.2000, 2.4000, 3.6000, 4.8000]，結果是 “非標量”，所以在 y.backward() 時需給定 4 個位置來儲存微分後的值，指定 4 個存放位置的方式如下。

y = x.square()
y.backward(gradient=torch.ones([x.shape[0]]))

4 個存放位置的初始值必需為 1，因為微分後的結果會跟存放的值相乘，所以要用 torch.ones()，不能使用 torch.zeros()。

另請注意，x.shape 是二維的，要取得其大小，必需是 x.shape[0]

微分後的結果若不是標量，而 y.backward() 又沒有指定存放的位置，則會出現只允許標量(scalar)輸出的錯誤。

網路上一些教學使用 from torch.autograd import Variable，把非標量放在 Variable 中，這個方法已經被捨棄掉了，請不要用。

RuntimeError: grad can be implicitly created only for scalar outputs

完整代碼如下

import torch
x = torch.tensor([1.,2.,3.,4.], requires_grad=True)
y = x.square()
y.backward(torch.ones(x.shape[0]))
print(x.grad)
結果 : 
tensor([2., 4., 6., 8.])

$(\sum)$微分

假設 $(f(x)=\sum_{i=1}^{4}a_ix^2+b_i )$，a 是 [0.1, 0.2, 0.3, 0.4] 的集合，b 是 [1, 2, 3, 4] 的集合。

$(f'(x)=\sum_{i=1}^{4}2a_ix = 2(0.1+0.2+0.3+0.4)x=2x)$，所以當 x = 2 時的微分值是 2*2 = 4，相關代碼如下

import torch
a=torch.tensor([0.1, 0.2, 0.3, 0.4])
b=torch.tensor([1., 2., 3., 4.])

x=torch.tensor([2.], requires_grad=True)
y=(a*x.square()+b).sum()
y.backward()
print(x.grad)

結果:
tensor([4.])

偏微分

假設公式為 $(f(x,y)=\sum_{i=1}^{n}a_ix^2+b_iy^2)$，a 是 [0.1, 0.2, 0.3, 0.4] 的集合，b 是 [1, 2, 3, 4] 的集合。

對 x 的偏微分為 $(\frac{f(x, y)}{\sigma x}=\sum_{i=1}^{4}2a_ix)$

對 y 的偏微分為 $(\frac{f(x, y)}{\sigma y}=\sum_{i=1}^{4}2b_iy)$

import torch
a=torch.tensor([0.1, 0.2, 0.3, 0.4])
b=torch.tensor([1., 2., 3., 4.])

x=torch.tensor(2., requires_grad=True)
y=torch.tensor(3., requires_grad=True)
f=(a*x.square()+b*y.square()).sum()
f.backward()
print("x偏微階 :", x.grad)
print("y偏微階 :", y.grad)
結果:
x偏微階 : tensor(4.)
y偏微階 : tensor(60.)

梯度下降

假設要使用梯度下降計算 $(y=x^2)$ 的極小值，x 一開始由 -5 開始逼近，經由基礎梯度下降的說明，每次逼近的公式為$(x_{t+1}=x_{t} – f'(x_t) * lr)$

先定義 x 張量，切記一定要寫 5. ，因為只有小數才能進行微分。

x=torch.tensor([-5.], requires_grad=True)

定義公式 $(y = x^2)$，這個公式在每次 x 變更時都需重新定義，所以一定要寫在迴圈中，然後用 y.backward() 進行微分。

因為 x 是標量，所以 y.backward() 不需給存放的位置

for i in range(epochs): 
    y=x.square() 
    y.backward()

接下來逼近 x ，但請注意 x 值只要一改變，就會被自動追蹤計算的過程，所以手動更改 x 值時，一定要放在 with torch.no_grad() 區塊中，這個目的是在告知 pytorch，區塊中的計算請不要追蹤。

如果不放在區塊中，只能重定 x = torch.tensor([ x – x.grad * lr], requires_grad = True)，但這會影響效能。

另外在區塊中，只能寫 x -= …. ，如果寫成 x = x – x.grad * lr，是會出現錯誤的。

with torch.no_grad():
    x -= x.grad * lr#不可以寫成 x = x - x.grad *0.01

最後一定要清除 x.grad，如果不清除，前後的結果會累加進去

    x.grad.zero_()

完整代碼如下

import torch
epochs=1000
lr=0.01
x=torch.tensor([-5.], requires_grad=True)
for i in range(epochs):
    y=x.square()
    y.backward()
    with torch.no_grad():#torch會自動追蹤x的任何計算，所以要手動調整x時，要用 no_grad 取消追蹤
        x -= x.grad * lr#不可以寫成 x = x - x.grad *0.01
    x.grad.zero_()#最後一定要清除 x.grad, 不然會累加上去
    print(x)
結果 : 
........
tensor([-9.1231e-09], requires_grad=True)
tensor([-8.9406e-09], requires_grad=True)
tensor([-8.7618e-09], requires_grad=True)
tensor([-8.5866e-09], requires_grad=True)
tensor([-8.4148e-09], requires_grad=True)

detach

torch 的張量模式有時需轉成 numpy 格式進行計算或繪圖，此時可以用 .numpy() 進行轉換。

但張量若宣告為 requires_grad = True 時，在這狀態下處於追蹤模式，是不能轉成 numpy 格式的，必需使用 detach() 暫時分離追蹤模式。

如下代碼，x 宣告為追蹤模式，y 也隨之變成追蹤模式，所以都要先 detach 後才能轉成 numpy 格式，z 就不用分離了。

import torch
x=torch.tensor([1.,2.,3.,4.], requires_grad=True)
y=x.square()
z=torch.tensor([1.,2.,3.,4.])
print(x.detach().numpy())
print(y.detach().numpy())
print(z.numpy())

結果 :
[1. 2. 3. 4.]
[ 1.  4.  9. 16.]
[1. 2. 3. 4.]

torch.no_grad區塊

在 torch.no_grad 區塊是屬於分離狀態的，所以可以直接使用 numpy() 轉換，但如果脫離了區塊範圍，就要加上 detach() 方法

底下代碼藍色部份，分別是區塊內及區塊外的使用方法。

import torch
epochs=1000
lr=0.01
x=torch.tensor([-5.], requires_grad=True)
for i in range(epochs):
    y=torch.square(x)
    y.backward()
    with torch.no_grad():
        x -= x.grad * lr#不可以寫成 x = x - x.grad *0.01
        x.grad.zero_()#最後一定要清除 x.grad, 不然會累加上去
        print('區塊內 : ', x.numpy()[0])
    print('區塊外 : ', x.detach().numpy()[0])
結果 : 
區塊內 :  -8.7618e-09
區塊外 :  -8.7618e-09
區塊內 :  -8.586564e-09
區塊外 :  -8.586564e-09
區塊內 :  -8.414833e-09
區塊外 :  -8.414833e-09

GPU運算

如果交由 GPU運算，要將 GPU 裏的變數轉成 numpy，需使用 .cpu() 將變數 copy 到主機版的 Ram, 後再轉成 numpy()。底下代碼藍色部份，分別是區塊內及區塊外的使用方法。

import torch
epochs=1000
lr=0.01
#x=torch.tensor([-5.], requires_grad=True, device="cuda")
x=torch.tensor([-5.], requires_grad=True)
for i in range(epochs):
    y=torch.square(x)
    y.backward()
    with torch.no_grad():
        x -= x.grad * lr#不可以寫成 x = x - x.grad *0.01
        x.grad.zero_()#最後一定要清除 x.grad, 不然會累加上去
        print('區塊內 : ', x.cpu().numpy()[0])
    print('區塊外 : ', x.cpu().detach().numpy()[0])
結果 : 
區塊內 :  -8.7618e-09
區塊外 :  -8.7618e-09
區塊內 :  -8.586564e-09
區塊外 :  -8.586564e-09
區塊內 :  -8.414833e-09
區塊外 :  -8.414833e-09

Pytorch 微分