激活函数 Relu,Gelu,Mish,SiLU,Swish,Tanh,Sigmoid
- Relu (Rectified Linear Unit)
R e l u ( x ) = m a x ( 0 , x ) Relu(x)=max(0,x) Relu(x)=max(0,x)
from torch import nn
import torch
import matplotlib
matplotlib.use('agg')
import matplotlib.pyplot as plt
func = nn.ReLU()
x = torch.arange(start=-2, end=2, step=0.01)
y = func(x)
plt.plot(x.numpy(), y.numpy())
plt.title("relu")
plt.savefig("relu.png")
2.Sigmoid
S
i
g
m
o
i
d
(
x
)
=
1
1
+
e
−
x
Sigmoid(x)=\frac{1}{1+e^{−x}}
Sigmoid(x)=1+e−x1
func = nn.Sigmoid()
x = torch.arange(start=-10, end=10, step=0.01)
y = func(x)
plt.plot(x.numpy(), y.numpy())
plt.title("sigmoid")
plt.savefig("sigmoid.png")
3.Tanh
s
i
n
h
(
x
)
=
e
x
−
e
−
x
2
sinh(x)=\frac{e^x−e^{−x}}{2}
sinh(x)=2ex−e−x
c
o
s
h
(
x
)
=
e
x
+
e
−
x
2
cosh(x)=\frac{e^x+e^{−x}}{2}
cosh(x)=2ex+e−x
t
a
n
h
(
x
)
=
s
i
n
h
(
x
)
c
o
s
h
(
x
)
tanh(x)=sinh(x)cosh(x)
tanh(x)=sinh(x)cosh(x)
func = nn.Tanh()
x = torch.arange(start=-10, end=10, step=0.01)
y = func(x)
plt.plot(x.numpy(), y.numpy())
plt.title("tanh")
plt.savefig("tanh.png")
4.Silu(Sigmoid Linear Unit) or Swish
The SiLU activation function was introduced in "Gaussian Error Linear Units (GELUs)"Hendrycks et al. 2016and "Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning"Elfwing et al. 2017and was independently discovered (and called swish) in "Searching for Activation Functions"Ramachandran et al. 2017
s i l u ( x ) = x ∗ s i g m o i d ( x ) = x 1 + e − x silu(x)=x∗sigmoid(x)=\frac{x}{1+e^{−x}} silu(x)=x∗sigmoid(x)=1+e−xx
func = nn.Sigmoid()
x = torch.arange(start=-10, end=10, step=0.01)
y = func(x) * x
plt.plot(x.numpy(), y.numpy())
plt.title("silu")
plt.savefig("silu.png")
5.Gelu(Gaussian Error Linear Units)
G
e
l
u
(
x
)
=
x
P
(
X
≤
x
)
=
x
Φ
(
x
)
=
x
⋅
1
2
[
1
+
e
r
f
(
x
2
)
]
Gelu(x)=xP(X≤x)=xΦ(x)=x⋅\frac{1}{2}[1+erf(\frac{x}{\sqrt{2}})]
Gelu(x)=xP(X≤x)=xΦ(x)=x⋅21[1+erf(2x)]
其中, Φ(x) 是高斯分布的累计分布函数;其中误差函数
e
r
f
(
x
)
=
2
π
∫
0
z
e
−
t
2
d
t
erf(x)=\frac{2}{\sqrt{π}} \int_{0}^{z} e^{−t^2}dt
erf(x)=π2∫0ze−t2dt
gelu在精度要求不高的情况下,可用下列函数估计
G
e
l
u
(
x
)
≈
0.5
x
(
1
+
t
a
n
h
[
2
π
(
x
+
0.044715
x
3
)
]
)
Gelu(x)≈0.5x(1+tanh[\sqrt{\frac{2}{\pi}}(x+0.044715x^3)])
Gelu(x)≈0.5x(1+tanh[π2(x+0.044715x3)])
或者
G
e
l
u
(
x
)
≈
x
⋅
s
i
g
m
o
i
d
(
1.702
x
)
Gelu(x)≈x⋅sigmoid(1.702x)
Gelu(x)≈x⋅sigmoid(1.702x)
func = torch.nn.functional.gelu
x = torch.arange(start=-10, end=10, step=0.01)
y = func(x)
plt.plot(x.numpy(), y.numpy())
plt.title("gelu")
plt.savefig("gelu.png")
6.Mish (A Self Regularized Non-Monotonic Activation Function)
m i s h ( x ) = x ⋅ t a n h ( s o f t p l u s ( x ) ) = x ⋅ t a n h ( l n ( 1 + e x ) ) mish(x)=x \cdot tanh(softplus(x))=x \cdot tanh(ln(1+e^x)) mish(x)=x⋅tanh(softplus(x))=x⋅tanh(ln(1+ex))
def mish(x):
return x * torch.tanh(torch.nn.functional.softplus(x))
x = torch.arange(start=-10, end=10, step=0.01)
y = mish(x)
plt.plot(x.numpy(), y.numpy())
plt.title("mish")
plt.savefig("mish.png")
最后,给出Relu,Swish,Mish曲线之间的对比图
更多推荐
所有评论(0)