1. Relu (Rectified Linear Unit)
    R e l u ( x ) = m a x ( 0 , x ) Relu(x)=max(0,x) Relu(x)=max(0,x)
from torch import nn
import torch
import matplotlib
matplotlib.use('agg')
import matplotlib.pyplot as plt

func = nn.ReLU()
x = torch.arange(start=-2, end=2, step=0.01)
y = func(x)
plt.plot(x.numpy(), y.numpy())
plt.title("relu")
plt.savefig("relu.png")

在这里插入图片描述

2.Sigmoid
S i g m o i d ( x ) = 1 1 + e − x Sigmoid(x)=\frac{1}{1+e^{−x}} Sigmoid(x)=1+ex1

func = nn.Sigmoid()
x = torch.arange(start=-10, end=10, step=0.01)
y = func(x)
plt.plot(x.numpy(), y.numpy())
plt.title("sigmoid")
plt.savefig("sigmoid.png")

在这里插入图片描述

3.Tanh
s i n h ( x ) = e x − e − x 2 sinh(x)=\frac{e^x−e^{−x}}{2} sinh(x)=2exex
c o s h ( x ) = e x + e − x 2 cosh(x)=\frac{e^x+e^{−x}}{2} cosh(x)=2ex+ex
t a n h ( x ) = s i n h ( x ) c o s h ( x ) tanh(x)=sinh(x)cosh(x) tanh(x)=sinh(x)cosh(x)

func = nn.Tanh()
x = torch.arange(start=-10, end=10, step=0.01)
y = func(x)
plt.plot(x.numpy(), y.numpy())
plt.title("tanh")
plt.savefig("tanh.png")

在这里插入图片描述

4.Silu(Sigmoid Linear Unit) or Swish

The SiLU activation function was introduced in "Gaussian Error Linear Units (GELUs)"Hendrycks et al. 2016and "Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning"Elfwing et al. 2017and was independently discovered (and called swish) in "Searching for Activation Functions"Ramachandran et al. 2017

s i l u ( x ) = x ∗ s i g m o i d ( x ) = x 1 + e − x silu(x)=x∗sigmoid(x)=\frac{x}{1+e^{−x}} silu(x)=xsigmoid(x)=1+exx

func = nn.Sigmoid()
x = torch.arange(start=-10, end=10, step=0.01)
y = func(x) * x
plt.plot(x.numpy(), y.numpy())
plt.title("silu")
plt.savefig("silu.png")

在这里插入图片描述

5.Gelu(Gaussian Error Linear Units)

G e l u ( x ) = x P ( X ≤ x ) = x Φ ( x ) = x ⋅ 1 2 [ 1 + e r f ( x 2 ) ] Gelu(x)=xP(X≤x)=xΦ(x)=x⋅\frac{1}{2}[1+erf(\frac{x}{\sqrt{2}})] Gelu(x)=xP(Xx)=xΦ(x)=x21[1+erf(2 x)]
其中, Φ(x) 是高斯分布的累计分布函数;其中误差函数
e r f ( x ) = 2 π ∫ 0 z e − t 2 d t erf(x)=\frac{2}{\sqrt{π}} \int_{0}^{z} e^{−t^2}dt erf(x)=π 20zet2dt
gelu在精度要求不高的情况下,可用下列函数估计
G e l u ( x ) ≈ 0.5 x ( 1 + t a n h [ 2 π ( x + 0.044715 x 3 ) ] ) Gelu(x)≈0.5x(1+tanh[\sqrt{\frac{2}{\pi}}(x+0.044715x^3)]) Gelu(x)0.5x(1+tanh[π2 (x+0.044715x3)])
或者
G e l u ( x ) ≈ x ⋅ s i g m o i d ( 1.702 x ) Gelu(x)≈x⋅sigmoid(1.702x) Gelu(x)xsigmoid(1.702x)

func = torch.nn.functional.gelu
x = torch.arange(start=-10, end=10, step=0.01)
y = func(x) 
plt.plot(x.numpy(), y.numpy())
plt.title("gelu")
plt.savefig("gelu.png")

在这里插入图片描述

6.Mish (A Self Regularized Non-Monotonic Activation Function)

m i s h ( x ) = x ⋅ t a n h ( s o f t p l u s ( x ) ) = x ⋅ t a n h ( l n ( 1 + e x ) ) mish(x)=x \cdot tanh(softplus(x))=x \cdot tanh(ln(1+e^x)) mish(x)=xtanh(softplus(x))=xtanh(ln(1+ex))

def mish(x):
    return x * torch.tanh(torch.nn.functional.softplus(x))
x = torch.arange(start=-10, end=10, step=0.01)
y = mish(x) 
plt.plot(x.numpy(), y.numpy())
plt.title("mish")
plt.savefig("mish.png")

在这里插入图片描述

最后,给出Relu,Swish,Mish曲线之间的对比图
在这里插入图片描述

Logo

旨在为数千万中国开发者提供一个无缝且高效的云端环境,以支持学习、使用和贡献开源项目。

更多推荐