Tensorflow中k.gradients()和tf.stop_gradient()的深入理解

码农的科研笔记

9890人浏览 · 2019-05-19 18:26:26

码农的科研笔记 · 2019-05-19 18:26:26 发布

上周在实验室开荒某个代码，看到中间这么一段，对Tensorflow中的stop_gradient()还不熟悉，特此周末进行重新并总结。

    y = xx + K.stop_gradient(rounded - xx)

这代码最终调用位置在tensoflow.python.ops.gen_array_ops.stop_gradient(input, name=None)，关于这段代码为什么这样写的意义在文末给出。

【stop_gradient()意义】

用stop_gradient生成损失函数w.r.t.的梯度。

【tf.gradients()理解】

tf中我们只需要设计我们自己的函数，tf提供提供强大的自动计算函数梯度方法，tf.gradients()。

tf.gradients(
    ys,
    xs,
    grad_ys=None,
    name='gradients',
    colocate_gradients_with_ops=False,
    gate_gradients=False,
    aggregation_method=None,
    stop_gradients=None,
    unconnected_gradients=tf.UnconnectedGradients.NONE
)

gradients() adds ops to the graph to output the derivatives of ys with respect to xs. It returns a list of Tensor of length len(xs) where each tensor is the sum(dy/dx) for y in ys.

tf.gradients()实现ys对xs的求导
ys和xs可以是Tensor或者list包含的Tensor
求导返回值是一个list，list的长度等于len(xs)

eg.假设返回值是[grad1, grad2, grad3]，ys=[y1, y2]，xs=[x1, x2, x3]。则计算过程为:

$grad1 = \frac {dy_{1}}{dx_{1}} + \frac {dy_{2}}{dx_{1}}$ ， $grad2 = \frac {dy_{1}}{dx_{2}} + \frac {dy_{2}}{dx_{2}}$ ， $grad3 = \frac {dy_{1}}{dx_{3}} + \frac {dy_{2}}{dx_{3}}$

import numpy as np
import tensorflow as tf

#构造数据集
x_pure = np.random.randint(-10, 100, 32)
x_train = x_pure + np.random.randn(32) / 32
y_train = 3 * x_pure + 2 + np.random.randn(32) / 32

x_input = tf.placeholder(tf.float32, name='x_input')
y_input = tf.placeholder(tf.float32, name='y_input')
w = tf.Variable(2.0, name='weight')
b = tf.Variable(1.0, name='biases')
y = tf.add(tf.multiply(x_input, w), b)

loss_op = tf.reduce_sum(tf.pow(y_input - y, 2)) / (2 * 32)
train_op = tf.train.GradientDescentOptimizer(0.01).minimize(loss_op)
gradients_node = tf.gradients(loss_op, w)

sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)

for i in range(20):
    _, gradients, loss = sess.run([train_op, gradients_node, loss_op], feed_dict={x_input: x_train[i], y_input: y_train[i]})
    print("epoch: {} \t loss: {} \t gradients: {}".format(i, loss, gradients))
sess.close()

自定义梯度和更新函数

import numpy as np
import tensorflow as tf

#构造数据集
x_pure = np.random.randint(-10, 100, 32)
x_train = x_pure + np.random.randn(32) / 32
y_train = 3 * x_pure + 2 + np.random.randn(32) / 32

x_input = tf.placeholder(tf.float32, name='x_input')
y_input = tf.placeholder(tf.float32, name='y_input')
w = tf.Variable(2.0, name='weight')
b = tf.Variable(1.0, name='biases')
y = tf.add(tf.multiply(x_input, w), b)

loss_op = tf.reduce_sum(tf.pow(y_input - y, 2)) / (2 * 32)
# train_op = tf.train.GradientDescentOptimizer(0.01).minimize(loss_op)

#自定义权重更新
grad_w, grad_b = tf.gradients(loss_op, [w, b])
new_w = w.assign(w - 0.01 * grad_w)
new_b = b.assign(b - 0.01 * grad_b)

init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)

for i in range(20):
    _, gradients, loss = sess.run([new_w, new_b, loss_op], feed_dict={x_input: x_train[i], y_input: y_train[i]})
    print("epoch: {} \t loss: {} \t gradients: {}".format(i, loss, gradients))
sess.close()

【tf.stop_gradient()理解】

在tf.gradients()参数中存在stop_gradients，这是一个List，list中的元素是tensorflow graph中的op，一旦进入这个list，将不会被计算梯度，更重要的是，在该op之后的BP计算都不会运行。

import numpy as np
import tensorflow as tf

a = tf.constant(0.)
b = 2 * a
c = a + b
g = tf.gradients(c, [a, b])

with tf.Session() as sess:
    tf.global_variables_initializer().run()
    print(sess.run(g))

#输出[3.0, 1.0]

在用一个stop_gradient()的例子

import tensorflow as tf

#实验一
w1 = tf.Variable(2.0)
w2 = tf.Variable(2.0)
a = tf.multiply(w1, 3.0)
a_stoped = tf.stop_gradient(a)

# b=w1*3.0*w2
b = tf.multiply(a_stoped, w2)
gradients = tf.gradients(b, xs=[w1, w2])
print(gradients)
#输出[None, <tf.Tensor 'gradients/Mul_1_grad/Reshape_1:0' shape=() dtype=float32>]

#实验二
a = tf.Variable(1.0)
b = tf.Variable(1.0)
c = tf.add(a, b)
c_stoped = tf.stop_gradient(c)
d = tf.add(a, b)
e = tf.add(c_stoped, d)
gradients = tf.gradients(e, xs=[a, b])
with tf.Session() as sess:
    tf.global_variables_initializer().run()
    print(sess.run(gradients))

#因为梯度从另外地方传回，所以输出 [1.0, 1.0]

【答案】

开始提出的问题，为什么存在那段代码：

t = g(x)
y = t + tf.stop_gradient(f(x) - t)

这里，我们本来的前向传递函数是XX，但是想要在反向时传递的函数是g(x)，因为在前向过程中，tf.stop_gradient()不起作用，因此+t和-t抵消掉了，只剩下f(x)前向传递；而在反向过程中，因为tf.stop_gradient()的作用，使得f(x)-t的梯度变为了0，从而只剩下g(x)在反向传递。

【参考文献】

【1】利用 tf.gradients 在 TensorFlow 中实现梯度下降

【2】在TensorFlow中自定义梯度的两种方法

【3】tensorflow学习笔记（三十）：tf.gradients 与 tf.stop_gradient() 与高阶导数

GitCode 开源社区

旨在为数千万中国开发者提供一个无缝且高效的云端环境，以支持学习、使用和贡献开源项目。

更多推荐

[转载]在Windows环境下安装GNU Radio

转自：在Windows环境下安装GNURadio_恐弱智_新浪博客GNU Radio是用Python开发的，大部分开源的工程能够在Linux环境下运行良好，而Windows下却运行的很勉强，而且安装配置都很复杂。GNU Radio算是个例外了，不光提供了Windows的二进制安装，还有比较详细的说明。我是Python小白，所以折腾了好久才弄好，特意记录下来，免得以后再装还折腾。GNU Radio的