TensorFlow学习（四）：优化器Optimizer

tensorflow

一个面向所有人的开源机器学习框架

项目地址：https://gitcode.com/gh_mirrors/te/tensorflow

免费下载资源

c2a2o2

12112人浏览 · 2017-03-24 16:45:39

c2a2o2 · 2017-03-24 16:45:39 发布

反正是要学一些API的，不如直接从例子里面学习怎么使用API，这样同时可以复习一下一些基本的机器学习知识。但是一开始开始和以前一样，先直接讲类和常用函数用法，然后举例子。
这里主要是各种优化器，以及使用。因为大多数机器学习任务就是最小化损失，在损失定义的情况下，后面的工作就交给优化器啦
https://www.tensorflow.org/versions/r0.11/api_docs/python/train.html
从其中讲几个比较常用的，其他的可以自己去看文档。
类：

Optimizer
GradientDescentOptimizer
AdagradOptimizer
AdagradDAOptimizer
MomentumOptimizer
AdamOptimizer
FtrlOptimizer
RMSPropOptimizer

二.常用的optimizer类

Ⅰ.class tf.train.Optimizer

优化器（optimizers）类的基类。这个类定义了在训练模型的时候添加一个操作的API。你基本上不会直接使用这个类，但是你会用到他的子类比如GradientDescentOptimizer, AdagradOptimizer, MomentumOptimizer.等等这些。

这里讲一个大致的使用流程
(可以和后面提供的线性回归例子对比加深理解)

# 代入你需要的参数创建优化器（optimizer），这里以GradientDescentOptimizer类作为例子
opt = GradientDescentOptimizer(learning_rate=0.1)
# 添加操作到图里面通过更新变量（variables）列表来最小化代价。cost是一个tensor，变量列表就是tf.Variable对象列表
opt_op = opt.minimize(cost, var_list=<list of variables>)
 
 1
2
3
4
 
 1
2
3
4

在训练部分中，你只要run那个更新的操作就行了（比如这里是minimize函数返回的这个opt_op）

# Execute opt_op to do one step of training:
opt_op.run()
 
 1
2
 
 1
2

以上就是最基本的的使用框架了，思想其实是很简单的

Processing gradients before applying them.

Calling minimize() takes care of both computing the gradients and applying them to the variables. If you want to process the gradients before applying them you can instead use the optimizer in three steps:

Compute the gradients with compute_gradients().
Process the gradients as you wish.
Apply the processed gradients with apply_gradients().
Example:

# Create an optimizer.
opt = GradientDescentOptimizer(learning_rate=0.1)

# Compute the gradients for a list of variables.
grads_and_vars = opt.compute_gradients(loss, <list of variables>)

# grads_and_vars is a list of tuples (gradient, variable).  Do whatever you
# need to the 'gradient' part, for example cap them, etc.
capped_grads_and_vars = [(MyCapper(gv[0]), gv[1]) for gv in grads_and_vars]

# Ask the optimizer to apply the capped gradients.
opt.apply_gradients(capped_grads_and_vars)
 
 1
2
3
4
5
6
7
8
9
10
11
12
 
 1
2
3
4
5
6
7
8
9
10
11
12

重要函数：
tf.train.Optimizer.__init__(use_locking, name)

作用：创建一个新的优化器（optimizer）,必须通过子类调用，事实上，我们也不会使用这个。
参数：
use_locking: Bool. If True apply use locks to prevent concurrent updates to variables.
name: A non-empty string. The name to use for accumulators created for the optimizer.

tf.train.Optimizer.minimize(loss, global_step=None, var_list=None, gate_gradients=1, aggregation_method=None, colocate_gradients_with_ops=False, name=None, grad_loss=None)

作用：添加操作通过更新变量列表消化损失（loss）.这个函数结合了调用compute_gradients() 和apply_gradients() 这两个函数，如果你想在应用梯度之前计算出来，那么就调用compute_gradients() 然后再自己显式的使用apply_gradients()
参数：
loss: 待小化的值（类型是tesnor）
global_step: Optional Variable to increment by one after the variables have been updated.
var_list: [可选]待更新小化损失的变量列表。默认是图里面GraphKeys.TRAINABLE_VARIABLES 下的变量的列表。
gate_gradients: How to gate the computation of gradients. Can be GATE_NONE, GATE_OP, or GATE_GRAPH.
aggregation_method: Specifies the method used to combine gradient terms. Valid values are defined in the class AggregationMethod.
colocate_gradients_with_ops: If True, try colocating gradients with the corresponding op.
name: Optional name for the returned operation.
grad_loss: Optional. A Tensor holding the gradient computed for loss.
返回:
An Operation that updates the variables in var_list. If global_step was not None, that operation also increments global_step.
Raises:
ValueError: If some of the variables are not Variable objects.

tf.train.Optimizer.compute_gradients(loss, var_list=None, gate_gradients=1, aggregation_method=None, colocate_gradients_with_ops=False, grad_loss=None)

作用：计算损失函数对于各个变量列表中各个变量的梯度。minimize() 的第一部分就是计算梯度的这个函数。返回的是一个(gradient, variable)对的列表当然对于某个给定的变量要是没有梯度的话，就是None 。
参数：
loss: 待小化的值（类型是tesnor）
var_list: [可选]待更新小化损失的变量列表。默认是图里面GraphKeys.TRAINABLE_VARIABLES 下的变量的列表。
gate_gradients: How to gate the computation of gradients. Can be GATE_NONE, GATE_OP, or GATE_GRAPH.
aggregation_method: Specifies the method used to combine gradient terms. Valid values are defined in the class AggregationMethod.
colocate_gradients_with_ops: If True, try colocating gradients with the corresponding op.
grad_loss: Optional. A Tensor holding the gradient computed for loss.

tf.train.Optimizer.apply_gradients(grads_and_vars, global_step=None, name=None)

Apply gradients to variables.This is the second part of minimize(). It returns an Operation that applies gradients.
Args:
grads_and_vars: List of (gradient, variable) pairs as returned by compute_gradients().
global_step: Optional Variable to increment by one after the variables have been updated.
name: Optional name for the returned operation. Default to the name passed to the Optimizer constructor.
Returns:
An Operation that applies the specified gradients. If global_step was not None, that operation also increments global_step.

Gating Gradients

Both minimize() and compute_gradients() accept a gate_gradients argument that controls the degree of parallelism during the application of the gradients.

The possible values are: GATE_NONE, GATE_OP, and GATE_GRAPH.

GATE_NONE: Compute and apply gradients in parallel. This provides the maximum parallelism in execution, at the cost of some non-reproducibility in the results. For example the two gradients of matmul depend on the input values: With GATE_NONE one of the gradients could be applied to one of the inputs before the other gradient is computed resulting in non-reproducible results.

GATE_OP: For each Op, make sure all gradients are computed before they are used. This prevents race conditions for Ops that generate gradients for multiple inputs where the gradients depend on the inputs.

GATE_GRAPH: Make sure all gradients for all variables are computed before any one of them is used. This provides the least parallelism but can be useful if you want to process all gradients before applying any of them.

Slots

Some optimizer subclasses, such as MomentumOptimizer and AdagradOptimizer allocate and manage additional variables associated with the variables to train. These are called Slots. Slots have names and you can ask the optimizer for the names of the slots that it uses. Once you have a slot name you can ask the optimizer for the variable it created to hold the slot value.

This can be useful if you want to log debug a training algorithm, report stats about the slots, etc.

tf.train.Optimizer.get_slot_names()

Return a list of the names of slots created by the Optimizer.

See get_slot().

Returns:

A list of strings.

tf.train.Optimizer.get_slot(var, name)

Return a slot named name created for var by the Optimizer.

Some Optimizer subclasses use additional variables. For example Momentum and Adagrad use variables to accumulate updates. This method gives access to these Variable objects if for some reason you need them.

Use get_slot_names() to get the list of slot names created by the Optimizer.

Args:

var: A variable passed to minimize() or apply_gradients().
name: A string.
Returns:

The Variable for the slot if it was created, None otherwise.

Other Methods

tf.train.Optimizer.get_name()

Ⅱ.class tf.train.GradientDescentOptimizer

这个类是实现梯度下降算法的优化器。

构造函数
tf.train.GradientDescentOptimizer.__init__(learning_rate, use_locking=False, name=’GradientDescent’)

作用：
构造一个新的使用梯度下降算法的优化器（optimizer）

参数：
learning_rate: 一个tensor或者浮点值，表示使用的学习率
use_locking: If True use locks for update operations.
name: 【可选】这个操作的名字，默认是”GradientDescent”

Ⅲ.class tf.train.AdadeltaOptimizer

实现了 Adadelta算法的优化器，可以算是下面的Adagrad算法改进版本

构造函数：
tf.train.AdadeltaOptimizer.init(learning_rate=0.001, rho=0.95, epsilon=1e-08, use_locking=False, name=’Adadelta’)

作用：构造一个使用Adadelta算法的优化器
参数：
learning_rate: tensor或者浮点数，学习率
rho: tensor或者浮点数. The decay rate.
epsilon: A Tensor or a floating point value. A constant epsilon used to better conditioning the grad update.
use_locking: If True use locks for update operations.
name: 【可选】这个操作的名字，默认是”Adadelta”

IV.class tf.train.AdagradOptimizer

Optimizer that implements the Adagrad algorithm.

See this paper.
tf.train.AdagradOptimizer.__init__(learning_rate, initial_accumulator_value=0.1, use_locking=False, name=’Adagrad’)

Construct a new Adagrad optimizer.
Args:

learning_rate: A Tensor or a floating point value. The learning rate.
initial_accumulator_value: A floating point value. Starting value for the accumulators, must be positive.
use_locking: If True use locks for update operations.
name: Optional name prefix for the operations created when applying gradients. Defaults to "Adagrad".

Raises:

ValueError: If the initial_accumulator_value is invalid.

The Optimizer base class provides methods to compute gradients for a loss and apply gradients to variables. A collection of subclasses implement classic optimization algorithms such as GradientDescent and Adagrad.

You never instantiate the Optimizer class itself, but instead instantiate one of the subclasses.

Ⅴ.class tf.train.MomentumOptimizer

Optimizer that implements the Momentum algorithm.

tf.train.MomentumOptimizer.__init__(learning_rate, momentum, use_locking=False, name=’Momentum’, use_nesterov=False)

Construct a new Momentum optimizer.

Args:

learning_rate: A Tensor or a floating point value. The learning rate.
momentum: A Tensor or a floating point value. The momentum.
use_locking: If True use locks for update operations.
name: Optional name prefix for the operations created when applying gradients. Defaults to “Momentum”.

Ⅵ.class tf.train.AdamOptimizer

实现了Adam算法的优化器
构造函数：
tf.train.AdamOptimizer.__init__(learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-08, use_locking=False, name=’Adam’)

Construct a new Adam optimizer.

Initialization:

m_0 <- 0 (Initialize initial 1st moment vector)
v_0 <- 0 (Initialize initial 2nd moment vector)
t <- 0 (Initialize timestep)
The update rule for variable with gradient g uses an optimization described at the end of section2 of the paper:

t <- t + 1
lr_t <- learning_rate * sqrt(1 - beta2^t) / (1 - beta1^t)

m_t <- beta1 * m_{t-1} + (1 - beta1) * g
v_t <- beta2 * v_{t-1} + (1 - beta2) * g * g
variable <- variable - lr_t * m_t / (sqrt(v_t) + epsilon)
The default value of 1e-8 for epsilon might not be a good default in general. For example, when training an Inception network on ImageNet a current good choice is 1.0 or 0.1.

Note that in dense implement of this algorithm, m_t, v_t and variable will update even if g is zero, but in sparse implement, m_t, v_t and variable will not update in iterations g is zero.

Args:

learning_rate: A Tensor or a floating point value. The learning rate.
beta1: A float value or a constant float tensor. The exponential decay rate for the 1st moment estimates.
beta2: A float value or a constant float tensor. The exponential decay rate for the 2nd moment estimates.
epsilon: A small constant for numerical stability.
use_locking: If True use locks for update operations.
name: Optional name for the operations created when applying gradients. Defaults to “Adam”.

三.函数

四.例子

I.线性回归

要是有不知道线性回归的理论知识的，请到
http://blog.csdn.net/xierhacker/article/details/53257748
http://blog.csdn.net/xierhacker/article/details/53261008
熟悉的直接跳过。
直接上代码:

from __future__ import print_function,division
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

# Prepare train data
train_X = np.linspace(-1, 1, 100)
train_Y = 2 * train_X + np.random.randn(*train_X.shape) * 0.33 + 10

# Define the model
X = tf.placeholder("float")
Y = tf.placeholder("float")
w = tf.Variable(0.0, name="weight")
b = tf.Variable(0.0, name="bias")
loss = tf.square(Y - tf.mul(X, w) - b)
train_op = tf.train.GradientDescentOptimizer(0.01).minimize(loss)

# Create session to run
with tf.Session() as sess:
    sess.run(tf.initialize_all_variables())

    epoch = 1
    for i in range(10):
        for (x, y) in zip(train_X, train_Y):
            _, w_value, b_value = sess.run([train_op, w, b],feed_dict={X: x,Y: y})
        print("Epoch: {}, w: {}, b: {}".format(epoch, w_value, b_value))
        epoch += 1


#draw 
plt.plot(train_X,train_Y,"+")
plt.plot(train_X,train_X.dot(w_value)+b_value)
 
 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
 
 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

结果：
这里写图片描述

GitHub 加速计划 / te / tensorflow

184.55 K

74.12 K

下载

一个面向所有人的开源机器学习框架

最近提交(Master分支：2 个月前 )

a49e66f2 PiperOrigin-RevId: 663726708 2 个月前

91dac11a This test overrides disabled_backends, dropping the default value in the process. PiperOrigin-RevId: 663711155 2 个月前

GitCode 开源社区

旨在为数千万中国开发者提供一个无缝且高效的云端环境，以支持学习、使用和贡献开源项目。

更多推荐

[转载]在Windows环境下安装GNU Radio

转自：在Windows环境下安装GNURadio_恐弱智_新浪博客GNU Radio是用Python开发的，大部分开源的工程能够在Linux环境下运行良好，而Windows下却运行的很勉强，而且安装配置都很复杂。GNU Radio算是个例外了，不光提供了Windows的二进制安装，还有比较详细的说明。我是Python小白，所以折腾了好久才弄好，特意记录下来，免得以后再装还折腾。GNU Radio的

GitCode 开源社区

centOS 8 使用dnf安装Docker

DNF是什么？CentOS 8使用YUM软件包管理器版本v4.0.4。现在，该版本使用DNF(已删除YUM)。DNF是软件包管理器。它会在Linux发行版上安装，执行更新并删除软件包。使用DNF安装Docker跳过具有损坏依赖性的程序包一个有效的解决方案是使您的CentOS 8系统使用以下--nobest命令安装最符合条件的版本：sudo dnf install docker...

GitCode 开源社区

定时同步数据库表(mysql+linux+crontab)

sync.sh里面的参数需要改变，ip/username/password/database/tablesync.sh#!/bin/sh# Please change the IP and password of the data source db.# Then change the table name.filename=/home/nington/db/$(date +%Y-%m