一、模型保存:

  • 1.保存模型参数

  • 2.保存整个模型

    • 回调函数保存

    • 手动保存

1.回调函数:tf.keras.callbacks.ModelCheckpoint

训练期间保存模型(以 checkpoints 形式保存),Checkpoint是一个二进制文件,它保存了权重、偏置项、梯度以及其他所有的变量的取值,扩展名为.ckpt

keras.callbacks.ModelCheckpoint(filepath, monitor=‘val_loss’, verbose=0, save_best_only=False, save_weights_only=False, mode=‘auto’, period=1)

  • filepath:string,保存模型文件的路径。
  • monitor:要监测的数量。
  • verbose:详细信息模式,0或1。
  • save_best_only:如果save_best_only=True,被监测数量的最佳型号不会被覆盖。
  • mode:{auto,min,max}之一。如果save_best_only=True,那么是否覆盖保存文件的决定就取决于被监测数据的最大或者最小值。对于val_acc,这应该是max,对于val_loss这应该是min,等等。在auto模式中,方向是从监测数量的名称自动推断出来的。
  • save_weights_only:如果为True,则仅保存模型的权重(model.save_weights(filepath)),否则保存完整模型(model.save(filepath))。
  • period:检查点之间的间隔(epoch数)。
import tensorflow as tf
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()

train_labels = train_labels[:1000]
test_labels = test_labels[:1000]

train_images = train_images[:1000].reshape(-1, 28 * 28) / 255.0
test_images = test_images[:1000].reshape(-1, 28 * 28) / 255.0
# 定义一个简单的序列模型
def create_model():
    model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(128, activation='relu', input_shape=(784,)),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(10, activation='softmax')
  ])

    model.compile(optimizer='adam',
                loss='sparse_categorical_crossentropy',
                metrics=['accuracy'])

    return model

保存模型参数

model1 = create_model()
checkpoint_path1 = "training_1/cp.ckpt"
cp_callback = tf.keras.callbacks.ModelCheckpoint(checkpoint_path1, save_weights_only=True,verbose=0)
model1.fit(train_images,train_labels,epochs=10,validation_data=(test_images,test_labels),callbacks = [cp_callback],verbose=2)
Train on 1000 samples, validate on 1000 samples
Epoch 1/10
1000/1000 - 0s - loss: 1.5260 - accuracy: 0.5590 - val_loss: 1.0336 - val_accuracy: 0.7450
Epoch 2/10
1000/1000 - 0s - loss: 0.6987 - accuracy: 0.8240 - val_loss: 0.6933 - val_accuracy: 0.8130
Epoch 3/10
1000/1000 - 0s - loss: 0.4780 - accuracy: 0.8760 - val_loss: 0.5774 - val_accuracy: 0.8360
Epoch 4/10
1000/1000 - 0s - loss: 0.3682 - accuracy: 0.9010 - val_loss: 0.5263 - val_accuracy: 0.8450
Epoch 5/10
1000/1000 - 0s - loss: 0.3029 - accuracy: 0.9250 - val_loss: 0.4847 - val_accuracy: 0.8420
Epoch 6/10
1000/1000 - 0s - loss: 0.2572 - accuracy: 0.9340 - val_loss: 0.4661 - val_accuracy: 0.8560
Epoch 7/10
1000/1000 - 0s - loss: 0.2252 - accuracy: 0.9490 - val_loss: 0.4509 - val_accuracy: 0.8540
Epoch 8/10
1000/1000 - 0s - loss: 0.1855 - accuracy: 0.9600 - val_loss: 0.4275 - val_accuracy: 0.8570
Epoch 9/10
1000/1000 - 0s - loss: 0.1605 - accuracy: 0.9670 - val_loss: 0.4292 - val_accuracy: 0.8590
Epoch 10/10
1000/1000 - 0s - loss: 0.1421 - accuracy: 0.9710 - val_loss: 0.4227 - val_accuracy: 0.8650
<tensorflow.python.keras.callbacks.History at 0x25617446d68>

checkpoint 回调选项:

  • 回调提供了几个选项,为 checkpoint 提供唯一名称并调整 checkpoint 频率。

  • 训练一个新模型,每两个 epochs 保存一次唯一命名的 checkpoint :

model2 = create_model()
checkpoint_path2 = "training_2/cp-{epoch:04d}.ckpt"
cp_callback = tf.keras.callbacks.ModelCheckpoint(checkpoint_path2,period=2,monitor="val_accuracy",
                                                 save_best_only=True,mode="max",save_weights_only=True,verbose=0)
model2.fit(train_images,train_labels,epochs=10,validation_data=(test_images,test_labels),callbacks = [cp_callback],verbose=2)
WARNING:tensorflow:`period` argument is deprecated. Please use `save_freq` to specify the frequency in number of samples seen.
Train on 1000 samples, validate on 1000 samples
Epoch 1/10
1000/1000 - 0s - loss: 1.6386 - accuracy: 0.5140 - val_loss: 1.1132 - val_accuracy: 0.7000
Epoch 2/10
1000/1000 - 0s - loss: 0.7385 - accuracy: 0.8020 - val_loss: 0.7291 - val_accuracy: 0.7950
Epoch 3/10
1000/1000 - 0s - loss: 0.4973 - accuracy: 0.8720 - val_loss: 0.5980 - val_accuracy: 0.8280
Epoch 4/10
1000/1000 - 0s - loss: 0.3961 - accuracy: 0.8930 - val_loss: 0.5367 - val_accuracy: 0.8410
Epoch 5/10
1000/1000 - 0s - loss: 0.3182 - accuracy: 0.9170 - val_loss: 0.4925 - val_accuracy: 0.8560
Epoch 6/10
1000/1000 - 0s - loss: 0.2772 - accuracy: 0.9250 - val_loss: 0.5132 - val_accuracy: 0.8410
Epoch 7/10
1000/1000 - 0s - loss: 0.2298 - accuracy: 0.9470 - val_loss: 0.4731 - val_accuracy: 0.8530
Epoch 8/10
1000/1000 - 0s - loss: 0.2083 - accuracy: 0.9480 - val_loss: 0.4472 - val_accuracy: 0.8590
Epoch 9/10
1000/1000 - 0s - loss: 0.1766 - accuracy: 0.9670 - val_loss: 0.4370 - val_accuracy: 0.8610
Epoch 10/10
1000/1000 - 0s - loss: 0.1465 - accuracy: 0.9660 - val_loss: 0.4363 - val_accuracy: 0.8660
<tensorflow.python.keras.callbacks.History at 0x25617a7def0>

保存整个模型

model3 = create_model()
# checkpoint_path3 = "training_3"
checkpoint_path1 = "training_3.h5"
cp_callback = tf.keras.callbacks.ModelCheckpoint(checkpoint_path3, save_weights_only=False,verbose=0)
model3.fit(train_images,train_labels,epochs=10,validation_data=(test_images,test_labels),callbacks = [cp_callback],verbose=2)
Train on 1000 samples, validate on 1000 samples
Epoch 1/10
INFO:tensorflow:Assets written to: training_3\assets
1000/1000 - 1s - loss: 1.5397 - accuracy: 0.5510 - val_loss: 1.0164 - val_accuracy: 0.7510
Epoch 2/10
INFO:tensorflow:Assets written to: training_3\assets
1000/1000 - 1s - loss: 0.6896 - accuracy: 0.8190 - val_loss: 0.6884 - val_accuracy: 0.8020
Epoch 3/10
INFO:tensorflow:Assets written to: training_3\assets
1000/1000 - 1s - loss: 0.4903 - accuracy: 0.8640 - val_loss: 0.5970 - val_accuracy: 0.8200
Epoch 4/10
INFO:tensorflow:Assets written to: training_3\assets
1000/1000 - 1s - loss: 0.3805 - accuracy: 0.9050 - val_loss: 0.5356 - val_accuracy: 0.8390
Epoch 5/10
INFO:tensorflow:Assets written to: training_3\assets
1000/1000 - 0s - loss: 0.3025 - accuracy: 0.9240 - val_loss: 0.4885 - val_accuracy: 0.8510
Epoch 6/10
INFO:tensorflow:Assets written to: training_3\assets
1000/1000 - 1s - loss: 0.2783 - accuracy: 0.9250 - val_loss: 0.4767 - val_accuracy: 0.8500
Epoch 7/10
INFO:tensorflow:Assets written to: training_3\assets
1000/1000 - 1s - loss: 0.2241 - accuracy: 0.9400 - val_loss: 0.4600 - val_accuracy: 0.8490
Epoch 8/10
INFO:tensorflow:Assets written to: training_3\assets
1000/1000 - 0s - loss: 0.2078 - accuracy: 0.9500 - val_loss: 0.4576 - val_accuracy: 0.8490
Epoch 9/10
INFO:tensorflow:Assets written to: training_3\assets
1000/1000 - 0s - loss: 0.1631 - accuracy: 0.9670 - val_loss: 0.4456 - val_accuracy: 0.8630
Epoch 10/10
INFO:tensorflow:Assets written to: training_3\assets
1000/1000 - 0s - loss: 0.1491 - accuracy: 0.9700 - val_loss: 0.4264 - val_accuracy: 0.8570
<tensorflow.python.keras.callbacks.History at 0x2561908add8>

2.手动保存

model4 = create_model()
model4.fit(train_images,train_labels,epochs=10,validation_data=(test_images,test_labels),verbose=2)
Train on 1000 samples, validate on 1000 samples
Epoch 1/10
1000/1000 - 0s - loss: 1.5830 - accuracy: 0.5460 - val_loss: 1.0903 - val_accuracy: 0.6990
Epoch 2/10
1000/1000 - 0s - loss: 0.7136 - accuracy: 0.8010 - val_loss: 0.7605 - val_accuracy: 0.7720
Epoch 3/10
1000/1000 - 0s - loss: 0.4949 - accuracy: 0.8690 - val_loss: 0.5975 - val_accuracy: 0.8190
Epoch 4/10
1000/1000 - 0s - loss: 0.3896 - accuracy: 0.8920 - val_loss: 0.5583 - val_accuracy: 0.8350
Epoch 5/10
1000/1000 - 0s - loss: 0.3083 - accuracy: 0.9200 - val_loss: 0.5231 - val_accuracy: 0.8460
Epoch 6/10
1000/1000 - 0s - loss: 0.2686 - accuracy: 0.9330 - val_loss: 0.4792 - val_accuracy: 0.8460
Epoch 7/10
1000/1000 - 0s - loss: 0.2288 - accuracy: 0.9370 - val_loss: 0.4640 - val_accuracy: 0.8560
Epoch 8/10
1000/1000 - 0s - loss: 0.1974 - accuracy: 0.9530 - val_loss: 0.4606 - val_accuracy: 0.8570
Epoch 9/10
1000/1000 - 0s - loss: 0.1637 - accuracy: 0.9680 - val_loss: 0.4609 - val_accuracy: 0.8480
Epoch 10/10
1000/1000 - 0s - loss: 0.1481 - accuracy: 0.9710 - val_loss: 0.4347 - val_accuracy: 0.8660
<tensorflow.python.keras.callbacks.History at 0x25619f2b5c0>

保存权重

model4.save_weights('./checkpoints/my_checkpoint')

保存整个模型

model5 = create_model()
model5.fit(train_images,train_labels,epochs=10,validation_data=(test_images,test_labels),verbose=2)
Train on 1000 samples, validate on 1000 samples
Epoch 1/10
1000/1000 - 0s - loss: 1.5685 - accuracy: 0.5620 - val_loss: 1.0435 - val_accuracy: 0.7190
Epoch 2/10
1000/1000 - 0s - loss: 0.6908 - accuracy: 0.8280 - val_loss: 0.7122 - val_accuracy: 0.7970
Epoch 3/10
1000/1000 - 0s - loss: 0.4719 - accuracy: 0.8750 - val_loss: 0.6066 - val_accuracy: 0.8010
Epoch 4/10
1000/1000 - 0s - loss: 0.3819 - accuracy: 0.8930 - val_loss: 0.5493 - val_accuracy: 0.8250
Epoch 5/10
1000/1000 - 0s - loss: 0.3040 - accuracy: 0.9230 - val_loss: 0.5098 - val_accuracy: 0.8350
Epoch 6/10
1000/1000 - 0s - loss: 0.2652 - accuracy: 0.9350 - val_loss: 0.4669 - val_accuracy: 0.8480
Epoch 7/10
1000/1000 - 0s - loss: 0.2081 - accuracy: 0.9490 - val_loss: 0.4626 - val_accuracy: 0.8440
Epoch 8/10
1000/1000 - 0s - loss: 0.1879 - accuracy: 0.9610 - val_loss: 0.4384 - val_accuracy: 0.8530
Epoch 9/10
1000/1000 - 0s - loss: 0.1588 - accuracy: 0.9650 - val_loss: 0.4378 - val_accuracy: 0.8510
Epoch 10/10
1000/1000 - 0s - loss: 0.1538 - accuracy: 0.9660 - val_loss: 0.4495 - val_accuracy: 0.8540
<tensorflow.python.keras.callbacks.History at 0x2561cd88e10>
model5.save('model5')
INFO:tensorflow:Assets written to: model5\assets

在部署模型时,我们的第一步往往是将训练好的整个模型完整导出为一系列标准格式的文件,然后即可在不同的平台上部署模型文件。无需建立模型的源代码即可再次运行模型,适用于模型的分享和部署。TensorFlow Serving(服务器端部署模型)、TensorFlow Lite(移动端部署模型)以及 TensorFlow.js 都会用到这一格式。

将模型保存为HDF5文件

  • HDF(Hierarchical Data Format)指一种为存储和处理大容量科学数据设计的文件格式及相应库文件。
  • https://support.hdfgroup.org/HDF5/
# 创建一个新的模型实例
model6 = create_model()

# 训练模型
model6.fit(train_images, train_labels, epochs=5)

# 将整个模型保存为HDF5文件
model6.save('my_model.h5')
Train on 1000 samples
Epoch 1/5
1000/1000 [==============================] - 0s 254us/sample - loss: 1.6071 - accuracy: 0.5330
Epoch 2/5
1000/1000 [==============================] - 0s 56us/sample - loss: 0.7051 - accuracy: 0.8150
Epoch 3/5
1000/1000 [==============================] - 0s 59us/sample - loss: 0.4864 - accuracy: 0.8680
Epoch 4/5
1000/1000 [==============================] - 0s 56us/sample - loss: 0.3656 - accuracy: 0.9130
Epoch 5/5
1000/1000 [==============================] - 0s 59us/sample - loss: 0.3108 - accuracy: 0.9150

二、模型恢复

恢复模型参数

创建一个新的未经训练的模型。仅恢复模型的权重时,必须具有与原始模型具有相同网络结构的模型。由于模型具有相同的结构,您可以共享权重,尽管它是模型的不同实例。 现在重建一个新的未经训练的模型,并在测试集上进行评估。未经训练的模型将在机会水平(chance levels)上执行(准确度约为10%):

# 创建一个基本模型实例
model7 = create_model()
# 评估模型
loss, acc = model7.evaluate(test_images,  test_labels, verbose=2)
print("Untrained model, accuracy: {:5.2f}%".format(100*acc))
1000/1 - 0s - loss: 2.3194 - accuracy: 0.0990
Untrained model, accuracy:  9.90%

然后从 checkpoint 加载权重并重新评估:

# 加载权重
model7.load_weights(checkpoint_path1)

# 重新评估模型
loss,acc = model7.evaluate(test_images,  test_labels, verbose=2)
print("Restored model, accuracy: {:5.2f}%".format(100*acc))
1000/1 - 0s - loss: 0.5116 - accuracy: 0.8650
Restored model, accuracy: 86.50%
import os
checkpoint_dir = os.path.dirname(checkpoint_path2)
latest = tf.train.latest_checkpoint(checkpoint_dir)
latest
'training_2\\cp-0010.ckpt'
model8= create_model()
# 加载最后一次保存模型的权重
model8.load_weights(latest)

# 重新评估模型
loss,acc = model3.evaluate(test_images,  test_labels, verbose=2)
print("Restored model, accuracy: {:5.2f}%".format(100*acc))
1000/1 - 0s - loss: 0.4467 - accuracy: 0.8570
Restored model, accuracy: 85.70%

指定文件名恢复

# model.ckpt-8
model9= create_model()
model9.load_weights('training_2\cp-0002.ckpt')

# 重新评估模型
loss,acc = model9.evaluate(test_images,  test_labels, verbose=2)
print("Restored model, accuracy: {:5.2f}%".format(100*acc))
1000/1 - 0s - loss: 0.8354 - accuracy: 0.7950
Restored model, accuracy: 79.50%

恢复整个模型

从HDF5文件中恢复模型

# 重新创建完全相同的模型,包括其权重和优化程序
new_model = tf.keras.models.load_model('my_model.h5')
# 显示网络结构
new_model.summary()
Model: "sequential_14"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_28 (Dense)             (None, 128)               100480    
_________________________________________________________________
dropout_14 (Dropout)         (None, 128)               0         
_________________________________________________________________
dense_29 (Dense)             (None, 10)                1290      
=================================================================
Total params: 101,770
Trainable params: 101,770
Non-trainable params: 0
_________________________________________________________________
loss, acc = new_model.evaluate(test_images,  test_labels, verbose=2)
print("Restored model, accuracy: {:5.2f}%".format(100*acc))
1000/1 - 0s - loss: 0.5647 - accuracy: 0.8430
Restored model, accuracy: 84.30%

恢复回调函数保存的整个模型

new_model = tf.keras.models.load_model('training_3')
# 显示网络结构
new_model.summary()
Model: "sequential_9"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_18 (Dense)             (None, 128)               100480    
_________________________________________________________________
dropout_9 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_19 (Dense)             (None, 10)                1290      
=================================================================
Total params: 101,770
Trainable params: 101,770
Non-trainable params: 0
_________________________________________________________________
Logo

旨在为数千万中国开发者提供一个无缝且高效的云端环境,以支持学习、使用和贡献开源项目。

更多推荐