1.论文下载地址

https://arxiv.org/abs/1512.00567

2.结构表

在这里插入图片描述


3.改进的三种inception

原始的InceptionV1模块:
在这里插入图片描述
关于以下模块的数量和计算的计算量可以参考InceptionV1这篇文章:
https://mydreamambitious.blog.csdn.net/article/details/124237000

(1)改进inception模块1:

在这里插入图片描述
在这里插入图片描述

注:上面的5x5卷积可以使用两个3x3的卷积代替,并且很大程度上减少参数量;
为什么上面的5x5卷积可以使用两个3x3的卷积代替呢?
主要是因为感受野是相同的,并且这样也可以降低参数量;其中7x7的卷积可以使用三个3x3的卷积代替。

参数量的计算对比:假设有一个5x5 feature map。
(1)直接使用5x5卷积核进行卷积:
参数量:(5x5xC)xC=25C^2;
计算量:(WxHxC)x(5x5xC)=25WHC^2
(2)使用两个3x3的卷积代替5x5的卷积:
参数量:2x(3x3xC)xC=18C^2;
计算量:2x(WxHxC)x(3x3xC)=18WHC^2
可以对比发现,参数量和计算量相对于直接使用5x5的卷积要减少了很多。
参数量的计算对比:假设有一个7x7 feature map。
(3)直接使用7x7卷积核进行卷积:
参数量:(7x7xC)xC=49C^2;
计算量:(WxHxC)x(7x7xC)=49WHC^2
(4)使用三个3x3的卷积代替7x7的卷积:
参数量:3x(3x3xC)xC=27C^2;
计算量:3x(WxHxC)x(3x3xC)=27WHC^2
可以对比发现,参数量和计算量相对于直接使用7x7的卷积要减少了近一半的参数。
注:卷积神经网络中,感受野(Receptive Field)的定义是卷积神经网络每一层输出特征图(feature map)上的像素点在输入图片上映射的区域大小。

(1)改进inception模块2:

在这里插入图片描述

在这里插入图片描述
将5x5的卷积使用两个1x3,3x1的卷积来代替;3x3的卷积使用1x3,3x1的卷积来代替。
(5)如果是使用两个3x3的卷积代替5x5的卷积:
参数量:2x(3x3xC)xC=18C^2;
计算量:2x(WxHxC)x(3x3xC)=18WHC^2
(6)如上图所示,使用两个1x3,3x1的卷积代替5x5的卷积:
参数量:(1x3xC)xC+(3x1xC)xC+(1x3xC)xC+(3x1xC)xC=12C^2;
计算量:2x[(WxHxC)x(1x3xC)+(WxHxC)x(3x1xC)]=12WHC^2
(7)使用1x3,3x1的卷积代替3x3的卷积:
参数量:(1x3xC)xC+(3x1xC)xC=6C^2;
计算量:[(WxHxC)x(1x3xC)+(WxHxC)x(3x1xC)]=6WHC^2
可以对比发现,参数量和计算量相对于使用两个3x3的卷积代替5x5的卷积要减少了很多。

(1)改进inception模块3:

在这里插入图片描述
该结构主要用于扩充通道数,所以放在了所有Inception模块的最后。


4.特征图缩放和通道数增加的方式

在这里插入图片描述
对于方式一:先池化再进行升维的话,那么在池化的过程将丢失很多信息,对于后面输出的特征图提取的图像中的特征将会更少,违反了原则一;
方式二:先升维再池化的话,那么计算量将增加三倍,对于训练来说是不利的。

改进之后:
在这里插入图片描述
改进方案:在扩充通道数的同时下采样,也保证了计算效率。


5.四条原则

(1)原则一:

避免过度降维或者收缩特征bottleneck,特别是在网络浅层,因为在浅层过度的降维的话,将导致过多的信息丢失;降维会造成各通道间的相关性信息丢失,仅反映了致密的嵌入信息;
在这里插入图片描述

(2)原则二:

特征越多,收敛越快,相互独立的特征就越多,输入的信息分解的越彻底。

(3)原则三:

3x3和5x5大卷积核之前可用1x1卷积降维;大的卷积可以聚合空间信息的作用和大的感受野,因为使用1x1卷积,这样做不仅可以降低计算量和参数量,也是因为邻近单元的强相关性在降维的过程中信息损失很少。

(4)原则四:

均衡网络的宽度和深度,两者同时提升,既可以提高性能,也可以提高计算效率,不像VGG16大多数的参数量都集中在全连接层,这样做不利于提升性能和计算效率,而Inception则将参数均衡的分布在各层,使网络和宽度和深度更加的均衡,最后的计算效率和性能都会有所提升。


6.总结

(1)GoogLeNet成功的原因是因为在网络大量使用1x1的卷积降维,降低计算量和参数量(1x1卷积可以看成是一种特殊的卷积分解,提高了计算效率)。
(2)相邻感受野的卷积是高度相关的,使用1x1的卷积有利于保留相邻单元之间的相关性。
6.在InceptionV1中有两个辅助分来器,在训练快结束的时候,带有辅助分来器头的模型精度会更高;但是InceptionV3中取消了:因为辅助分来器不能帮助模型更快的收敛,去掉浅层的辅助分器头没有什么影响。


7.网络结构实现

import os
import keras
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras.models import Model
config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True      # TensorFlow按需分配显存
config.gpu_options.per_process_gpu_memory_fraction = 0.5  # 指定显存分配比例


inceptionV3_One={'1a':[64,48,64,96,96,32],
                 '2a':[64,48,64,96,96,64],
                 '3a':[64,48,64,96,96,64]
}


inceptionV3_Two={'1b':[192,128,128,192,128,128,128,128,192,192],
                 '2b':[192,160,160,192,160,160,160,160,192,192],
                 '3b':[192,160,160,192,160,160,160,160,192,192],
                 '4b':[192,192,192,192,192,192,192,192,192,192]
}
keys_two=(list)(inceptionV3_Two.keys())

inceptionV3_Three={
                '1c':[320,384,384,384,448,384,384,384,192],
                '2c':[320,384,384,384,448,384,384,384,192]
}
keys_three=(list)(inceptionV3_Three.keys())

def InceptionV3(inceptionV3_One,inceptionV3_Two,inceptionV3_Three):
    keys_one=(list)(inceptionV3_One.keys())
    keys_two = (list)(inceptionV3_Two.keys())
    keys_three = (list)(inceptionV3_Three.keys())

    input=layers.Input(shape=[299,299,3])

    # 输入部分
    conv1_one = layers.Conv2D(32, kernel_size=[3, 3], strides=[2, 2], padding='valid')(input)
    conv1_batch=layers.BatchNormalization()(conv1_one)
    conv1relu=layers.Activation('relu')(conv1_batch)
    conv2_one = layers.Conv2D(32, kernel_size=[3, 3], strides=[1,1],padding='valid')(conv1relu)
    conv2_batch=layers.BatchNormalization()(conv2_one)
    conv2relu=layers.Activation('relu')(conv2_batch)
    conv3_padded = layers.Conv2D(64, kernel_size=[3, 3], strides=[1,1],padding='same')(conv2relu)
    conv3_batch=layers.BatchNormalization()(conv3_padded)
    con3relu=layers.Activation('relu')(conv3_batch)
    pool1_one = layers.MaxPool2D(pool_size=[3, 3], strides=[2, 2])(con3relu)
    conv4_one = layers.Conv2D(80, kernel_size=[3,3], strides=[1,1], padding='valid')(pool1_one)
    conv4_batch=layers.BatchNormalization()(conv4_one)
    conv4relu=layers.Activation('relu')(conv4_batch)
    conv5_one = layers.Conv2D(192, kernel_size=[3, 3], strides=[2,2], padding='valid')(conv4relu)
    conv5_batch = layers.BatchNormalization()(conv5_one)
    x=layers.Activation('relu')(conv5_batch)

    """
        filter11:1x1的卷积核个数
        filter13:3x3卷积之前的1x1卷积核个数
        filter33:3x3卷积个数
        filter15:使用3x3卷积代替5x5卷积之前的1x1卷积核个数
        filter55:使用3x3卷积代替5x5卷积个数
        filtermax:最大池化之后的1x1卷积核个数
    """
    for i in range(3):
        conv11 = layers.Conv2D((int)(inceptionV3_One[keys_one[i]][0]), kernel_size=[1, 1], strides=[1, 1], padding='same')(x)
        batchnormaliztion11 = layers.BatchNormalization()(conv11)
        conv11relu = layers.Activation('relu')(batchnormaliztion11)

        conv13 = layers.Conv2D((int)(inceptionV3_One[keys_one[i]][1]), kernel_size=[1, 1], strides=[1, 1], padding='same')(x)
        batchnormaliztion13 = layers.BatchNormalization()(conv13)
        conv13relu = layers.Activation('relu')(batchnormaliztion13)
        conv33 = layers.Conv2D((int)(inceptionV3_One[keys_one[i]][2]), kernel_size=[5, 5], strides=[1, 1], padding='same')(conv13relu)
        batchnormaliztion33 = layers.BatchNormalization()(conv33)
        conv33relu = layers.Activation('relu')(batchnormaliztion33)

        conv1533 = layers.Conv2D((int)(inceptionV3_One[keys_one[i]][3]), kernel_size=[1, 1], strides=[1, 1], padding='same')(x)
        batchnormaliztion1533 = layers.BatchNormalization()(conv1533)
        conv1522relu = layers.Activation('relu')(batchnormaliztion1533)
        conv5533first = layers.Conv2D((int)(inceptionV3_One[keys_one[i]][4]), kernel_size=[3, 3], strides=[1, 1], padding='same')(conv1522relu)
        batchnormaliztion5533first = layers.BatchNormalization()(conv5533first)
        conv5533firstrelu = layers.Activation('relu')(batchnormaliztion5533first)
        conv5533last = layers.Conv2D((int)(inceptionV3_One[keys_one[i]][4]), kernel_size=[3, 3], strides=[1, 1], padding='same')(conv5533firstrelu)
        batchnormaliztion5533last = layers.BatchNormalization()(conv5533last)
        conv5533lastrelu = layers.Activation('relu')(batchnormaliztion5533last)

        maxpool = layers.AveragePooling2D(pool_size=[3, 3], strides=[1, 1], padding='same')(x)
        maxconv11 = layers.Conv2D((int)(inceptionV3_One[keys_one[i]][5]), kernel_size=[1, 1], strides=[1, 1], padding='same')(maxpool)
        batchnormaliztionpool = layers.BatchNormalization()(maxconv11)
        convmaxrelu = layers.Activation('relu')(batchnormaliztionpool)

        x=tf.concat([
            conv11relu,conv33relu,conv5533lastrelu,convmaxrelu
        ],axis=3)

    conv1_two = layers.Conv2D(384, kernel_size=[3, 3], strides=[2, 2], padding='valid')(x)
    conv1batch=layers.BatchNormalization()(conv1_two)
    conv1_tworelu=layers.Activation('relu')(conv1batch)

    conv2_two = layers.Conv2D(64, kernel_size=[1, 1], strides=[1, 1], padding='same')(x)
    conv2batch=layers.BatchNormalization()(conv2_two)
    conv2_tworelu=layers.Activation('relu')(conv2batch)
    conv3_two = layers.Conv2D( 96, kernel_size=[3, 3], strides=[1,1], padding='same')(conv2_tworelu)
    conv3batch=layers.BatchNormalization()(conv3_two)
    conv3_tworelu=layers.Activation('relu')(conv3batch)
    conv4_two = layers.Conv2D( 96, kernel_size=[3, 3], strides=[2, 2], padding='valid')(conv3_tworelu)
    conv4batch=layers.BatchNormalization()(conv4_two)
    conv4_tworelu=layers.Activation('relu')(conv4batch)

    maxpool = layers.MaxPool2D(pool_size=[3, 3], strides=[2, 2])(x)
    x=tf.concat([
        conv1_tworelu,conv4_tworelu,maxpool
    ],axis=3)
    """
        filter11:1x1的卷积核个数
        filter13:使用1x3,3x1卷积代替3x3卷积之前的1x1卷积核个数
        filter33:使用1x3,3x1卷积代替3x3卷积的个数
        filter15:使用1x3,3x1,1x3,3x1卷积卷积代替5x5卷积之前的1x1卷积核个数
        filter55:使用1x3,3x1,1x3,3x1卷积代替5x5卷积个数
        filtermax:最大池化之后的1x1卷积核个数
    """
    for i in range(4):
        conv11 = layers.Conv2D((int)(inceptionV3_Two[keys_two[i]][0]), kernel_size=[1, 1], strides=[1, 1], padding='same')(x)
        batchnormaliztion11 = layers.BatchNormalization()(conv11)
        conv11relu=layers.Activation('relu')(batchnormaliztion11)

        conv13 = layers.Conv2D((int)(inceptionV3_Two[keys_two[i]][1]), kernel_size=[1, 1], strides=[1, 1], padding='same')(x)
        batchnormaliztion13 = layers.BatchNormalization()(conv13)
        conv13relu=layers.Activation('relu')(batchnormaliztion13)
        conv3313 = layers.Conv2D((int)(inceptionV3_Two[keys_two[i]][2]), kernel_size=[1, 7], strides=[1, 1], padding='same')(conv13relu)
        batchnormaliztion3313 = layers.BatchNormalization()(conv3313)
        conv3313relu=layers.Activation('relu')(batchnormaliztion3313)
        conv3331 = layers.Conv2D((int)(inceptionV3_Two[keys_two[i]][3]), kernel_size=[7, 1], strides=[1, 1], padding='same')(conv3313relu)
        batchnormaliztion3331 = layers.BatchNormalization()(conv3331)
        conv3331relu=layers.Activation('relu')(batchnormaliztion3331)

        conv15 = layers.Conv2D((int)(inceptionV3_Two[keys_two[i]][4]), kernel_size=[1, 1], strides=[1, 1], padding='same')(x)
        batchnormaliztion15 = layers.BatchNormalization()(conv15)
        conv15relu=layers.Activation('relu')(batchnormaliztion15)
        conv1513first = layers.Conv2D((int)(inceptionV3_Two[keys_two[i]][5]), kernel_size=[1, 7], strides=[1, 1], padding='same')(conv15relu)
        batchnormaliztion1513first = layers.BatchNormalization()(conv1513first)
        conv1513firstrelu=layers.Activation('relu')(batchnormaliztion1513first)
        conv1531second = layers.Conv2D((int)(inceptionV3_Two[keys_two[i]][6]), kernel_size=[7, 1], strides=[1, 1], padding='same')(conv1513firstrelu)
        batchnormaliztion1531second = layers.BatchNormalization()(conv1531second)
        conv1531second=layers.Activation('relu')(batchnormaliztion1531second)
        conv1513third = layers.Conv2D((int)(inceptionV3_Two[keys_two[i]][7]), kernel_size=[1, 7], strides=[1, 1], padding='same')(conv1531second)
        batchnormaliztion1513third = layers.BatchNormalization()(conv1513third)
        conv1513thirdrelu=layers.Activation('relu')(batchnormaliztion1513third)
        conv1531last = layers.Conv2D((int)(inceptionV3_Two[keys_two[i]][8]), kernel_size=[7, 1], strides=[1, 1], padding='same')(conv1513thirdrelu)
        batchnormaliztion1531last = layers.BatchNormalization()(conv1531last)
        conv1531lastrelu=layers.Activation('relu')(batchnormaliztion1531last)

        maxpool = layers.AveragePooling2D(pool_size=[3, 3], strides=[1, 1], padding='same')(x)
        maxconv11 = layers.Conv2D((int)(inceptionV3_Two[keys_two[i]][9]), kernel_size=[1, 1], strides=[1, 1], padding='same')(maxpool)
        maxconv11relu = layers.BatchNormalization()(maxconv11)
        maxconv11relu = layers.Activation('relu')(maxconv11relu)

        x=tf.concat([
            conv11relu,conv3331relu,conv1531lastrelu,maxconv11relu
        ],axis=3)

    conv11_three=layers.Conv2D(192, kernel_size=[1, 1], strides=[1, 1], padding='same')(x)
    conv11batch=layers.BatchNormalization()(conv11_three)
    conv11relu=layers.Activation('relu')(conv11batch)
    conv33_three=layers.Conv2D(320, kernel_size=[3, 3], strides=[2, 2], padding='valid')(conv11relu)
    conv33batch=layers.BatchNormalization()(conv33_three)
    conv33relu=layers.Activation('relu')(conv33batch)

    conv7711_three=layers.Conv2D(192, kernel_size=[1, 1], strides=[1, 1], padding='same')(x)
    conv77batch=layers.BatchNormalization()(conv7711_three)
    conv77relu=layers.Activation('relu')(conv77batch)
    conv7717_three=layers.Conv2D(192, kernel_size=[1, 7], strides=[1, 1], padding='same')(conv77relu)
    conv7717batch=layers.BatchNormalization()(conv7717_three)
    conv7717relu=layers.Activation('relu')(conv7717batch)
    conv7771_three=layers.Conv2D(192, kernel_size=[7, 1], strides=[1, 1], padding='same')(conv7717relu)
    conv7771batch=layers.BatchNormalization()(conv7771_three)
    conv7771relu=layers.Activation('relu')(conv7771batch)
    conv33_three=layers.Conv2D(192, kernel_size=[3, 3], strides=[2, 2], padding='valid')(conv7771relu)
    conv3377batch=layers.BatchNormalization()(conv33_three)
    conv3377relu=layers.Activation('relu')(conv3377batch)

    convmax_three=layers.MaxPool2D(pool_size=[3, 3], strides=[2, 2])(x)
    x=tf.concat([
        conv33relu,conv3377relu,convmax_three
    ],axis=3)
    """
        filter11:1x1的卷积核个数
        filter13:使用1x3,3x1卷积代替3x3卷积之前的1x1卷积核个数
        filter33:使用1x3,3x1卷积代替3x3卷积的个数
        filter15:使用3x3卷积代替5x5卷积之前的1x1卷积核个数
        filter55:使用3x3卷积代替5x5卷积个数
        filtermax:最大池化之后的1x1卷积核个数
        """
    for i in range(2):
        conv11 = layers.Conv2D((int)(inceptionV3_Three[keys_three[i]][0]), kernel_size=[1, 1], strides=[1, 1], padding='same')(x)
        batchnormaliztion11 = layers.BatchNormalization()(conv11)
        conv11relu=layers.Activation('relu')(batchnormaliztion11)

        conv13 = layers.Conv2D((int)(inceptionV3_Three[keys_three[i]][1]), kernel_size=[1, 1], strides=[1, 1], padding='same')(x)
        batchnormaliztion13 = layers.BatchNormalization()(conv13)
        conv13relu=layers.Activation('relu')(batchnormaliztion13)
        conv33left = layers.Conv2D((int)(inceptionV3_Three[keys_three[i]][2]), kernel_size=[1, 3], strides=[1, 1], padding='same')(conv13relu)
        batchnormaliztion33left = layers.BatchNormalization()(conv33left)
        conv33leftrelu=layers.Activation('relu')(batchnormaliztion33left)
        conv33right = layers.Conv2D((int)(inceptionV3_Three[keys_three[i]][3]), kernel_size=[3, 1], strides=[1, 1], padding='same')(conv33leftrelu)
        batchnormaliztion33right = layers.BatchNormalization()(conv33right)
        conv33rightrelu=layers.Activation('relu')(batchnormaliztion33right)
        conv33rightleft=tf.concat([
            conv33leftrelu,conv33rightrelu
        ],axis=3)

        conv15 = layers.Conv2D((int)(inceptionV3_Three[keys_three[i]][4]), kernel_size=[1, 1], strides=[1, 1], padding='same')(x)
        batchnormaliztion15 = layers.BatchNormalization()(conv15)
        conv15relu=layers.Activation('relu')(batchnormaliztion15)
        conv1533 = layers.Conv2D((int)(inceptionV3_Three[keys_three[i]][5]), kernel_size=[3, 3], strides=[1, 1], padding='same')(conv15relu)
        batchnormaliztion1533 = layers.BatchNormalization()(conv1533)
        conv1533relu=layers.Activation('relu')(batchnormaliztion1533)
        conv1533left = layers.Conv2D((int)(inceptionV3_Three[keys_three[i]][6]), kernel_size=[1, 3], strides=[1, 1], padding='same')(conv1533relu)
        batchnormaliztion1533left = layers.BatchNormalization()(conv1533left)
        conv1533leftrelu=layers.Activation('relu')(batchnormaliztion1533left)
        conv1533right = layers.Conv2D((int)(inceptionV3_Three[keys_three[i]][6]), kernel_size=[3, 1], strides=[1, 1], padding='same')(conv1533leftrelu)
        batchnormaliztion1533right = layers.BatchNormalization()(conv1533right)
        conv1533rightrelu=layers.Activation('relu')(batchnormaliztion1533right)
        conv1533leftright=tf.concat([
            conv1533right,conv1533rightrelu
        ],axis=3)

        maxpool = layers.AveragePooling2D(pool_size=[3, 3], strides=[1, 1],padding='same')(x)
        maxconv11 = layers.Conv2D((int)(inceptionV3_Three[keys_three[i]][8]), kernel_size=[1, 1], strides=[1, 1], padding='same')(maxpool)
        batchnormaliztionpool = layers.BatchNormalization()(maxconv11)
        maxrelu = layers.Activation('relu')(batchnormaliztionpool)

        x=tf.concat([
            conv11relu,conv33rightleft,conv1533leftright,maxrelu
        ],axis=3)

    x=layers.GlobalAveragePooling2D()(x)
    x=layers.Dense(1000)(x)
    softmax=layers.Activation('softmax')(x)
    model_inceptionV3=Model(inputs=input,outputs=softmax,name='InceptionV3')
    return model_inceptionV3

model_inceptionV3=InceptionV3(inceptionV3_One,inceptionV3_Two,inceptionV3_Three)
model_inceptionV3.summary()

模型输出结果:
在这里插入图片描述
中间部分未截图:
在这里插入图片描述

GitHub 加速计划 / te / tensorflow
184.55 K
74.12 K
下载
一个面向所有人的开源机器学习框架
最近提交(Master分支:2 个月前 )
a49e66f2 PiperOrigin-RevId: 663726708 2 个月前
91dac11a This test overrides disabled_backends, dropping the default value in the process. PiperOrigin-RevId: 663711155 2 个月前
Logo

旨在为数千万中国开发者提供一个无缝且高效的云端环境,以支持学习、使用和贡献开源项目。

更多推荐