efficientnet搭建yolov3目标检测
Bubbliiiing 人气:0什么是EfficientNet模型
2019年,谷歌新出EfficientNet,在其它网络的基础上,大幅度的缩小了参数的同时提高了预测准确度,简直太强了,我这样的强者也要跟着做下去
EfficientNet,网络如其名,这个网络非常的有效率,怎么理解有效率这个词呢,我们从卷积神经网络的发展来看:
从最初的VGG16发展到如今的Xception,人们慢慢发现,提高神经网络的性能不仅仅在于堆叠层数,更重要的几点是:
1、网络要可以训练,可以收敛。
2、参数量要比较小,方便训练,提高速度。
3、创新神经网络的结构,学到更重要的东西。
而EfficientNet很好的做到了这一点,它利用更少的参数量(关系到训练、速度)得到最好的识别度(学到更重要的特点)。
源码下载
https://github.com/bubbliiiing/efficientnet-yolo3-keras
EfficientNet模型的实现思路
1、EfficientNet模型的特点
EfficientNet模型具有很独特的特点,这个特点是参考其它优秀神经网络设计出来的。经典的神经网络特点如下:
1、利用残差神经网络增大神经网络的深度,通过更深的神经网络实现特征提取。
2、改变每一层提取的特征层数,实现更多层的特征提取,得到更多的特征,提升宽度。
3、通过增大输入图片的分辨率也可以使得网络可以学习与表达的东西更加丰富,有利于提高精确度。
EfficientNet就是将这三个特点结合起来,通过一起缩放baseline模型(MobileNet中就通过缩放α实现缩放模型,不同的α有不同的模型精度,α=1时为baseline模型;
ResNet其实也是有一个baseline模型,在baseline的基础上通过改变图片的深度实现不同的模型实现),同时调整深度、宽度、输入图片的分辨率完成一个优秀的网络设计。
EfficientNet的效果如下:
在EfficientNet模型中,其使用一组固定的缩放系数统一缩放网络深度、宽度和分辨率。
假设想使用 2N倍的计算资源,我们可以简单的对网络深度扩大αN倍、宽度扩大βN 、图像尺寸扩大γN倍,这里的α,β,γ都是由原来的小模型上做微小的网格搜索决定的常量系数。
如图为EfficientNet的设计思路,从三个方面同时拓充网络的特性。
2、EfficientNet网络的结构
EfficientNet一共由一个Stem + 16个Blocks + Con2D + GlobalAveragePooling2D + Dense组成,其核心内容是16个Blocks,其它的结构与常规的卷积神经网络差距不大。
下图展示的是EfficientNet-B0也就是EfficientNet的设计基线的结构:
第一部分是Stem,用于进行初步的特征提取,实际内容是一个卷积+标准化+激活函数。
第二部分是16个Blocks,是efficientnet特有的特征提取结构,在Blocks堆叠的过程中完成高效的特征提取。
第三部分是Con2D + GlobalAveragePooling2D + Dense,是efficientnet的分类头,在构建efficientnet-yolov3的时候没有使用到。
整个efficientnet由7个部分的Block组成,对应上图的Block1-Block7,其中每个部分的Block的的参数如下:
DEFAULT_BLOCKS_ARGS = [ {'kernel_size': 3, 'repeats': 1, 'filters_in': 32, 'filters_out': 16, 'expand_ratio': 1, 'id_skip': True, 'strides': 1, 'se_ratio': 0.25}, {'kernel_size': 3, 'repeats': 2, 'filters_in': 16, 'filters_out': 24, 'expand_ratio': 6, 'id_skip': True, 'strides': 2, 'se_ratio': 0.25}, {'kernel_size': 5, 'repeats': 2, 'filters_in': 24, 'filters_out': 40, 'expand_ratio': 6, 'id_skip': True, 'strides': 2, 'se_ratio': 0.25}, {'kernel_size': 3, 'repeats': 3, 'filters_in': 40, 'filters_out': 80, 'expand_ratio': 6, 'id_skip': True, 'strides': 2, 'se_ratio': 0.25}, {'kernel_size': 5, 'repeats': 3, 'filters_in': 80, 'filters_out': 112, 'expand_ratio': 6, 'id_skip': True, 'strides': 1, 'se_ratio': 0.25}, {'kernel_size': 5, 'repeats': 4, 'filters_in': 112, 'filters_out': 192, 'expand_ratio': 6, 'id_skip': True, 'strides': 2, 'se_ratio': 0.25}, {'kernel_size': 3, 'repeats': 1, 'filters_in': 192, 'filters_out': 320, 'expand_ratio': 6, 'id_skip': True, 'strides': 1, 'se_ratio': 0.25} ]
Block的通用结构如下,其总体的设计思路是一个结合深度可分离卷积和注意力机制的逆残差结构,每个Block可分为两部分:
左边为主干部分,首先利用1x1卷积升维,再使用3x3或者5x5的逐层卷积进行跨特征点的特征提取。完成特征提取后添加一个通道注意力机制,最后利用1x1卷积降维。
右边为残差边,不进行处理。
Block实现代码如下:
#-------------------------------------------------# # efficient_block #-------------------------------------------------# def block(inputs, activation_fn=tf.nn.swish, drop_rate=0., name='', filters_in=32, filters_out=16, kernel_size=3, strides=1, expand_ratio=1, se_ratio=0., id_skip=True): filters = filters_in * expand_ratio #-------------------------------------------------# # 利用Inverted residuals # part1 利用1x1卷积进行通道数上升 #-------------------------------------------------# if expand_ratio != 1: x = layers.Conv2D(filters, 1, padding='same', use_bias=False, kernel_initializer=CONV_KERNEL_INITIALIZER, name=name + 'expand_conv')(inputs) x = layers.BatchNormalization(axis=3, name=name + 'expand_bn')(x) x = layers.Activation(activation_fn, name=name + 'expand_activation')(x) else: x = inputs #------------------------------------------------------# # 如果步长为2x2的话,利用深度可分离卷积进行高宽压缩 # part2 利用3x3卷积对每一个channel进行卷积 #------------------------------------------------------# if strides == 2: x = layers.ZeroPadding2D(padding=correct_pad(x, kernel_size), name=name + 'dwconv_pad')(x) conv_pad = 'valid' else: conv_pad = 'same' x = layers.DepthwiseConv2D(kernel_size, strides=strides, padding=conv_pad, use_bias=False, depthwise_initializer=CONV_KERNEL_INITIALIZER, name=name + 'dwconv')(x) x = layers.BatchNormalization(axis=3, name=name + 'bn')(x) x = layers.Activation(activation_fn, name=name + 'activation')(x) #------------------------------------------------------# # 完成深度可分离卷积后 # 对深度可分离卷积的结果施加注意力机制 #------------------------------------------------------# if 0 < se_ratio <= 1: filters_se = max(1, int(filters_in * se_ratio)) se = layers.GlobalAveragePooling2D(name=name + 'se_squeeze')(x) se = layers.Reshape((1, 1, filters), name=name + 'se_reshape')(se) #------------------------------------------------------# # 通道先压缩后上升,最后利用sigmoid将值固定到0-1之间 #------------------------------------------------------# se = layers.Conv2D(filters_se, 1, padding='same', activation=activation_fn, kernel_initializer=CONV_KERNEL_INITIALIZER, name=name + 'se_reduce')(se) se = layers.Conv2D(filters, 1, padding='same', activation='sigmoid', kernel_initializer=CONV_KERNEL_INITIALIZER, name=name + 'se_expand')(se) x = layers.multiply([x, se], name=name + 'se_excite') #------------------------------------------------------# # part3 利用1x1卷积进行通道下降 #------------------------------------------------------# x = layers.Conv2D(filters_out, 1, padding='same', use_bias=False, kernel_initializer=CONV_KERNEL_INITIALIZER, name=name + 'project_conv')(x) x = layers.BatchNormalization(axis=3, name=name + 'project_bn')(x) #------------------------------------------------------# # part4 如果满足残差条件,那么就增加残差边 #------------------------------------------------------# if (id_skip is True and strides == 1 and filters_in == filters_out): if drop_rate > 0: x = layers.Dropout(drop_rate, noise_shape=(None, 1, 1, 1), name=name + 'drop')(x) x = layers.add([x, inputs], name=name + 'add') return x
EfficientNet的代码构建
1、模型代码的构建
EfficientNet的实现代码如下,该代码是EfficientNet在YoloV3上的应用,可以参考一下:
import math from copy import deepcopy import tensorflow as tf from keras import backend, layers #-------------------------------------------------# # 一共七个大结构块,每个大结构块都有特定的参数 #-------------------------------------------------# DEFAULT_BLOCKS_ARGS = [ {'kernel_size': 3, 'repeats': 1, 'filters_in': 32, 'filters_out': 16, 'expand_ratio': 1, 'id_skip': True, 'strides': 1, 'se_ratio': 0.25}, {'kernel_size': 3, 'repeats': 2, 'filters_in': 16, 'filters_out': 24, 'expand_ratio': 6, 'id_skip': True, 'strides': 2, 'se_ratio': 0.25}, {'kernel_size': 5, 'repeats': 2, 'filters_in': 24, 'filters_out': 40, 'expand_ratio': 6, 'id_skip': True, 'strides': 2, 'se_ratio': 0.25}, {'kernel_size': 3, 'repeats': 3, 'filters_in': 40, 'filters_out': 80, 'expand_ratio': 6, 'id_skip': True, 'strides': 2, 'se_ratio': 0.25}, {'kernel_size': 5, 'repeats': 3, 'filters_in': 80, 'filters_out': 112, 'expand_ratio': 6, 'id_skip': True, 'strides': 1, 'se_ratio': 0.25}, {'kernel_size': 5, 'repeats': 4, 'filters_in': 112, 'filters_out': 192, 'expand_ratio': 6, 'id_skip': True, 'strides': 2, 'se_ratio': 0.25}, {'kernel_size': 3, 'repeats': 1, 'filters_in': 192, 'filters_out': 320, 'expand_ratio': 6, 'id_skip': True, 'strides': 1, 'se_ratio': 0.25} ] #-------------------------------------------------# # Kernel的初始化器 #-------------------------------------------------# CONV_KERNEL_INITIALIZER = { 'class_name': 'VarianceScaling', 'config': { 'scale': 2.0, 'mode': 'fan_out', 'distribution': 'normal' } } #-------------------------------------------------# # 用于计算卷积层的padding的大小 #-------------------------------------------------# def correct_pad(inputs, kernel_size): img_dim = 1 input_size = backend.int_shape(inputs)[img_dim:(img_dim + 2)] if isinstance(kernel_size, int): kernel_size = (kernel_size, kernel_size) if input_size[0] is None: adjust = (1, 1) else: adjust = (1 - input_size[0] % 2, 1 - input_size[1] % 2) correct = (kernel_size[0] // 2, kernel_size[1] // 2) return ((correct[0] - adjust[0], correct[0]), (correct[1] - adjust[1], correct[1])) #-------------------------------------------------# # 该函数的目的是保证filter的大小可以被8整除 #-------------------------------------------------# def round_filters(filters, divisor, width_coefficient): filters *= width_coefficient new_filters = max(divisor, int(filters + divisor / 2) // divisor * divisor) if new_filters < 0.9 * filters: new_filters += divisor return int(new_filters) #-------------------------------------------------# # 计算模块的重复次数 #-------------------------------------------------# def round_repeats(repeats, depth_coefficient): return int(math.ceil(depth_coefficient * repeats)) #-------------------------------------------------# # efficient_block #-------------------------------------------------# def block(inputs, activation_fn=tf.nn.swish, drop_rate=0., name='', filters_in=32, filters_out=16, kernel_size=3, strides=1, expand_ratio=1, se_ratio=0., id_skip=True): filters = filters_in * expand_ratio #-------------------------------------------------# # 利用Inverted residuals # part1 利用1x1卷积进行通道数上升 #-------------------------------------------------# if expand_ratio != 1: x = layers.Conv2D(filters, 1, padding='same', use_bias=False, kernel_initializer=CONV_KERNEL_INITIALIZER, name=name + 'expand_conv')(inputs) x = layers.BatchNormalization(axis=3, name=name + 'expand_bn')(x) x = layers.Activation(activation_fn, name=name + 'expand_activation')(x) else: x = inputs #------------------------------------------------------# # 如果步长为2x2的话,利用深度可分离卷积进行高宽压缩 # part2 利用3x3卷积对每一个channel进行卷积 #------------------------------------------------------# if strides == 2: x = layers.ZeroPadding2D(padding=correct_pad(x, kernel_size), name=name + 'dwconv_pad')(x) conv_pad = 'valid' else: conv_pad = 'same' x = layers.DepthwiseConv2D(kernel_size, strides=strides, padding=conv_pad, use_bias=False, depthwise_initializer=CONV_KERNEL_INITIALIZER, name=name + 'dwconv')(x) x = layers.BatchNormalization(axis=3, name=name + 'bn')(x) x = layers.Activation(activation_fn, name=name + 'activation')(x) #------------------------------------------------------# # 完成深度可分离卷积后 # 对深度可分离卷积的结果施加注意力机制 #------------------------------------------------------# if 0 < se_ratio <= 1: filters_se = max(1, int(filters_in * se_ratio)) se = layers.GlobalAveragePooling2D(name=name + 'se_squeeze')(x) se = layers.Reshape((1, 1, filters), name=name + 'se_reshape')(se) #------------------------------------------------------# # 通道先压缩后上升,最后利用sigmoid将值固定到0-1之间 #------------------------------------------------------# se = layers.Conv2D(filters_se, 1, padding='same', activation=activation_fn, kernel_initializer=CONV_KERNEL_INITIALIZER, name=name + 'se_reduce')(se) se = layers.Conv2D(filters, 1, padding='same', activation='sigmoid', kernel_initializer=CONV_KERNEL_INITIALIZER, name=name + 'se_expand')(se) x = layers.multiply([x, se], name=name + 'se_excite') #------------------------------------------------------# # part3 利用1x1卷积进行通道下降 #------------------------------------------------------# x = layers.Conv2D(filters_out, 1, padding='same', use_bias=False, kernel_initializer=CONV_KERNEL_INITIALIZER, name=name + 'project_conv')(x) x = layers.BatchNormalization(axis=3, name=name + 'project_bn')(x) #------------------------------------------------------# # part4 如果满足残差条件,那么就增加残差边 #------------------------------------------------------# if (id_skip is True and strides == 1 and filters_in == filters_out): if drop_rate > 0: x = layers.Dropout(drop_rate, noise_shape=(None, 1, 1, 1), name=name + 'drop')(x) x = layers.add([x, inputs], name=name + 'add') return x def EfficientNet(width_coefficient, depth_coefficient, drop_connect_rate=0.2, depth_divisor=8, activation_fn=tf.nn.swish, blocks_args=DEFAULT_BLOCKS_ARGS, inputs=None, **kwargs): img_input = inputs #-------------------------------------------------# # 创建stem部分 # 416,416,3 -> 208,208,32 #-------------------------------------------------# x = img_input x = layers.ZeroPadding2D(padding=correct_pad(x, 3), name='stem_conv_pad')(x) x = layers.Conv2D(round_filters(32, depth_divisor, width_coefficient), 3, strides=2, padding='valid', use_bias=False, kernel_initializer=CONV_KERNEL_INITIALIZER, name='stem_conv')(x) x = layers.BatchNormalization(axis=3, name='stem_bn')(x) x = layers.Activation(activation_fn, name='stem_activation')(x) #-------------------------------------------------# # 进行一个深度的copy #-------------------------------------------------# blocks_args = deepcopy(blocks_args) #-------------------------------------------------# # 计算总的efficient_block的数量 #-------------------------------------------------# b = 0 blocks = float(sum(args['repeats'] for args in blocks_args)) feats = [] filters_outs = [] #------------------------------------------------------------------------------# # 对结构块参数进行循环、一共进行7个大的结构块。 # 每个大结构块下会重复小的efficient_block、每个大结构块的shape变化为: # 208,208,32 -> 208,208,16 -> 104,104,24 -> 52,52,40 # -> 26,26,80 -> 26,26,112 -> 13,13,192 -> 13,13,320 # 输入为208,208,32,最终获得三个shape的有效特征层 # 104,104,24、26,26,112、13,13,320 #------------------------------------------------------------------------------# for (i, args) in enumerate(blocks_args): assert args['repeats'] > 0 args['filters_in'] = round_filters(args['filters_in'], depth_divisor, width_coefficient) args['filters_out'] = round_filters(args['filters_out'], depth_divisor, width_coefficient) for j in range(round_repeats(args.pop('repeats'), depth_coefficient)): if j > 0: args['strides'] = 1 args['filters_in'] = args['filters_out'] x = block(x, activation_fn, drop_connect_rate * b / blocks, name='block{}{}_'.format(i + 1, chr(j + 97)), **args) b += 1 feats.append(x) if i == 2 or i == 4 or i == 6: filters_outs.append(args['filters_out']) return feats, filters_outs def EfficientNetB0(inputs=None, **kwargs): return EfficientNet(1.0, 1.0, inputs=inputs, **kwargs) def EfficientNetB1(inputs=None, **kwargs): return EfficientNet(1.0, 1.1, inputs=inputs, **kwargs) def EfficientNetB2(inputs=None, **kwargs): return EfficientNet(1.1, 1.2, inputs=inputs, **kwargs) def EfficientNetB3(inputs=None, **kwargs): return EfficientNet(1.2, 1.4, inputs=inputs, **kwargs) def EfficientNetB4(inputs=None, **kwargs): return EfficientNet(1.4, 1.8, inputs=inputs, **kwargs) def EfficientNetB5(inputs=None, **kwargs): return EfficientNet(1.6, 2.2, inputs=inputs, **kwargs) def EfficientNetB6(inputs=None, **kwargs): return EfficientNet(1.8, 2.6, inputs=inputs, **kwargs) def EfficientNetB7(inputs=None, **kwargs): return EfficientNet(2.0, 3.1, inputs=inputs, **kwargs)
2、Yolov3上的应用
对于yolov3来讲,我们需要利用主干特征提取网络获得的三个有效特征进行加强特征金字塔的构建。
我们通过上述代码可以取出三个有效特征层,我们可以利用这三个有效特征层替换原来yolov3主干网络darknet53的有效特征层。
为了进一步减少参数量,我们减少了yolov3中用到的普通卷积的通道数。
最终EfficientNet-YoloV3的构建代码如下:
from functools import wraps from keras.initializers import random_normal from keras.layers import (BatchNormalization, Concatenate, Conv2D, Input, Lambda, LeakyReLU, UpSampling2D) from keras.models import Model from keras.regularizers import l2 from utils.utils import compose from nets.efficientnet import (EfficientNetB0, EfficientNetB1, EfficientNetB2, EfficientNetB3, EfficientNetB4, EfficientNetB5, EfficientNetB6, EfficientNetB7) from nets.yolo_training import yolo_loss Efficient = [EfficientNetB0,EfficientNetB1,EfficientNetB2,EfficientNetB3,EfficientNetB4,EfficientNetB5,EfficientNetB6,EfficientNetB7] #------------------------------------------------------# # 单次卷积DarknetConv2D # 如果步长为2则自己设定padding方式。 # 测试中发现没有l2正则化效果更好,所以去掉了l2正则化 #------------------------------------------------------# @wraps(Conv2D) def DarknetConv2D(*args, **kwargs): darknet_conv_kwargs = {'kernel_initializer' : random_normal(stddev=0.02), 'kernel_regularizer': l2(5e-4)} darknet_conv_kwargs['padding'] = 'valid' if kwargs.get('strides')==(2,2) else 'same' darknet_conv_kwargs.update(kwargs) return Conv2D(*args, **darknet_conv_kwargs) #---------------------------------------------------# # 卷积块 -> 卷积 + 标准化 + 激活函数 # DarknetConv2D + BatchNormalization + LeakyReLU #---------------------------------------------------# def DarknetConv2D_BN_Leaky(*args, **kwargs): no_bias_kwargs = {'use_bias': False} no_bias_kwargs.update(kwargs) return compose( DarknetConv2D(*args, **no_bias_kwargs), BatchNormalization(), LeakyReLU(alpha=0.1)) #---------------------------------------------------# # 特征层->最后的输出 #---------------------------------------------------# def make_five_conv(x, num_filters): x = DarknetConv2D_BN_Leaky(num_filters, (1,1))(x) x = DarknetConv2D_BN_Leaky(num_filters*2, (3,3))(x) x = DarknetConv2D_BN_Leaky(num_filters, (1,1))(x) x = DarknetConv2D_BN_Leaky(num_filters*2, (3,3))(x) x = DarknetConv2D_BN_Leaky(num_filters, (1,1))(x) return x def make_yolo_head(x, num_filters, out_filters): y = DarknetConv2D_BN_Leaky(num_filters*2, (3,3))(x) y = DarknetConv2D(out_filters, (1,1))(y) return y #---------------------------------------------------# # FPN网络的构建,并且获得预测结果 #---------------------------------------------------# def yolo_body(input_shape, anchors_mask, num_classes, phi = 0): inputs = Input(input_shape) #---------------------------------------------------# # 生成darknet53的主干模型 # 获得三个有效特征层,他们的shape分别是: # C3 为 52,52,256 # C4 为 26,26,512 # C5 为 13,13,1024 #---------------------------------------------------# feats, filters_outs = Efficient[phi](inputs = inputs) feat1 = feats[2] feat2 = feats[4] feat3 = feats[6] #---------------------------------------------------# # 第一个特征层 # y1=(batch_size,13,13,3,85) #---------------------------------------------------# # 13,13,1024 -> 13,13,512 -> 13,13,1024 -> 13,13,512 -> 13,13,1024 -> 13,13,512 x = make_five_conv(feat3, int(filters_outs[2])) P5 = make_yolo_head(x, int(filters_outs[2]), len(anchors_mask[0]) * (num_classes+5)) # 13,13,512 -> 13,13,256 -> 26,26,256 x = compose(DarknetConv2D_BN_Leaky(int(filters_outs[1]), (1,1)), UpSampling2D(2))(x) # 26,26,256 + 26,26,512 -> 26,26,768 x = Concatenate()([x, feat2]) #---------------------------------------------------# # 第二个特征层 # y2=(batch_size,26,26,3,85) #---------------------------------------------------# # 26,26,768 -> 26,26,256 -> 26,26,512 -> 26,26,256 -> 26,26,512 -> 26,26,256 x = make_five_conv(x, int(filters_outs[1])) P4 = make_yolo_head(x, int(filters_outs[1]), len(anchors_mask[1]) * (num_classes+5)) # 26,26,256 -> 26,26,128 -> 52,52,128 x = compose(DarknetConv2D_BN_Leaky(int(filters_outs[0]), (1,1)), UpSampling2D(2))(x) # 52,52,128 + 52,52,256 -> 52,52,384 x = Concatenate()([x, feat1]) #---------------------------------------------------# # 第三个特征层 # y3=(batch_size,52,52,3,85) #---------------------------------------------------# # 52,52,384 -> 52,52,128 -> 52,52,256 -> 52,52,128 -> 52,52,256 -> 52,52,128 x = make_five_conv(x, int(filters_outs[0])) P3 = make_yolo_head(x, int(filters_outs[0]), len(anchors_mask[2]) * (num_classes+5)) return Model(inputs, [P5, P4, P3])
加载全部内容