728x90
반응형

Transfer learning

적은 데이터를 갖고 있더라도 복잡한 문제 상황을 해결할 수 있는 방법
이미 학습된 모델을 통해 적은 데이터를 학습시켜 최적의 성능을 낼수 있는 방법

Layer의 앞부분에서는 간단한 특징들(, ) 탐지할 있지만, layer들의 조합으로 인해 점차 복잡한 특징을 탐지해낸다

결국 layer의 뒷부분으로 갈수록 학습한 데이터에 맞춰 특징을 탐지할 있도록 변환된다

데이터가 유사하면 유사할 수록 앞단의 layer부분에서는 비슷한 특징을 추출한다

예를 들어서 고양이를 학습한 모델을 활용하여 개를 분류하는 모델로 전이학습을 한다고 한다면 앞부분의 layer는 학습을 시키지 않고 뒷부분만 새로운 데이터인 '개' 데이터로 학습을 시켜 새로운 모델을 만들어 낸다

유사한 데이터를 전이학습하고자 할 때는 특징을 추출하는 Convolutional layer부분은 그대로 사용하고

Fully connected 부분만 변형하여 사용한다

Transfer learning 두 가지 특징

1. Feature extraction

- 비슷한 이미지를 학습한 모델의 convolutional layer만 활용하는 방식

- Fully connected layer는 새로 학습할 데이터에 맞게 새로운 추출기를 사용한다

- 예시) 개와 고양이 분류 모델

2. Fine tuning

- 범용적인 이미지를 학습한 모델을 그대로 사용하되 Fully connected layer 부분을 미세하게 조정하여 사용하는 방법

- 예시) ImageNet 데이터를 학습한 모델을 활용할

Fine tuning

import tensorflow as tf 
import numpy as np

vgg = tf.keras.applications.VGG16(include_top=True)

tf.keras.utils.plot_model(vgg, rankdir='BT')

vgg.summary()
Model: "vgg16"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_3 (InputLayer)         [(None, 224, 224, 3)]     0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 56, 56, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 28, 28, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 28, 28, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 14, 14, 512)       0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 7, 7, 512)         0         
_________________________________________________________________
flatten (Flatten)            (None, 25088)             0         
_________________________________________________________________
fc1 (Dense)                  (None, 4096)              102764544 
_________________________________________________________________
fc2 (Dense)                  (None, 4096)              16781312  
_________________________________________________________________
predictions (Dense)          (None, 1000)              4097000   
=================================================================
Total params: 138,357,544
Trainable params: 138,357,544
Non-trainable params: 0
_________________________________________________________________
vgg.input, vgg.output, vgg.layers
# (<KerasTensor: shape=(None, 224, 224, 3) dtype=float32 (created by layer 'input_3')>,
 <KerasTensor: shape=(None, 1000) dtype=float32 (created by layer 'predictions')>,
 [<keras.engine.input_layer.InputLayer at 0x7f580e71d090>,
  <keras.layers.convolutional.Conv2D at 0x7f580f887390>,
  <keras.layers.convolutional.Conv2D at 0x7f580e71f110>,
  <keras.layers.pooling.MaxPooling2D at 0x7f580e78f710>,
  <keras.layers.convolutional.Conv2D at 0x7f580f830d10>,
  <keras.layers.convolutional.Conv2D at 0x7f580e788110>,
  <keras.layers.pooling.MaxPooling2D at 0x7f580e7938d0>,
  <keras.layers.convolutional.Conv2D at 0x7f580f821ed0>,
  <keras.layers.convolutional.Conv2D at 0x7f580e771b50>,
  <keras.layers.convolutional.Conv2D at 0x7f580e77a110>,
  <keras.layers.pooling.MaxPooling2D at 0x7f580e76be10>,
  <keras.layers.convolutional.Conv2D at 0x7f580f82ec10>,
  <keras.layers.convolutional.Conv2D at 0x7f580e75a310>,
  <keras.layers.convolutional.Conv2D at 0x7f580e76bbd0>,
  <keras.layers.pooling.MaxPooling2D at 0x7f580e79bcd0>,
  <keras.layers.convolutional.Conv2D at 0x7f580e7cbf50>,
  <keras.layers.convolutional.Conv2D at 0x7f580e79ca90>,
  <keras.layers.convolutional.Conv2D at 0x7f580f8a3250>,
  <keras.layers.pooling.MaxPooling2D at 0x7f5880044250>,
  <keras.layers.core.Flatten at 0x7f5880040b90>,
  <keras.layers.core.Dense at 0x7f5880040890>,
  <keras.layers.core.Dense at 0x7f580e7d8c10>,
  <keras.layers.core.Dense at 0x7f588002ee50>])
#vgg.layers.pop() # 맨 마지막 layer 삭제 # 내가 직접 지정하려고
# <keras.layers.core.Dense at 0x7f580f8308d0>
vgg.layers.append(tf.keras.layers.Dense(5)) # 마지막 layer에 추가

vgg.layers
# [<keras.engine.input_layer.InputLayer at 0x7f5880019110>,
#  <keras.layers.convolutional.Conv2D at 0x7f588003ed90>,
#  <keras.layers.convolutional.Conv2D at 0x7f580f898810>,
#  <keras.layers.pooling.MaxPooling2D at 0x7f580f8ce050>,
#  <keras.layers.convolutional.Conv2D at 0x7f580e7df8d0>,
#  <keras.layers.convolutional.Conv2D at 0x7f580f8c8ad0>,
#  <keras.layers.pooling.MaxPooling2D at 0x7f588003ef10>,
#  <keras.layers.convolutional.Conv2D at 0x7f580e7e7450>,
#  <keras.layers.convolutional.Conv2D at 0x7f588005f410>,
#  <keras.layers.convolutional.Conv2D at 0x7f58d9d08790>,
#  <keras.layers.pooling.MaxPooling2D at 0x7f580e80b0d0>,
#  <keras.layers.convolutional.Conv2D at 0x7f580e7e7750>,
#  <keras.layers.convolutional.Conv2D at 0x7f580f843ed0>,
#  <keras.layers.convolutional.Conv2D at 0x7f580f841350>,
#  <keras.layers.pooling.MaxPooling2D at 0x7f580f83f050>,
#  <keras.layers.convolutional.Conv2D at 0x7f580f83dfd0>,
#  <keras.layers.convolutional.Conv2D at 0x7f580f8381d0>,
#  <keras.layers.convolutional.Conv2D at 0x7f580f83db10>,
#  <keras.layers.pooling.MaxPooling2D at 0x7f580f8349d0>,
#  <keras.layers.core.Flatten at 0x7f580f82e950>,
#  <keras.layers.core.Dense at 0x7f580f82e850>,
#  <keras.layers.core.Dense at 0x7f580f82e190>,
#  <keras.layers.core.Dense at 0x7f580f8308d0>]
mylayer = vgg.layers

vgg.layers[1].trainable
# True

vgg.trainable = False # weight를 학습하지 마세요
mylayer = []
for i in vgg.layers[:-1]:
  i.trainable = False 
  mylayer.append(i)
mylayer.append(tf.keras.layers.Dense(5))

model = tf.keras.models.Sequential(mylayer)

model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 56, 56, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 28, 28, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 28, 28, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 14, 14, 512)       0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 7, 7, 512)         0         
_________________________________________________________________
flatten (Flatten)            (None, 25088)             0         
_________________________________________________________________
fc1 (Dense)                  (None, 4096)              102764544 
_________________________________________________________________
fc2 (Dense)                  (None, 4096)              16781312  
_________________________________________________________________
dense_2 (Dense)              (None, 5)                 20485     
=================================================================
Total params: 134,281,029
Trainable params: 20,485
Non-trainable params: 134,260,544
_________________________________________________________________

성능은 어느정도 원하는 기대치에 도달했으나 모델을 사용하는 것이 현실적으로 쉽지 않았다

정확도가 높으면 메모리 사용량이 많아지고, 연산량이 많아져서 실제로 사용하기에는 적합하지 않았다

그래서 2017년 이후 모델들은 경량화에 집중하기 시작했다

 

google = tf.keras.applications.InceptionV3()
tf.keras.utils.plot_model(google) 
# GoogLeNet은 복잡하기 때문에 transfer learing으로 사용하기에 부적합한 모델이다 / 
# Fine tuning하기 쉽지 않음

 

Sequential 방식

vgg = tf.keras.applications.VGG16(include_top=True)
# fine tuning 관점 / # input은 224,224 형태 데이터를 받아야 한다 

vgg.trainable = False # 모든 layer를 학습하지 않도록 설정 

mylayer = vgg.layers

mylayer.pop()
# <keras.layers.core.Dense at 0x7f580e1e7f90>
model = tf.keras.models.Sequential(mylayer + [tf.keras.layers.Dense(5)])
model.summary() # Flatten을 사용하면 데이터 크기가 맞아야 한다 
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 56, 56, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 28, 28, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 28, 28, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 14, 14, 512)       0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 7, 7, 512)         0         
_________________________________________________________________
flatten (Flatten)            (None, 25088)             0         
_________________________________________________________________
fc1 (Dense)                  (None, 4096)              102764544 
_________________________________________________________________
fc2 (Dense)                  (None, 4096)              16781312  
_________________________________________________________________
dense_3 (Dense)              (None, 5)                 20485     
=================================================================
Total params: 134,281,029
Trainable params: 20,485
Non-trainable params: 134,260,544
_________________________________________________________________

Model 방식

input_ = vgg.input
x = vgg.layers[1](input_)
x = vgg.layers[2](x)
x = vgg.layers[3](x) 

import numpy as np

im = tf.keras.preprocessing.image.load_img('people.jpg')
im2 = np.resize(np.array(im), (224,224,3)) # 입력 크기를 맞추기 위해서 크기를 조절한다 

im2.shape
# (224, 224, 3)

model(im2[np.newaxis])
# <tf.Tensor: shape=(1, 5), dtype=float32, numpy=
# array([[ 0.60317594, -0.7978455 ,  2.3856583 , -0.96538734, -0.4498136 ]],
#       dtype=float32)>

Feature extraction

Feature extraction에서는 FC부분이 없기 때문에 FC로 넘겨주기 위한 데이터가 1개인 1차원으로 변경해야 한다

# Hyperparameter로써 접근해야 한다 / 상황에 맞게 선택하여 사용

1. Flatten

2. Global average pooling

 

vgg2 = tf.keras.applications.VGG16(include_top=False) 
# include_top = False로 지정하면 분류 층이 포함되지 않은 네트워크를 로드하므로 특징 추출에 이상적이다

vgg2.summary() # 크기 맞출 필요 없다 
Model: "vgg16"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_8 (InputLayer)         [(None, None, None, 3)]   0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, None, None, 64)    1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, None, None, 64)    36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, None, None, 64)    0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, None, None, 128)   73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, None, None, 128)   147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, None, None, 128)   0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, None, None, 256)   295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, None, None, 256)   590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, None, None, 256)   590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, None, None, 256)   0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, None, None, 512)   1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, None, None, 512)   0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, None, None, 512)   0         
=================================================================
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
_________________________________________________________________
vgg2.trainable = False 
vgg2.layers[0] = tf.keras.layers.InputLayer((224,224,3))
layers = vgg2.layers+[tf.keras.layers.GlobalAvgPool2D()] # flatten과 달리 입력 크기와 상관없이 결과 낼수 있는 모델을 만들 수 있다 
layers = layers + [tf.keras.layers.Dense(5)]
model2 = tf.keras.models.Sequential(layers) 

model2(np.array(im)[np.newaxis])
# <tf.Tensor: shape=(1, 5), dtype=float32, numpy=
# array([[-1.9419088,  6.326975 , -2.2549076, -3.0776834,  1.4872327]],
#       dtype=float32)>
model2(im2[np.newaxis])
# <tf.Tensor: shape=(1, 5), dtype=float32, numpy=
# array([[ 2.4478257 ,  1.1684433 , -8.583718  , -0.92841846,  6.4567146 ]],
#       dtype=float32)>
model2.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
block1_conv1 (Conv2D)        (None, None, None, 64)    1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, None, None, 64)    36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, None, None, 64)    0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, None, None, 128)   73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, None, None, 128)   147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, None, None, 128)   0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, None, None, 256)   295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, None, None, 256)   590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, None, None, 256)   590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, None, None, 256)   0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, None, None, 512)   1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, None, None, 512)   0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, None, None, 512)   0         
_________________________________________________________________
global_average_pooling2d (Gl (None, 512)               0         
_________________________________________________________________
dense_4 (Dense)              (None, 5)                 2565      
=================================================================
Total params: 14,717,253
Trainable params: 2,565
Non-trainable params: 14,714,688
_________________________________________________________________

 

vgg2 = tf.keras.applications.VGG16(include_top=False, input_shape=(224,224,3)) 
vgg2.trainable = False 
layers = vgg2.layers+[tf.keras.layers.GlobalAvgPool2D()] 
layers = layers + [tf.keras.layers.Dense(5)]
model2 = tf.keras.models.Sequential(layers)
model2.summary()
Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 56, 56, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 28, 28, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 28, 28, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 14, 14, 512)       0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 7, 7, 512)         0         
_________________________________________________________________
global_average_pooling2d_1 ( (None, 512)               0         
_________________________________________________________________
dense_5 (Dense)              (None, 5)                 2565      
=================================================================
Total params: 14,717,253
Trainable params: 2,565
Non-trainable params: 14,714,688
_________________________________________________________________

Transfer learning + preprocessing layer

im = tf.keras.preprocessing.image.load_img('people.jpg')
im2 = np.resize(np.array(im), (224,224,3)) 

# vgg는 채널별로 평균을 내서 빼준다 (Zero centered)
tf.keras.applications.vgg16.preprocess_input(im2) 

array([[[ 73.061    ,  72.221    ,  77.32     ],
        [ 75.061    ,  74.221    ,  79.32     ],
        [ 76.061    ,  75.221    ,  80.32     ],
        ...,
        [  4.060997 ,  26.221    ,  45.32     ],
        [  7.060997 ,  29.221    ,  48.32     ],
        [ 12.060997 ,  34.221    ,  53.32     ]],

       [[ 15.060997 ,  29.221    ,  43.32     ],
        [ 10.060997 ,  28.221    ,  44.32     ],
        [  7.060997 ,  26.221    ,  48.32     ],
        ...,
        [ 67.061    ,  84.221    , 102.32     ],
        [106.061    , 123.221    , 131.32     ],
        [ 56.060997 ,  75.221    ,  91.32     ]],

       [[ 41.060997 ,  61.221    ,  87.32     ],
        [ 43.060997 ,  63.221    ,  89.32     ],
        [ 45.060997 ,  65.221    ,  91.32     ],
        ...,
        [ 29.060997 ,  28.221    ,  33.32     ],
        [ 29.060997 ,  34.221    ,  44.32     ],
        [ 38.060997 ,  49.221    ,  64.32     ]],

       ...,

       [[  5.060997 ,  22.221    ,  40.32     ],
        [  6.060997 ,  23.221    ,  41.32     ],
        [  6.060997 ,  23.221    ,  41.32     ],
        ...,
        [-43.939003 , -49.779    , -53.68     ],
        [-37.939003 , -46.779    , -52.68     ],
        [-47.939003 , -58.779    , -64.68     ]],

       [[-46.939003 , -61.779    , -68.68     ],
        [-38.939003 , -53.779    , -60.68     ],
        [-39.939003 , -54.779    , -61.68     ],
        ...,
        [ 31.060997 ,  46.221    ,  63.32     ],
        [ 30.060997 ,  45.221    ,  62.32     ],
        [ 32.060997 ,  45.221    ,  62.32     ]],

       [[ 31.060997 ,  45.221    ,  64.32     ],
        [ 31.060997 ,  45.221    ,  64.32     ],
        [ 29.060997 ,  44.221    ,  61.32     ],
        ...,
        [-20.939003 , -25.779    ,  -3.6800003],
        [-11.939003 , -18.779    ,   3.3199997],
        [ -3.939003 , -10.778999 ,  11.32     ]]], dtype=float32)
pretrain_model = tf.keras.applications.VGG16(include_top=False, input_shape=(224,224,3))
pretrain_model.trainable = False
model = tf.keras.models.Sequential([
    tf.keras.layers.Lambda(lambda data: tf.keras.applications.vgg16.preprocess_input(
        tf.cast(data, tf.float32)), input_shape=(224,224,3) # data n개씩 처리 할 수 있도록 lambda layer를 사용한다  
    ), 
    pretrain_model,
    tf.keras.layers.GlobalAvgPool2D(),
    tf.keras.layers.Dense(5)
]) # 전처리를 하지 않는 실수를 방지할 수 있다
model.summary()
Model: "sequential_6"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lambda_2 (Lambda)            (None, 224, 224, 3)       0         
_________________________________________________________________
vgg16 (Functional)           (None, 7, 7, 512)         14714688  
_________________________________________________________________
global_average_pooling2d_2 ( (None, 512)               0         
_________________________________________________________________
dense_8 (Dense)              (None, 5)                 2565      
=================================================================
Total params: 14,717,253
Trainable params: 2,565
Non-trainable params: 14,714,688
_________________________________________________________________

Zero centered

Type Markdown and LaTeX: 𝛼2

 

x = tf.keras.layers.Dense(5)

x.weights, x.trainable_weights, x.non_trainable_weights
# ([], [], [])

x(tf.constant([[1,2,],[3,4,]]))
# <tf.Tensor: shape=(2, 5), dtype=float32, numpy=
# array([[-0.12723184,  1.5356824 , -0.45605123, -1.7108853 , -2.0955188 ],
#        [ 0.3099841 ,  2.8317063 , -0.42237318, -3.7084074 , -4.6590133 ]],
#       dtype=float32)>
x.weights, x.trainable_weights, x.non_trainable_weights

# ([<tf.Variable 'dense_14/kernel:0' shape=(2, 5) dtype=float32, numpy=
#   array([[ 0.56444776, -0.23965877,  0.4897293 , -0.28663665, -0.46797565],
#          [-0.3458398 ,  0.88767064, -0.47289026, -0.71212435, -0.81377155]],
#         dtype=float32)>,
#   <tf.Variable 'dense_14/bias:0' shape=(5,) dtype=float32, numpy=array([0., 0., 0., 0., 0.], dtype=float32)>],
#  [<tf.Variable 'dense_14/kernel:0' shape=(2, 5) dtype=float32, numpy=
#   array([[ 0.56444776, -0.23965877,  0.4897293 , -0.28663665, -0.46797565],
#          [-0.3458398 ,  0.88767064, -0.47289026, -0.71212435, -0.81377155]],
#         dtype=float32)>,
#   <tf.Variable 'dense_14/bias:0' shape=(5,) dtype=float32, numpy=array([0., 0., 0., 0., 0.], dtype=float32)>],
#  [])
y = tf.keras.layers.BatchNormalization()

y.built
# False

y.build((None,4))

y.built
# True
y.weights 
# [<tf.Variable 'gamma:0' shape=(4,) dtype=float32, numpy=array([1., 1., 1., 1.], dtype=float32)>,
#  <tf.Variable 'beta:0' shape=(4,) dtype=float32, numpy=array([0., 0., 0., 0.], dtype=float32)>,
#  <tf.Variable 'moving_mean:0' shape=(4,) dtype=float32, numpy=array([0., 0., 0., 0.], dtype=float32)>,
#  <tf.Variable 'moving_variance:0' shape=(4,) dtype=float32, numpy=array([1., 1., 1., 1.], dtype=float32)>]
# 이동평균과 표준편차는 학습을 통해 구하는 것이 아니기 때문에 trainable weight가 아니다
y.trainable_weights  
# [<tf.Variable 'gamma:0' shape=(4,) dtype=float32, numpy=array([1., 1., 1., 1.], dtype=float32)>,
#  <tf.Variable 'beta:0' shape=(4,) dtype=float32, numpy=array([0., 0., 0., 0.], dtype=float32)>]

y.trainable = False
y.trainable_weights
# []
y.non_trainable_weights
# [<tf.Variable 'gamma:0' shape=(4,) dtype=float32, numpy=array([1., 1., 1., 1.], dtype=float32)>,
#  <tf.Variable 'beta:0' shape=(4,) dtype=float32, numpy=array([0., 0., 0., 0.], dtype=float32)>,
#  <tf.Variable 'moving_mean:0' shape=(4,) dtype=float32, numpy=array([0., 0., 0., 0.], dtype=float32)>,
#  <tf.Variable 'moving_variance:0' shape=(4,) dtype=float32, numpy=array([1., 1., 1., 1.], dtype=float32)>]
model(im2[tf.newaxis], training=True)
# <tf.Tensor: shape=(1, 5), dtype=float32, numpy=
# array([[-1.6044631,  4.722974 ,  2.0047941, -2.9608102,  1.3918904]],
#       dtype=float32)>

Batch normalization할 때 layer.trainable = False를 설정하면 학습하지 않고 이동평균을 그대로 사용하게 된다

따라서 학습 상태를 유지하기 위해서 batch normalization layer를 포함하는 모델을 동결 해제 하면

기본 모델을 예측(호출)할 때 training = False를 전달해야 한다

 

 

 

반응형

+ Recent posts