728x90
반응형

yolo v2 이후에 2017. 08. retinaNet이 one-stage detector 중에서 매우 높은 예측 성능을 보이는데 FPN을 차용했기 때문임.

그래서 YOLO V3도 FPN을 차용하면서 성능을 높임. Real time detector로서 자리매김을 한다.

- 원본정도 fm - 절반 fm - 또 절반 fm 으로 구성

- 가장 최상위 '또 절반 fm'은 추상적이지만 학습에 완숙도가 높은, 그래서 object detection을 일반적으로 수행하는 fm이다. 그런데 일반적으로 최상위에서 od를 수행하니 큰 object만 수행을 하게 되더라. 그래서 ssd는 하위 fm에서도 뽑아낸 것임.

- 이런 상태에서 fpn은 conv 연산을 한 상태에서 크기가 다르니깐 2배 upsampling하고나서 하위 fm을 합치고 그 합쳐진 fm에서 predict 수행

- 그러면 추상적 + 상세함의 특징을 반영한상태로 예측가능함.

 

YOLO Vers 비교

항목 v1 v2 v3
원본 이미지 크기 446x446 416x416 416x416
Feature Extractor Inception 변형 DarkNet19 DarkNet53
(resNet의영향
Grid당 Anchor Box 수 2개 => 셀 별 prediction 2개
(Anchor box는 고정크기)
5개 Output Feature Map 당 3개
서로다른 크기와 스케일로 총 9개
Anchor Box 결정 방법 - Kmeans Clustering Kmeans Clustering
Output Feature Map 크기
(Depth 제외)
7 x 7 13 x 13 13 x 13, 26 x 26, 52 x 52
3개의 Feature Map 사용
Feature Map Scaling
기법
- - FPN
(Feature Pyramid Network)

- FPN

- backbone 성능 향상 : darknet 53 ( weight 가능한 layer

- 13x13에 매번 2배를 한 fm

- anchor box 9개

- multi labels 예측 : softmax 가 아닌 sigmoid 기반 logistic classifier로 개별 object의 multi labels 예측

 

 

Model Architect

upsampling feature map + feature map => predict

 

 

Yolo v3 Network 구조

- output : 13x13, 26x26, 52x52

- 연두색 : upsampling feature map

Output Feature map

25+25+25 => depth가 75

13x13x 75

26x26x 75

52x52x 75

 

Darknet 53 특성 ( 53개의 layer

Training

- Data Augmentation

- batch normalization

 

반응형

'Computer_Science > Computer Vision Guide' 카테고리의 다른 글

7-8. OpenCV DNN based yolo v3 inference  (0) 2021.10.27
7-7. opencv dnn yolo object detection  (0) 2021.10.27
7-4~5. YOLO V2  (0) 2021.10.25
7-2~3. yolo v1  (0) 2021.10.24
7-1. YOLO - You Only Look Once  (0) 2021.10.24
728x90
반응형

YOLO Vers 비교

항목 v1 v2 v3
원본 이미지 크기 446x446 416x416 416x416
Feature Extractor Inception 변형 DarkNet19 DarkNet53
(resNet의영향
Grid당 Anchor Box 수 2개 => 셀 별 prediction 2개
(Anchor box는 고정크기)
5개 Output Feature Map 당 3개
서로다른 크기와 스케일로 총 9개
Anchor Box 결정 방법 - Kmeans Clustering Kmeans Clustering
Output Feature Map 크기
(Depth 제외)
7 x 7 13 x 13 13 x 13, 26 x 26, 52 x 52
3개의 Feature Map 사용
Feature Map Scaling
기법
- - FPN
(Feature Pyramid Network)

 

YOLO V2 detection 시간 및 성능

- 속도면에서 ssd와 함께 압도적인 성능을 보여줌.

- yolo 중에서도 tiny yolo는 더욱 빠름

 

YOLO V2 특징

- Batch Normalization 

   * cnn - batch normali(정규화)  - acti(relu)

- High Resolution classifier : 네트웍의 classifier 단을 보다 높은 resolution (448 x 448)로 fine tuning 

- classification layer 를 fully connected dense layer에서 fully convolution 으로 변경하고 서로 다른 크기의 image들로 네트

- 13 x 13 feature map 기반에서 개별 grid cell 별 5개의 Anchor box에서 object detection

   * Anchor box 크기와 ratio는 kmeans clustering으로 설정

- 예측 bbox의 x,y 좌표가 중심 cell 내에서 벗어나지 않도록 direct location prediction 적용

- darknet-19 classification model 채택 => 예측성능, 수행시간 향상

 

 

yolo v2 anchor box로 1cell 에서 여러개 object detection

- SSD와 마찬가지로 1개의 CELL에서 여러개의 Anchor를 통해 개별 cell에서 여러개 object detection가능

- kmeans clustering을 통해 데이터 세트의 이미지크기와 shape ratio따른 5개 군집화 분류를 하여 anchor box 계산

 

Output feature map

 

- depth 125개, anchor box 가 5개라서 개당 25개

- yolo v1 : 각 cell의 bbox의 class 확률 : 2개 ( bbox 좌표 4개, confidence 1개) (10개) // 20개의 pascal

- yolo v2 : bbox 25 => bbox 좌표 4개, confidence score 1개, class scores 20개 // 5개 묶음

 

Direct Location Prediction

 

(pw, ph) : anchor box size

(tx, ty, tw, th) : 모델 예측 offset 값

(bx, by) : 예측 bounding box 중심 좌표와 size

* center 좌표가 cell 중심을 너무 벗어나지 못하도록 0~1 사이의 시그모이드 값(1/1+e^x)으로 조절

 

- yolo v1 loss와 유사한 loss 식

 

Passthrough module을 통한 fine grained feature

- 좀더 작은 오브젝트를 detect하기 위해서 26x26x512 feature map 특징을 유지한 채 13x13x2048로 reshape한 뒤 13x13x1024에 추가하여 feature map 생성

=> 1/4로 줄어듬

- merge module로 넣어서 작은 object를 찾기

- SSD는 각각 feature map에서 끄집어내서 합치고 nms으로 필터링

 

Multi-scale training

- classification layer가 convolution layer로 생성하여 동적으로 입력 이미지 크기 변경 가능

- 학습 시 10회 배치시 마다 입력 이미지 크기를 모델에서 320부터 608까지 동적으로 변경(32배수로 설정)

 

 

Darknet 19 backbone

- classification layer에 fully conneted layer를 제거하고 conv layer를 적용

  * vgg-16 : 30.69 bflops, top5 accuracy : 90%

    -> 3x3이라 간단해서 선호하는 아키텍처

  * yolo v1 : 8.52 bflops, top5 accuracy : 88%

  * yolo v2 darknet19 : 5.58 bflops, top5 accuracy : 91.2%

 

성능 향상

 

 

반응형

'Computer_Science > Computer Vision Guide' 카테고리의 다른 글

7-7. opencv dnn yolo object detection  (0) 2021.10.27
7-6~7. YOLO V3  (0) 2021.10.25
7-2~3. yolo v1  (0) 2021.10.24
7-1. YOLO - You Only Look Once  (0) 2021.10.24
6-7~8. TF hub pretrained model SSD Inference  (0) 2021.10.22
728x90
반응형

yolo v1

- yolo v1은 입력 이미지를 SxS grid로 나누고 각 Grid의 Cell이 하나의 object에 대한 Detection 수행

- 각 Grid Cell이 2개의 bounding box 후보를 기반으로 object의 bounding box를 예측

 

Yolo-v1 네트웤 및 prediction 값

- inception net 적용 / 1x1

- backbone이 없다. - 2015년 당시 sota는 vgg를 많이 썼다.

- 2차원 convolution된 3차원 feature map을 dense하게 만든다

- 그걸 reshape해서 7x7하고 detection

=> 7x7x30 > 30의 정보

 

각 grid cell 별로 아래를 계산

ㄱ. 2개의 bounding box 후보의 좌표와 해당 box별 confidence score

- x, y, w, h : 정규화된 bbox의 중심 좌표와 너비 / 높이

- confidence score = 오브젝트일 확률 * IOU 값

 

ㄴ. 클래스 확률 : Pascal VOC 기준 20개 클래스의 확률

YOLO V1 LOSS

BBOX중심 X, Y 좌표 LOSS,

- 예측 좌표 x, y 값과 Ground Truth 좌표 x, y값의 오차 제곱을 기반

- 모든 cell의 2개의 bbox(98개 bbox) 중에 예측 bbox를 책임지는 bbox만 loss 계산

- 98개 bbox중 오브젝트 예측을 책임지는 bbox만 1, 나머지는 0

( 책임지는 bbox만 계산하고 아닌애들은 0으로 처리

BBOX 너비 w, 높이 h Loss

- 예측 너비, 높익밧과 Ground Truth 너비, 높이값의 오차 제곱을 기반으로 하되, 크기가 큰 오브젝트의 경우 오류가 상대적으로 커짐을 제약하기 위해서 제곱근을 취함

- 루트를 쓰는건 bbox를 잘못예측 했을 때 많이 잘못했을 때

 

coord => 가중치를 곱해라, 5 곱

noobj 0.5 곱

 

Object Confidence Loss => 독특한 loss

- 예측된 object confidence score와 ground Truth의 IOU의 예측 오차를 기반

- Object를 책임지는 bbox confidence loss + object가 없어야 하는 bbox의 confidence loss

 

Classfication Loss => bbox 계산

- 예측 classification 확률 오차의 제곱. object를 책임지는 bbox만 대상

 

 

NMS

One-stage는 대게 많이 예측하고 NMS로 필터링하는 전략

Two-stage는 예측하고 확정하는 전략

 

개별 class별 NMS 수행

1. 특정 confidence 값 이하는 모두 제거

2. 가장 높은 confidence 값을 가진 순으로 bbox 정렬

3. 가장 높은 confidence를 가진 bbox와 iou와 겹치는 부분이 iou threshold보다 큰 bbox는 모두 제거

4. 남아있는 bbox에 대해 3번 step을 반복

 

 

이슈

detection시간은 빠르나 detection 성능이 떨어짐

특히 작은 object에 대한 성능이 나쁨

=> 한셀이 한 object를 담당하기 때문에 2개 object가 들어가면 아예 인식 못함

=> 구조적 문제

 

반응형

'Computer_Science > Computer Vision Guide' 카테고리의 다른 글

7-6~7. YOLO V3  (0) 2021.10.25
7-4~5. YOLO V2  (0) 2021.10.25
7-1. YOLO - You Only Look Once  (0) 2021.10.24
6-7~8. TF hub pretrained model SSD Inference  (0) 2021.10.22
6-6. TensorFlow hub  (0) 2021.10.22
728x90
반응형

실시간 object detection의 대명사

You Only Look Once => one stage detection

non-FPN

1. yolo v1 || 2015. 05

- 150 FPS

- 빠른 시간, 낮은 정확도

2. SSD || 2015. 12

- 수행성능, 시간 향상

03. yolo v2 || 2016. 12

- SSD와 대등한 수행성능, 시간

- 수행시간, 성능 모두 개선, ssd에 비해 작은 object 성능 저하

 

FPN

4. retinaNet || 2017. 08

- 속도는 느리지만 성능이 좋음

- yolo v3 보다 작은 object에 성능이 좋음

5. yolo v3 || 2018. 04

- 성능 대폭 개선

6. EfficientDet || 2019. 11

- D0 : yolo v3 보다 조금도 좋음

 

7. yolo v4 || 2020. 04

- 성능, 시간 모두 개선

 

Darknet 기반의 yolo 

=> c 기반의 deep learning framework

=> cuda 기반 인터페이스

yolo v3까지 원 개발자가 개발

 

 

반응형

'Computer_Science > Computer Vision Guide' 카테고리의 다른 글

7-4~5. YOLO V2  (0) 2021.10.25
7-2~3. yolo v1  (0) 2021.10.24
6-7~8. TF hub pretrained model SSD Inference  (0) 2021.10.22
6-6. TensorFlow hub  (0) 2021.10.22
6-5. opencv를 이용한 SSD Inference 실습2  (0) 2021.10.22
728x90
반응형
import tensorflow as tf

img = tf.keras.utils.get_file('zebra.jpg','https://i.imgur.com/XjeiRMV.jpg')

import cv2
import matplotlib.pyplot as plt

im = cv2.imread(img)
im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
im_ = im.copy()

rec1 = cv2.rectangle(im_, (120,25),(200,165), color=(255,0,0), thickness=2)
rec2 = cv2.rectangle(im_, (300,50),(480,320), color=(255,0,0), thickness=2)

plt.imshow(im_)

im_.shape
# (333, 500, 3)

h = w = 800
im_r = cv2.resize(im, (h,w))
im_r_ = im_r.copy()
import numpy as np
x = np.array([120, 25, 200, 165])
y = np.array([300, 50, 480,320])
x[0] = int(x[0]*(w/im.shape[1]))
x[1] = int(x[1]*(h/im.shape[0]))
x[2] = int(x[2]*(w/im.shape[1]))
x[3] = int(x[3]*(h/im.shape[0]))
y[0] = int(y[0]*(w/im.shape[1]))
y[1] = int(y[1]*(h/im.shape[0]))
y[2] = int(y[2]*(w/im.shape[1]))
y[3] = int(y[3]*(h/im.shape[0]))
rec1 = cv2.rectangle(im_r_, (x[0],x[1]),(x[2],x[3]), color=(255,0,0), thickness=2)
rec2 = cv2.rectangle(im_r_, (y[0],y[1]),(y[2],y[3]), color=(255,0,0), thickness=2)
from skimage.util import view_as_blocks, view_as_windows

plt.figure(figsize=(8,8))
plt.imshow(im_r_)

vgg = tf.keras.applications.VGG16(include_top=False)

for j,i in enumerate(vgg.layers):
    output = tf.keras.models.Model(vgg.input, i.output)
    print(output(im_r_[tf.newaxis]).shape,j)
    
(1, 800, 800, 3) 0
(1, 800, 800, 64) 1
(1, 800, 800, 64) 2
(1, 400, 400, 64) 3
(1, 400, 400, 128) 4
(1, 400, 400, 128) 5
(1, 200, 200, 128) 6
(1, 200, 200, 256) 7
(1, 200, 200, 256) 8
(1, 200, 200, 256) 9
(1, 100, 100, 256) 10
(1, 100, 100, 512) 11
(1, 100, 100, 512) 12
(1, 100, 100, 512) 13
(1, 50, 50, 512) 14
(1, 50, 50, 512) 15
(1, 50, 50, 512) 16
(1, 50, 50, 512) 17
(1, 25, 25, 512) 18
backbone = tf.keras.models.Model(vgg.input, vgg.layers[17].output)

backbone(im_r_[tf.newaxis]).shape
# TensorShape([1, 50, 50, 512])

plt.imshow(backbone(im_r_[tf.newaxis])[0,...,4])

vgg.summary()
Model: "vgg16"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_3 (InputLayer)         [(None, None, None, 3)]   0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, None, None, 64)    1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, None, None, 64)    36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, None, None, 64)    0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, None, None, 128)   73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, None, None, 128)   147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, None, None, 128)   0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, None, None, 256)   295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, None, None, 256)   590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, None, None, 256)   590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, None, None, 256)   0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, None, None, 512)   1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, None, None, 512)   0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, None, None, 512)   0         
=================================================================
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
_________________________________________________________________
vgg(im_r_[tf.newaxis])


<tf.Tensor: shape=(1, 25, 25, 512), dtype=float32, numpy=
array([[[[ 0.        ,  0.        ,  0.        , ...,  0.        ,
           0.        ,  0.        ],
         [ 0.        ,  0.        ,  0.        , ...,  0.        ,
           0.        ,  0.        ],
         [ 0.        ,  0.        ,  0.142358  , ...,  0.        ,
           0.        ,  0.        ],
         ...,
         [ 0.        ,  0.        ,  1.3040222 , ...,  0.        ,
           2.3414693 ,  0.        ],
         [ 0.        ,  0.        ,  0.        , ...,  0.        ,
           3.648667  ,  0.        ],
         [ 0.        ,  0.        ,  2.5827253 , ...,  0.        ,
           1.2787921 ,  0.        ]],

        [[ 0.        ,  0.        ,  0.        , ...,  0.        ,
           0.        ,  0.        ],
         [ 0.        ,  0.        ,  0.        , ...,  0.        ,
           0.        ,  0.        ],
         [ 0.        ,  0.        ,  0.        , ...,  0.        ,
           0.        ,  0.        ],
         ...,
         [ 0.        ,  0.        ,  0.        , ...,  0.        ,
           0.        ,  0.        ],
         [ 0.        ,  0.        ,  0.        , ...,  0.        ,
           0.        ,  0.        ],
         [ 0.        ,  0.        ,  0.        , ...,  0.        ,
           0.        ,  0.        ]],

        [[ 0.        ,  0.        ,  0.        , ...,  0.        ,
           0.        ,  0.        ],
         [ 0.        ,  0.        ,  0.        , ...,  0.        ,
           0.        ,  0.        ],
         [ 0.        ,  0.        ,  0.        , ...,  0.        ,
           0.        ,  0.        ],
         ...,
         [ 0.        ,  0.        , 25.991032  , ...,  0.        ,
           4.2155175 ,  0.        ],
         [ 0.        ,  0.        ,  9.656704  , ...,  0.        ,
           6.1238546 ,  0.        ],
         [ 0.        ,  0.        ,  0.        , ...,  0.        ,
           0.        ,  0.        ]],

        ...,

        [[ 0.        ,  0.        ,  0.        , ...,  0.        ,
           2.6076157 ,  0.24721637],
         [ 0.        ,  0.        ,  0.        , ...,  0.        ,
           0.        ,  1.4595927 ],
         [ 0.        ,  0.        ,  0.        , ...,  0.        ,
           0.        ,  0.13756028],
         ...,
         [ 0.        ,  0.        ,  0.        , ...,  0.        ,
           0.        ,  0.        ],
         [ 0.        ,  0.        ,  0.        , ...,  0.        ,
           0.        ,  0.        ],
         [ 0.        ,  0.        ,  0.        , ...,  0.        ,
           0.        ,  0.        ]],

        [[15.054876  ,  0.        ,  0.        , ...,  0.        ,
           2.182668  ,  0.        ],
         [11.117934  ,  0.        ,  0.        , ...,  0.        ,
           0.        ,  0.        ],
         [ 0.        ,  0.        ,  0.        , ...,  0.        ,
           0.        ,  0.        ],
         ...,
         [ 0.        ,  0.        ,  0.        , ...,  0.        ,
           0.        ,  0.        ],
         [ 0.        ,  0.        ,  0.        , ...,  0.        ,
           0.        ,  0.        ],
         [ 0.        ,  0.        ,  0.        , ...,  0.        ,
           0.        ,  0.        ]],

        [[ 0.        ,  0.        ,  0.        , ...,  0.        ,
           0.        ,  0.        ],
         [ 0.        ,  0.        ,  0.        , ...,  0.        ,
           0.29810184,  0.        ],
         [ 0.        ,  0.        ,  0.        , ...,  0.        ,
           1.1234534 ,  0.        ],
         ...,
         [ 0.        ,  0.        ,  0.        , ...,  0.        ,
           0.        ,  0.        ],
         [ 0.        ,  0.        ,  0.        , ...,  0.        ,
           0.        ,  0.        ],
         [ 0.        ,  0.        ,  0.        , ...,  0.        ,
           0.        ,  0.        ]]]], dtype=float32)>
im_r = cv2.resize(im, (h,w))
im_r_ = im_r.copy()

x = np.arange(8,800,16)
y = np.arange(8,800,16)

cl = np.array(np.meshgrid(x,y)).T.reshape(-1,2)

for i in range(2500):
    cv2.circle(im_r_, (cl[i,0], cl[i,1]),1, (255,0,0), thickness=2)
    
plt.figure(figsize=(10,10))
plt.imshow(im_r_)

50*50*9
# 22500

ratio = [0.5, 1, 2]
scale = [8,16,32]

al = np.zeros((22500,4))
count = 0
for i in cl:
    cx, cy = i[0],i[1]
    for r in ratio:
        for s in scale:
            h = pow(pow(s,2)/r,0.5)
            w = h*r
            h *= 16
            w *= 16
            xmin = cx-0.5*w
            ymin = cy-0.5*h
            xmax = cx+0.5*w
            ymax = cy+0.5*h
            al[count] = [xmin, ymin,xmax,ymax]
            count += 1
al.shape
# (22500, 4)

point = 570
im_r_
array([[[ 87,  51,  37],
        [ 91,  52,  40],
        [ 97,  54,  45],
        ...,
        [ 75,  48,  37],
        [ 67,  45,  35],
        [ 61,  43,  33]],

       [[ 87,  51,  37],
        [ 91,  53,  40],
        [ 97,  54,  45],
        ...,
        [ 74,  47,  36],
        [ 66,  44,  34],
        [ 60,  42,  32]],

       [[ 86,  52,  38],
        [ 90,  53,  41],
        [ 95,  54,  44],
        ...,
        [ 70,  43,  32],
        [ 62,  40,  30],
        [ 56,  38,  29]],

       ...,

       [[153,  93,  65],
        [106,  63,  43],
        [ 49,  28,  16],
        ...,
        [ 85,  50,  28],
        [129,  84,  55],
        [166, 113,  78]],

       [[106,  59,  47],
        [ 95,  50,  37],
        [ 82,  41,  25],
        ...,
        [108,  75,  51],
        [115,  75,  51],
        [123,  78,  53]],

       [[ 92,  49,  42],
        [ 91,  46,  35],
        [ 91,  45,  28],
        ...,
        [114,  83,  58],
        [110,  73,  50],
        [110,  68,  46]]], dtype=uint8)
# img_ = np.copy(im_r)
# for i in range(point,point+9):
#     x_min = int(al[i][0])
#     y_min = int(al[i][1])
#     x_max = int(al[i][2])
#     y_max = int(al[i][3])
#     cv2.rectangle(img_, (x_min,y_min),(x_max,y_max), (0,255,0), thickness=4)
# for i in range(2500):
#     cv2.circle(img_, (cl[i,0], cl[i,1]),1, (0,0,255), thickness=2)    

x = np.array([120, 25, 200, 165])
y = np.array([300, 50, 480,320])

x[0] = int(x[0]*1.6)
x[1] = int(x[1]*2.4)
x[2] = int(x[2]*1.6)
x[3] = int(x[3]*2.4)
y[0] = int(y[0]*1.6)
y[1] = int(y[1]*2.4)
y[2] = int(y[2]*1.6)
y[3] = int(y[3]*2.4)

rec1 = cv2.rectangle(im_r_, (x[0],x[1]),(x[2],x[3]), color=(255,0,0), thickness=5)
rec2 = cv2.rectangle(im_r_, (y[0],y[1]),(y[2],y[3]), color=(255,0,0), thickness=5)    

plt.imshow(im_r_)

22500 > 0< 800> 제외

np.where((al[:,0] >=0) & (al[:,1] >=0) &  (al[:,2] <= 800 ) &  (al[:,3] <= 800 ))
# (array([ 1404,  1413,  1422, ..., 21069, 21078, 21087], dtype=int64),)

is_al = al[np.where((al[:,0] >=0) & (al[:,1] >=0) &  (al[:,2] <= 800 ) &  (al[:,3] <= 800 ))]

len(is_al) # anchor 
# 8940
def iou(box1,box2):
    x1 = max(box1[0], box2[0])
    y1 = max(box1[1], box2[1])
    
    x2 = min(box1[2], box2[2])
    y2 = min(box1[3], box2[3])
    
    if (x1 < x2 and  y1 < y2):
        w_o = x2 - x1
        h_o = y2 - y1
        area = w_o*h_o
    else:
        return 0
    
    area_b1 = (box1[2]-box1[0])*(box1[3]-box1[1])
    area_b2 = (box2[2]-box2[0])*(box2[3]-box2[1])
    union = area_b1 + area_b2 - area
    
    return area/union

object 1 = x

x = np.array([300, 50, 480,320])
x[0] = int(x[0]*1.6)
x[1] = int(x[1]*2.4)
x[2] = int(x[2]*1.6)
x[3] = int(x[3]*2.4)

object 2 = y

y = np.array([120, 25, 200, 165])
y[0] = int(y[0]*1.6)
y[1] = int(y[1]*2.4)
y[2] = int(y[2]*1.6)
y[3] = int(y[3]*2.4)


objects = [x,y]

result = np.zeros((8940,len(objects)))
for t,g in enumerate(objects):
    for i,j in enumerate(is_al):
        result[i][t] = iou(j,g)
        

result
array([[0.        , 0.        ],
       [0.        , 0.        ],
       [0.        , 0.        ],
       ...,
       [0.06581804, 0.        ],
       [0.06484636, 0.        ],
       [0.05869298, 0.        ]])
anchor_id = np.where((al[:,0] >=0) & (al[:,1] >=0) &  (al[:,2] <= 800 ) &  (al[:,3] <= 800 ))
anchor_id[0]
# array([ 1404,  1413,  1422, ..., 21069, 21078, 21087], dtype=int64)

pandas : 2차원

data = pd.DataFrame(data=[anchor_id[0], result[:,0], result[:,1]]).T

data.rename(columns={0:'anchor_id', 1:'o1_iou',2:'o2_iou'}, inplace=True)

data.anchor_id = data.anchor_id.astype('int')
data
	anchor_id	o1_iou	o2_iou
0	1404	0.000000	0.0
1	1413	0.000000	0.0
2	1422	0.000000	0.0
3	1431	0.000000	0.0
4	1440	0.000000	0.0
...	...	...	...
8935	21051	0.065818	0.0
8936	21060	0.065818	0.0
8937	21069	0.065818	0.0
8938	21078	0.064846	0.0
8939	21087	0.058693	0.0
data['o1_iou_objectness'] = data.apply(lambda x: 1 if x['o1_iou'] > 0.7 else -1, axis=1)

data[data['o1_iou_objectness'] == 1]
	anchor_id	o1_iou	o2_iou	o1_iou_objectness
7540	16877	0.711914	0.0	1
7547	16886	0.711914	0.0	1
7768	17327	0.711914	0.0	1
7775	17336	0.711914	0.0	1
data.o2_iou.argmax()
# 1785


data.loc[data.o2_iou.argmax()]
anchor_id            6418.00000
o1_iou                  0.00000
o2_iou                  0.65625
o1_iou_objectness      -1.00000
Name: 1785, dtype: float64

top
array([[418.98066402,  45.96132803, 781.01933598, 770.03867197],
       [418.98066402,  61.96132803, 781.01933598, 786.03867197],
       [434.98066402,  45.96132803, 797.01933598, 770.03867197],
       [434.98066402,  61.96132803, 797.01933598, 786.03867197]])
img_ = np.copy(im_r)

for i,j in enumerate(top):
    x_min = int(top[i][0])
    y_min = int(top[i][1])
    x_max = int(top[i][2])
    y_max = int(top[i][3])
    cv2.rectangle(img_, (x_min,y_min),(x_max,y_max), (0,255,0), thickness=1)

# x = np.array([120, 25, 200, 165])
# y = np.array([300, 50, 480,320])

# x[0] = int(x[0]*1.6)
# x[1] = int(x[1]*2.4)
# x[2] = int(x[2]*1.6)
# x[3] = int(x[3]*2.4)
# y[0] = int(y[0]*1.6)
# y[1] = int(y[1]*2.4)
# y[2] = int(y[2]*1.6)
# y[3] = int(y[3]*2.4)

# # rec1 = cv2.rectangle(im_r_, (x[0],x[1]),(x[2],x[3]), color=(255,0,0), thickness=5)
# # rec2 = cv2.rectangle(im_r_, (y[0],y[1]),(y[2],y[3]), color=(255,0,0), thickness=5)    
plt.figure(figsize=(10,10))
plt.imshow(img_)

 

 

 

 

반응형
728x90
반응형

Faster R-CNN

Fast R-CNN에서 Selective search를 제거하고 대신에 Region proposal network를 통해 대략적인 물체의 위치를 찾는 방식을 사용한다

처음 부터 끝까지 CNN안에서 작동하기 때문에 Fast R-CNN보다 훨씬 더 빠르다

Region Proposal

물체가 있을 만한 위치를 찾는 방법

1. Selective search

- 색상, 질감, 크기 등을 기준으로 유사도를 비교하여 영역들을 통합한다

이때 threshold에 따라서 후보 영역들을 만들어 낸다

(selective search 논문에서는 2000개의 후보영역을 만들어냈다)

2. Edge boxes

- Gradient magnitude와 gradient orientation을 사용하여 edge group을 표현하고,

이를 이용하여 bounding box score를 찾아내는 방법

3. Region proposal network

Region proposal network

원본 이미지에서 region proposals를 추출하는 네트워크
RPN은 각 위치에서 객체의 경계와 IOU점수를 동시에 예측하는 fully connected convolution network이다

import tensorflow as tf 
import cv2
import matplotlib.pyplot as plt
import numpy as np

img = tf.keras.utils.get_file('zebra.jpg', 'https://i.imgur.com/XjeiRMV.jpg')
im = cv2.imread(img)
im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
plt.imshow(im)

im_ = im.copy()

rec1 = cv2.rectangle(im_, (120,25),(200,165), color=(255,0,0), thickness=2)
rec2 = cv2.rectangle(im_, (300,50),(480,320), color=(255,0,0), thickness=2)

plt.imshow(im_)

h = w = 800

plt.imshow(cv2.resize(im, (w,h), interpolation=cv2.INTER_NEAREST))

im_.shape
# (333, 500, 3)

800/333, 800/500
# (2.4024024024024024, 1.6)
im_r = cv2.resize(im, (h,w))
im_r_ = im_r.copy()

x = np.array([120,25,200,165])
y = np.array([300,50,480,320])
# mapping 
x[0] = int(x[0]*(w/im.shape[1])) # 비율 곱 
x[1] = int(x[1]*(h/im.shape[0]))
x[2] = int(x[2]*(w/im.shape[1]))
x[3] = int(x[3]*(h/im.shape[0]))

y[0] = int(y[0]*(w/im.shape[1]))
y[1] = int(y[1]*(h/im.shape[0]))
y[2] = int(y[2]*(w/im.shape[1]))
y[3] = int(y[3]*(h/im.shape[0]))

rec1 = cv2.rectangle(im_r_, (x[0],x[1]),(x[2],x[3]), color=(255,0,0), thickness=2)
rec2 = cv2.rectangle(im_r_, (y[0],y[1]),(y[2],y[3]), color=(255,0,0), thickness=2)

plt.imshow(im_r_)

vgg = tf.keras.applications.VGG16(include_top=False)

vgg.summary()
Model: "vgg16"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, None, None, 3)]   0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, None, None, 64)    1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, None, None, 64)    36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, None, None, 64)    0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, None, None, 128)   73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, None, None, 128)   147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, None, None, 128)   0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, None, None, 256)   295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, None, None, 256)   590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, None, None, 256)   590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, None, None, 256)   0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, None, None, 512)   1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, None, None, 512)   0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, None, None, 512)   0         
=================================================================
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
_________________________________________________________________
# 800x800 이미지가 최종적으로 25x25로 줄어든다 (convolutional layer를 통과하기 때문에)
for i in vgg.layers:
  output = tf.keras.models.Model(vgg.input, i.output)
  print(output(im_r_[tf.newaxis]).shape)
  
(1, 800, 800, 3)
(1, 800, 800, 64)
(1, 800, 800, 64)
(1, 400, 400, 64)
(1, 400, 400, 128)
(1, 400, 400, 128)
(1, 200, 200, 128)
(1, 200, 200, 256)
(1, 200, 200, 256)
(1, 200, 200, 256)
(1, 100, 100, 256)
(1, 100, 100, 512)
(1, 100, 100, 512)
(1, 100, 100, 512)
(1, 50, 50, 512)
(1, 50, 50, 512)
(1, 50, 50, 512)
(1, 50, 50, 512)
(1, 25, 25, 512)
# 몇번째 layer까지 사용하는지에 따라 mapping하는 이미지 비율이 달라진다
backbone = tf.keras.models.Model(vgg.input, vgg.layers[17].output) # Faster R-CNN 논문에서는 17번째 layer까지 사용했다

backbone(im_r_[tf.newaxis]).shape
# TensorShape([1, 50, 50, 512])

from skimage.util import view_as_blocks, view_as_windows 겹쳐서 자르기, 겹치지 않고 자르기  

 

# 800x800 => 50x50  :   16x16이미지가 1x1로 표현된다는 의미이다    

x = np.arange(8,800,16)
y = np.arange(8,800,16)
cl = np.array(np.meshgrid(x,y)).T.reshape(-1,2) # 중점들 모음 array 

cl.shape
# (2500, 2)
im_r = cv2.resize(im, (h,w))
im_r_ = im_r.copy()

for i in range(2500):
  cv2.circle(im_r_, (cl[i,0],cl[i,1]), 1, (255,0,0), thickness=2)
  

plt.figure(figsize=(10,10))
plt.imshow(im_r_) # 50x50 영역을 표기하기 위한 점들

ratio = [0.5,1,2]
scale = [8,16,32]
al = np.zeros((22500,4)) # 50x50x9개 그림을 그릴 수 있다 
count = 0

같은 색 영역들은 모양만 다르고 가능한 영역의 크기들은 근접하게 만들어야 한다

(위 그림의 예시는 800x600 이미지에 대한 anchor boxes를 만든 것이다)

800x800 이미지에 대한 anchor box의 중점은 800/16 * 800/16 = 2500 이고,

중점당 anchor box의 개수가 9개 이므로 2500*9 = 22500 총 22500개의 anchor boxes가 생성이 된다

 

for i in cl:
  cx, cy = i[0], i[1]
  for r in ratio:
    for s in scale:
      h = pow(pow(s,2)/r,0.5) # 지수를 사용하고 루트를 씌우는 이유 : numerical stability때문에 
      w = h*r
      h *= 16 # 1칸에 16이기 때문에?
      w *= 16 
      xmin = cx-0.5*w
      ymin = cy-0.5*h
      xmax = cx+0.5*w
      ymax = cy+0.5*h
      al[count] = [xmin,ymin,xmax,ymax]
      count += 1
al
# array([[ -37.254834  ,  -82.50966799,   53.254834  ,   98.50966799],
       [ -82.50966799, -173.01933598,   98.50966799,  189.01933598],
       [-173.01933598, -354.03867197,  189.01933598,  370.03867197],
       ...,
       [ 701.49033201,  746.745166  ,  882.50966799,  837.254834  ],
       [ 610.98066402,  701.49033201,  973.01933598,  882.50966799],
       [ 429.96132803,  610.98066402, 1154.03867197,  973.01933598]])
img_ = np.copy(im_r)

point = 11465
for i in range(point,point+9):
  x_min = int(al[i][0])
  y_min = int(al[i][1])
  x_max = int(al[i][2])
  y_max = int(al[i][3])
  cv2.rectangle(img_, (x_min,y_min),(x_max,y_max),(0,255,0),thickness=4)

for i in range(2500):
  cv2.circle(img_, (cl[i,0],cl[i,1]), 1, (255,0,0), thickness=2)

x = np.array([120,25,200,165])
y = np.array([300,50,480,320])

x[0] = int(x[0]*(w/im.shape[1])) 
x[1] = int(x[1]*(h/im.shape[0]))
x[2] = int(x[2]*(w/im.shape[1]))
x[3] = int(x[3]*(h/im.shape[0]))

y[0] = int(y[0]*(w/im.shape[1]))
y[1] = int(y[1]*(h/im.shape[0]))
y[2] = int(y[2]*(w/im.shape[1]))
y[3] = int(y[3]*(h/im.shape[0]))

rec1 = cv2.rectangle(img_, (x[0],x[1]),(x[2],x[3]), color=(255,0,0), thickness=3)
rec2 = cv2.rectangle(img_, (y[0],y[1]),(y[2],y[3]), color=(255,0,0), thickness=3)

plt.imshow(img_)

물체를 인식하는 사각형 하나와 22500개의 anchor box가 겹치는 부분을 구한다

그렇게 해서 계산된 IOU값을 구하고 가장 많이 겹치는 anchor box를 구한다

논문에서는 IOU가 0.7보다 높은 것을 사용하고 0.3 보다 작은 것은 없는 것으로 판단한다

 

 

 

반응형
728x90
반응형

Fast R-CNN

Fast R-CNN의 구조 Feature Extractor -> ROI Pooling -> Classifier & Regressor ※ ROI(Region of interest): 관심 영역

Feature extractor

R-CNN이나 SPPNet에서 사용된것과 같이 CNN을 통해 이미지의 Feature Map을 추출하는 단계이다

R-CNN을 떠올려보면 먼저 Selective Search를 이용해 이미지 하나 당 약 2000개의 RoI를 뽑아내고 이를 모두 CNN에 통과시켰기 때문에 엄청난 처리시간이 요구되었다

또한 추출된 RoI는 서로 겹치는 부분이 굉장히 많이 발생하기 때문에 같은 영역의 이미지가 CNN에 여러번 들어가게 되므로 비효율적이다

 

반면, Fast R-CNN은 Selective Search를 적용할 때 이미지를 잘라내는 것이 아니라 그 좌표와 크기정보만을 (r, c, h, w) 추출해 낸다

이는 이미지에 비해 굉장히 적은 용량이기 때문에 다른 저장공간을 요구하지 않는다는 장점이 있다

그리고 CNN에는 이미지 한 장 만이 들어가 공통적인 Feature Map을 추출하고, 각 RoI들은 모델을 통과하며 줄어든 크기의 비율을 따져 좌표만 변경시킨다.

이를 RoI Projection이라고 한다

ROI pooling

이미지 하나의 feature map + RoI들의좌표들을 동일한 크기로 변환하는 layer를 거친다

이를 RoI pooling layer라고 하고 한 층짜리 spatial pyramid pooling layer와 똑같다고 생각 하면 된다.

spatial pyramid pooling할 때 층층이 쌓인 피라미드가 누적되서 더 좋은 효과를 내지 못한다는 사실을 알게 되었기 때문에 굳이 하지 않게 되었다

Classifier & Regressor

Classifier는 물체가 무엇인지 구별하는 classification 역할을 하고

Regressor는 물체의 영역을 표시하기 위한 localization 역할을 한다

R-CNN에서는 이 둘을 따로 학습시켰지만, Fast R-CNN에서는 Multi-task loss function을 고안해서 End-to-end로 학습이 가능해 졌다

보통 기존 Loss에 새로운 Loss를 추가할 때에는 위 식처럼 덧셈으로 연결한 뒤, 새로운 Loss의 영향력을 조절하기 위해 가중치 λ를 붙여준다

Lcls는 따로 SVM을 학습시키지 않고 Classification을 하기 위한 Softmax 함수이며, Lloc는 Localization을 위한 L1 Loss 함수이다

λ는 1로 고정시켰고, [u≥1]은 'Classification 결과가 Background (u=0)이면 Lloc를 죽이고, 그렇지 않으면 살린다'는 의미이다

Bounding Box를 치고자 하는 대상은 Background가 아닌 Object이기 때문이다

R-CNN과 SPPNet에서 사용한 L2 대신 L1을 사용한 이유는 덜 민감한 함수이기 때문에 Fine Tuning을 사용하기 쉬워진다

import pandas as pd 
import tensorflow as tf 


air = pd.read_csv('dataset/annotations/airplane.csv', header=None, names=['filename','x1','y1','x2','y2','class'])
face = pd.read_csv('dataset/annotations/face.csv', header=None, names=['filename','x1','y1','x2','y2','class'])
motorcycle = pd.read_csv('dataset/annotations/motorcycle.csv', header=None, names=['filename','x1','y1','x2','y2','class'])
air.filename = air.filename.map(lambda x : 'airplane/'+x)
face.filename = face.filename.map(lambda x : 'face/'+x)
motorcycle.filename = motorcycle.filename.map(lambda x : 'motorcycle/'+x)
data = pd.concat([air, face, motorcycle], ignore_index=True)
data = pd.concat([data, pd.get_dummies(data['class'])], axis=1)
data.drop(columns='class',inplace=True)

data.columns[1:]
# Index(['x1', 'y1', 'x2', 'y2', 'airplane', 'face', 'motorcycle'], dtype='object')
dig = tf.keras.preprocessing.image.ImageDataGenerator() # dataframe을 바로 불러올 수 있기 때문에 imagedatagenerator를 사용한다
dig = dig.flow_from_dataframe(data, 'dataset/images/', class_mode='raw', y_col=data.columns[1:], target_size=(224,224))
# Found 2033 validated image filenames.
def flow(x):
  while True:
    (X,y) = next(x)
    yield X, (y[:,:4],y[:,4:]) # X는 그대로, y는 4개, 3개로 분리 
    
dfg = tf.data.Dataset.from_generator(lambda : flow(dig), output_shapes=(((None,224,224,3)), ((None,4), (None,3))),
                                                  output_types=((tf.float32), (tf.float32, tf.float32)))

next(iter(dfg))[1][0] # y값 (4개)

<tf.Tensor: shape=(32, 4), dtype=float32, numpy=
array([[113.,   9., 286., 239.],
       [ 61.,  47., 210., 141.],
       [ 41.,  22., 225., 134.],
       [ 39.,  32., 226., 149.],
       [ 53.,  29., 349., 135.],
       [ 49.,  28., 346., 116.],
       [ 53.,  63., 335., 138.],
       [ 32.,  16., 230., 113.],
       [121.,  34., 314., 302.],
       [ 32.,  25., 228., 164.],
       [ 37.,  22., 233., 140.],
       [ 34.,  42., 230., 133.],
       [ 35.,  16., 230., 112.],
       [ 75.,  25., 259., 283.],
       [ 46.,  37., 342., 140.],
       [180.,  23., 361., 276.],
       [ 38.,  25., 227., 128.],
       [ 38.,  25., 220., 155.],
       [ 44.,  41., 221., 154.],
       [ 42.,  52., 226., 162.],
       [ 61.,  24., 346., 148.],
       [ 62.,  16., 270., 277.],
       [ 31.,  19., 233., 135.],
       [ 49.,  30., 349., 137.],
       [ 65.,  42., 345., 158.],
       [162.,  47., 414., 302.],
       [ 54.,  30., 345., 130.],
       [ 48.,  31., 350., 112.],
       [ 38.,  24., 224., 127.],
       [ 45.,  24., 237., 142.],
       [ 34.,  19., 230., 127.],
       [ 42.,  36., 231., 150.]], dtype=float32)>
next(iter(dfg))[1][1] # y값 (3개)

<tf.Tensor: shape=(32, 3), dtype=float32, numpy=
array([[0., 0., 1.],
       [0., 0., 1.],
       [1., 0., 0.],
       [0., 1., 0.],
       [0., 1., 0.],
       [0., 1., 0.],
       [1., 0., 0.],
       [0., 0., 1.],
       [1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.],
       [1., 0., 0.],
       [0., 0., 1.],
       [1., 0., 0.],
       [0., 0., 1.],
       [0., 0., 1.],
       [1., 0., 0.],
       [0., 0., 1.],
       [0., 1., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [0., 1., 0.],
       [0., 1., 0.],
       [0., 0., 1.],
       [1., 0., 0.],
       [0., 0., 1.],
       [0., 1., 0.],
       [0., 1., 0.],
       [1., 0., 0.],
       [0., 0., 1.],
       [0., 0., 1.],
       [1., 0., 0.]], dtype=float32)>
vgg = tf.keras.applications.VGG16(include_top=False, input_shape=(224,224,3))
vgg.trainable = False 

input_ = tf.keras.Input((224,224,3))
preprocess = tf.keras.layers.Lambda(lambda x: tf.keras.applications.vgg16.preprocess_input(x))(input_)
x = vgg(preprocess)
x = tf.keras.layers.Flatten()(x)
x = tf.keras.layers.Dense(256, activation='relu')(x)
x = tf.keras.layers.Dense(128, activation='relu')(x)

box = tf.keras.layers.Dense(64, activation='relu')(x)
box = tf.keras.layers.Dense(4, name='box',activation='relu')(box)

target = tf.keras.layers.Dense(64, activation='relu')(x)
target = tf.keras.layers.Dense(3, name='target',activation='softmax')(target)
model=tf.keras.models.Model(input_, [box,target])

loss = {
    'box': tf.keras.losses.MeanAbsoluteError(),
    'target': tf.keras.losses.CategoricalCrossentropy()
}

model.compile(loss=loss)
model.fit(dfg, epochs=20, steps_per_epoch=10) # 한 epoch당 10번 업데이트 
Epoch 1/20
10/10 [==============================] - 199s 16s/step - loss: 90.5833 - box_loss: 80.8374 - target_loss: 9.7459
Epoch 2/20
10/10 [==============================] - 158s 16s/step - loss: 38.9525 - box_loss: 38.9085 - target_loss: 0.0441
Epoch 3/20
10/10 [==============================] - 156s 16s/step - loss: 32.4575 - box_loss: 32.4469 - target_loss: 0.0106
Epoch 4/20
10/10 [==============================] - 155s 16s/step - loss: 28.6964 - box_loss: 28.6949 - target_loss: 0.0014
Epoch 5/20
10/10 [==============================] - 153s 15s/step - loss: 28.4143 - box_loss: 28.4143 - target_loss: 6.0349e-08
Epoch 6/20
10/10 [==============================] - 158s 16s/step - loss: 27.1531 - box_loss: 27.1531 - target_loss: 2.0451e-07
Epoch 7/20
10/10 [==============================] - 62s 5s/step - loss: 22.2554 - box_loss: 22.2548 - target_loss: 6.0339e-04
Epoch 8/20
10/10 [==============================] - 3s 350ms/step - loss: 27.1756 - box_loss: 27.1756 - target_loss: 1.8626e-09
Epoch 9/20
10/10 [==============================] - 3s 344ms/step - loss: 22.8763 - box_loss: 22.8763 - target_loss: 0.0000e+00
Epoch 10/20
10/10 [==============================] - 3s 344ms/step - loss: 23.6116 - box_loss: 23.5324 - target_loss: 0.0792
Epoch 11/20
10/10 [==============================] - 3s 351ms/step - loss: 23.6213 - box_loss: 23.6213 - target_loss: 6.5937e-08
Epoch 12/20
10/10 [==============================] - 3s 349ms/step - loss: 20.9325 - box_loss: 20.9325 - target_loss: 9.6111e-08
Epoch 13/20
10/10 [==============================] - 3s 342ms/step - loss: 19.7643 - box_loss: 19.7643 - target_loss: 6.2536e-09
Epoch 14/20
10/10 [==============================] - 3s 346ms/step - loss: 21.9087 - box_loss: 21.9071 - target_loss: 0.0016
Epoch 15/20
10/10 [==============================] - 3s 348ms/step - loss: 19.8910 - box_loss: 19.8445 - target_loss: 0.0465
Epoch 16/20
10/10 [==============================] - 3s 341ms/step - loss: 18.0794 - box_loss: 18.0794 - target_loss: 2.2352e-09
Epoch 17/20
10/10 [==============================] - 3s 349ms/step - loss: 19.9182 - box_loss: 19.9182 - target_loss: 0.0000e+00
Epoch 18/20
10/10 [==============================] - 3s 337ms/step - loss: 18.0254 - box_loss: 18.0253 - target_loss: 8.1338e-05
Epoch 19/20
10/10 [==============================] - 3s 339ms/step - loss: 19.9694 - box_loss: 19.9694 - target_loss: 1.5977e-06
Epoch 20/20
10/10 [==============================] - 3s 331ms/step - loss: 17.7530 - box_loss: 17.7530 - target_loss: 3.9085e-10
model.summary()
#Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_2 (InputLayer)            [(None, 224, 224, 3) 0                                            
__________________________________________________________________________________________________
lambda (Lambda)                 (None, 224, 224, 3)  0           input_2[0][0]                    
__________________________________________________________________________________________________
vgg16 (Functional)              (None, 7, 7, 512)    14714688    lambda[0][0]                     
__________________________________________________________________________________________________
flatten (Flatten)               (None, 25088)        0           vgg16[0][0]                      
__________________________________________________________________________________________________
dense (Dense)                   (None, 256)          6422784     flatten[0][0]                    
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 128)          32896       dense[0][0]                      
__________________________________________________________________________________________________
dense_2 (Dense)                 (None, 64)           8256        dense_1[0][0]                    
__________________________________________________________________________________________________
dense_3 (Dense)                 (None, 64)           8256        dense_1[0][0]                    
__________________________________________________________________________________________________
box (Dense)                     (None, 4)            260         dense_2[0][0]                    
__________________________________________________________________________________________________
target (Dense)                  (None, 3)            195         dense_3[0][0]                    
==================================================================================================
Total params: 21,187,335
Trainable params: 6,472,647
Non-trainable params: 14,714,688
__________________________________________________________________________________________________
tf.keras.utils.plot_model(model, rankdir='BT')

history = model.fit(dfg, epochs=20, steps_per_epoch=20)

Epoch 1/20
20/20 [==============================] - 7s 352ms/step - loss: 17.1495 - box_loss: 17.1495 - target_loss: 8.9183e-06
Epoch 2/20
20/20 [==============================] - 7s 349ms/step - loss: 16.8312 - box_loss: 16.8311 - target_loss: 1.7778e-05
Epoch 3/20
20/20 [==============================] - 7s 338ms/step - loss: 16.1163 - box_loss: 16.1148 - target_loss: 0.0015
Epoch 4/20
20/20 [==============================] - 7s 346ms/step - loss: 16.0480 - box_loss: 16.0480 - target_loss: 0.0000e+00
Epoch 5/20
20/20 [==============================] - 7s 349ms/step - loss: 16.7372 - box_loss: 16.7371 - target_loss: 1.0930e-04
Epoch 6/20
20/20 [==============================] - 7s 344ms/step - loss: 15.5043 - box_loss: 15.4865 - target_loss: 0.0177
Epoch 7/20
20/20 [==============================] - 7s 347ms/step - loss: 14.6377 - box_loss: 14.6377 - target_loss: 9.2198e-08
Epoch 8/20
20/20 [==============================] - 7s 343ms/step - loss: 14.4987 - box_loss: 14.4818 - target_loss: 0.0169
Epoch 9/20
20/20 [==============================] - 7s 344ms/step - loss: 14.9790 - box_loss: 14.9560 - target_loss: 0.0229
Epoch 10/20
20/20 [==============================] - 7s 340ms/step - loss: 13.8011 - box_loss: 13.8011 - target_loss: 0.0000e+00
Epoch 11/20
20/20 [==============================] - 7s 349ms/step - loss: 14.2557 - box_loss: 14.2554 - target_loss: 2.6677e-04
Epoch 12/20
20/20 [==============================] - 7s 338ms/step - loss: 12.9601 - box_loss: 12.9229 - target_loss: 0.0371
Epoch 13/20
20/20 [==============================] - 7s 343ms/step - loss: 13.7806 - box_loss: 13.7806 - target_loss: 0.0000e+00
Epoch 14/20
20/20 [==============================] - 7s 337ms/step - loss: 12.5579 - box_loss: 12.5379 - target_loss: 0.0200
Epoch 15/20
20/20 [==============================] - 7s 343ms/step - loss: 14.1479 - box_loss: 14.1479 - target_loss: 2.3356e-07
Epoch 16/20
20/20 [==============================] - 7s 342ms/step - loss: 12.5556 - box_loss: 12.5556 - target_loss: 3.8147e-10
Epoch 17/20
20/20 [==============================] - 7s 344ms/step - loss: 12.5946 - box_loss: 12.5944 - target_loss: 2.0212e-04
Epoch 18/20
20/20 [==============================] - 7s 340ms/step - loss: 11.9439 - box_loss: 11.9369 - target_loss: 0.0070
Epoch 19/20
20/20 [==============================] - 7s 337ms/step - loss: 12.3697 - box_loss: 12.3697 - target_loss: 3.1069e-06
Epoch 20/20
20/20 [==============================] - 7s 341ms/step - loss: 11.9605 - box_loss: 11.9605 - target_loss: 4.5287e-05
pd.DataFrame(history.history).plot.line()

im = tf.keras.preprocessing.image.load_img('dataset/images/airplane/image_0001.jpg')

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as mpt

bim = np.array(im.resize((224,224,)))[tf.newaxis]

model(bim)
# [<tf.Tensor: shape=(1, 4), dtype=float32, numpy=array([[ 42.0325  ,  28.545605, 291.0303  , 110.72619 ]], dtype=float32)>,
#  <tf.Tensor: shape=(1, 3), dtype=float32, numpy=array([[1.0000000e+00, 1.5445283e-31, 0.0000000e+00]], dtype=float32)>]
fig, ax = plt.subplots(1,1)
ax.imshow(im)
pt = mpt.Rectangle((42.0325, 28.545605),291.0303-42.0325,110.72619-28.545605,fill=False)
ax.add_patch(pt)

 

 

!pip install -U tensorflow-hub

모델 가져다 쓰는 3가지

1. tf.keras.applications

2. tensorflow hub

3. model garden

 

import tensorflow_hub as hub # 공개된 모델을 사용할 수 있는 Package

model = tf.keras.models.Sequential([
    hub.KerasLayer('https://tfhub.dev/google/tf2-preview/mobilenet_v2/classification/4', trainable=True, 
                   input_shape=(224,224,3))
])
# 주소에 classification가 있으면 classification전용이고 build하지 않아도 모델 안에 넣을 수 있다 (일반적으로) / 뒤에 있는 번호는 버전 

model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
keras_layer_1 (KerasLayer)   (None, 1001)              3540265   
=================================================================
Total params: 3,540,265
Trainable params: 3,506,153
Non-trainable params: 34,112
_________________________________________________________________
model = tf.keras.models.Sequential([
    hub.KerasLayer('https://tfhub.dev/google/tf2-preview/mobilenet_v2/feature_vector/4', trainable=True,
                   input_shape=(224,224,3))
])

model.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
keras_layer_3 (KerasLayer)   (None, 1280)              2257984   
=================================================================
Total params: 2,257,984
Trainable params: 2,223,872
Non-trainable params: 34,112
_________________________________________________________________
mv2 = tf.keras.applications.MobileNetV2(include_top=True)

mv2.summary()
Model: "mobilenetv2_1.00_224"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_3 (InputLayer)            [(None, 224, 224, 3) 0                                            
__________________________________________________________________________________________________
Conv1 (Conv2D)                  (None, 112, 112, 32) 864         input_3[0][0]                    
__________________________________________________________________________________________________
bn_Conv1 (BatchNormalization)   (None, 112, 112, 32) 128         Conv1[0][0]                      
__________________________________________________________________________________________________
Conv1_relu (ReLU)               (None, 112, 112, 32) 0           bn_Conv1[0][0]                   
__________________________________________________________________________________________________
expanded_conv_depthwise (Depthw (None, 112, 112, 32) 288         Conv1_relu[0][0]                 
__________________________________________________________________________________________________
expanded_conv_depthwise_BN (Bat (None, 112, 112, 32) 128         expanded_conv_depthwise[0][0]    
__________________________________________________________________________________________________
expanded_conv_depthwise_relu (R (None, 112, 112, 32) 0           expanded_conv_depthwise_BN[0][0] 
__________________________________________________________________________________________________
expanded_conv_project (Conv2D)  (None, 112, 112, 16) 512         expanded_conv_depthwise_relu[0][0
__________________________________________________________________________________________________
expanded_conv_project_BN (Batch (None, 112, 112, 16) 64          expanded_conv_project[0][0]      
__________________________________________________________________________________________________
block_1_expand (Conv2D)         (None, 112, 112, 96) 1536        expanded_conv_project_BN[0][0]   
__________________________________________________________________________________________________
block_1_expand_BN (BatchNormali (None, 112, 112, 96) 384         block_1_expand[0][0]             
__________________________________________________________________________________________________
block_1_expand_relu (ReLU)      (None, 112, 112, 96) 0           block_1_expand_BN[0][0]          
__________________________________________________________________________________________________
block_1_pad (ZeroPadding2D)     (None, 113, 113, 96) 0           block_1_expand_relu[0][0]        
__________________________________________________________________________________________________
block_1_depthwise (DepthwiseCon (None, 56, 56, 96)   864         block_1_pad[0][0]                
__________________________________________________________________________________________________
block_1_depthwise_BN (BatchNorm (None, 56, 56, 96)   384         block_1_depthwise[0][0]          
__________________________________________________________________________________________________
block_1_depthwise_relu (ReLU)   (None, 56, 56, 96)   0           block_1_depthwise_BN[0][0]       
__________________________________________________________________________________________________
block_1_project (Conv2D)        (None, 56, 56, 24)   2304        block_1_depthwise_relu[0][0]     
__________________________________________________________________________________________________
block_1_project_BN (BatchNormal (None, 56, 56, 24)   96          block_1_project[0][0]            
__________________________________________________________________________________________________
block_2_expand (Conv2D)         (None, 56, 56, 144)  3456        block_1_project_BN[0][0]         
__________________________________________________________________________________________________
block_2_expand_BN (BatchNormali (None, 56, 56, 144)  576         block_2_expand[0][0]             
__________________________________________________________________________________________________
block_2_expand_relu (ReLU)      (None, 56, 56, 144)  0           block_2_expand_BN[0][0]          
__________________________________________________________________________________________________
block_2_depthwise (DepthwiseCon (None, 56, 56, 144)  1296        block_2_expand_relu[0][0]        
__________________________________________________________________________________________________
block_2_depthwise_BN (BatchNorm (None, 56, 56, 144)  576         block_2_depthwise[0][0]          
__________________________________________________________________________________________________
block_2_depthwise_relu (ReLU)   (None, 56, 56, 144)  0           block_2_depthwise_BN[0][0]       
__________________________________________________________________________________________________
block_2_project (Conv2D)        (None, 56, 56, 24)   3456        block_2_depthwise_relu[0][0]     
__________________________________________________________________________________________________
block_2_project_BN (BatchNormal (None, 56, 56, 24)   96          block_2_project[0][0]            
__________________________________________________________________________________________________
block_2_add (Add)               (None, 56, 56, 24)   0           block_1_project_BN[0][0]         
                                                                 block_2_project_BN[0][0]         
__________________________________________________________________________________________________
block_3_expand (Conv2D)         (None, 56, 56, 144)  3456        block_2_add[0][0]                
__________________________________________________________________________________________________
block_3_expand_BN (BatchNormali (None, 56, 56, 144)  576         block_3_expand[0][0]             
__________________________________________________________________________________________________
block_3_expand_relu (ReLU)      (None, 56, 56, 144)  0           block_3_expand_BN[0][0]          
__________________________________________________________________________________________________
block_3_pad (ZeroPadding2D)     (None, 57, 57, 144)  0           block_3_expand_relu[0][0]        
__________________________________________________________________________________________________
block_3_depthwise (DepthwiseCon (None, 28, 28, 144)  1296        block_3_pad[0][0]                
__________________________________________________________________________________________________
block_3_depthwise_BN (BatchNorm (None, 28, 28, 144)  576         block_3_depthwise[0][0]          
__________________________________________________________________________________________________
block_3_depthwise_relu (ReLU)   (None, 28, 28, 144)  0           block_3_depthwise_BN[0][0]       
__________________________________________________________________________________________________
block_3_project (Conv2D)        (None, 28, 28, 32)   4608        block_3_depthwise_relu[0][0]     
__________________________________________________________________________________________________
block_3_project_BN (BatchNormal (None, 28, 28, 32)   128         block_3_project[0][0]            
__________________________________________________________________________________________________
block_4_expand (Conv2D)         (None, 28, 28, 192)  6144        block_3_project_BN[0][0]         
__________________________________________________________________________________________________
block_4_expand_BN (BatchNormali (None, 28, 28, 192)  768         block_4_expand[0][0]             
__________________________________________________________________________________________________
block_4_expand_relu (ReLU)      (None, 28, 28, 192)  0           block_4_expand_BN[0][0]          
__________________________________________________________________________________________________
block_4_depthwise (DepthwiseCon (None, 28, 28, 192)  1728        block_4_expand_relu[0][0]        
__________________________________________________________________________________________________
block_4_depthwise_BN (BatchNorm (None, 28, 28, 192)  768         block_4_depthwise[0][0]          
__________________________________________________________________________________________________
block_4_depthwise_relu (ReLU)   (None, 28, 28, 192)  0           block_4_depthwise_BN[0][0]       
__________________________________________________________________________________________________
block_4_project (Conv2D)        (None, 28, 28, 32)   6144        block_4_depthwise_relu[0][0]     
__________________________________________________________________________________________________
block_4_project_BN (BatchNormal (None, 28, 28, 32)   128         block_4_project[0][0]            
__________________________________________________________________________________________________
block_4_add (Add)               (None, 28, 28, 32)   0           block_3_project_BN[0][0]         
                                                                 block_4_project_BN[0][0]         
__________________________________________________________________________________________________
block_5_expand (Conv2D)         (None, 28, 28, 192)  6144        block_4_add[0][0]                
__________________________________________________________________________________________________
block_5_expand_BN (BatchNormali (None, 28, 28, 192)  768         block_5_expand[0][0]             
__________________________________________________________________________________________________
block_5_expand_relu (ReLU)      (None, 28, 28, 192)  0           block_5_expand_BN[0][0]          
__________________________________________________________________________________________________
block_5_depthwise (DepthwiseCon (None, 28, 28, 192)  1728        block_5_expand_relu[0][0]        
__________________________________________________________________________________________________
block_5_depthwise_BN (BatchNorm (None, 28, 28, 192)  768         block_5_depthwise[0][0]          
__________________________________________________________________________________________________
block_5_depthwise_relu (ReLU)   (None, 28, 28, 192)  0           block_5_depthwise_BN[0][0]       
__________________________________________________________________________________________________
block_5_project (Conv2D)        (None, 28, 28, 32)   6144        block_5_depthwise_relu[0][0]     
__________________________________________________________________________________________________
block_5_project_BN (BatchNormal (None, 28, 28, 32)   128         block_5_project[0][0]            
__________________________________________________________________________________________________
block_5_add (Add)               (None, 28, 28, 32)   0           block_4_add[0][0]                
                                                                 block_5_project_BN[0][0]         
__________________________________________________________________________________________________
block_6_expand (Conv2D)         (None, 28, 28, 192)  6144        block_5_add[0][0]                
__________________________________________________________________________________________________
block_6_expand_BN (BatchNormali (None, 28, 28, 192)  768         block_6_expand[0][0]             
__________________________________________________________________________________________________
block_6_expand_relu (ReLU)      (None, 28, 28, 192)  0           block_6_expand_BN[0][0]          
__________________________________________________________________________________________________
block_6_pad (ZeroPadding2D)     (None, 29, 29, 192)  0           block_6_expand_relu[0][0]        
__________________________________________________________________________________________________
block_6_depthwise (DepthwiseCon (None, 14, 14, 192)  1728        block_6_pad[0][0]                
__________________________________________________________________________________________________
block_6_depthwise_BN (BatchNorm (None, 14, 14, 192)  768         block_6_depthwise[0][0]          
__________________________________________________________________________________________________
block_6_depthwise_relu (ReLU)   (None, 14, 14, 192)  0           block_6_depthwise_BN[0][0]       
__________________________________________________________________________________________________
block_6_project (Conv2D)        (None, 14, 14, 64)   12288       block_6_depthwise_relu[0][0]     
__________________________________________________________________________________________________
block_6_project_BN (BatchNormal (None, 14, 14, 64)   256         block_6_project[0][0]            
__________________________________________________________________________________________________
block_7_expand (Conv2D)         (None, 14, 14, 384)  24576       block_6_project_BN[0][0]         
__________________________________________________________________________________________________
block_7_expand_BN (BatchNormali (None, 14, 14, 384)  1536        block_7_expand[0][0]             
__________________________________________________________________________________________________
block_7_expand_relu (ReLU)      (None, 14, 14, 384)  0           block_7_expand_BN[0][0]          
__________________________________________________________________________________________________
block_7_depthwise (DepthwiseCon (None, 14, 14, 384)  3456        block_7_expand_relu[0][0]        
__________________________________________________________________________________________________
block_7_depthwise_BN (BatchNorm (None, 14, 14, 384)  1536        block_7_depthwise[0][0]          
__________________________________________________________________________________________________
block_7_depthwise_relu (ReLU)   (None, 14, 14, 384)  0           block_7_depthwise_BN[0][0]       
__________________________________________________________________________________________________
block_7_project (Conv2D)        (None, 14, 14, 64)   24576       block_7_depthwise_relu[0][0]     
__________________________________________________________________________________________________
block_7_project_BN (BatchNormal (None, 14, 14, 64)   256         block_7_project[0][0]            
__________________________________________________________________________________________________
block_7_add (Add)               (None, 14, 14, 64)   0           block_6_project_BN[0][0]         
                                                                 block_7_project_BN[0][0]         
__________________________________________________________________________________________________
block_8_expand (Conv2D)         (None, 14, 14, 384)  24576       block_7_add[0][0]                
__________________________________________________________________________________________________
block_8_expand_BN (BatchNormali (None, 14, 14, 384)  1536        block_8_expand[0][0]             
__________________________________________________________________________________________________
block_8_expand_relu (ReLU)      (None, 14, 14, 384)  0           block_8_expand_BN[0][0]          
__________________________________________________________________________________________________
block_8_depthwise (DepthwiseCon (None, 14, 14, 384)  3456        block_8_expand_relu[0][0]        
__________________________________________________________________________________________________
block_8_depthwise_BN (BatchNorm (None, 14, 14, 384)  1536        block_8_depthwise[0][0]          
__________________________________________________________________________________________________
block_8_depthwise_relu (ReLU)   (None, 14, 14, 384)  0           block_8_depthwise_BN[0][0]       
__________________________________________________________________________________________________
block_8_project (Conv2D)        (None, 14, 14, 64)   24576       block_8_depthwise_relu[0][0]     
__________________________________________________________________________________________________
block_8_project_BN (BatchNormal (None, 14, 14, 64)   256         block_8_project[0][0]            
__________________________________________________________________________________________________
block_8_add (Add)               (None, 14, 14, 64)   0           block_7_add[0][0]                
                                                                 block_8_project_BN[0][0]         
__________________________________________________________________________________________________
block_9_expand (Conv2D)         (None, 14, 14, 384)  24576       block_8_add[0][0]                
__________________________________________________________________________________________________
block_9_expand_BN (BatchNormali (None, 14, 14, 384)  1536        block_9_expand[0][0]             
__________________________________________________________________________________________________
block_9_expand_relu (ReLU)      (None, 14, 14, 384)  0           block_9_expand_BN[0][0]          
__________________________________________________________________________________________________
block_9_depthwise (DepthwiseCon (None, 14, 14, 384)  3456        block_9_expand_relu[0][0]        
__________________________________________________________________________________________________
block_9_depthwise_BN (BatchNorm (None, 14, 14, 384)  1536        block_9_depthwise[0][0]          
__________________________________________________________________________________________________
block_9_depthwise_relu (ReLU)   (None, 14, 14, 384)  0           block_9_depthwise_BN[0][0]       
__________________________________________________________________________________________________
block_9_project (Conv2D)        (None, 14, 14, 64)   24576       block_9_depthwise_relu[0][0]     
__________________________________________________________________________________________________
block_9_project_BN (BatchNormal (None, 14, 14, 64)   256         block_9_project[0][0]            
__________________________________________________________________________________________________
block_9_add (Add)               (None, 14, 14, 64)   0           block_8_add[0][0]                
                                                                 block_9_project_BN[0][0]         
__________________________________________________________________________________________________
block_10_expand (Conv2D)        (None, 14, 14, 384)  24576       block_9_add[0][0]                
__________________________________________________________________________________________________
block_10_expand_BN (BatchNormal (None, 14, 14, 384)  1536        block_10_expand[0][0]            
__________________________________________________________________________________________________
block_10_expand_relu (ReLU)     (None, 14, 14, 384)  0           block_10_expand_BN[0][0]         
__________________________________________________________________________________________________
block_10_depthwise (DepthwiseCo (None, 14, 14, 384)  3456        block_10_expand_relu[0][0]       
__________________________________________________________________________________________________
block_10_depthwise_BN (BatchNor (None, 14, 14, 384)  1536        block_10_depthwise[0][0]         
__________________________________________________________________________________________________
block_10_depthwise_relu (ReLU)  (None, 14, 14, 384)  0           block_10_depthwise_BN[0][0]      
__________________________________________________________________________________________________
block_10_project (Conv2D)       (None, 14, 14, 96)   36864       block_10_depthwise_relu[0][0]    
__________________________________________________________________________________________________
block_10_project_BN (BatchNorma (None, 14, 14, 96)   384         block_10_project[0][0]           
__________________________________________________________________________________________________
block_11_expand (Conv2D)        (None, 14, 14, 576)  55296       block_10_project_BN[0][0]        
__________________________________________________________________________________________________
block_11_expand_BN (BatchNormal (None, 14, 14, 576)  2304        block_11_expand[0][0]            
__________________________________________________________________________________________________
block_11_expand_relu (ReLU)     (None, 14, 14, 576)  0           block_11_expand_BN[0][0]         
__________________________________________________________________________________________________
block_11_depthwise (DepthwiseCo (None, 14, 14, 576)  5184        block_11_expand_relu[0][0]       
__________________________________________________________________________________________________
block_11_depthwise_BN (BatchNor (None, 14, 14, 576)  2304        block_11_depthwise[0][0]         
__________________________________________________________________________________________________
block_11_depthwise_relu (ReLU)  (None, 14, 14, 576)  0           block_11_depthwise_BN[0][0]      
__________________________________________________________________________________________________
block_11_project (Conv2D)       (None, 14, 14, 96)   55296       block_11_depthwise_relu[0][0]    
__________________________________________________________________________________________________
block_11_project_BN (BatchNorma (None, 14, 14, 96)   384         block_11_project[0][0]           
__________________________________________________________________________________________________
block_11_add (Add)              (None, 14, 14, 96)   0           block_10_project_BN[0][0]        
                                                                 block_11_project_BN[0][0]        
__________________________________________________________________________________________________
block_12_expand (Conv2D)        (None, 14, 14, 576)  55296       block_11_add[0][0]               
__________________________________________________________________________________________________
block_12_expand_BN (BatchNormal (None, 14, 14, 576)  2304        block_12_expand[0][0]            
__________________________________________________________________________________________________
block_12_expand_relu (ReLU)     (None, 14, 14, 576)  0           block_12_expand_BN[0][0]         
__________________________________________________________________________________________________
block_12_depthwise (DepthwiseCo (None, 14, 14, 576)  5184        block_12_expand_relu[0][0]       
__________________________________________________________________________________________________
block_12_depthwise_BN (BatchNor (None, 14, 14, 576)  2304        block_12_depthwise[0][0]         
__________________________________________________________________________________________________
block_12_depthwise_relu (ReLU)  (None, 14, 14, 576)  0           block_12_depthwise_BN[0][0]      
__________________________________________________________________________________________________
block_12_project (Conv2D)       (None, 14, 14, 96)   55296       block_12_depthwise_relu[0][0]    
__________________________________________________________________________________________________
block_12_project_BN (BatchNorma (None, 14, 14, 96)   384         block_12_project[0][0]           
__________________________________________________________________________________________________
block_12_add (Add)              (None, 14, 14, 96)   0           block_11_add[0][0]               
                                                                 block_12_project_BN[0][0]        
__________________________________________________________________________________________________
block_13_expand (Conv2D)        (None, 14, 14, 576)  55296       block_12_add[0][0]               
__________________________________________________________________________________________________
block_13_expand_BN (BatchNormal (None, 14, 14, 576)  2304        block_13_expand[0][0]            
__________________________________________________________________________________________________
block_13_expand_relu (ReLU)     (None, 14, 14, 576)  0           block_13_expand_BN[0][0]         
__________________________________________________________________________________________________
block_13_pad (ZeroPadding2D)    (None, 15, 15, 576)  0           block_13_expand_relu[0][0]       
__________________________________________________________________________________________________
block_13_depthwise (DepthwiseCo (None, 7, 7, 576)    5184        block_13_pad[0][0]               
__________________________________________________________________________________________________
block_13_depthwise_BN (BatchNor (None, 7, 7, 576)    2304        block_13_depthwise[0][0]         
__________________________________________________________________________________________________
block_13_depthwise_relu (ReLU)  (None, 7, 7, 576)    0           block_13_depthwise_BN[0][0]      
__________________________________________________________________________________________________
block_13_project (Conv2D)       (None, 7, 7, 160)    92160       block_13_depthwise_relu[0][0]    
__________________________________________________________________________________________________
block_13_project_BN (BatchNorma (None, 7, 7, 160)    640         block_13_project[0][0]           
__________________________________________________________________________________________________
block_14_expand (Conv2D)        (None, 7, 7, 960)    153600      block_13_project_BN[0][0]        
__________________________________________________________________________________________________
block_14_expand_BN (BatchNormal (None, 7, 7, 960)    3840        block_14_expand[0][0]            
__________________________________________________________________________________________________
block_14_expand_relu (ReLU)     (None, 7, 7, 960)    0           block_14_expand_BN[0][0]         
__________________________________________________________________________________________________
block_14_depthwise (DepthwiseCo (None, 7, 7, 960)    8640        block_14_expand_relu[0][0]       
__________________________________________________________________________________________________
block_14_depthwise_BN (BatchNor (None, 7, 7, 960)    3840        block_14_depthwise[0][0]         
__________________________________________________________________________________________________
block_14_depthwise_relu (ReLU)  (None, 7, 7, 960)    0           block_14_depthwise_BN[0][0]      
__________________________________________________________________________________________________
block_14_project (Conv2D)       (None, 7, 7, 160)    153600      block_14_depthwise_relu[0][0]    
__________________________________________________________________________________________________
block_14_project_BN (BatchNorma (None, 7, 7, 160)    640         block_14_project[0][0]           
__________________________________________________________________________________________________
block_14_add (Add)              (None, 7, 7, 160)    0           block_13_project_BN[0][0]        
                                                                 block_14_project_BN[0][0]        
__________________________________________________________________________________________________
block_15_expand (Conv2D)        (None, 7, 7, 960)    153600      block_14_add[0][0]               
__________________________________________________________________________________________________
block_15_expand_BN (BatchNormal (None, 7, 7, 960)    3840        block_15_expand[0][0]            
__________________________________________________________________________________________________
block_15_expand_relu (ReLU)     (None, 7, 7, 960)    0           block_15_expand_BN[0][0]         
__________________________________________________________________________________________________
block_15_depthwise (DepthwiseCo (None, 7, 7, 960)    8640        block_15_expand_relu[0][0]       
__________________________________________________________________________________________________
block_15_depthwise_BN (BatchNor (None, 7, 7, 960)    3840        block_15_depthwise[0][0]         
__________________________________________________________________________________________________
block_15_depthwise_relu (ReLU)  (None, 7, 7, 960)    0           block_15_depthwise_BN[0][0]      
__________________________________________________________________________________________________
block_15_project (Conv2D)       (None, 7, 7, 160)    153600      block_15_depthwise_relu[0][0]    
__________________________________________________________________________________________________
block_15_project_BN (BatchNorma (None, 7, 7, 160)    640         block_15_project[0][0]           
__________________________________________________________________________________________________
block_15_add (Add)              (None, 7, 7, 160)    0           block_14_add[0][0]               
                                                                 block_15_project_BN[0][0]        
__________________________________________________________________________________________________
block_16_expand (Conv2D)        (None, 7, 7, 960)    153600      block_15_add[0][0]               
__________________________________________________________________________________________________
block_16_expand_BN (BatchNormal (None, 7, 7, 960)    3840        block_16_expand[0][0]            
__________________________________________________________________________________________________
block_16_expand_relu (ReLU)     (None, 7, 7, 960)    0           block_16_expand_BN[0][0]         
__________________________________________________________________________________________________
block_16_depthwise (DepthwiseCo (None, 7, 7, 960)    8640        block_16_expand_relu[0][0]       
__________________________________________________________________________________________________
block_16_depthwise_BN (BatchNor (None, 7, 7, 960)    3840        block_16_depthwise[0][0]         
__________________________________________________________________________________________________
block_16_depthwise_relu (ReLU)  (None, 7, 7, 960)    0           block_16_depthwise_BN[0][0]      
__________________________________________________________________________________________________
block_16_project (Conv2D)       (None, 7, 7, 320)    307200      block_16_depthwise_relu[0][0]    
__________________________________________________________________________________________________
block_16_project_BN (BatchNorma (None, 7, 7, 320)    1280        block_16_project[0][0]           
__________________________________________________________________________________________________
Conv_1 (Conv2D)                 (None, 7, 7, 1280)   409600      block_16_project_BN[0][0]        
__________________________________________________________________________________________________
Conv_1_bn (BatchNormalization)  (None, 7, 7, 1280)   5120        Conv_1[0][0]                     
__________________________________________________________________________________________________
out_relu (ReLU)                 (None, 7, 7, 1280)   0           Conv_1_bn[0][0]                  
__________________________________________________________________________________________________
global_average_pooling2d (Globa (None, 1280)         0           out_relu[0][0]                   
__________________________________________________________________________________________________
predictions (Dense)             (None, 1000)         1281000     global_average_pooling2d[0][0]   
==================================================================================================
Total params: 3,538,984
Trainable params: 3,504,872
Non-trainable params: 34,112
__________________________________________________________________________________________________

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

반응형
728x90
반응형

Localization

객체라고 판단되는 곳에 직사각형(bounding box)를 그려주는 작업
학습을 통해 위치를 찾는다

학습 데이터는 이미지의 target(label)과 rectangle(x,y좌표, 가로,세로)로 구성되어 있다

import matplotlib.pyplot as plt
import matplotlib.patches as mpt
import pandas as pd
import tensorflow as tf
import numpy as np
from PIL import Image 

air = pd.read_csv('dataset/annotations/airplane.csv', header=None)
face = pd.read_csv('dataset/annotations/face.csv', header=None)
motorcycle = pd.read_csv('dataset/annotations/motorcycle.csv', header=None)
air.rename(columns={1:'x1',2:'y1',3:'x2',4:'y2',0:'filename',5:'target'}, inplace=True)
face.rename(columns={1:'x1',2:'y1',3:'x2',4:'y2',0:'filename',5:'target'}, inplace=True)
motorcycle.rename(columns={1:'x1',2:'y1',3:'x2',4:'y2',0:'filename',5:'target'}, inplace=True)

air.filename = air.filename.map(lambda x: 'dataset/images/airplane/'+x)
face.filename = face.filename.map(lambda x: 'dataset/images/face/'+x)
motorcycle.filename = motorcycle.filename.map(lambda x: 'dataset/images/motorcycle/'+x)

data=pd.concat([air,face,motorcycle], axis=0, ignore_index=True)

air.head()

	filename	x1	y1	x2	y2	target
0	image_0001.jpg	49	30	349	137	airplane
1	image_0002.jpg	59	35	342	153	airplane
2	image_0003.jpg	47	36	331	135	airplane
3	image_0004.jpg	47	24	342	141	airplane
4	image_0005.jpg	48	18	339	146	airplane
im = plt.imread('dataset/images/airplane/image_0001.jpg')

fig, ax = plt.subplots(1,1)
ax.imshow(im)
pt = mpt.Rectangle((49,30),349-49,137-30, fill=False) 
ax.add_patch(pt)

data.target.value_counts() # imbalanced data

# airplane      800
# motorcycle    798
# face          435
# Name: target, dtype: int64
im = plt.imread(data.loc[0, 'filename'])

def show_images(i):
  im = plt.imread(data.loc[i, 'filename'])
  fig, ax = plt.subplots(1,1)
  ax.imshow(im)
  pt = mpt.Rectangle((data.loc[i,'x1'],data.loc[i,'y1']),
                     data.loc[i,'x2']-data.loc[i,'x1'],
                     data.loc[i,'y2']-data.loc[i,'y1'], fill=False) 
  ax.add_patch(pt)
  
  show_images(832)

One-hot encoding 3가지 방법

1. scikit learn - onehotencoder

2. pandas - get_dummies

3. data['name'] = (data.target=='name')*1

4. tf.keras.utils.to_categorical

# label encoding이 되어 있는 것만 바꿀 수 있다

from sklearn.preprocessing import OneHotEncoder

ohe = OneHotEncoder()
ohe.fit_transform(data[['target']]).toarray()

array([[1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       ...,
       [0., 0., 1.],
       [0., 0., 1.],
       [0., 0., 1.]])
pd.get_dummies(data.target)

	airplane	face	motorcycle
0	1	0	0
1	1	0	0
2	1	0	0
3	1	0	0
4	1	0	0
...	...	...	...
2028	0	0	1
2029	0	0	1
2030	0	0	1
2031	0	0	1
2032	0	0	1
data['airplane'] = (data.target=='airplane')*1
data['face'] = (data.target=='face')*1
data['motorcycle'] = (data.target=='motorcycle')*1

data.tail().filename
# 2028    dataset/images/motorcycle/image_0794.jpg
# 2029    dataset/images/motorcycle/image_0795.jpg
# 2030    dataset/images/motorcycle/image_0796.jpg
# 2031    dataset/images/motorcycle/image_0797.jpg
# 2032    dataset/images/motorcycle/image_0798.jpg
# Name: filename, dtype: object
# data['label']=data.target.map({'airplane':0, 'face':1, 'motorcycle':2})
tf.keras.utils.to_categorical(data.label)
# array([[1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       ...,
       [0., 0., 1.],
       [0., 0., 1.],
       [0., 0., 1.]], dtype=float32)

data.drop(columns=['target','label'], inplace=True)

Label encoding

1. map

2. scikit-learn - labelencoder

data['label']=data.target.map({'airplane':0, 'face':1, 'motorcycle:':2})

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
le.fit_transform(data.target)
# array([0, 0, 0, ..., 2, 2, 2])

data.values
array([['dataset/images/airplane/image_0001.jpg', 49, 30, ..., 1, 0, 0],
       ['dataset/images/airplane/image_0002.jpg', 59, 35, ..., 1, 0, 0],
       ['dataset/images/airplane/image_0003.jpg', 47, 36, ..., 1, 0, 0],
       ...,
       ['dataset/images/motorcycle/image_0796.jpg', 47, 40, ..., 0, 0, 1],
       ['dataset/images/motorcycle/image_0797.jpg', 48, 54, ..., 0, 0, 1],
       ['dataset/images/motorcycle/image_0798.jpg', 42, 33, ..., 0, 0, 1]],
      dtype=object)

이미지 한꺼번에 불러오기

1. tf.keras.preprocessing.image_dataset_from_directory => tf.data.Dataset으로 불러온다

2. tf.keras.preprocessing.image.ImageDataGenerator => numpy로 불러온다 / augmentation과 파일 저장을 옵션으로 사용할 있다

 

data.head()

filename	x1	y1	x2	y2	airplane	face	motorcycle
0	dataset/images/airplane/image_0001.jpg	49	30	349	137	1	0	0
1	dataset/images/airplane/image_0002.jpg	59	35	342	153	1	0	0
2	dataset/images/airplane/image_0003.jpg	47	36	331	135	1	0	0
3	dataset/images/airplane/image_0004.jpg	47	24	342	141	1	0	0
4	dataset/images/airplane/image_0005.jpg	48	18	339	146	1	0	0
idg = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1/255) # normalization하면서 불러온다
dg = idg.flow_from_dataframe(data, class_mode='raw', y_col=['x1','x2','y1','y2'], target_size=(64,128)) # resize를 강제시킨다 / 데이터 값을 내가 원하는 형태로 불러올 수 있다 
# localization에 활용하기 좋은 형태로 불러올 수 있다 

ims = next(dg)

ims[1][0], ims[1][1]
# (array([ 82, 366,  56, 171]), array([ 34, 230,  19, 127]))

ims[0][0].shape
# (256, 256, 3)
train = tf.data.Dataset.from_generator(lambda :dg, output_types=(tf.float32,tf.float32))

t = iter(train.take(1))

next(t)
# (<tf.Tensor: shape=(32, 256, 256, 3), dtype=float32, numpy=
 array([[[[1.        , 1.        , 1.        ],
          [1.        , 1.        , 1.        ],
          [1.        , 1.        , 1.        ],
          ...,
         [[0.9960785 , 0.98823535, 0.9921569 ],
          [1.        , 0.9960785 , 1.        ],
          [1.        , 0.9960785 , 0.9921569 ],
          ...,
          [1.        , 1.        , 1.        ],
          [1.        , 1.        , 1.        ],
          [1.        , 1.        , 1.        ]]]], dtype=float32)>,
          <tf.Tensor: shape=(32, 4), dtype=float32, numpy=
 array([[ 45., 345.,  31., 165.],
        [ 38., 238.,  24., 143.],
        [ 48., 351.,  29., 113.],
        [ 49., 209.,  51., 127.],
        [176., 379.,  15., 275.],
        [ 35., 234.,  25., 144.],
        [ 52., 346.,  27., 112.],
        [ 51., 350.,  33., 123.],
        [ 50., 350.,  29., 112.],
        [ 56., 353.,  33., 138.],
        [ 32., 241.,  20., 283.],
        [ 48., 348.,  26.,  92.],
        [ 35., 229.,  22., 151.],
        [ 36., 226.,  23., 129.],
        [ 36., 228.,  60., 174.],
        [ 32., 230.,  21., 118.],
        [ 47., 232.,  26., 136.],
        [ 51., 345.,  29., 132.],
        [171., 379.,  26., 278.],
        [ 59., 350.,  32., 126.],
        [ 36., 227.,  35., 147.],
        [ 24., 349.,  28., 124.],
        [ 29., 228.,  24., 136.],
        [ 32., 229.,  21., 150.],
        [ 35., 227.,  21., 143.],
        [154., 342.,  38., 285.],
        [122., 320.,  16., 253.],
        [ 35., 232.,  15., 132.],
        [ 50., 347.,  27., 119.],
        [ 37., 227.,  32., 164.],
        [ 75., 358.,  42., 153.],
        [ 39., 231.,  27., 140.]], dtype=float32)>)
for i in train.take(1):
  print(i[1])
  
tf.Tensor(
[[ 46. 352.  29. 113.]
 [ 47. 217.  40. 141.]
 [ 34. 228.  18. 121.]
 [266. 462.  54. 324.]
 [207. 391.  20. 282.]
 [ 61. 346.  24. 148.]
 [ 70. 347.  86. 167.]
 [ 33. 228.  17. 127.]
 [ 80. 280.  39. 314.]
 [ 48. 345.  27. 108.]
 [ 35. 229.  20. 126.]
 [ 54. 346.  29. 120.]
 [ 35. 232.  23. 149.]
 [ 61. 350.  27. 113.]
 [ 35. 216.  23. 134.]
 [ 48. 221.  43. 138.]
 [155. 348.  18. 263.]
 [ 52. 349.  28. 116.]
 [160. 375.  28. 304.]
 [ 51. 345.  33. 138.]
 [ 49. 344.  23. 122.]
 [ 55. 345.  22. 140.]
 [ 49. 349.  33. 107.]
 [ 34. 226.  22. 133.]
 [ 62. 353.  67. 127.]
 [ 49. 225.  52. 150.]
 [ 32. 233.  18. 137.]
 [118. 297.  13. 258.]
 [ 43. 344.  31. 117.]
 [ 57. 344.  24. 105.]
 [247. 432.  19. 286.]
 [ 55. 352.  31. 140.]], shape=(32, 4), dtype=float32)
vgg = tf.keras.applications.VGG16(include_top=False)
vgg.trainable = False 

input_ = tf.keras.Input((64,128,3))
x = tf.keras.layers.Lambda(lambda x:tf.keras.applications.vgg16.preprocess_input(x))(input_)
x = vgg(x)
x = tf.keras.layers.Flatten()(x)
x = tf.keras.layers.Dense(4)(x)
model = tf.keras.models.Model(input_, x)

model.compile(loss=tf.keras.losses.MeanAbsoluteError(), metrics=['mae']) 
# Huber는 값이 클 때는 L1 loss, 값이 작을 때는 L2 loss 방식을 쓴다 

model.summary()
Model: "model_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_8 (InputLayer)         [(None, 64, 128, 3)]      0         
_________________________________________________________________
lambda_3 (Lambda)            (None, 64, 128, 3)        0         
_________________________________________________________________
vgg16 (Functional)           (None, None, None, 512)   14714688  
_________________________________________________________________
flatten_3 (Flatten)          (None, 4096)              0         
_________________________________________________________________
dense_3 (Dense)              (None, 4)                 16388     
=================================================================
Total params: 14,731,076
Trainable params: 16,388
Non-trainable params: 14,714,688
_________________________________________________________________
model.fit(train, epochs=1, steps_per_epoch=50)
# 50/50 [==============================] - 933s 19s/step - loss: 108.1627 - mae: 108.1627 

model(x[tf.newaxis])
# <tf.Tensor: shape=(1, 4), dtype=float32, numpy=
# array([[-17.944298 ,  15.799198 ,  19.511517 ,   1.9699388]],
#       dtype=float32)>
im = Image.open('dataset/images/airplane/image_0001.jpg')
x = np.array(im.resize((128,64)))

plt.imshow(x)

Fast R-CNN

Localization + Classification을 한방에 처리하는 모델

Localization 문제를 CNN으로 풀수 있지만 Machine learning은 ad-hoc 이라는 특성 때문에

catastrophic forgetting문제와 multi-loss문제로 인해 Localization + classification을 동시에 없었다

왜냐 하면 여태까지 localization을 Regression으로 해결해왔기 때문에 multiple loss를 사용하면 어느 하나 모델은 굉장히 성능이 떨어지는 문제가 있었다

그리고 classification + localization 문제를 CNN하나로 해결 있을 거라 생각하지 했었다

그런데 CNN은 위치정보와 특성정보를 동시에 갖고 있기 때문에 output을 개로 늘려 multiple loss를 동시에 사용해도 학습이 되는 것을 확인할 있었다

 

반응형
728x90
반응형

sdsd

반응형

'Computer_Science > Computer Vision Guide' 카테고리의 다른 글

7-2~3. yolo v1  (0) 2021.10.24
7-1. YOLO - You Only Look Once  (0) 2021.10.24
6-6. TensorFlow hub  (0) 2021.10.22
6-5. opencv를 이용한 SSD Inference 실습2  (0) 2021.10.22
6-4. openCV SSD Inference 1/2  (0) 2021.10.20
728x90
반응형

Tensorflow hub는 tensorflow로 pretrained model를 쉽게 다운로드해서 model로 운용하고, fine tuning할 수 있게 함.

 

git이 코드를 공유하고, 코드파일을 import할 수 있는 것이라면,

tensorflow는 모델을 가져오는 것

 

tensorflow hub tutorial

https://www.tensorflow.org/hub/tutorials/tf2_object_detection?hl=ko 

 

TensorFlow Hub 객체 감지 Colab

ML 커뮤니티 데이는 11월 9일입니다! TensorFlow, JAX에서 업데이트를 우리와 함께, 더 자세히 알아보기 TensorFlow Hub 객체 감지 Colab TensorFlow Hub 객체 감지 Colab에 오신 것을 환영합니다! 이 노트북에서는

www.tensorflow.org

 

object detection model

https://tfhub.dev/s?module-type=image-object-detection 

 

TensorFlow Hub

 

tfhub.dev

 

 

 

# FPN이 있어야 NECK이 있는 것

 

 

반응형

+ Recent posts