전체 글

17일차 - 논문 수업 (CNN) 2021.10.04
16일차 - 논문 수업 (CNN) 2021.10.04
15일차 - CNN 2021.10.04
14일차 - 딥러닝 (Tensorflow) (3) 2021.10.04
4-1~2. pytorch 기반 주요 object detection / segmentation 패키지 2021.09.28
13일차 - 딥러닝 (Tensorflow) (2) 2021.09.28
3-13. 모던 object Detection 모델 아키텍처 2021.09.27
3-12. Video inference 2021.09.27
3-10~11. Faster RCNN Object Detection 2021.09.27
3-9. openCV의 DNN으로 Object Detection 구현 개요 2021.09.27

17일차 - 논문 수업 (CNN)

2021. 10. 4. 20:30

728x90

Convolutional Neural Network

위 막대 그래프는 다양한 분류 방법의 테스트 셋에 대한 오류율 이다

[deslant]는 기울어진 데이터 셋을 기반으로 학습된 분류 모델이다

[dist]는 인위적으로 왜곡된 샘플이 추가된 학습 데이터를 사용한 모델이다

다른 알고리즘과 비교했을 때 성능이 가장 좋았고, 자

기 자신의 예전 모델과도 비교해서 가장 성능이 좋았다

(Boosted 된 LeNet-4가 성능이 가장 좋았다)

Bagging

Bagging은 샘플을 무작위로 여러 번 뽑아 각 모델을 학습시켜 결과를 평균내는 방법이다 (Ensemble 기법중 하나)

Bagging 복원 추출을 하고 전체 데이터에서 일부를 뽑아 모델을 학습시킨다

여러 모델중에서 평균을 내는 방법이기 때문에 편차를 줄여주고 Overfitting을 피할 수 있는 방법이다

일반적인 모델을 만드는 것에 강점이 있다

ex) random forest

Boosting

Boosting은 Bagging과 동일하게 전체 데이터에서 일부 데이터만 사용하고 복원 추출을 한다

두 방법의 차이점은 잘 못 분류한 데이터들이 있을 경우 가중치를 두고 모델을 점차 강화한다는 점이다

학습이 끝나면 나온 결과에따라 가중치가 재분배 된다

오답에 대해 높은 가중치를 부여하고, 정답에 대해 낮은 가중치를 부여하기 때문에 오답에 더욱 집중하여 학습을 하게 된다

Bagging보다 좀 더 맞추기 어려운 문제나 성능을 좋은 장점은 있지만 outlier에 취약한 단점을 지니고 있다

ex) XGBoost, AdaBoost, GradientBoost

Convolution 두 가지 관점

1. 미리 자르는 방식

2. 이동하여 자르는 방식 (좌상->우하)

이미지 데이터에 대해서 전통적인 NN보다 CNN이 더 좋은 이유

전통적인 NN에서 이미지 데이터를 2D에서 1D로 변형할 때 두 가지 문제가 발생한다

1. column이 많아지는 문제

2. Locality를 상실한다

그러나 CNN에서는 두 가지 문제를 해결할 수 있다

1. convolution filter를 거치는 것은 데이터의 차원을 줄이지않고 그대로 사용하기 때문에 column이 많이 늘어나지 않는다. 즉, 적은 column을 갖는다

2. convolution filter를 거치면 특징이 있는지 없는지 관한 데이터로 변형 되기 때문에 Locality를 잃지 않는다

NN은 오른쪽 사진이 사람이라는 것과 왼쪽 사진이 사람이 아니라는 것을 잘 구별은 하지만

사람의 얼굴 크기가 많이 달라진다거나 방향이 비틀어졌거나 다른 위치에 있으면 사람이더라도

사람이라고 구별을 잘 못하는 경우가 발생할 수 있다

반면 CNN은 오른쪽 사진이 사람이라는 것과 사람의 얼굴 크기가 다르거나, 위치가 다르거나 방향이 뒤틀려도 잘 구별한다

하지만 왼쪽 사진 처럼 사람의 특징이 부분 별로 나뉘어져 있어도 전체를 보고 판단하지 않기 때문에 사람이라고 구별하는 오류를 범할 수 있다

Locally connected neural network

Locally connected neural network는 전통적인 NN에서 특징들의 위치 데이터에 민감하여

다양한 경우의 이미지를 구별하지 못하는 문제점을 보완한 방법이다

(ex) 크기가 다르거나 뒤틀리거나 하는 이미지를 구별하지 못하는 문제)

가까운 노드 끼리만 연결된다

연산 복잡도도 줄어들고 상대적인 위치에 대한 정보를 보기 때문에

shared weight를 쓰지 않았기 때문에 같은 값이면 같은 특징이다라는 점을 활용하지 못한다

Convolutional neural network(s

hared-weight local)

CNN은 shared-weight를 사용하면서 locally connected neural network인 모델이다

shared-weight를 사용하기 때문에 같은 값이면 같은 특징이다라는 점을 활용할 수 있고,

locally connected하기 때문에 부분적인 특징을 보고 특징이 있는지 없는지 여부를 판단할 수 있다

(Locality를 잃지 않는다)

CNN의 가정

1. Stationarity of statistics

- 정상성

- 이미지에서의 정상성이란 이미지의 한 부분에 대한 통계가 다른 부분들과 동일하다는 가정을 한다

- 이미지에서 한 특징이 위치에 상관없이 여러 군데 존재할 수 있고 특정 부분에서 학습된 특징 파라미터를 이용해

다른 위치에서도 동일한 특징을 추출할 수 있다는 의미이다

2. Locality of pixel dependencies

- 이미지는 작은 특징들로 구성되어 있기 때문에 각 픽셀들의 종속성은 특징이 있는 작은 지역으로 한정된다.

- 이미지를 구성하는 특징들은 이미지 전체가 아닌 일부 지역에 근접한 픽셀들로만 구성되고

근접한 픽셀들끼리만 종속성을 가진다

위 그림에서 왼쪽의 경우와 오른쪽의 경우는 같은 것이라 판단할 수 있지만

가운데의 것도 같은 것이라고 판단할 수 있을까?

convolutional layer를 한 번만 통과했다면 다른 것이라 판단할 수 있지만

layer를 여러번 통과한다면 세 가지 경우 모두 같은 특성이라고 볼수 있게 된다

=> layer가 많으면 많을 수록 좋은 점

Type Markdown and LaTeX: $α^{2}$

import tensorflow as tf 
from sklearn.datasets import load_digits 
import matplotlib.pyplot as plt

tf.keras.layers.Dense 
tf.keras.layers.LocallyConnected2D # weight를 공유하지 않는다 / 애매하기 때문에 이것도 저것도 아닌 상황에서 성능이 좋을 수 있다 
tf.keras.layers.Conv2D
tf.keras.layers.MaxPool2D (tf.keras.layers.MaxPooling2D)
tf.keras.layers.AvgPool2D (tf.keras.layers.AveragePooling2D)

tf.keras.layers.Conv2D is tf.keras.layers.Convolution2D # 단축 표현 
# True

(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()
data = load_digits()

padding

import numpy as np 
import scipy 
from scipy.ndimage import convolve
from scipy.signal import convolve, convolve2d

a = np.array([-1,0,1])
b = np.arange(5)

convolve(a, b, 'valid') 
# array([-2, -2, -2])

convolve 연산 방법

-1, 0, 1 => 1, 0, -1 (상하,좌우 반전)

0 0 1 2 3 4 x 1 0 -1 곱, 합 => (0x1) + (0x0) + (1x-1) = -1

0 0 1 2 3 4 x 1 0 -1 곱, 합 => (0x1) + (1x0) + (2x-1) = -2

0 0 1 2 3 4 x 1 0 -1 곱, 합=> (1x1) + (2x0) + (3x-1) = -2

a = np.array([-1,0,1])
b = np.arange(4)
convolve(a, b, 'same') # zero padding을 사용한다 
# array([-1, -2, -2])

convolve 연산 방법

-1, 0, 1 => 1, 0, -1 (상하,좌우 반전)

0 0 1 2 3 x 1 0 -1 곱, 합 => (0x1) + (0x0) + (1x-1) = -1

0 0 1 2 3 x 1 0 -1 곱, 합 => (0x1) + (1x0) + (2x-1) = -2

0 0 1 2 3 x 1 0 -1 곱, 합 => (1x1) + (2x0) + (3x-1) = -2

a = np.array([-1,0,1])
b = np.arange(4)

convolve(a, b, 'full') 
# array([ 0, -1, -2, -2,  2,  3])

convolve 연산 방법

-1, 0, 1 => 1, 0, -1 (상하,좌우 반전)

# 하나라도 걸치면 연산 할 수 있도록 0을 추가한다

0 0 0 1 2 3 0 0 x 1 0 -1 곱, 합 => (0x1) + (0x0) + (0x-1) = 0

0 0 0 1 2 3 0 0 x 1 0 -1 곱, 합 => (0x1) + (0x0) + (1x-1) = -1

0 0 0 1 2 3 0 0 x 1 0 -1 곱, 합 => (0x1) + (1x0) + (2x-1) = -2

0 0 0 1 2 3 0 0 x 1 0 -1 곱, 합 => (1x1) + (2x0) + (3x-1) = -2

0 0 0 1 2 3 0 0 x 1 0 -1 곱, 합 => (2x1) + (3x0) + (0x-1) = 2

0 0 0 1 2 3 0 0 x 1 0 -1 곱, 합 => (3x1) + (0x0) + (0x-1) = 3

a = np.array([[-1,0],[1,0]])
b = np.arange(9).reshape(3,3)

convolve2d(a,b,'full') # zero padding을 함으로써 값이 공평한 횟수로 연산된다 즉, 공평하게 특성을 검출할 수 있다 

# array([[ 0, -1, -2,  0],
#        [-3, -3, -3,  0],
#        [-3, -3, -3,  0],
#        [ 6,  7,  8,  0]])

convolve 연산 방법

-1, 0 => 0, 1 (상하,좌우 반전)

1, 0 0,-1

0 0 0 0 0

00 1 2 0 0 1 좌상에서 우하로 연산한다

0 3 4 5 0 x 0 -1

0 6 7 8 0

0 0 0 0 0

convolve2d(a,b,'valid')
# array([[-3, -3],
#        [-3, -3]])

convolve2d(a,b,'same')
# array([[-3, -3],
#        [-3, -3]])

Invariance vs Equivariance

Invariance: 불변성

- CNN은 invariance하다 (Translation에 대해서 invariance하다)

- 똑같은 특징이 있으면 위치와 상관없이 똑같은 값을 예측한다

- CNN은 Rotation, size, viewpoint, illumination에 대해서는 invariance하지 않다

(CNN은 회전되거나, 시점이다르거나, 사이즈가 다르거나 밝기가 다른 이미지는 예측할 수 없다)

Equivariance: 동일한

- input의 위치가 달라지면 output의 위치가 달라진다

- convolution 연산은 equivariance의 특징을 갖는다

CNN 모델이 범용적으로 쓰이려면 data augmentation이 필요하다

Pooling

pooling은 대표적인 값으로 줄이기 때문에 회전된 이미지도 같은 값으로 인식하는 경우가 발생할 수 있다

크기가 줄면서 정보의 손실이 발생할 수 있지만 invariance한 특성을 보장하기 때문에 오히려 성능이 늘어날 수 있다 (항상 그런건 아님)

Striving for Simplicity: The All Convolutional Net

stride를 크게하면 pooling 쓰지 않으면서 pooling을 한 것 같은 효과를 볼 수 있다

conv + pooling => stride를 크게 하여 conv하나로 만든다

pooling을 하는 것보다 연산복잡도가 줄어들기 때문에 성능은 유지하면서 학습 속도를 빠르게 할 수 있다는 장점이 있다

논문 LeNet-5 구현

input_ = tf.keras.Input(shape=(32,32,1))
x = tf.keras.layers.Conv2D(6, 5)(input_) # filter 개수, filter 크기 / stride는 생략되었기 때문에 1이라 가정한다 / padding: valid
x = tf.keras.layers.Activation('tanh')(x) # 그 당시 LeRU가 없었다 
model = tf.keras.models.Model(input_, x)

model.summary()
# Model: "model_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_2 (InputLayer)         [(None, 32, 32, 1)]       0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 28, 28, 6)         156       
_________________________________________________________________
activation_1 (Activation)    (None, 28, 28, 6)         0         
=================================================================
Total params: 156
Trainable params: 156
Non-trainable params: 0
_________________________________________________________________

저작자표시 비영리

'Computer_Science > Visual Intelligence' 카테고리의 다른 글

19일차 - 논문 수업 (CNN) (0)	2021.10.08
18일차 - 논문 수업 (CNN) (0)	2021.10.08
16일차 - 논문 수업 (CNN) (0)	2021.10.04
15일차 - CNN (0)	2021.10.04
14일차 - 딥러닝 (Tensorflow) (3) (0)	2021.10.04

16일차 - 논문 수업 (CNN)

2021. 10. 4. 20:16

728x90

Convolutional Neural Network

합성곱 신경망

Conv2D

- feature를 뽑아내는 3x3 filter

- filter 개수는 32개

- neural network에서 perceptron이 32개있는 것과 유사하다

- 3x3 filter는 처음에 랜덤하게 숫자가 정해진다

- linear연산이기 때문에 다양한 조합을 만들수가 없다

- Hyperparameter: filter 개수, filter 모양, padding, strides

- 3차원 연산

Activation

- ReLU - 0보다 작은 값은 0으로 0보다 큰 값은 값 그대로 출력한다

- non-linear한 함수를 활용하여 다양한 조합을 만든다

Conv2D

- filter를 다시 한번 더 통과 시킨다

- 원본 데이터에서 filter를 적용하면서 특징을 배워간다

- 가장 특징이 잘 나타내도록 변화 하는 작업을 수행한다

Activation

- filter 하나당 9개 weight를 갖고 총 32개 filter가 있기 때문에 총 288개 weight가 생긴다

MaxPooling2D

- 계산 복잡도를 줄이기 위해서 Maxpooling을 사용하여 크기를 줄인다

- 숫자 데이터에 대한 서로 다른 32가지 관점으로 해석을 한 후 데이터를 만들어 낸다

Dropout

- dropout을 사용하게 되면 좀 더 의미 있는 특징을 추출하게 된다

- 노드들을 무작위로 생략시키면서 학습을 하게되면 parameter들의 co-adaptation되는 것을 막을 수 있다

※ co-adaptation : 학습하는 도중 같은 층에서 두 개 이상의 노드의 입력 및 연결강도가 같아지게 되면,

아무리 학습이 진행되어도 그 노드들은 같은 일을 수행하게 되어 불필요한 중복이 생기는 문제

Flatten

- 특징이 잘 나타나도록 변경된 데이터 셋을 학습시키기 위해 1차원 데이터로 변환한다

최종적으로 0-9까지의 특징을 가장 잘 파악할 수 있는 32가지 filter를 학습하게 된다

CNN의 문제점 중 하나는 사람의 얼굴 특징이 모두 있지만 분리되어 있는 사진인 경우에도 얼굴이라고 인식하는 오류를 범한다

3차원 연산

CNN은 color이미지 일때 R,G,B 체널로 분리시켜 3차원 연산을 한다

CNN에서는 흑백 이미지일때에도 차원을 증가시켜 3차원 연산을 한다 (채널 데이터를 갖고 있어야 한다)

CNN에서 이미지를 변화시키는 이유는 구분시키는 특징을 잘 파악하는 데이터로 변화시키기 위해서 이다

서로 분리된 채널에 의한 이미지들은 element wise 연산을 통해 하나의 데이터에 대한 특징을 파악할 수 있다

Reference

Basic Convnet for MNIST

!pip install mglearn

초기값의 중요성

import mglearn
# dtype이 uint8일 때 최대값은 255, 최소값은 0 
# MinMaxScaler로 정규화 할 경우 0과 1사이로 값이 바뀌기 때문에 정사각형 형태로 데이터가 분포한다   
# 따라서 방향에 대한 크기변화가 없기 때문에 빠른 학습속도와 정확한 학습결과를 기대할 수 있다 
mglearn.plot_scaling.plot_scaling()

왜 convolution 연산에 대해서 합을 할까?

convolution 연산을 할때 element wise 연산을 하게되면(분리된 채널에서 합쳐질 때)특성을 알 수 없을 수도 있다

물론 depth wise convolution은 각각의 특성만 독립적으로 연산하는 경우도 있다

그러나 더하는 것이 성능이 더 좋고, convolution 연산 결과가 원본 이미지의 의미가 변하는 경우는 거의 나오지 않는다

왜 그런 경우가 나오지 않을까?

weight와 bias는 학습을 통해서 찾기 때문에 채널별로 서로 다른 결과가 나올수 있도록 학습이 되기 때문이다

(kernel은 특성을 잘 파악하도록 학습이 된다)

합하는 것이 왜 좋을까?

전통적인 NN에 영향을 받아서 hierarchy한 것을 고려했기 때문에 hierarchy 특징을 학습할 수 있다

중간 결과 및 filter 이미지 확인하기

import tensorflow as tf 
import numpy as np

(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()

from sklearn.datasets import load_digits
import matplotlib.pyplot as plt

data = load_digits()

data.data[0]
array([ 0.,  0.,  5., 13.,  9.,  1.,  0.,  0.,  0.,  0., 13., 15., 10.,
       15.,  5.,  0.,  0.,  3., 15.,  2.,  0., 11.,  8.,  0.,  0.,  4.,
       12.,  0.,  0.,  8.,  8.,  0.,  0.,  5.,  8.,  0.,  0.,  9.,  8.,
        0.,  0.,  4., 11.,  0.,  1., 12.,  7.,  0.,  0.,  2., 14.,  5.,
       10., 12.,  0.,  0.,  0.,  0.,  6., 13., 10.,  0.,  0.,  0.])
       
data.images[0]
array([[ 0.,  0.,  5., 13.,  9.,  1.,  0.,  0.],
       [ 0.,  0., 13., 15., 10., 15.,  5.,  0.],
       [ 0.,  3., 15.,  2.,  0., 11.,  8.,  0.],
       [ 0.,  4., 12.,  0.,  0.,  8.,  8.,  0.],
       [ 0.,  5.,  8.,  0.,  0.,  9.,  8.,  0.],
       [ 0.,  4., 11.,  0.,  1., 12.,  7.,  0.],
       [ 0.,  2., 14.,  5., 10., 12.,  0.,  0.],
       [ 0.,  0.,  6., 13., 10.,  0.,  0.,  0.]])

plt.imshow(data.images[0], cmap='gray')

data.images.shape
# (1797, 8, 8)

image = data.images.reshape(1797,8,8,1) # CNN에서 사용하는 연산하기 위해서 데이터 하나가 3차원이 되도록 데이터를 변화 시켰다

layer1 = tf.keras.layers.Conv2D(2, (3,3)) # filter 개수, filter 모양(단축 표현 가능 (3,3)=>3)
layer1.built # 일시키기 전까지 초기화가 안된다 => lazy Evaluation / 내부적으로 im2col
# False

layer1(image[0]) # 동시에 여러개 연산하기 때문에 안된다 / 데이터를 하나 연산하더라도 4차원 형태로 만들어 줘야 한다 
# ValueError: Input 0 of layer conv2d_1 is incompatible with the layer: : expected min_ndim=4, found ndim=3. Full shape received: (8, 8, 1)

layer1(image[0][tf.newaxis]) 
<tf.Tensor: shape=(1, 6, 6, 2), dtype=float32, numpy=
array([[[[-10.622592  ,   0.6645769 ],
		 ...
         [ -3.2092843 ,  -0.36533844]]]], dtype=float32)>

layer1.weights # weight를 xavier glorot uniform방식으로 초기화 하고 연산을 한다 
# [<tf.Variable 'conv2d_1/kernel:0' shape=(3, 3, 1, 2) dtype=float32, numpy=
#  array([[[[ 0.06324342,  0.15853754]],
#          [[-0.14388585, -0.19692683]],
#          [[-0.40798104,  0.04143384]]],
#         [[[-0.24675035, -0.07410842]],
#          [[-0.3730538 ,  0.22583339]],
#          [[-0.22161803,  0.13686094]]],
#         [[[ 0.11666891,  0.40331647]],
#          [[-0.17990309,  0.3350769 ]],
#          [[-0.34412956, -0.15513435]]]], dtype=float32)>,
#  <tf.Variable 'conv2d_1/bias:0' shape=(2,) dtype=float32, numpy=array([0., 0.], dtype=float32)>]

len(layer1.weights)
# 2

layer1.weights[0] # filter / kernel은 (3,3,1)가 2개 있다로 해석해야 한다 
# <tf.Variable 'conv2d_1/kernel:0' shape=(3, 3, 1, 2) dtype=float32, numpy=
# array([[[[ 0.06324342,  0.15853754]],
#         [[-0.14388585, -0.19692683]],
#         [[-0.40798104,  0.04143384]]],
#        [[[-0.24675035, -0.07410842]],
#         [[-0.3730538 ,  0.22583339]],
#         [[-0.22161803,  0.13686094]]],
#        [[[ 0.11666891,  0.40331647]],
#         [[-0.17990309,  0.3350769 ]],
#         [[-0.34412956, -0.15513435]]]], dtype=float32)>

layer1.weights[0][...,0] # 첫번째 filter 
<tf.Tensor: shape=(3, 3, 1), dtype=float32, numpy=
array([[[ 0.06324342],
        [-0.14388585],
        [-0.40798104]],

       [[-0.24675035],
        [-0.3730538 ],
        [-0.22161803]],

       [[ 0.11666891],
        [-0.17990309],
        [-0.34412956]]], dtype=float32)>
        
layer1.weights[0][...,1] # 두번째 filter 
<tf.Tensor: shape=(3, 3, 1), dtype=float32, numpy=
array([[[ 0.15853754],
        [-0.19692683],
        [ 0.04143384]],

       [[-0.07410842],
        [ 0.22583339],
        [ 0.13686094]],

       [[ 0.40331647],
        [ 0.3350769 ],
        [-0.15513435]]], dtype=float32)>
        
np.squeeze(layer1.weights[0][...,1])
array([[ 0.15853754, -0.19692683,  0.04143384],
       [-0.07410842,  0.22583339,  0.13686094],
       [ 0.40331647,  0.3350769 , -0.15513435]], dtype=float32)
       
tf.reshape((np.squeeze(layer1.weights[0][...,1])),(3,3))
<tf.Tensor: shape=(3, 3), dtype=float32, numpy=
array([[ 0.15853754, -0.19692683,  0.04143384],
       [-0.07410842,  0.22583339,  0.13686094],
       [ 0.40331647,  0.3350769 , -0.15513435]], dtype=float32)>

plt.imshow(np.squeeze(layer1.weights[0][...,1]), cmap='gray')

plt.imshow(np.squeeze(layer1.weights[0][...,0]), cmap='gray')

layer1.weights[1] # bias
# <tf.Variable 'conv2d_1/bias:0' shape=(2,) dtype=float32, numpy=array([0., 0.], dtype=float32)>

layer1_result = layer1(image[0][tf.newaxis])
layer1_result.shape
# TensorShape([1, 6, 6, 2])

layer1_result[0,...,0]
# <tf.Tensor: shape=(6, 6), dtype=float32, numpy=
array([[-10.622592 , -17.233952 , -14.855642 , -15.18894  , -13.4780655,
         -5.6591477],
       [-14.596352 , -16.461687 ,  -8.46327  , -12.294258 , -13.634556 ,
         -5.975335 ],
       [-14.3555565,  -9.104046 ,  -1.3667734,  -9.231415 , -13.976131 ,
         -5.8030467],
       [-13.614567 ,  -7.204097 ,  -0.2758534,  -9.567868 , -13.996439 ,
         -5.7096176],
       [-13.090911 ,  -9.931416 ,  -5.137371 , -12.04954  , -11.825691 ,
         -4.75425  ],
       [-10.976872 , -13.707218 , -12.3282795, -12.9457   , -10.296714 ,
         -3.2092843]], dtype=float32)>
         
plt.imshow(layer1_result[0,...,0], cmap='gray') # 8x8 이미지에서 3x3 filter를 썻기 때문에 8-3+1 = 6 => 6x6 결과가 나온다

Convolution layer를 통과한 데이터

(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()
X_train = X_train.reshape(-1,28,28,1)
input_ = tf.keras.Input(shape=(28,28,1))
x = tf.keras.layers.Conv2D(10,3)(input_)
model = tf.keras.models.Model(input_,x)

plt.imshow(model(X_train)[0][...,0], cmap='gray')

plt.imshow(model(X_train)[0][...,1], cmap='gray')

plt.imshow(model(X_train)[0][...,2], cmap='gray')

Convolution layer -> ReLU를 통과한 데이터

(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()
X_train = X_train.reshape(-1,28,28,1)
input_ = tf.keras.Input(shape=(28,28,1))
x = tf.keras.layers.Conv2D(10,3)(input_)
x = tf.keras.layers.ReLU()(x)
model = tf.keras.models.Model(input_,x)

plt.imshow(model(X_train)[0][...,0], cmap='gray')

plt.imshow(model(X_train)[0][...,0], cmap='binary')

plt.imshow(model(X_train)[0][...,1], cmap='gray')

plt.imshow(model(X_train)[0][...,1], cmap='binary')

plt.imshow(model(X_train)[0][...,2], cmap='gray')

plt.imshow(model(X_train)[0][...,2], cmap='binary')

LeNet-5

Gradient-Based Learning Applied to Document Recognition
Yann LeCun, Yoshua Bengio 등..

최초의 Convolutional Neural network가 상업적으로 성공한 논문

Feature extracion module은 trainable classifier module의 성능을 높이기 위한 수단이었다

나중에 Feature extracion module와 trainable classifier module를 결합해서 End-to-End로 만들었다

32x32를 한 이유는 정 중앙에 숫자를 넣기 위해서 조금 더 크게 만들었다

=> 제약 조건을 만들었다

성능을 높이기 위해서 가정을 추가한 것이다

이미지 크기가 크면 클수록 연산해야 할것이 많아지고, 많은 데이터가 필요하다

그런데 subsampling을 한다는 것은 줄이는 것이기 때문에 정보의 손실이 발생한다

이 지점에서 trade off가 발생한다

데이터가 많을 경우 pooling을 안하는 것이 성능이 더 좋다 하지만, 데이터가 적을 경우는 pooling을 하는 것이 더 좋다

subsampling으로 계산 복잡도를 줄였기 때문에 특징을 더 늘려야 한다 (이미지 피라미드)

데이터가 많으면 많을 수록 column이 많이 있는 것이 좋다(풍부한 nuance를 갖기 때문에)

특징이 있는지 없는 데이터 형태로 변환 시킨다

Subsampling

전체에서 일부만 뽑아서 줄이는 것 ex) pooling

저작자표시 비영리

'Computer_Science > Visual Intelligence' 카테고리의 다른 글

18일차 - 논문 수업 (CNN) (0)	2021.10.08
17일차 - 논문 수업 (CNN) (0)	2021.10.04
15일차 - CNN (0)	2021.10.04
14일차 - 딥러닝 (Tensorflow) (3) (0)	2021.10.04
13일차 - 딥러닝 (Tensorflow) (2) (0)	2021.09.28

15일차 - CNN

2021. 10. 4. 20:00

728x90

Block연산과 Window연산

from skimage.util import view_as_blocks, view_as_windows
import matplotlib.pyplot as plt
import skimage
from skimage.data import camera
import numpy as np

Block

겹치지 않게 동일한 크기의 block형태로 분할연산을 한다

camera = camera()
plt.imshow(camera, cmap='gray')

view_as_blocks(camera, (4,4)).shape # 4x4형태로 가로, 세로 128등분 나눠준다 / 겹치지 않게 자르는 방법 
# (128, 128, 4, 4)

camera.shape
# (512, 512)

block = view_as_blocks(camera, (4,4))
block[0,0].mean() # 4x4 이미지를 하나의 값으로 표현할 수 있다 
# 157.125

block_flatten = block.reshape(block.shape[0],block.shape[1],-1)
block_flatten.shape
# (128, 128, 16)

block_flatten[0,0]
# array([156, 157, 160, 159, 156, 157, 159, 158, 158, 157, 156, 156, 160,
       157, 154, 154], dtype=uint8)

block_flatten.mean(axis=2) # 맨 마지막 차원을 평균으로 구해서 128x128 형태로 결과를 낸다
array([[157.125 , 156.625 , 157.5625, ..., 154.75  , 152.4375, 152.1875],
       [156.875 , 156.125 , 157.625 , ..., 154.875 , 152.6875, 152.875 ],
       [154.375 , 156.6875, 156.8125, ..., 148.9375, 150.125 , 151.4375],
       ...,
       [115.5625, 104.5   , 131.    , ..., 102.3125,  91.5   , 101.3125],
       [120.5   , 109.25  , 121.125 , ..., 113.    , 126.5   , 120.75  ],
       [124.0625, 138.5625, 138.75  , ..., 129.4375, 128.375 , 118.9375]])
       
block_flatten.mean(axis=2).shape
# (128, 128)

plt.imshow(block_flatten.mean(axis=2), cmap='gray')

plt.imshow(block_flatten.max(axis=2), cmap='gray')

plt.imshow(block_flatten.min(axis=2), cmap='gray')

Winow

좌상에서 우하 방향으로 겹치면서 동일한 크기의 block형태로 연산을 한다

디테일을 유지한체 이미지를 변화시키는 기법

view_as_windows(camera, (4,4)).shape
# (509, 509, 4, 4)

512-4+1
# 509

view_as_windows(camera, (4,4),2).shape
# (255, 255, 4, 4)

512/2 - 1
# 255.0

window = view_as_windows(camera, (4,4))
window_flatten = window.reshape(window.shape[0],window.shape[1],-1)

plt.imshow(winow_flatten.mean(axis=2), cmap='gray')

plt.imshow(winow_flatten.min(axis=2), cmap='gray')

plt.imshow(winow_flatten.max(axis=2), cmap='gray')

Correlation, Convolution

1. Correlation

- 데이터를 상하, 좌우 반전하지 않고 필터를 통해 합성곱 연산을 한다

2. Convolution

- 데이터를 상하, 좌우 반전후 correlation연산을 하는 방법

def corr(im, kernel, stride=1): # im: 2d
  h,w = im.shape 
  out_h = (h-kernel.shape[0]//stride) + 1
  out_w = (w-kernel.shape[1]//stride) + 1 
  output = np.zeros((out_h, out_w))
  for i in range(out_h):
    for j in range(out_w):
      output[i,j] = (kernel*im[i:i+kernel.shape[0]*stride, j:j+kernel.shape[1]*stride]).sum()
  return output

def filter_by_conv(im, kernel, stride=1): # im: 2d
  h,w = im.shape 
  kernel = np.flipud(np.fliplr(kernel)) # 커널이 상-하, 좌-우가 반전된다 
  out_h = (h-kernel.shape[0]//stride) + 1
  out_w = (w-kernel.shape[1]//stride) + 1 
  output = np.zeros((out_h, out_w))
  for i in range(out_h):
    for j in range(out_w):
      output[i,j] = (kernel*im[i:i+kernel.shape[0]*stride, j:j+kernel.shape[1]*stride]).sum()
  return output

im1 = np.arange(36).reshape(6,6)
kernel = np.array([[1,2],[2,1]])
conv(im1, kernel)
# array([[ 21.,  27.,  33.,  39.,  45.],
#        [ 57.,  63.,  69.,  75.,  81.],
#        [ 93.,  99., 105., 111., 117.],
#        [129., 135., 141., 147., 153.],
#        [165., 171., 177., 183., 189.]])

kernel = np.array([[1,0,-1],[1,0,-1],[1,0,-1]])
plt.imshow(corr(camera,kernel), cmap='gray')

plt.imshow(filter_by_conv(camera, kernel), cmap='gray')

kernel = np.array([[1,2,1],[2,4,2],[1,2,1]])/16 # 가우시안 형태 

plt.imshow(corr(camera,kernel), cmap='gray') # 가우시안 블러

plt.imshow(filter_by_conv(camera, kernel), cmap='gray')

전처리 관점

from sklearn.datasets import load_breast_cancer, load_wine, load_iris
import seaborn as sns

cancer = load_breast_cancer(as_frame=True)
wine = load_wine(as_frame=True)
iris = load_iris(as_frame=True)

print(data.DESCR)

feature extraction은 데이터의 특성을 가장 잘나타나는 형태로 데이터를 구성하는 방법이다

cancer.frame # 특징을 잘 나타내는 값으로 표현했다 (feature extraction) 
	mean radius	mean texture	mean perimeter	mean area	mean smoothness	mean compactness	mean concavity	mean concave points	mean symmetry	mean fractal dimension	radius error	texture error	perimeter error	area error	smoothness error	compactness error	concavity error	concave points error	symmetry error	fractal dimension error	worst radius	worst texture	worst perimeter	worst area	worst smoothness	worst compactness	worst concavity	worst concave points	worst symmetry	worst fractal dimension	target
0	17.99	10.38	122.80	1001.0	0.11840	0.27760	0.30010	0.14710	0.2419	0.07871	1.0950	0.9053	8.589	153.40	0.006399	0.04904	0.05373	0.01587	0.03003	0.006193	25.380	17.33	184.60	2019.0	0.16220	0.66560	0.7119	0.2654	0.4601	0.11890	0
1	20.57	17.77	132.90	1326.0	0.08474	0.07864	0.08690	0.07017	0.1812	0.05667	0.5435	0.7339	3.398	74.08	0.005225	0.01308	0.01860	0.01340	0.01389	0.003532	24.990	23.41	158.80	1956.0	0.12380	0.18660	0.2416	0.1860	0.2750	0.08902	0
2	19.69	21.25	130.00	1203.0	0.10960	0.15990	0.19740	0.12790	0.2069	0.05999	0.7456	0.7869	4.585	94.03	0.006150	0.04006	0.03832	0.02058	0.02250	0.004571	23.570	25.53	152.50	1709.0	0.14440	0.42450	0.4504	0.2430	0.3613	0.08758	0
3	11.42	20.38	77.58	386.1	0.14250	0.28390	0.24140	0.10520	0.2597	0.09744	0.4956	1.1560	3.445	27.23	0.009110	0.07458	0.05661	0.01867	0.05963	0.009208	14.910	26.50	98.87	567.7	0.20980	0.86630	0.6869	0.2575	0.6638	0.17300	0
4	20.29	14.34	135.10	1297.0	0.10030	0.13280	0.19800	0.10430	0.1809	0.05883	0.7572	0.7813	5.438	94.44	0.011490	0.02461	0.05688	0.01885	0.01756	0.005115	22.540	16.67	152.20	1575.0	0.13740	0.20500	0.4000	0.1625	0.2364	0.07678	0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
564	21.56	22.39	142.00	1479.0	0.11100	0.11590	0.24390	0.13890	0.1726	0.05623	1.1760	1.2560	7.673	158.70	0.010300	0.02891	0.05198	0.02454	0.01114	0.004239	25.450	26.40	166.10	2027.0	0.14100	0.21130	0.4107	0.2216	0.2060	0.07115	0
565	20.13	28.25	131.20	1261.0	0.09780	0.10340	0.14400	0.09791	0.1752	0.05533	0.7655	2.4630	5.203	99.04	0.005769	0.02423	0.03950	0.01678	0.01898	0.002498	23.690	38.25	155.00	1731.0	0.11660	0.19220	0.3215	0.1628	0.2572	0.06637	0
566	16.60	28.08	108.30	858.1	0.08455	0.10230	0.09251	0.05302	0.1590	0.05648	0.4564	1.0750	3.425	48.55	0.005903	0.03731	0.04730	0.01557	0.01318	0.003892	18.980	34.12	126.70	1124.0	0.11390	0.30940	0.3403	0.1418	0.2218	0.07820	0
567	20.60	29.33	140.10	1265.0	0.11780	0.27700	0.35140	0.15200	0.2397	0.07016	0.7260	1.5950	5.772	86.22	0.006522	0.06158	0.07117	0.01664	0.02324	0.006185	25.740	39.42	184.60	1821.0	0.16500	0.86810	0.9387	0.2650	0.4087	0.12400	0
568	7.76	24.54	47.92	181.0	0.05263	0.04362	0.00000	0.00000	0.1587	0.05884	0.3857	1.4280	2.548	19.15	0.007189	0.00466	0.00000	0.00000	0.02676	0.002783	9.456	30.37	59.16	268.6	0.08996	0.06444	0.0000	0.0000	0.2871	0.07039	1
569 rows × 31 columns

iris.frame # 데이터를 그대로 사용해도 특성을 띄기 때문에 전처리를 하지 않아도 된다 
	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)	target
0	5.1	3.5	1.4	0.2	0
1	4.9	3.0	1.4	0.2	0
2	4.7	3.2	1.3	0.2	0
3	4.6	3.1	1.5	0.2	0
4	5.0	3.6	1.4	0.2	0
...	...	...	...	...	...
145	6.7	3.0	5.2	2.3	2
146	6.3	2.5	5.0	1.9	2
147	6.5	3.0	5.2	2.0	2
148	6.2	3.4	5.4	2.3	2
149	5.9	3.0	5.1	1.8	2
150 rows × 5 columns

sns.pairplot(wine.frame, hue='target') # 
# 원본 데이터를 그 특성에 제일 잘 맞게 다른 형태의 데이터로 변화 시킬 필요가 있다 => Featured data 형태로 바꾼다 
# min-max 또는 standard scaling을 사용해서 비정상적으로 영향력이 커지는 경우를 방지하기도 한다 
wine.frame.boxplot(figsize=(20,8))

Image의 문제점

1. 2차원 데이터 => 1차원 데이터 변환하는 과정에 column이 너무 많아진다

2. i.i.d(비정형 데이터의 특성)에서 벗어난다 => 전통적인 머신러닝을 사용하여 성능을 높이려면 가정이 많이 필요하다

3. 이미지 자체가 특성이 뚜렷하지 않은 경우 feature extraction을 통해 featured data로 변환해야 한다

어떤 필터를 적용하면 성능이 높아질까?

filter

- filter를 사용하면 가장 특징이 잘 나타나게 data를 변형 할 수 있다 (특징의 존재 여부로 문제를 변형한다)

- filter를 사용하여 이미지의 특성을 뚜렷하게 만든다

- filter를 사용하면 특징을 유지한체 크기가 줄어들기 때문에 데이터의 차원이 줄어드는 효과를 기대할 수 있다

- filter는 feature cross와 유사하다

Feature cross

feature를 합성해서 본래 갖고 있던 의미를 잃지 않은 채로 차원을 축소시키는 방법

filter를 사용하면 image 데이터의 문제점을 보완해주기 때문에 굉장히 좋은 방법이지만,

어떤 filter를 사용해야 하는지 선택하는 문제는 어려운 문제이다

결국 어떤 필터를 적용할지 학습을 통해서 찾을수 있지 않을까? 하는 의문으로 해결하게 된다

from sklearn.preprocessing import StandardScaler
ss = StandardScaler()
wine.frame.iloc[:,:-1] = ss.fit_transform(wine.frame.iloc[:,:-1])
wine.frame.boxplot(figsize=(20,8)) # 데이터가 다른 형태로 바뀌었다

Convolution Neural Network

- 1989 - 1991 LeNet-1 - 1997 LeNet-5 (MNIST)

convolution filter의 특징

1. neural network 연산과 똑같다 (곱 + 합)

2. window연산을 하기 때문에 전통적인 fully connected 방식과 달리 구해야할 weight의 수가 급격하게 줄어든다 (parameter가 줄어든다)

=> 차원의 저주를 피할 수 있다

3. stationarity: 이미지의 특정 구간에서 filter를 거친 값이 A라고 했을 때 또 다른 구간에서 A라는 값이나오면 두 구간은 특성이 같다(numerical stability와 유사한 의미)

=> 서로 다른 위치의 값이 유사할 경우 그 부분은 특성이 유사하다

저작자표시 비영리

'Computer_Science > Visual Intelligence' 카테고리의 다른 글

17일차 - 논문 수업 (CNN) (0)	2021.10.04
16일차 - 논문 수업 (CNN) (0)	2021.10.04
14일차 - 딥러닝 (Tensorflow) (3) (0)	2021.10.04
13일차 - 딥러닝 (Tensorflow) (2) (0)	2021.09.28
12일차 - 딥러닝 (Tensorflow) (1) (0)	2021.09.26

14일차 - 딥러닝 (Tensorflow) (3)

2021. 10. 4. 19:37

728x90

Tensorflow

import tensorflow as tf 
(X_train, y_train), (X_test,y_test) = tf.keras.datasets.fashion_mnist.load_data()
X_train.dtype
# dtype('uint8')

# Normalization (min-max) => 최소값 0 최대값 1로 정규화를 한다 => 학습시 훨씬 더 효율적으로 할 수 있기 때문에 더 정확하고 빠르게 할 수 있다 
X_train = X_train / 255
X_test = X_test / 255


# 전처리는 model안에서

# Sequential 버전 
model = tf.keras.Sequential([
  tf.keras.layers.Flatten(input_shape=(28,28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dense(10, activation='softmax')
])

# Model 버전 
inputs_ = tf.keras.layers.Input(shape=(28,28))
x = tf.keras.layers.Flatten()(inputs_)
x = tf.keras.layers.Dense(128)(x)
x = tf.keras.layers.ReLU()(x)
outputs = tf.keras.layers.Dense(10, activation='softmax')(x)

model = tf.keras.models.Model(inputs_, outputs)

model.summary()
Model: "model_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_5 (InputLayer)         [(None, 28, 28)]          0         
_________________________________________________________________
flatten_6 (Flatten)          (None, 784)               0         
_________________________________________________________________
dense_12 (Dense)             (None, 128)               100480    
_________________________________________________________________
re_lu_4 (ReLU)               (None, 128)               0         
_________________________________________________________________
dense_13 (Dense)             (None, 10)                1290      
=================================================================
Total params: 101,770
Trainable params: 101,770
Non-trainable params: 0
_________________________________________________________________

model.trainable_weights # 학습 가능한 가중치를 학습시킬 것이다
[<tf.Variable 'dense_2/kernel:0' shape=(784, 128) dtype=float32, numpy=
 array([[ 0.02516936,  0.06438791,  0.018306  , ..., -0.01758076,
          0.06290304,  0.05140124],
        [-0.05912167, -0.01394046,  0.00835894, ...,  0.07631402,
         -0.02987464,  0.03332863],
        [-0.0592133 ,  0.03133401,  0.07544664, ...,  0.0805388 ,
         -0.07592296, -0.03545989],
        ...,
        [ 0.04126004,  0.02557782, -0.07374297, ..., -0.01776882,
          0.05950726,  0.03152514],
        [ 0.04481585, -0.01106533,  0.0462807 , ...,  0.02908169,
          0.02692767, -0.03776729],
        [ 0.00387245, -0.05717902,  0.07691073, ..., -0.03013743,
          0.04765027, -0.012512  ]], dtype=float32)>,
 <tf.Variable 'dense_2/bias:0' shape=(128,) dtype=float32, numpy=
 array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)>,
 <tf.Variable 'dense_3/kernel:0' shape=(128, 10) dtype=float32, numpy=
 array([[-0.15315153,  0.0456792 , -0.01593776, ...,  0.10890259,
          0.05543895,  0.05593623],
        [ 0.13440786,  0.08750169,  0.10087   , ..., -0.15075403,
         -0.13427767,  0.15088643],
        [ 0.00122444,  0.02775083, -0.05616018, ..., -0.01628929,
         -0.00568283, -0.05492131],
        ...,
        [ 0.07487939, -0.07071173,  0.1954471 , ..., -0.01074487,
          0.10662811, -0.13067412],
        [ 0.13852535,  0.161729  ,  0.18972401, ..., -0.1629928 ,
          0.04696299,  0.08850865],
        [-0.13064459, -0.07557841, -0.06935515, ...,  0.19506954,
          0.18600823,  0.00797179]], dtype=float32)>,
 <tf.Variable 'dense_3/bias:0' shape=(10,) dtype=float32, numpy=array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)>]

Loss function

모델은 학습을 할때 실제값 - 예측값을 최소화 하는 방향으로 학습을 하는데

이때 loss function은 예측값이 실제값에 가까워지는지 판단하는 척도가 되는 함수이다

tf.keras.losses.binary_crossentropy # parameter 변경 불가능, decorator로 기능을 확장할 수 있다 / 그대로 사용할 때 편리함 
tf.keras.losses.BinaryCrossentropy() # instance 하면서 완전히 다른 값을 가질 수 있다, subclass 상속을 통해서 다른 값을 가질 수 있다 / 인자를 바꾸거나 원하는 형태로 바꿀 때는 클래스 방식을 쓴다
tf.keras.losses.categorical_crossentropy
tf.keras.losses.CategoricalCrossentropy
tf.keras.losses.SparseCategoricalCrossentropy # one-hot-encoding을 내부적으로 해준다 
tf.keras.losses.sparse_categorical_crossentropy # partial을 굳이 사용하지 않고 one-hot-encoding을 할 수 있도록 클래스를 제공한다 

from functools import partial
def x(a,b):
  return a+b
x2 = partial(x, b=1) # 기존의 함수기능을 변경할 수 있다 
x2(3)
# 4

# compile은 내부적으로 computational graph로 바꾸어 lose function를 효율적으로 자동 미분하도록 도와준다  
# tf.keras.losses.categorical_crossentropy의 단축 표현 => categorical_crossentropy 단, 단축표현은 parameter 변경 불가능 
model.compile(loss=tf.keras.losses.categorical_crossentropy)

compile(학습 설계) 할 때 필요한 파라미터 3가지

1. loss function

2. optimizer => 자동 미분을 할 때 사용하는 방법 설정

3. metrics => 평가 기준 설정

# 내가 원하는 값의 형태로 바꾸지 않는 지점까지를 logit이라고 한다 / from_logits=True는 logits으로 부터 실제값 - 예측값을 구하는 방법을 사용하겠다는 의미이다 
model.compile(loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True), optimizer='adam')

'adam' # parameter 변경 불가능 
tf.keras.optimizers.Adam() # parameter 변경 가능

Softmax

softmax는 numerical stability하지 않는 문제가 발생할 수 있다

def softmax(logits): 
  exp = tf.exp(logits)
  return exp / tf.reduce_sum(exp)
  
softmax([10000.,0.]) # softmax는 지수연산을 하기 때문에 큰 값이 들어오면 쉽게 오버플로우가 발생한다 
# <tf.Tensor: shape=(2,), dtype=float32, numpy=array([nan,  0.], dtype=float32)>

softmax([1.,3.,2.])
# <tf.Tensor: shape=(3,), dtype=float32, numpy=array([0.09003057, 0.66524094, 0.24472848], dtype=float32)>

def softmax2(logits): # 큰 값이 들어갔을 때 오버플로우를 방지하는 방법 그러나 이 또한 문제가 있다
  exp = tf.exp(logits-tf.reduce_max(logits))
  return exp / tf.reduce_sum(exp)
  
softmax2([100000., 0.])
# <tf.Tensor: shape=(2,), dtype=float32, numpy=array([1., 0.], dtype=float32)>

softmax2([100.,3.,0.]) # 가장 큰값만 값을 가지고 나머지는 값을 못갖게 되는 일도 발생한다 
# <tf.Tensor: shape=(3,), dtype=float32, numpy=array([1., 0., 0.], dtype=float32)>

softmax2([100.,300.,200.])
# <tf.Tensor: shape=(3,), dtype=float32, numpy=array([0., 1., 0.], dtype=float32)>

softmax2([0.00011,0.000012,0.000014]) # 너무 작은 값이 들어가도 언더플로우가 발생하여 모두 동일한 비율로 들어가게 된다 
# <tf.Tensor: shape=(3,), dtype=float32, numpy=array([0.3333549 , 0.33332223, 0.3333229 ], dtype=float32)>

학습과정 설정 및 학습(compile & fit)

model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False), 
              optimizer='adam', metrics=['acc'])
              
model.fit(X_train,y_train, epochs=2) # 학습 데이터와 epoch을 정해주면 학습이 가능하다 / default로 epoch=1, batch size = 32
Epoch 1/2
1875/1875 [==============================] - 5s 2ms/step - loss: 0.5010 - acc: 0.8243
Epoch 2/2
1875/1875 [==============================] - 4s 2ms/step - loss: 0.3774 - acc: 0.8643
<keras.callbacks.History at 0x7efc4ebf33d0>

X_train.shape # 6만개 데이터 
# (60000, 28, 28)

60000/1875 # batch size
# 32.0

# loss, metrics return 
model.train_on_batch(X_train, y_train) # fit은 batch size만큼 train_on_batch가 실행된다 ex) 데이터 개수 = 60000, batch size=100 600번 train_on_batch실행 
# [0.38287022709846497, 0.8614166378974915]

GridSearch CV

Hyperparameter tuning 방법중 하나이다

from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import GridSearchCV

knn = KNeighborsClassifier(n_neighbors=5)

GridSearchCV(KNeighborsClassifier(), {'n_neighbors':[5,6,7,8,9,10]}) # 사용자가 지정한 파라미터를 반복문을 사용하는 것처럼 구해주는 방법

Callback

callback은 특정 상황에 대해서 그 결과를 판단할 수 있도록 결과값을 알려주는 기능

class MyCallback(tf.keras.callbacks.Callback):
  def on_epoch_end(self, epoch, logs=None): # signature를 동일하게 맞춰야 한다 
    print('epoch', epoch+1, '시작')
    
dir(tf.keras.callbacks.Callback)

dir(tf.keras.callbacks)
['BaseLogger',
 'CSVLogger',
 'Callback',
 'CallbackList',
 'EarlyStopping',
 'History',
 'LambdaCallback',
 'LearningRateScheduler',
 'ModelCheckpoint',
 'ProgbarLogger',
 'ReduceLROnPlateau',
 'RemoteMonitor',
 'TensorBoard',
 'TerminateOnNaN',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '_sys',
 'experimental']

h = model.fit(X_train,y_train, steps_per_epoch=100, epochs=2, callbacks=[MyCallback()])
Epoch 1/2
100/100 [==============================] - 0s 4ms/step - loss: 0.2951 - acc: 0.8926
epoch 1 시작
Epoch 2/2
100/100 [==============================] - 0s 4ms/step - loss: 0.2919 - acc: 0.8947
epoch 2 시작

inputs_ = tf.keras.layers.Input(shape=(28,28))
x = tf.keras.layers.Flatten()(inputs_)
x = tf.keras.layers.Dense(2)(x)
x = tf.keras.layers.ReLU()(x)
outputs = tf.keras.layers.Dense(10, activation='softmax')(x)

model = tf.keras.models.Model(inputs_, outputs)

model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False), 
              optimizer='adam', metrics=['accuracy'])

model.weights[0] # 학습전  
<tf.Variable 'dense_14/kernel:0' shape=(784, 2) dtype=float32, numpy=
array([[ 0.0621696 ,  0.03114827],
       [ 0.05718201, -0.06315091],
       [ 0.01374986, -0.00775776],
       ...,
       [ 0.00518525,  0.02914727],
       [ 0.04352903, -0.05101464],
       [ 0.03367535,  0.01672252]], dtype=float32)>
       
model.fit(X_train, y_train)
1875/1875 [==============================] - 5s 2ms/step - loss: 1.3585 - accuracy: 0.4977
<keras.callbacks.History at 0x7efc51d21ed0>

model.weights[0] # 학습 후 (epoch 1)
<tf.Variable 'dense_14/kernel:0' shape=(784, 2) dtype=float32, numpy=
array([[ 0.01506082,  0.07337335],
       [ 0.02667496, -0.08142766],
       [-0.03281154, -0.00934248],
       ...,
       [ 0.02592397,  0.03641215],
       [ 0.06711672, -0.02997859],
       [ 0.01211459,  0.00046465]], dtype=float32)>
       
model.fit(X_train, y_train, epochs=3) # 이전에 epoch이 한번 돌아갔으므로 총 epochs은 4번 돌린것과 같다 (mutable이기 때문)       
Epoch 1/3
1875/1875 [==============================] - 5s 2ms/step - loss: 0.9381 - accuracy: 0.6460
Epoch 2/3
1875/1875 [==============================] - 4s 2ms/step - loss: 0.8567 - accuracy: 0.6885
Epoch 3/3
1875/1875 [==============================] - 4s 2ms/step - loss: 0.8133 - accuracy: 0.7038
<keras.callbacks.History at 0x7efc51d5e650>

model.weights[0] # 학습 후 (epoch 4)
<tf.Variable 'dense_14/kernel:0' shape=(784, 2) dtype=float32, numpy=
array([[-0.04761839,  0.18095993],
       [-0.0655762 , -0.07643786],
       [-0.17626587,  0.11628444],
       ...,
       [ 0.15831317,  0.12745775],
       [ 0.0916123 ,  0.08825199],
       [ 0.03401611,  0.01437407]], dtype=float32)>
       
model.fit(X_train, y_train, initial_epoch=3, epochs=5) # initial epoch부터 다시 학습을 시작한다 
Epoch 4/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.7889 - accuracy: 0.7098
Epoch 5/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.7734 - accuracy: 0.7145
<keras.callbacks.History at 0x7efc51d21f90>

model.fit(X_train, y_train, epochs=2, verbose=2)  # 숫자가 커지면 표현되는 정보가 줄어든다 
Epoch 1/2
1875/1875 - 3s - loss: 0.9043
Epoch 2/2
1875/1875 - 3s - loss: 0.8777
<keras.callbacks.History at 0x7efc50cd7210>

model.fit(X_train, y_train, epochs=2, verbose=3) 
Epoch 1/2
Epoch 2/2
<keras.callbacks.History at 0x7efc50ca5610>

h = model.fit(X_train, y_train, initial_epoch=5, epochs=20, validation_split=0.3) # 학습을 하면 항상 history callback 이다 
Epoch 6/20
1313/1313 [==============================] - 6s 4ms/step - loss: 0.7630 - accuracy: 0.7176 - val_loss: 0.7644 - val_accuracy: 0.7236
Epoch 7/20
1313/1313 [==============================] - 6s 4ms/step - loss: 0.7571 - accuracy: 0.7197 - val_loss: 0.7743 - val_accuracy: 0.7108
Epoch 8/20
1313/1313 [==============================] - 6s 4ms/step - loss: 0.7529 - accuracy: 0.7206 - val_loss: 0.7593 - val_accuracy: 0.7146
Epoch 9/20
1313/1313 [==============================] - 5s 4ms/step - loss: 0.7485 - accuracy: 0.7202 - val_loss: 0.7564 - val_accuracy: 0.7232
Epoch 10/20
1313/1313 [==============================] - 5s 4ms/step - loss: 0.7454 - accuracy: 0.7230 - val_loss: 0.7620 - val_accuracy: 0.7138
Epoch 11/20
1313/1313 [==============================] - 5s 4ms/step - loss: 0.7426 - accuracy: 0.7242 - val_loss: 0.7688 - val_accuracy: 0.7273
Epoch 12/20
1313/1313 [==============================] - 5s 4ms/step - loss: 0.7410 - accuracy: 0.7253 - val_loss: 0.7562 - val_accuracy: 0.7230
Epoch 13/20
1313/1313 [==============================] - 5s 4ms/step - loss: 0.7383 - accuracy: 0.7252 - val_loss: 0.7649 - val_accuracy: 0.7212
Epoch 14/20
1313/1313 [==============================] - 5s 4ms/step - loss: 0.7361 - accuracy: 0.7273 - val_loss: 0.7529 - val_accuracy: 0.7227
Epoch 15/20
1313/1313 [==============================] - 5s 4ms/step - loss: 0.7342 - accuracy: 0.7278 - val_loss: 0.7500 - val_accuracy: 0.7280
Epoch 16/20
1313/1313 [==============================] - 5s 4ms/step - loss: 0.7330 - accuracy: 0.7300 - val_loss: 0.7539 - val_accuracy: 0.7308
Epoch 17/20
1313/1313 [==============================] - 5s 4ms/step - loss: 0.7321 - accuracy: 0.7296 - val_loss: 0.7486 - val_accuracy: 0.7286
Epoch 18/20
1313/1313 [==============================] - 6s 4ms/step - loss: 0.7303 - accuracy: 0.7314 - val_loss: 0.7518 - val_accuracy: 0.7164
Epoch 19/20
1313/1313 [==============================] - 6s 4ms/step - loss: 0.7288 - accuracy: 0.7309 - val_loss: 0.7553 - val_accuracy: 0.7300
Epoch 20/20
1313/1313 [==============================] - 5s 4ms/step - loss: 0.7272 - accuracy: 0.7303 - val_loss: 0.7629 - val_accuracy: 0.7312

'history' in dir(h)
# True

h.history
{'accuracy': [0.7176190614700317,
  0.7197142839431763,
  ...
  'loss': [0.7630206346511841,
  0.7571397423744202,
  ...
  'val_accuracy': [0.7235555648803711,
  0.710777759552002,
  ...
  'val_loss': [0.764350175857544,
  0.7743284106254578,
  0.7628713250160217]}

import pandas as pd
pd.DataFrame(h.history)
	loss	accuracy	val_loss	val_accuracy
0	0.763021	0.717619	0.764350	0.723556
1	0.757140	0.719714	0.774328	0.710778
2	0.752870	0.720595	0.759297	0.714556
3	0.748499	0.720167	0.756395	0.723222
4	0.745416	0.723024	0.761951	0.713833
5	0.742576	0.724190	0.768764	0.727278
6	0.740972	0.725310	0.756249	0.723000
7	0.738272	0.725167	0.764873	0.721167
8	0.736076	0.727286	0.752897	0.722722
9	0.734219	0.727786	0.749965	0.728000
10	0.732950	0.730024	0.753874	0.730778
11	0.732133	0.729571	0.748595	0.728611
12	0.730341	0.731429	0.751841	0.716389
13	0.728781	0.730857	0.755321	0.730000
14	0.727199	0.730286	0.762871	0.731222


pd.DataFrame(h.history).plot.line()

저작자표시 비영리

'Computer_Science > Visual Intelligence' 카테고리의 다른 글

16일차 - 논문 수업 (CNN) (0)	2021.10.04
15일차 - CNN (0)	2021.10.04
13일차 - 딥러닝 (Tensorflow) (2) (0)	2021.09.28
12일차 - 딥러닝 (Tensorflow) (1) (0)	2021.09.26
11일차 - 영상 데이터 기계학습 활용 (0)	2021.09.26

4-1~2. pytorch 기반 주요 object detection / segmentation 패키지

2021. 9. 28. 10:07

728x90

알고리즘을 패키지화 시키는 것이 대세

code기반, 지원 알고리즘이 많지 않음

Config 기반, Facebook Research 주도

detectron1이 있었는데 mmdetection에 밀리고, v2는 1보다 훨씬 빠르다

Config 기반, 중국 칭화대 중심 openMMlab 주도

발전속도도 빠르고, 지원하는 알고리즘도 많다.

매우 수준 높은 패키지, 난이도는 조금 있다.

kaggle에서도 좋은 성적을 냄

MMDetection

- 2018년 MSCOCO Challenge에서 우승 후 모듈을 확장하여 다수의 알고리즘 수용

- 뛰어난 구현 성능, 효율적인 모듈 설계, config 기반 데이터부터 모델 학습 / 평가 까지 이어지는 간편한 파이프라인 적용

- 코드를 짜는게 아니라 config에 적용함

- pytorch 기반 구현

- backbone이 ResNet이 많음

- SSD에 어울리는 mobileNet이 없다.

YOLO v3도 포함

efficientDet은 업데이트 예정

MMdetection 모델 아키텍처

- Backbone : Feature Extractor(이미지 => Feature Map)

- Neck : backbone과 head를 연결, heads가 feature map의 특성을 보다 잘 해석하고 처리하도록 정제 작업 수행

- DenseHead : Feature map에서 object의 위치와 classification을 처리하는 부분

- ROIExtractor : Feature Map에서 ROI정보를 뽑아내는 부분

- ROIHead(BBoxHead/MaskHead) : ROI정보를 기반으로 Object 위치와 Classification을 수행하는 부분

1) Dataset

model training, model evaluation

annotation, image

3) Model

faster RCNN, SSD, YOLO, RetinaNet

optimasier, epoch 등 선택

=> 알고리즘의 특성을 각각 반영하다보면 config 규모가 커짐

- config로 하면 여러 알고리즘을 써도 편함

# MMdetection Training pipeline

- loop를 iteration을 돌아서 중간에 어떤 조작이 안되고 대신 callback을 건다

- Hook(callback) 통해 학습에 필요한 여러 설정들을 customization 가능

- 대부분 configuration에서 이를 설정함

저작자표시 비영리

'Computer_Science > Computer Vision Guide' 카테고리의 다른 글

4-6. faster-rcnn pretrained model로 video inference 실행 (0)	2021.10.10
4-3-5. MMDetection faster-RCNN pretrained model inference 실행 (0)	2021.10.10
3-13. 모던 object Detection 모델 아키텍처 (0)	2021.09.27
3-12. Video inference (0)	2021.09.27
3-10~11. Faster RCNN Object Detection (0)	2021.09.27

13일차 - 딥러닝 (Tensorflow) (2)

2021. 9. 28. 09:53

728x90

Tensorflow

모델을 만드는 여러가지 표현 방법

1. Sequential

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(2))
model.add(tf.keras.layers.Dense(2))

x = tf.constant([[1,2]])
y = tf.constant([[1,2],[2,3]])

model(x) 
# <tf.Tensor: shape=(1, 2), dtype=float32, numpy=array([[0.3294789, 0.589738 ]], dtype=float32)>

model.built
# True

model.summary()

# Model: "sequential"
# _________________________________________________________________
# Layer (type)                 Output Shape              Param #   
# =================================================================
# dense (Dense)                (1, 2)                    6         
# _________________________________________________________________
# dense_1 (Dense)              (1, 2)                    6         
# =================================================================
# Total params: 12
# Trainable params: 12
# Non-trainable params: 0
# _________________________________________________________________

model(y)
# <tf.Tensor: shape=(2, 2), dtype=float32, numpy=
# array([[0.3294789 , 0.589738  ],
#        [0.2781678 , 0.97357416]], dtype=float32)>

model.summary()
# Model: "sequential"
# _________________________________________________________________
# Layer (type)                 Output Shape              Param #   
# =================================================================
# dense (Dense)                (None, 2)                 6         
# _________________________________________________________________
# dense_1 (Dense)              (None, 2)                 6         
# =================================================================
# Total params: 12
# Trainable params: 12
# Non-trainable params: 0
# _________________________________________________________________

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.InputLayer(input_shape=(2,))) # 두 개의 값을 갖는 1차원 데이터를 받겠다 (데이터의 개수는 상관없이 받는다)
model.add(tf.keras.layers.Dense(2))
model.add(tf.keras.layers.Dense(2))

model.summary()
# Model: "sequential_1"
# _________________________________________________________________
# Layer (type)                 Output Shape              Param #   
# =================================================================
# dense_2 (Dense)              (None, 2)                 6         
# _________________________________________________________________
# dense_3 (Dense)              (None, 2)                 6         
# =================================================================
# Total params: 12
# Trainable params: 12
# Non-trainable params: 0
# _________________________________________________________________

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(2, input_shape=(2,))) # input layer를 추가하지 않고 줄여서 사용할 수 있다 
model.add(tf.keras.layers.Dense(2))

model.summary()
# Model: "sequential_3"
# _________________________________________________________________
# Layer (type)                 Output Shape              Param #   
# =================================================================
# dense_5 (Dense)              (None, 2)                 6         
# _________________________________________________________________
# dense_6 (Dense)              (None, 2)                 6         
# =================================================================
# Total params: 12
# Trainable params: 12
# Non-trainable params: 0
# _________________________________________________________________

model = tf.keras.models.Sequential([
  tf.keras.layers.Dense(2, input_shape=[2]), # (2,)처럼 튜플 형태말고 리스트 형태로도 표현할 수 있다 
  tf.keras.layers.Dense(2)
])

model.summary()
# Model: "sequential_4"
# _________________________________________________________________
# Layer (type)                 Output Shape              Param #   
# =================================================================
# dense_7 (Dense)              (None, 2)                 6         
# _________________________________________________________________
# dense_8 (Dense)              (None, 2)                 6         
# =================================================================
# Total params: 12
# Trainable params: 12
# Non-trainable params: 0
# _________________________________________________________________

Model

inputs = tf.keras.Input(shape=(2,)) # tensor 
x = tf.keras.layers.Dense(2)(inputs)s
outputs = tf.keras.layers.Dense(2)(x)

model = tf.keras.models.Model(inputs, outputs) # input과 output을 같이 인자로 넣어줘야 한다 / 그리고 input으로 tensor를 받아야 한다 

model.summary() # input layer가 명시된다 
# Model: "model"
# _________________________________________________________________
# Layer (type)                 Output Shape              Param #   
# =================================================================
# input_4 (InputLayer)         [(None, 2)]               0         
# _________________________________________________________________
# dense_15 (Dense)             (None, 2)                 6         
# _________________________________________________________________
# dense_16 (Dense)             (None, 2)                 6         
# =================================================================
# Total params: 12
# Trainable params: 12
# Non-trainable params: 0
# _________________________________________________________________

inputs = tf.keras.layers.InputLayer(input_shape=(2,)) # layer
x = tf.keras.layers.Dense(2)(inputs) # tensor를 인자로 받아야 한다 
outputs = tf.keras.layers.Dense(2)(x)
# TypeError: Inputs to a layer should be tensors. Got: <keras.engine.input_layer.InputLayer object at 0x7f09f68a56d0>

# Sequential이 아닌 Model을 사용하는 것은 inputlayer를 표시하여서 조금더 복잡한 모델을 만들수있기 때문

inputs1 = tf.keras.Input(shape=(2,))
inputs2 = tf.keras.Input(shape=(2,))
x1 = tf.keras.layers.Dense(2)(inputs1)
x2 = tf.keras.layers.Dense(2)(inputs2) 
outputs1 = tf.keras.layers.Dense(2)(x1)
outputs2 = tf.keras.layers.Dense(2)(x2)

model = tf.keras.models.Model([inputs1, inputs2], [outputs1,outputs2])

model.summary() # InputLayer나 connected to가 있을 경우 Model로 만들었다는 의미 이다 

# Model: "model_1"
# __________________________________________________________________________________________________
# Layer (type)                    Output Shape         Param #     Connected to                     
# ==================================================================================================
# input_8 (InputLayer)            [(None, 2)]          0                                            
# __________________________________________________________________________________________________
# input_9 (InputLayer)            [(None, 2)]          0                                            
# __________________________________________________________________________________________________
# dense_19 (Dense)                (None, 2)            6           input_8[0][0]                    
# __________________________________________________________________________________________________
# dense_20 (Dense)                (None, 2)            6           input_9[0][0]                    
# __________________________________________________________________________________________________
# dense_21 (Dense)                (None, 2)            6           dense_19[0][0]                   
# __________________________________________________________________________________________________
# dense_22 (Dense)                (None, 2)            6           dense_20[0][0]                   
# ==================================================================================================
# Total params: 24
# Trainable params: 24
# Non-trainable params: 0
# __________________________________________________________________________________________________

Sequential과 달리 input layer와 output layer를 두 개 이상 받을 수 있다

tf.keras.utils.plot_model(model)

tf.keras.utils.plot_model(model, show_shapes=True)

inputs1 = tf.keras.Input(shape=(2,))
inputs2 = tf.keras.Input(shape=(2,))
x1 = tf.keras.layers.Dense(2)(inputs1)
x2 = tf.keras.layers.Dense(2)(inputs2) 
outputs1 = tf.keras.layers.Add()([x1,x2])
outputs2 = tf.keras.layers.Subtract()([x1,x2])

model = tf.keras.models.Model([inputs1, inputs2], [outputs1,outputs2])

tf.keras.utils.plot_model(model, show_shapes=True)

Model과 Sequential을 동시에 활용하는 방법

se = tf.keras.models.Sequential([
  tf.keras.layers.Dense(2),
  tf.keras.layers.Dense(2)                                 
])

inputs_ = tf.keras.Input(shape=(3,))
outputs = se(inputs_)
model = tf.keras.models.Model(inputs_, outputs)

model
# <keras.engine.functional.Functional at 0x7f73508eef10>

model.summary() # 모델 안에 모델을 넣지는 않지만 단순화 하기 위해서 모델 안에 sequential을 넣는 방법을 사용하는 경우는 많다  
# Model: "model_2"
# _________________________________________________________________
# Layer (type)                 Output Shape              Param #   
# =================================================================
# input_6 (InputLayer)         [(None, 3)]               0         
# _________________________________________________________________
# sequential (Sequential)      (None, 2)                 14        
# =================================================================
# Total params: 14
# Trainable params: 14
# Non-trainable params: 0
# _________________________________________________________________

데이터 시각화로 Balanced data인지 구별하기

(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()

X_train.shape
# (60000, 28, 28)

X_train
# array([[[0, 0, 0, ..., 0, 0, 0],
#         [0, 0, 0, ..., 0, 0, 0],
#         [0, 0, 0, ..., 0, 0, 0],
#         ...
#         [0, 0, 0, ..., 0, 0, 0],
#         [0, 0, 0, ..., 0, 0, 0],
#         [0, 0, 0, ..., 0, 0, 0]]], dtype=uint8)

np.unique(y_train, return_counts=True)
# (array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=uint8),
# array([5923, 6742, 5958, 6131, 5842, 5421, 5918, 6265, 5851, 5949]))

plt.hist(y_train) # blanced data
# (array([5923., 6742., 5958., 6131., 5842., 5421., 5918., 6265., 5851.,
#         5949.]),
#  array([0. , 0.9, 1.8, 2.7, 3.6, 4.5, 5.4, 6.3, 7.2, 8.1, 9. ]),
#  <a list of 10 Patch objects>)

MLP

# numerical stability => input이 있으면 output이 있어야 한다

# Normalization 
X_train = X_train / 255 # coersion
X_test = X_test / 255

preprocessing을 모델 안에서 하는지 모델 밖에서 하는지 결정해야 한다

Inside model

- portability, 편리하다

- GPU 연산이 가능하다

Outside model

- 데이터의 형태에 제약이 없어 범용성이 있다

- 기본적으로 CPU방식이지만 CPU + GPU방식을 쓸 수 있다

tensorflow에서는 3차원,즉 흑백 이미지는 (Batch(size), H, W)로 받아들인다

4차원(color 이미지) 일때는 (B, H, W, C) 또는 (B, C, H, W)(theano 기본)으로 받아들인다

Flatten

tf.keras.layers.Flatten()(X_train) # 함수 밖에서 사용하면 cpu로 연산한다  
# <tf.Tensor: shape=(60000, 784), dtype=float32, numpy=
# array([[0., 0., 0., ..., 0., 0., 0.],
#        [0., 0., 0., ..., 0., 0., 0.],
#        [0., 0., 0., ..., 0., 0., 0.],
#        ...,
#        [0., 0., 0., ..., 0., 0., 0.],
#        [0., 0., 0., ..., 0., 0., 0.],
#        [0., 0., 0., ..., 0., 0., 0.]], dtype=float32)>
       
X_train_bw = X_train.reshape(60000, 28, 28, 1)
tf.keras.layers.Flatten()(X_train_bw) 
# <tf.Tensor: shape=(60000, 784), dtype=float32, numpy=
# array([[0., 0., 0., ..., 0., 0., 0.],
#        [0., 0., 0., ..., 0., 0., 0.],
#        [0., 0., 0., ..., 0., 0., 0.],
#        ...,
#        [0., 0., 0., ..., 0., 0., 0.],
#        [0., 0., 0., ..., 0., 0., 0.],
#        [0., 0., 0., ..., 0., 0., 0.]], dtype=float32)>

X_train_bw.flatten().shape
# (47040000,)
# 그냥 하면 depth 까지 다 flat해짐

numpy에서 flatten은 모든 것을 1차원으로 만들어 준다

'__call__' in dir(tf.keras.layers.Flatten()) # instance => Function 
# True

MLP를 만드는 다양한 방법

input_ = tf.keras.Input(shape=(28,28)) 
x = tf.keras.layers.Flatten()(input_) # 1차가 되었기 때문에 fully conntected 모델을 사용할 수 있다 
x = tf.keras.layers.Dense(128, activation='relu')(x) # Dense에는 하나의 데이터가 1차원인 경우에만 집어 넣을 수 있다 

x
# <KerasTensor: shape=(None, 128) dtype=float32 (created by layer 'dense_6')>

input_ = tf.keras.Input(shape=(28,28))
x = tf.keras.layers.Dense(128, activation='relu')(input_) # 2차원 값을 Dense에 넣으면 원하는 output을 만들어 낼 수 없다

x
# <KerasTensor: shape=(None, 28, 128) dtype=float32 (created by layer 'dense_7')>

Option1

input_ = tf.keras.Input(shape=(28,28)) 
x = tf.keras.layers.Flatten()(input_) 
x = tf.keras.layers.Dense(2)(x)
x = tf.keras.layers.Activation('relu')(x) # 또는 x = tf.keras.layers.ReLU()(x)
# Activation layer를 따로 두는 이유는 Batch Normalization을 하기 위해서 이다

Option2

input_ = tf.keras.Input(shape=(28,28)) 
x = tf.keras.layers.Flatten()(input_) 
x= tf.keras.layers.Dense(128, activation='relu')(x)

input_ = tf.keras.Input(shape=(28,28)) 
x = tf.keras.layers.Flatten()(input_) 
x = tf.keras.layers.Dense(128)(x)
x = tf.keras.layers.ReLU()(x)
x = tf.keras.layers.Dropout(0.2)(x)
output = tf.keras.layers.Dense(10, activation='softmax')(x) 
# Batch Normalization과 상관 없기 때문에 관례상 마지막은 단축 표현을 쓴다 

model = tf.keras.models.Model(input_, output)

model.summary() # param이 0인 경우 학습하지 않아도 되는 단순한 연산이다 

# Model: "model_3"
# _________________________________________________________________
# Layer (type)                 Output Shape              Param #   
# =================================================================
# input_9 (InputLayer)         [(None, 28, 28)]          0         
# _________________________________________________________________
# flatten_1 (Flatten)          (None, 784)               0         
# _________________________________________________________________
# dense_8 (Dense)              (None, 128)               100480    
# _________________________________________________________________
# re_lu (ReLU)                 (None, 128)               0         
# _________________________________________________________________
# dropout (Dropout)            (None, 128)               0         
# _________________________________________________________________
# dense_9 (Dense)              (None, 10)                1290      
# =================================================================
# Total params: 101,770
# Trainable params: 101,770
# Non-trainable params: 0
# _________________________________________________________________

neural network parameter 구하는 법

입력값 (28x28) X dense layer nodes(128) + dense layer bias(128) = 100480

model.weights[0] 
# tensorflow에서는 weight를 kernel라고 부른다 그리고 weight는 kernel + bias를 지칭한다  
# kernel은 glorot방식으로 초기화 된다(학습이 빠르고 잘 수렴되는 초기화 방식)

# <tf.Variable 'dense_7/kernel:0' shape=(784, 128) dtype=float32, numpy=
# array([[-0.00481777, -0.05054575, -0.066313  , ...,  0.01311962,
#          0.04805059,  0.00298249],
#        [ 0.07520033, -0.01051669,  0.00903524, ..., -0.07472851,
#          0.01202653, -0.00115251],
#        [-0.01647546, -0.03879095, -0.03718614, ..., -0.05494369,
#         -0.05540787,  0.05530912],
#        ...,
#        [-0.00182921,  0.07649232, -0.07937535, ...,  0.05838447,
#          0.05726648, -0.03632762],
#        [-0.00923116,  0.01400795, -0.05262435, ..., -0.07948828,
#          0.01813819, -0.02076239],
#        [-0.01461188,  0.07456786,  0.0081539 , ...,  0.01914987,
#          0.07928162, -0.03385995]], dtype=float32)>

model.weights[1]
# <tf.Variable 'dense_7/bias:0' shape=(128,) dtype=float32, numpy=
# array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
#        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
#        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
#        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
#        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
#        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
#        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
#        0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)>

model.weights[2]
# <tf.Variable 'dense_8/kernel:0' shape=(128, 10) dtype=float32, numpy=
# array([[-0.01635328,  0.00763369,  0.13723023, ..., -0.18139066,
#          0.09355612,  0.14300655],
#        [-0.06704196, -0.12000238,  0.20172156, ...,  0.0606968 ,
#         -0.02551591, -0.10963563],
#        [ 0.00977565, -0.11188473,  0.15327264, ...,  0.12097086,
#          0.00371699,  0.11089064],
#        ...,
#        [ 0.15975083, -0.12796631, -0.12143513, ...,  0.00048973,
#          0.08025642,  0.09352569],
#        [-0.10482513, -0.00614406,  0.16832988, ...,  0.1809843 ,
#          0.16601638, -0.18317826],
#        [ 0.10724227,  0.1758986 ,  0.03089926, ...,  0.09623767,
#          0.0754603 , -0.14214656]], dtype=float32)>

model.weights[3]
# <tf.Variable 'dense_8/bias:0' shape=(10,) dtype=float32, numpy=array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)>

model.trainable_weights == model.trainable_variables
# True

model.trainable_weights is model.trainable_variables
# False

model.trainable_weights
# [<tf.Variable 'dense_8/kernel:0' shape=(784, 128) dtype=float32, numpy=
#  array([[-0.05531485, -0.07744511, -0.0746323 , ..., -0.06485563,
#          -0.05913215, -0.03761471],
#         [ 0.00124295,  0.07567873,  0.00529402, ...,  0.03713652,
#          -0.03533567,  0.02690652],
#         [ 0.06762456, -0.02606952, -0.03866001, ...,  0.0321365 ,
#          -0.05224103, -0.03288466],
#         ...,
#         [ 0.05756753,  0.03487769, -0.04956114, ...,  0.07006838,
#          -0.04104863, -0.08020123],
#         [ 0.03739367,  0.05591244,  0.0753384 , ...,  0.0743489 ,
#           0.00566504, -0.05400074],
#         [-0.00660357, -0.06026491, -0.07941656, ..., -0.05506966,
#          -0.06525376, -0.05522396]], dtype=float32)>,
#  <tf.Variable 'dense_8/bias:0' shape=(128,) dtype=float32, numpy=
#  array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
#         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
#         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
#         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
#         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
#         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
#         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
#         0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)>,
#  <tf.Variable 'dense_9/kernel:0' shape=(128, 10) dtype=float32, numpy=
#  array([[ 0.09544928,  0.08058016, -0.19936405, ...,  0.06292586,
#           0.05765642, -0.18936837],
#         [-0.03744939, -0.09664733,  0.01312402, ...,  0.10993271,
#           0.14227311,  0.10375817],
#         [-0.11351802,  0.05995773,  0.19267906, ...,  0.02832732,
#           0.1751137 ,  0.08727919],
#         ...,
#         [ 0.18473722,  0.0153936 ,  0.17687447, ..., -0.07043667,
#           0.07194577, -0.2060329 ],
#         [ 0.04917909, -0.19079669,  0.03048648, ..., -0.04494686,
#           0.19417565, -0.18756153],
#         [ 0.04652865,  0.02577874, -0.12224188, ..., -0.19071367,
#           0.04850109,  0.04275919]], dtype=float32)>,
#  <tf.Variable 'dense_9/bias:0' shape=(10,) dtype=float32, numpy=array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)>]

Learning

학습시키는 두 가지 방법

1. compile 방식 => computational graph 형태로 변환 시켜준다 / 내가 만든 모델이 tensorflow에서 지원해준다

2. 직접 구현하는 방식

저작자표시 비영리

'Computer_Science > Visual Intelligence' 카테고리의 다른 글

15일차 - CNN (0)	2021.10.04
14일차 - 딥러닝 (Tensorflow) (3) (0)	2021.10.04
12일차 - 딥러닝 (Tensorflow) (1) (0)	2021.09.26
11일차 - 영상 데이터 기계학습 활용 (0)	2021.09.26
9일차 - 영상 데이터 기계학습 활용 (0)	2021.09.21

3-13. 모던 object Detection 모델 아키텍처

2021. 9. 27. 17:31

728x90

neck => fpn을 위해서 만듬

- backbone에서 size는 줄어는데 depth는 늘어나니까 갈수록 추상화된 정보들이 남음

neck에서 다양한 크기의 feature map을 다시 활용, 풍부하게 뽑아보자

feature pyramid network

mapping으로 정보를 전달해줌 => single shot detection

작은 object들을 보다 잘 detect하기 위해 다양한 feature map을 활용

상위 feature map의 추상화된 정보와 하위 feature map의 정보를 효과적으로 적용

저작자표시 비영리

'Computer_Science > Computer Vision Guide' 카테고리의 다른 글

4-3-5. MMDetection faster-RCNN pretrained model inference 실행 (0)	2021.10.10
4-1~2. pytorch 기반 주요 object detection / segmentation 패키지 (0)	2021.09.28
3-12. Video inference (0)	2021.09.27
3-10~11. Faster RCNN Object Detection (0)	2021.09.27
3-9. openCV의 DNN으로 Object Detection 구현 개요 (0)	2021.09.27

3-12. Video inference

2021. 9. 27. 16:57

728x90

Video Object Detection 수행

원본 영상 보기

!wget -O ./data/Jonh_Wick_small.mp4 https://github.com/chulminkw/DLCV/blob/master/data/video/John_Wick_small.mp4?raw=true

VideoCapture와 VideoWriter 설정하기

VideoCapture를 이용하여 Video를 frame별로 capture 할 수 있도록 설정
VideoCapture의 속성을 이용하여 Video Frame의 크기 및 FPS 설정.
VideoWriter를 위한 인코딩 코덱 설정 및 영상 write를 위한 설정

video_input_path = '/content/data/Jonh_Wick_small.mp4'

cap = cv2.VideoCapture(video_input_path)
frame_cnt = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
print('총 Frame 갯수:', frame_cnt)

# 총 Frame 갯수: 58

video_input_path = '/content/data/Jonh_Wick_small.mp4'
video_output_path = './data/John_Wick_small_cv01.mp4'

cap = cv2.VideoCapture(video_input_path)

codec = cv2.VideoWriter_fourcc(*'XVID')

vid_size = (round(cap.get(cv2.CAP_PROP_FRAME_WIDTH)),round(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))) 
vid_fps = cap.get(cv2.CAP_PROP_FPS )
    
vid_writer = cv2.VideoWriter(video_output_path, codec, vid_fps, vid_size) 

frame_cnt = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
print('총 Frame 갯수:', frame_cnt)

# 총 Frame 갯수: 58

총 Frame 별로 iteration 하면서 Object Detection 수행. 개별 frame별로 단일 이미지 Object Detection과 유사

# bounding box의 테두리와 caption 글자색 지정
green_color=(0, 255, 0)
red_color=(0, 0, 255)

while True:

    hasFrame, img_frame = cap.read()
    if not hasFrame:
        print('더 이상 처리할 frame이 없습니다.')
        break

    rows = img_frame.shape[0]
    cols = img_frame.shape[1]
    # 원본 이미지 배열 BGR을 RGB로 변환하여 배열 입력
    cv_net.setInput(cv2.dnn.blobFromImage(img_frame,  swapRB=True, crop=False))
    
    start= time.time()
    # Object Detection 수행하여 결과를 cv_out으로 반환 
    cv_out = cv_net.forward()
    frame_index = 0
    # detected 된 object들을 iteration 하면서 정보 추출
    for detection in cv_out[0,0,:,:]:
        score = float(detection[2])
        class_id = int(detection[1])
        # detected된 object들의 score가 0.5 이상만 추출
        if score > 0.5:
            # detected된 object들은 scale된 기준으로 예측되었으므로 다시 원본 이미지 비율로 계산
            left = detection[3] * cols
            top = detection[4] * rows
            right = detection[5] * cols
            bottom = detection[6] * rows
            # labels_to_names_0딕셔너리로 class_id값을 클래스명으로 변경.
            caption = "{}: {:.4f}".format(labels_to_names_0[class_id], score)
            #print(class_id, caption)
            #cv2.rectangle()은 인자로 들어온 draw_img에 사각형을 그림. 위치 인자는 반드시 정수형.
            cv2.rectangle(img_frame, (int(left), int(top)), (int(right), int(bottom)), color=green_color, thickness=2)
            cv2.putText(img_frame, caption, (int(left), int(top - 5)), cv2.FONT_HERSHEY_SIMPLEX, 0.5, red_color, 1)
    print('Detection 수행 시간:', round(time.time()-start, 2),'초')
    vid_writer.write(img_frame)
# end of while loop

vid_writer.release()
cap.release()

# Detection 수행 시간: 8.54 초
# Detection 수행 시간: 8.34 초
...
# Detection 수행 시간: 8.38 초
# 더 이상 처리할 frame이 없습니다.

video detection 전용 함수 생성.

def do_detected_video(cv_net, input_path, output_path, score_threshold, is_print):
    
    cap = cv2.VideoCapture(input_path)

    codec = cv2.VideoWriter_fourcc(*'XVID')

    vid_size = (round(cap.get(cv2.CAP_PROP_FRAME_WIDTH)),round(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)))
    vid_fps = cap.get(cv2.CAP_PROP_FPS)

    vid_writer = cv2.VideoWriter(output_path, codec, vid_fps, vid_size) 

    frame_cnt = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
    print('총 Frame 갯수:', frame_cnt)

    green_color=(0, 255, 0)
    red_color=(0, 0, 255)
    while True:
        hasFrame, img_frame = cap.read()
        if not hasFrame:
            print('더 이상 처리할 frame이 없습니다.')
            break
        
        img_frame = get_detected_img(cv_net, img_frame, score_threshold=score_threshold, use_copied_array=False, is_print=is_print)
        
        vid_writer.write(img_frame)
    # end of while loop

    vid_writer.release()
    cap.release()

do_detected_video(cv_net, '/content/data/Jonh_Wick_small.mp4', './data/John_Wick_small_02.mp4', 0.2, False)

# 총 Frame 갯수: 58
# person: 0.9495
# person: 0.2871
# bicycle: 0.3498
# car: 0.9882
# car: 0.9622
# ...
# car: 0.4122
# horse: 0.8085
# tie: 0.3411
# 더 이상 처리할 frame이 없습니다.

저작자표시 비영리

'Computer_Science > Computer Vision Guide' 카테고리의 다른 글

4-1~2. pytorch 기반 주요 object detection / segmentation 패키지 (0)	2021.09.28
3-13. 모던 object Detection 모델 아키텍처 (0)	2021.09.27
3-10~11. Faster RCNN Object Detection (0)	2021.09.27
3-9. openCV의 DNN으로 Object Detection 구현 개요 (0)	2021.09.27
3-8. RPN, Positive Anchor Box (0)	2021.09.26

3-10~11. Faster RCNN Object Detection

2021. 9. 27. 16:53

728x90

OpenCV DNN 패키지를 이용하여 Faster R-CNN 기반의 Object Detection 수행

Tensorflow 에서 Pretrained 된 모델 파일을 OpenCV에서 로드하여 이미지와 영상에 대한 Object Detection 수행.

입력 이미지로 사용될 이미지 다운로드/보기

!mkdir /content/data
!wget -O ./data/beatles01.jpg https://raw.githubusercontent.com/chulminkw/DLCV/master/data/image/beatles01.jpg

import cv2
import matplotlib.pyplot as plt
%matplotlib inline

img = cv2.imread('./data/beatles01.jpg')
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

print('image shape:', img.shape)
plt.figure(figsize=(12, 12))
plt.imshow(img_rgb)

Tensorflow에서 Pretrained 된 Inference모델(Frozen graph)와 환경파일을 다운로드 받은 후 이를 이용해 OpenCV에서 Inference 모델 생성

https://github.com/opencv/opencv/wiki/TensorFlow-Object-Detection-API 에 다운로드 URL 있음.
pretrained 모델은 http://download.tensorflow.org/models/object_detection/faster_rcnn_resnet50_coco_2018_01_28.tar.gz 에서 다운로드 후 압축 해제
pretrained 모델을 위한 환경 파일은 https://github.com/opencv/opencv_extra/blob/master/testdata/dnn/faster_rcnn_resnet50_coco_2018_01_28.pbtxt 에서 다운로드
download된 모델 파일과 config 파일을 인자로 하여 inference 모델을 DNN에서 로딩함.

!mkdir ./pretrained
!wget -O ./pretrained/faster_rcnn_resnet50_coco_2018_01_28.tar.gz http://download.tensorflow.org/models/object_detection/faster_rcnn_resnet50_coco_2018_01_28.tar.gz
!wget -O ./pretrained/config_graph.pbtxt https://raw.githubusercontent.com/opencv/opencv_extra/master/testdata/dnn/faster_rcnn_resnet50_coco_2018_01_28.pbtxt

!tar -xvf ./pretrained/faster*.tar.gz -C ./pretrained

!pwd
!ls -lia ./pretrained/faster_rcnn_resnet50_coco_2018_01_28

dnn에서 readNetFromTensorflow()로 tensorflow inference 모델을 로딩

cv_net = cv2.dnn.readNetFromTensorflow('./pretrained/faster_rcnn_resnet50_coco_2018_01_28/frozen_inference_graph.pb', 
                                     './pretrained/config_graph.pbtxt')

coco 데이터 세트의 클래스id별 클래스name mapping.

# OpenCV Yolo용 
labels_to_names_seq = {0:'person',1:'bicycle',2:'car',3:'motorbike',4:'aeroplane',5:'bus',6:'train',7:'truck',8:'boat',9:'traffic light',10:'fire hydrant',
                        11:'stop sign',12:'parking meter',13:'bench',14:'bird',15:'cat',16:'dog',17:'horse',18:'sheep',19:'cow',20:'elephant',
                        21:'bear',22:'zebra',23:'giraffe',24:'backpack',25:'umbrella',26:'handbag',27:'tie',28:'suitcase',29:'frisbee',30:'skis',
                        31:'snowboard',32:'sports ball',33:'kite',34:'baseball bat',35:'baseball glove',36:'skateboard',37:'surfboard',38:'tennis racket',39:'bottle',40:'wine glass',
                        41:'cup',42:'fork',43:'knife',44:'spoon',45:'bowl',46:'banana',47:'apple',48:'sandwich',49:'orange',50:'broccoli',
                        51:'carrot',52:'hot dog',53:'pizza',54:'donut',55:'cake',56:'chair',57:'sofa',58:'pottedplant',59:'bed',60:'diningtable',
                        61:'toilet',62:'tvmonitor',63:'laptop',64:'mouse',65:'remote',66:'keyboard',67:'cell phone',68:'microwave',69:'oven',70:'toaster',
                        71:'sink',72:'refrigerator',73:'book',74:'clock',75:'vase',76:'scissors',77:'teddy bear',78:'hair drier',79:'toothbrush' }

# OpenCV Tensorflow Faster-RCNN용
labels_to_names_0 = {0:'person',1:'bicycle',2:'car',3:'motorcycle',4:'airplane',5:'bus',6:'train',7:'truck',8:'boat',9:'traffic light',
                    10:'fire hydrant',11:'street sign',12:'stop sign',13:'parking meter',14:'bench',15:'bird',16:'cat',17:'dog',18:'horse',19:'sheep',
                    20:'cow',21:'elephant',22:'bear',23:'zebra',24:'giraffe',25:'hat',26:'backpack',27:'umbrella',28:'shoe',29:'eye glasses',
                    30:'handbag',31:'tie',32:'suitcase',33:'frisbee',34:'skis',35:'snowboard',36:'sports ball',37:'kite',38:'baseball bat',39:'baseball glove',
                    40:'skateboard',41:'surfboard',42:'tennis racket',43:'bottle',44:'plate',45:'wine glass',46:'cup',47:'fork',48:'knife',49:'spoon',
                    50:'bowl',51:'banana',52:'apple',53:'sandwich',54:'orange',55:'broccoli',56:'carrot',57:'hot dog',58:'pizza',59:'donut',
                    60:'cake',61:'chair',62:'couch',63:'potted plant',64:'bed',65:'mirror',66:'dining table',67:'window',68:'desk',69:'toilet',
                    70:'door',71:'tv',72:'laptop',73:'mouse',74:'remote',75:'keyboard',76:'cell phone',77:'microwave',78:'oven',79:'toaster',
                    80:'sink',81:'refrigerator',82:'blender',83:'book',84:'clock',85:'vase',86:'scissors',87:'teddy bear',88:'hair drier',89:'toothbrush',
                    90:'hair brush'}

labels_to_names = {1:'person',2:'bicycle',3:'car',4:'motorcycle',5:'airplane',6:'bus',7:'train',8:'truck',9:'boat',10:'traffic light',
                    11:'fire hydrant',12:'street sign',13:'stop sign',14:'parking meter',15:'bench',16:'bird',17:'cat',18:'dog',19:'horse',20:'sheep',
                    21:'cow',22:'elephant',23:'bear',24:'zebra',25:'giraffe',26:'hat',27:'backpack',28:'umbrella',29:'shoe',30:'eye glasses',
                    31:'handbag',32:'tie',33:'suitcase',34:'frisbee',35:'skis',36:'snowboard',37:'sports ball',38:'kite',39:'baseball bat',40:'baseball glove',
                    41:'skateboard',42:'surfboard',43:'tennis racket',44:'bottle',45:'plate',46:'wine glass',47:'cup',48:'fork',49:'knife',50:'spoon',
                    51:'bowl',52:'banana',53:'apple',54:'sandwich',55:'orange',56:'broccoli',57:'carrot',58:'hot dog',59:'pizza',60:'donut',
                    61:'cake',62:'chair',63:'couch',64:'potted plant',65:'bed',66:'mirror',67:'dining table',68:'window',69:'desk',70:'toilet',
                    71:'door',72:'tv',73:'laptop',74:'mouse',75:'remote',76:'keyboard',77:'cell phone',78:'microwave',79:'oven',80:'toaster',
                    81:'sink',82:'refrigerator',83:'blender',84:'book',85:'clock',86:'vase',87:'scissors',88:'teddy bear',89:'hair drier',90:'toothbrush',
                    91:'hair brush'}

이미지를 preprocessing 수행하여 Network에 입력하고 Object Detection 수행 후 결과를 이미지에 시각화

img.shape
# (633, 806, 3)

# 원본 이미지가 Faster RCNN기반 네트웍으로 입력 시 resize됨. 
# scaling된 이미지 기반으로 bounding box 위치가 예측 되므로 이를 다시 원복하기 위해 원본 이미지 shape정보 필요
rows = img.shape[0]
cols = img.shape[1]
# cv2의 rectangle()은 인자로 들어온 이미지 배열에 직접 사각형을 업데이트 하므로 그림 표현을 위한 별도의 이미지 배열 생성. 
draw_img = img.copy()

# 원본 이미지 배열 BGR을 RGB로 변환하여 배열 입력. Tensorflow Faster RCNN은 마지막 classification layer가 Dense가 아니여서 size를 고정할 필요는 없음.  
cv_net.setInput(cv2.dnn.blobFromImage(img, swapRB=True, crop=False))

# Object Detection 수행하여 결과를 cvOut으로 반환 
cv_out = cv_net.forward()
print(cv_out.shape)

# bounding box의 테두리와 caption 글자색 지정
green_color=(0, 255, 0)
red_color=(0, 0, 255)

# detected 된 object들을 iteration 하면서 정보 추출
for detection in cv_out[0,0,:,:]:
    score = float(detection[2])
    class_id = int(detection[1])
    # detected된 object들의 score가 0.5 이상만 추출
    if score > 0.5:
        # detected된 object들은 scale된 기준으로 예측되었으므로 다시 원본 이미지 비율로 계산
        left = detection[3] * cols
        top = detection[4] * rows
        right = detection[5] * cols
        bottom = detection[6] * rows
        # labels_to_names_seq 딕셔너리로 class_id값을 클래스명으로 변경.
        caption = "{}: {:.4f}".format(labels_to_names_0[class_id], score)
        print(caption)
        #cv2.rectangle()은 인자로 들어온 draw_img에 사각형을 그림. 위치 인자는 반드시 정수형.
        cv2.rectangle(draw_img, (int(left), int(top)), (int(right), int(bottom)), color=green_color, thickness=2)
        cv2.putText(draw_img, caption, (int(left), int(top - 5)), cv2.FONT_HERSHEY_SIMPLEX, 0.4, red_color, 1)

img_rgb = cv2.cvtColor(draw_img, cv2.COLOR_BGR2RGB)

plt.figure(figsize=(12, 12))
plt.imshow(img_rgb)

# (1, 1, 100, 7)
# person: 0.9998
# person: 0.9996
# person: 0.9993
# person: 0.9970
# person: 0.8995
# car: 0.8922
# car: 0.7602
# car: 0.7415
# car: 0.6930
# car: 0.6918
# car: 0.6896
# car: 0.6717
# car: 0.6521
# car: 0.5730
# car: 0.5679
# car: 0.5261
# car: 0.5012
# <matplotlib.image.AxesImage at 0x7f943493e810>

cv_out
# [0, class_id, class_confidence, 좌표 4개]
# array([[[[0.00000000e+00, 0.00000000e+00, 9.99780715e-01,
          2.80248284e-01, 4.11070347e-01, 4.66062069e-01,
          8.59829903e-01],
          ...
          [0.00000000e+00, 8.60000000e+01, 1.52409787e-03,
          6.01132810e-01, 7.01487005e-01, 7.45222032e-01,
          8.93119752e-01]]]], dtype=float32)

단일 이미지의 object detection을 함수로 생성

import time

def get_detected_img(cv_net, img_array, score_threshold, use_copied_array=True, is_print=True):
    
    rows = img_array.shape[0]
    cols = img_array.shape[1]
    
    draw_img = None
    if use_copied_array:
        draw_img = img_array.copy()
    else:
        draw_img = img_array
    
    cv_net.setInput(cv2.dnn.blobFromImage(img_array, swapRB=True, crop=False))
    
    start = time.time()
    cv_out = cv_net.forward()
    
    green_color=(0, 255, 0)
    red_color=(0, 0, 255)

    # detected 된 object들을 iteration 하면서 정보 추출
    for detection in cv_out[0,0,:,:]:
        score = float(detection[2])
        class_id = int(detection[1])
        # detected된 object들의 score가 함수 인자로 들어온 score_threshold 이상만 추출
        if score > score_threshold:
            # detected된 object들은 scale된 기준으로 예측되었으므로 다시 원본 이미지 비율로 계산
            left = detection[3] * cols
            top = detection[4] * rows
            right = detection[5] * cols
            bottom = detection[6] * rows
            # labels_to_names 딕셔너리로 class_id값을 클래스명으로 변경. opencv에서는 class_id + 1로 매핑해야함.
            caption = "{}: {:.4f}".format(labels_to_names_0[class_id], score)
            print(caption)
            #cv2.rectangle()은 인자로 들어온 draw_img에 사각형을 그림. 위치 인자는 반드시 정수형.
            cv2.rectangle(draw_img, (int(left), int(top)), (int(right), int(bottom)), color=green_color, thickness=2)
            cv2.putText(draw_img, caption, (int(left), int(top - 5)), cv2.FONT_HERSHEY_SIMPLEX, 0.4, red_color, 1)
    if is_print:
        print('Detection 수행시간:',round(time.time() - start, 2),"초")

    return draw_img

# image 로드 
img = cv2.imread('./data/beatles01.jpg')
print('image shape:', img.shape)

# tensorflow inference 모델 로딩
cv_net = cv2.dnn.readNetFromTensorflow('./pretrained/faster_rcnn_resnet50_coco_2018_01_28/frozen_inference_graph.pb', 
                                     './pretrained/config_graph.pbtxt')
# Object Detetion 수행 후 시각화 
draw_img = get_detected_img(cv_net, img, score_threshold=0.5, use_copied_array=True, is_print=True)

img_rgb = cv2.cvtColor(draw_img, cv2.COLOR_BGR2RGB)

plt.figure(figsize=(12, 12))
plt.imshow(img_rgb)

# image shape: (633, 806, 3)
# person: 0.9998
# person: 0.9996
# person: 0.9993
# person: 0.9970
# person: 0.8995
# car: 0.8922
# car: 0.7602
# car: 0.7415
# car: 0.6930
# car: 0.6918
# car: 0.6896
# car: 0.6717
# car: 0.6521
# car: 0.5730
# car: 0.5679
# car: 0.5261
# car: 0.5012
# Detection 수행시간: 8.62 초
# <matplotlib.image.AxesImage at 0x7f94346af150>

# 다른 image 테스트
!wget -O ./data/baseball01.jpg https://raw.githubusercontent.com/chulminkw/DLCV/master/data/image/baseball01.jpg

img = cv2.imread('./data/baseball01.jpg')
print('image shape:', img.shape)

# tensorflow inference 모델 로딩
cv_net = cv2.dnn.readNetFromTensorflow('./pretrained/faster_rcnn_resnet50_coco_2018_01_28/frozen_inference_graph.pb', 
                                     './pretrained/config_graph.pbtxt')
# Object Detetion 수행 후 시각화 
draw_img = get_detected_img(cv_net, img, score_threshold=0.5, use_copied_array=True, is_print=True)

img_rgb = cv2.cvtColor(draw_img, cv2.COLOR_BGR2RGB)

plt.figure(figsize=(12, 12))
plt.imshow(img_rgb)

# image shape: (476, 735, 3)
# person: 0.9998
# person: 0.9997
# person: 0.9977
# sports ball: 0.8867
# baseball bat: 0.8420
# baseball glove: 0.9815
# Detection 수행시간: 7.56 초
# <matplotlib.image.AxesImage at 0x7f9434623f50>

저작자표시 비영리

'Computer_Science > Computer Vision Guide' 카테고리의 다른 글

3-13. 모던 object Detection 모델 아키텍처 (0)	2021.09.27
3-12. Video inference (0)	2021.09.27
3-9. openCV의 DNN으로 Object Detection 구현 개요 (0)	2021.09.27
3-8. RPN, Positive Anchor Box (0)	2021.09.26
10일차 - 영상 데이터 기계학습 활용 (0)	2021.09.26

3-9. openCV의 DNN으로 Object Detection 구현 개요

2021. 9. 27. 12:29

728x90

# OpenCV DNN 장단점

1) 장점

- 딥러닝 개발 프레임워크 없이 쉽게 Inference 구현 가능

- OpenCV에서 지원하는 다양한 Computer Vision 처리 API와 Deep Learning을 쉽게 결합

2) 단점

- GPU 지원기능이 약함

- DDN 모듈은 과거에 NVIDIA GPU 지원이 안됨, 2019년 10월에 google에서 NVIDIA GPU 지원 발표함. 아직 환경 구성, 설치가 어려움, 점차 개선 예상됨

- OpenCV는 모델을 학습할수있는 방법을 제공하지 않으며 오직 inference만 가능

- CPU기반에서 Inference 속도가 개선되었으나, NVIDIA GPU가 지원되지 않아 타 Deep learning framework 대비 interence 속도가 크게 저하됨

# 타 deep learning frame과의 연동

- opencv는 자체적으로 딥러닝 가중치 모델을 생성하지 않고 타 framework에서 생성된 모델을 변환하여 로딩함

- dnn패키지는 파일로 생성된 타 프레임워크 모델을 로딩할 수 있도록 readNetFromxxx(가중치 모델, 환경 파일)API 제공

- 가중치 모델파일은 타 프레임워크 모델 파일, 환경 파일은 타 프레임워크 모델 파일의 환경(Config)파일을 DNN패키지에서 다시 변환한 환경 파일

Framework	타 framework 모델 로딩	특징
tensorflow	cvNet = Cv2.dnn.readNetFromTensorflow(모델, 환경파일)	많은 모델 제공
darknet	cvNet = Cv2.dnn.readNetFromDarknet(모델, 환경파일)	yolo만 로딩가능
torch	cvNet = Cv2.dnn.readNetFromTorch(모델, 환경파일)
caffe	cvNet = Cv2.dnn.readNetFromCaffe(모델, 환경파일)

* weights model은 tensorflow API로 PreTrained된 모델(Frozen Graph)를 다운로드

* config는 pretrained된 모델(Frozen Graph)를 구동을 위한 Config 다운로드

opencv지원 tensorflow모델

1) inference 수행시간 위주

- MobileNet-SSD v1

2) inference 정확도 위주

- inception-SSD v2

- Faster RCNN inception v2

- Faster RCNN ResNet-50

# opencv DNN을 이용한 inference 수행 절차

# 가중치 모델 파일과 환경설정 파일을 로드하여 inference network model 생성
cvNet = cv2.dnn.readNetFromTensorflow('frozen_inference_graph.ph', 'graph.pbtxt')

img = cv2.imread('img.jpg')
rows, cols, channels = img.shape

# 입력 이미지를 preprocessing 하여 network 입력
cvNet.setInput(cv2.dnn.blobFromImage(img, size = (300, 300), swapRB=True, crop=False))

# interence network에서 detect된 정보 output 추출
networkOutput = cvNet.forward()

# 추출된 output에서 detect 정보를 기반으로 원본 image위에 object detection 시각화
for detection in networkOutput[0,0]:
	# object detected 결과, bounding box 좌표, 예측 레이블들을 원본 image위에 시각화 로직

# opencv blobFromImage()

Image를 preprocessing 수행하여 network에 입력할수있게 제공

- 이미지 사이즈 고정

- 이미지 값 스케일링

- BGR을 RGB로 변경, 이미지를 CROP할수있는 옵션 제공

# BGR을 해결하는 법

1) cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

2) cvNet.setInput(cv2.dnn.blobFromImage(img, size=(300, 300), swapRB=True, crop=False))

swapRB=True로 변환하여 네트워크에 입력

# video stream capture를 이용한 video object detection

opencv의 videocapture() api를 이용하여 video stream을 frame by frame 별로 capture한 image에 object detection을 수행하는 방식

input_video = cv2.VideoCapture(input_file_path)
while(True) :
	frame 별로 Object Detection 수행

저작자표시 비영리

'Computer_Science > Computer Vision Guide' 카테고리의 다른 글

3-12. Video inference (0)	2021.09.27
3-10~11. Faster RCNN Object Detection (0)	2021.09.27
3-8. RPN, Positive Anchor Box (0)	2021.09.26
10일차 - 영상 데이터 기계학습 활용 (0)	2021.09.26
3-7. anchor box를 이용한 RPN (0)	2021.09.24

PREV 1 ···36 37 38 39 40 41 42 43 NEXT

전체 글

Convolutional Neural Network

Bagging

Boosting

Convolution 두 가지 관점

이미지 데이터에 대해서 전통적인 NN보다 CNN이 더 좋은 이유

Locally connected neural network

Convolutional neural network(s

hared-weight local)

CNN의 가정

padding

Invariance vs Equivariance

Pooling

Striving for Simplicity: The All Convolutional Net

논문 LeNet-5 구현

'Computer_Science > Visual Intelligence' 카테고리의 다른 글

Convolutional Neural Network

3차원 연산

Reference

초기값의 중요성

왜 convolution 연산에 대해서 합을 할까?

중간 결과 및 filter 이미지 확인하기

Convolution layer를 통과한 데이터

Convolution layer -> ReLU를 통과한 데이터

LeNet-5

Subsampling

'Computer_Science > Visual Intelligence' 카테고리의 다른 글

Block연산과 Window연산

Block

Winow

Correlation, Convolution

전처리 관점

어떤 필터를 적용하면 성능이 높아질까?

Feature cross

Convolution Neural Network

convolution filter의 특징

'Computer_Science > Visual Intelligence' 카테고리의 다른 글

Tensorflow

Loss function

Softmax

학습과정 설정 및 학습(compile & fit)

GridSearch CV

Callback

'Computer_Science > Visual Intelligence' 카테고리의 다른 글

'Computer_Science > Computer Vision Guide' 카테고리의 다른 글

Tensorflow

모델을 만드는 여러가지 표현 방법

1. Sequential

Model

Model과 Sequential을 동시에 활용하는 방법

데이터 시각화로 Balanced data인지 구별하기

MLP

Flatten

MLP를 만드는 다양한 방법

Option1

Option2

Learning

'Computer_Science > Visual Intelligence' 카테고리의 다른 글

'Computer_Science > Computer Vision Guide' 카테고리의 다른 글

Video Object Detection 수행

원본 영상 보기

VideoCapture와 VideoWriter 설정하기

video detection 전용 함수 생성.

'Computer_Science > Computer Vision Guide' 카테고리의 다른 글

OpenCV DNN 패키지를 이용하여 Faster R-CNN 기반의 Object Detection 수행

입력 이미지로 사용될 이미지 다운로드/보기

Tensorflow에서 Pretrained 된 Inference모델(Frozen graph)와 환경파일을 다운로드 받은 후 이를 이용해 OpenCV에서 Inference 모델 생성

dnn에서 readNetFromTensorflow()로 tensorflow inference 모델을 로딩

coco 데이터 세트의 클래스id별 클래스name mapping.

이미지를 preprocessing 수행하여 Network에 입력하고 Object Detection 수행 후 결과를 이미지에 시각화

단일 이미지의 object detection을 함수로 생성

'Computer_Science > Computer Vision Guide' 카테고리의 다른 글

'Computer_Science > Computer Vision Guide' 카테고리의 다른 글

티스토리툴바