'Computer_Science/Computer Vision Guide' 카테고리의 글 목록

728x90

Ultralytics Yolo v3 는 yolo v5로 알려져 있다.

v3에서 v4까지 수많은 테크닉이 등장함.

예측 성능, 예측 시간도 크게 단출

Edgo AI integrated into custom iOS and Android apps for realtime 30FPS video inference

=> mobile에서도 괜찮은 fps 성능을 냄

=> 그동안 yolo의 약점 이 cpu에서 성능이 떨어진다는 점인데 deep spot 을 적용하여 극복

# 특징

견고한 구현 : Enterprise 솔루션 지향

다양한 편의 기능

- 학습 시 loss, weight 등에 대한 시각적인 util 기능 제공

- 편리한 evalutation 결과 도출 및 시각화

저작자표시 비영리 (새창열림)

'Computer_Science > Computer Vision Guide' 카테고리의 다른 글

7-10. GPU를 활용한 Object Detection 모델을 활용한 training 수행 시 유의사항 (0)	2021.10.27
7-8. OpenCV DNN based yolo v3 inference (0)	2021.10.27
7-7. opencv dnn yolo object detection (0)	2021.10.27
7-6~7. YOLO V3 (0)	2021.10.25
7-4~5. YOLO V2 (0)	2021.10.25

728x90

OpenCV Darknet Yolo를 이용하여 이미지Object Detection

yolo와 tiny-yolo 를 이용하여 Object Detection

입력 이미지로 사용될 이미지 다운로드/보기

!mkdir /content/data
!wget -O ./data/beatles01.jpg https://raw.githubusercontent.com/chulminkw/DLCV/master/data/image/beatles01.jpg

Darknet Yolo사이트에서 coco로 학습된 Inference모델와 환경파일을 다운로드 받은 후 이를 이용해 OpenCV에서 Inference 모델 생성

https://pjreddie.com/darknet/yolo/ 에 다운로드 URL 있음.
pretrained 모델은 wget https://pjreddie.com/media/files/yolov3.weights 에서 다운로드
pretrained 모델을 위한 환경 파일은 https://raw.githubusercontent.com/pjreddie/darknet/master/cfg/yolov3.cfg 에서 다운로드
wget https://github.com/pjreddie/darknet/blob/master/cfg/yolov3.cfg?raw=true -O ./yolov3.cfg
readNetFromDarknet(config파일, weight파일)로 config파일 인자가 weight파일 인자보다 먼저 옴. 주의 필요.
tiny yolo의 pretrained된 weight파일은 wget https://pjreddie.com/media/files/yolov3-tiny.weights 에서 download 가능.
config 파일은 wget https://github.com/pjreddie/darknet/blob/master/cfg/yolov3-tiny.cfg?raw=true -O ./yolov3-tiny.cfg 로 다운로드

### coco 데이터 세트로 pretrained 된 yolo weight 파일과 config 파일 다운로드하여 /content/pretrained 디렉토리 아래에 저장. 
!mkdir ./pretrained
!echo "##### downloading pretrained yolo/tiny-yolo weight file and config file"
!wget -O /content/pretrained/yolov3.weights https://pjreddie.com/media/files/yolov3.weights
!wget -O /content/pretrained/yolov3.cfg https://github.com/pjreddie/darknet/blob/master/cfg/yolov3.cfg?raw=true 

!wget -O /content/pretrained/yolov3-tiny.weights https://pjreddie.com/media/files/yolov3-tiny.weights
!wget -O /content/pretrained/yolov3-tiny.cfg https://github.com/pjreddie/darknet/blob/master/cfg/yolov3-tiny.cfg?raw=true

!ls /content/pretrained

readNetFromDarknet(config파일, weight파일)을 이용하여 yolo inference network 모델을 로딩

import os
import cv2

weights_path = '/content/pretrained/yolov3.weights'
config_path =  '/content/pretrained/yolov3.cfg'
#config 파일 인자가 먼저 옴. 
cv_net_yolo = cv2.dnn.readNetFromDarknet(config_path, weights_path)

COCO class id와 class 명 매핑

labels_to_names_seq = {0:'person',1:'bicycle',2:'car',3:'motorbike',4:'aeroplane',5:'bus',6:'train',7:'truck',8:'boat',9:'traffic light',10:'fire hydrant',
                        11:'stop sign',12:'parking meter',13:'bench',14:'bird',15:'cat',16:'dog',17:'horse',18:'sheep',19:'cow',20:'elephant',
                        21:'bear',22:'zebra',23:'giraffe',24:'backpack',25:'umbrella',26:'handbag',27:'tie',28:'suitcase',29:'frisbee',30:'skis',
                        31:'snowboard',32:'sports ball',33:'kite',34:'baseball bat',35:'baseball glove',36:'skateboard',37:'surfboard',38:'tennis racket',39:'bottle',40:'wine glass',
                        41:'cup',42:'fork',43:'knife',44:'spoon',45:'bowl',46:'banana',47:'apple',48:'sandwich',49:'orange',50:'broccoli',
                        51:'carrot',52:'hot dog',53:'pizza',54:'donut',55:'cake',56:'chair',57:'sofa',58:'pottedplant',59:'bed',60:'diningtable',
                        61:'toilet',62:'tvmonitor',63:'laptop',64:'mouse',65:'remote',66:'keyboard',67:'cell phone',68:'microwave',69:'oven',70:'toaster',
                        71:'sink',72:'refrigerator',73:'book',74:'clock',75:'vase',76:'scissors',77:'teddy bear',78:'hair drier',79:'toothbrush' }

3개의 scale Output layer에서 결과 데이터 추출

layer_names = cv_net_yolo.getLayerNames()
print('### yolo v3 layer name:', layer_names)
print('final output layer id:', cv_net_yolo.getUnconnectedOutLayers())
print('final output layer name:', [layer_names[i[0] - 1] for i in cv_net_yolo.getUnconnectedOutLayers()])

### yolo v3 layer name: ['conv_0', 'bn_0', 'relu_0', 'conv_1', 'bn_1', 'relu_1', 'conv_2', 'bn_2', 'relu_2', 'conv_3', 'bn_3', 'relu_3', 'shortcut_4', 'conv_5', 'bn_5', 'relu_5', 'conv_6', 'bn_6', 'relu_6', 'conv_7', 'bn_7', 'relu_7', 'shortcut_8', 'conv_9', 'bn_9', 'relu_9', 'conv_10', 'bn_10', 'relu_10', 'shortcut_11', 'conv_12', 'bn_12', 'relu_12', 'conv_13', 'bn_13', 'relu_13', 'conv_14', 'bn_14', 'relu_14', 'shortcut_15', 'conv_16', 'bn_16', 'relu_16', 'conv_17', 'bn_17', 'relu_17', 'shortcut_18', 'conv_19', 'bn_19', 'relu_19', 'conv_20', 'bn_20', 'relu_20', 'shortcut_21', 'conv_22', 'bn_22', 'relu_22', 'conv_23', 'bn_23', 'relu_23', 'shortcut_24', 'conv_25', 'bn_25', 'relu_25', 'conv_26', 'bn_26', 'relu_26', 'shortcut_27', 'conv_28', 'bn_28', 'relu_28', 'conv_29', 'bn_29', 'relu_29', 'shortcut_30', 'conv_31', 'bn_31', 'relu_31', 'conv_32', 'bn_32', 'relu_32', 'shortcut_33', 'conv_34', 'bn_34', 'relu_34', 'conv_35', 'bn_35', 'relu_35', 'shortcut_36', 'conv_37', 'bn_37', 'relu_37', 'conv_38', 'bn_38', 'relu_38', 'conv_39', 'bn_39', 'relu_39', 'shortcut_40', 'conv_41', 'bn_41', 'relu_41', 'conv_42', 'bn_42', 'relu_42', 'shortcut_43', 'conv_44', 'bn_44', 'relu_44', 'conv_45', 'bn_45', 'relu_45', 'shortcut_46', 'conv_47', 'bn_47', 'relu_47', 'conv_48', 'bn_48', 'relu_48', 'shortcut_49', 'conv_50', 'bn_50', 'relu_50', 'conv_51', 'bn_51', 'relu_51', 'shortcut_52', 'conv_53', 'bn_53', 'relu_53', 'conv_54', 'bn_54', 'relu_54', 'shortcut_55', 'conv_56', 'bn_56', 'relu_56', 'conv_57', 'bn_57', 'relu_57', 'shortcut_58', 'conv_59', 'bn_59', 'relu_59', 'conv_60', 'bn_60', 'relu_60', 'shortcut_61', 'conv_62', 'bn_62', 'relu_62', 'conv_63', 'bn_63', 'relu_63', 'conv_64', 'bn_64', 'relu_64', 'shortcut_65', 'conv_66', 'bn_66', 'relu_66', 'conv_67', 'bn_67', 'relu_67', 'shortcut_68', 'conv_69', 'bn_69', 'relu_69', 'conv_70', 'bn_70', 'relu_70', 'shortcut_71', 'conv_72', 'bn_72', 'relu_72', 'conv_73', 'bn_73', 'relu_73', 'shortcut_74', 'conv_75', 'bn_75', 'relu_75', 'conv_76', 'bn_76', 'relu_76', 'conv_77', 'bn_77', 'relu_77', 'conv_78', 'bn_78', 'relu_78', 'conv_79', 'bn_79', 'relu_79', 'conv_80', 'bn_80', 'relu_80', 'conv_81', 'permute_82', 'yolo_82', 'identity_83', 'conv_84', 'bn_84', 'relu_84', 'upsample_85', 'concat_86', 'conv_87', 'bn_87', 'relu_87', 'conv_88', 'bn_88', 'relu_88', 'conv_89', 'bn_89', 'relu_89', 'conv_90', 'bn_90', 'relu_90', 'conv_91', 'bn_91', 'relu_91', 'conv_92', 'bn_92', 'relu_92', 'conv_93', 'permute_94', 'yolo_94', 'identity_95', 'conv_96', 'bn_96', 'relu_96', 'upsample_97', 'concat_98', 'conv_99', 'bn_99', 'relu_99', 'conv_100', 'bn_100', 'relu_100', 'conv_101', 'bn_101', 'relu_101', 'conv_102', 'bn_102', 'relu_102', 'conv_103', 'bn_103', 'relu_103', 'conv_104', 'bn_104', 'relu_104', 'conv_105', 'permute_106', 'yolo_106']
final output layer id: [[200]
 [227]
 [254]]
final output layer name: ['yolo_82', 'yolo_94', 'yolo_106']

#전체 Darknet layer에서 13x13 grid, 26x26, 52x52 grid에서 detect된 Output layer만 filtering
layer_names = cv_net_yolo.getLayerNames()
outlayer_names = [layer_names[i[0] - 1] for i in cv_net_yolo.getUnconnectedOutLayers()]
print('output_layer name:', outlayer_names)

img = cv2.imread('./data/beatles01.jpg')
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

# 로딩한 모델은 Yolov3 416 x 416 모델임. 원본 이미지 배열을 사이즈 (416, 416)으로, BGR을 RGB로 변환하여 배열 입력
cv_net_yolo.setInput(cv2.dnn.blobFromImage(img, scalefactor=1/255.0, size=(416, 416), swapRB=True, crop=False))

# Object Detection 수행하여 결과를 cvOut으로 반환 
cv_outs = cv_net_yolo.forward(outlayer_names)
print('cv_outs type:', type(cv_outs), 'cv_outs의 내부 원소개수:', len(cv_outs))
print(cv_outs[0].shape, cv_outs[1].shape, cv_outs[2].shape)
print(cv_outs)

output_layer name: ['yolo_82', 'yolo_94', 'yolo_106']
cv_outs type: <class 'list'> cv_outs의 내부 원소개수: 3
(507, 85) (2028, 85) (8112, 85)
[array([[0.03803749, 0.0470234 , 0.3876816 , ..., 0.        , 0.        ,
        0.        ],
       [0.04705836, 0.03385845, 0.2689603 , ..., 0.        , 0.        ,
        0.        ],
       [0.04941482, 0.03791986, 0.7151826 , ..., 0.        , 0.        ,
        0.        ],
       ...,
       [0.9585798 , 0.9460585 , 0.35046625, ..., 0.        , 0.        ,
        0.        ],
       [0.96015006, 0.9630715 , 0.29724196, ..., 0.        , 0.        ,
        0.        ],
       [0.9663636 , 0.9657401 , 0.79356086, ..., 0.        , 0.        ,
        0.        ]], dtype=float32), array([[0.01637367, 0.02457962, 0.04684627, ..., 0.        , 0.        ,
        0.        ],
       [0.01678773, 0.01458679, 0.46203217, ..., 0.        , 0.        ,
        0.        ],
       [0.02219823, 0.01376948, 0.0662718 , ..., 0.        , 0.        ,
        0.        ],
       ...,
       [0.97421783, 0.97686917, 0.04557502, ..., 0.        , 0.        ,
        0.        ],
       [0.98114103, 0.9762939 , 0.33147967, ..., 0.        , 0.        ,
        0.        ],
       [0.97884774, 0.98335934, 0.07896643, ..., 0.        , 0.        ,
        0.        ]], dtype=float32), array([[0.00859342, 0.00442324, 0.01781066, ..., 0.        , 0.        ,
        0.        ],
       [0.010101  , 0.01088366, 0.01980249, ..., 0.        , 0.        ,
        0.        ],
       [0.01071996, 0.00756924, 0.20484295, ..., 0.        , 0.        ,
        0.        ],
       ...,
       [0.9901033 , 0.9906244 , 0.01741469, ..., 0.        , 0.        ,
        0.        ],
       [0.9907341 , 0.9876037 , 0.01802968, ..., 0.        , 0.        ,
        0.        ],
       [0.98756605, 0.99131656, 0.17707303, ..., 0.        , 0.        ,
        0.        ]], dtype=float32)]

3개의 scale output layer에서 Object Detection 정보를 모두 수집.

center와 width,height좌표는 모두 좌상단, 우하단 좌표로 변경.

import numpy as np

# 원본 이미지를 네트웍에 입력시에는 (416, 416)로 resize 함. 
# 이후 결과가 출력되면 resize된 이미지 기반으로 bounding box 위치가 예측 되므로 이를 다시 원복하기 위해 원본 이미지 shape정보 필요
rows = img.shape[0]
cols = img.shape[1]

conf_threshold = 0.5
nms_threshold = 0.4

# bounding box의 테두리와 caption 글자색 지정
green_color=(0, 255, 0)
red_color=(0, 0, 255)

class_ids = []
confidences = []
boxes = []

# 3개의 개별 output layer별로 Detect된 Object들에 대해서 Detection 정보 추출 및 시각화 
for ix, output in enumerate(cv_outs):
    print('output shape:', output.shape)
    # feature map에 있는 anchor 갯수만큼 iteration하면서 Detected 된 Object 추출.(13x13x3, 26x26x3, 52x52x3)
    for jx, detection in enumerate(output):
        # class score는 detetection배열에서 5번째 이후 위치에 있는 값. 
        class_scores = detection[5:]
        # class_scores배열에서 가장 높은 값을 가지는 값이 class confidence, 그리고 그때의 위치 인덱스가 class id
        class_id = np.argmax(class_scores)
        confidence = class_scores[class_id]

        # confidence가 지정된 conf_threshold보다 작은 값은 제외 
        if confidence > conf_threshold:
            print('ix:', ix, 'jx:', jx, 'class_id', class_id, 'confidence:', confidence)
            # detection은 scale된 좌상단, 우하단 좌표를 반환하는 것이 아니라, detection object의 중심좌표와 너비/높이를 반환
            # 원본 이미지에 맞게 scale 적용 및 좌상단, 우하단 좌표 계산
            center_x = int(detection[0] * cols)
            center_y = int(detection[1] * rows)
            width = int(detection[2] * cols)
            height = int(detection[3] * rows)
            left = int(center_x - width / 2)
            top = int(center_y - height / 2)
            # 3개의 개별 output layer별로 Detect된 Object들에 대한 class id, confidence, 좌표정보를 모두 수집
            class_ids.append(class_id)
            confidences.append(float(confidence))
            boxes.append([left, top, width, height])
            


output shape: (507, 85)
ix: 0 jx: 319 class_id 0 confidence: 0.9317017
ix: 0 jx: 328 class_id 0 confidence: 0.96232384
ix: 0 jx: 334 class_id 0 confidence: 0.9984486
ix: 0 jx: 343 class_id 0 confidence: 0.9978433
output shape: (2028, 85)
ix: 1 jx: 831 class_id 2 confidence: 0.8169964
ix: 1 jx: 955 class_id 2 confidence: 0.8472691
ix: 1 jx: 1262 class_id 0 confidence: 0.9877816
ix: 1 jx: 1280 class_id 0 confidence: 0.99840033
ix: 1 jx: 1295 class_id 0 confidence: 0.6916561
ix: 1 jx: 1313 class_id 0 confidence: 0.9205806
output shape: (8112, 85)
ix: 2 jx: 2883 class_id 2 confidence: 0.9077368
ix: 2 jx: 2886 class_id 2 confidence: 0.63324535
ix: 2 jx: 3048 class_id 2 confidence: 0.9412014
ix: 2 jx: 3051 class_id 2 confidence: 0.615405
ix: 2 jx: 3184 class_id 2 confidence: 0.95041
ix: 2 jx: 3214 class_id 2 confidence: 0.9064125
ix: 2 jx: 3373 class_id 2 confidence: 0.68998003
ix: 2 jx: 3394 class_id 0 confidence: 0.76407045

NMS를 이용하여 각 Output layer에서 Detected된 Object의 겹치는 Bounding box를 제외.

conf_threshold = 0.5
nms_threshold = 0.4
idxs = cv2.dnn.NMSBoxes(boxes, confidences, conf_threshold, nms_threshold)

idxs
array([[ 2],
       [ 7],
       [ 3],
       [ 6],
       [14],
       [12],
       [10],
       [15],
       [ 5],
       [ 4],
       [17],
       [16],
       [11],
       [13]], dtype=int32)

idxs.flatten()

array([ 2,  7,  3,  6, 14, 12, 10, 15,  5,  4, 17, 16, 11, 13],
      dtype=int32)

NMS로 최종 filtering된 idxs를 이용하여 boxes, classes, confidences에서 해당하는 Object정보를 추출하고 시각화.

import matplotlib.pyplot as plt

# cv2의 rectangle()은 인자로 들어온 이미지 배열에 직접 사각형을 업데이트 하므로 그림 표현을 위한 별도의 이미지 배열 생성. 
draw_img = img.copy()

# NMS로 최종 filtering된 idxs를 이용하여 boxes, classes, confidences에서 해당하는 Object정보를 추출하고 시각화.
if len(idxs) > 0:
    for i in idxs.flatten():
        box = boxes[i]
        left = box[0]
        top = box[1]
        width = box[2]
        height = box[3]
        # labels_to_names 딕셔너리로 class_id값을 클래스명으로 변경. opencv에서는 class_id + 1로 매핑해야함.
        caption = "{}: {:.4f}".format(labels_to_names_seq[class_ids[i]], confidences[i])
        #cv2.rectangle()은 인자로 들어온 draw_img에 사각형을 그림. 위치 인자는 반드시 정수형.
        cv2.rectangle(draw_img, (int(left), int(top)), (int(left+width), int(top+height)), color=green_color, thickness=2)
        cv2.putText(draw_img, caption, (int(left), int(top - 5)), cv2.FONT_HERSHEY_SIMPLEX, 0.5, red_color, 1)
        print(caption)

img_rgb = cv2.cvtColor(draw_img, cv2.COLOR_BGR2RGB)
plt.figure(figsize=(12, 12))
plt.imshow(img_rgb)
person: 0.9984
person: 0.9984
person: 0.9978
person: 0.9878
car: 0.9504
car: 0.9412
car: 0.9077
car: 0.9064
car: 0.8473
car: 0.8170
person: 0.7641
car: 0.6900
car: 0.6332
car: 0.6154
<matplotlib.image.AxesImage at 0x7f4dd35bdc90>

단일 이미지를 Yolo로 detect하는 get_detected_img() 함수 생성.

def get_detected_img(cv_net, img_array, conf_threshold, nms_threshold, is_print=True):
    
    # 원본 이미지를 네트웍에 입력시에는 (416, 416)로 resize 함. 
    # 이후 결과가 출력되면 resize된 이미지 기반으로 bounding box 위치가 예측 되므로 이를 다시 원복하기 위해 원본 이미지 shape정보 필요
    rows = img_array.shape[0]
    cols = img_array.shape[1]
    
    draw_img = img_array.copy()
    
    #전체 Darknet layer에서 13x13 grid, 26x26, 52x52 grid에서 detect된 Output layer만 filtering
    layer_names = cv_net.getLayerNames()
    outlayer_names = [layer_names[i[0] - 1] for i in cv_net.getUnconnectedOutLayers()]
    
    # 로딩한 모델은 Yolov3 416 x 416 모델임. 원본 이미지 배열을 사이즈 (416, 416)으로, BGR을 RGB로 변환하여 배열 입력
    cv_net.setInput(cv2.dnn.blobFromImage(img_array, scalefactor=1/255.0, size=(416, 416), swapRB=True, crop=False))
    start = time.time()
    # Object Detection 수행하여 결과를 cvOut으로 반환 
    cv_outs = cv_net.forward(outlayer_names)
    layerOutputs = cv_net.forward(outlayer_names)
    # bounding box의 테두리와 caption 글자색 지정
    green_color=(0, 255, 0)
    red_color=(0, 0, 255)

    class_ids = []
    confidences = []
    boxes = []

    # 3개의 개별 output layer별로 Detect된 Object들에 대해서 Detection 정보 추출 및 시각화 
    for ix, output in enumerate(cv_outs):
        # Detected된 Object별 iteration
        for jx, detection in enumerate(output):
            scores = detection[5:]
            class_id = np.argmax(scores)
            confidence = scores[class_id]
            # confidence가 지정된 conf_threshold보다 작은 값은 제외 
            if confidence > conf_threshold:
                #print('ix:', ix, 'jx:', jx, 'class_id', class_id, 'confidence:', confidence)
                # detection은 scale된 좌상단, 우하단 좌표를 반환하는 것이 아니라, detection object의 중심좌표와 너비/높이를 반환
                # 원본 이미지에 맞게 scale 적용 및 좌상단, 우하단 좌표 계산
                center_x = int(detection[0] * cols)
                center_y = int(detection[1] * rows)
                width = int(detection[2] * cols)
                height = int(detection[3] * rows)
                left = int(center_x - width / 2)
                top = int(center_y - height / 2)
                # 3개의 개별 output layer별로 Detect된 Object들에 대한 class id, confidence, 좌표정보를 모두 수집
                class_ids.append(class_id)
                confidences.append(float(confidence))
                boxes.append([left, top, width, height])
    
    # NMS로 최종 filtering된 idxs를 이용하여 boxes, classes, confidences에서 해당하는 Object정보를 추출하고 시각화.
    idxs = cv2.dnn.NMSBoxes(boxes, confidences, conf_threshold, nms_threshold)
    if len(idxs) > 0:
        for i in idxs.flatten():
            box = boxes[i]
            left = box[0]
            top = box[1]
            width = box[2]
            height = box[3]
            # labels_to_names 딕셔너리로 class_id값을 클래스명으로 변경. opencv에서는 class_id + 1로 매핑해야함.
            caption = "{}: {:.4f}".format(labels_to_names_seq[class_ids[i]], confidences[i])
            #cv2.rectangle()은 인자로 들어온 draw_img에 사각형을 그림. 위치 인자는 반드시 정수형.
            cv2.rectangle(draw_img, (int(left), int(top)), (int(left+width), int(top+height)), color=green_color, thickness=2)
            cv2.putText(draw_img, caption, (int(left), int(top - 5)), cv2.FONT_HERSHEY_SIMPLEX, 0.5, red_color, 1)

    if is_print:
        print('Detection 수행시간:',round(time.time() - start, 2),"초")
    return draw_img

import cv2
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
import time
import os

# image 로드 
img = cv2.imread('/content/data/beatles01.jpg')

weights_path = '/content/pretrained/yolov3.weights'
config_path =  '/content/pretrained/yolov3.cfg'

# darknet yolo pretrained 모델 로딩
cv_net_yolo = cv2.dnn.readNetFromDarknet(config_path, weights_path)

conf_threshold = 0.5
nms_threshold = 0.4
# Object Detetion 수행 후 시각화 
draw_img = get_detected_img(cv_net_yolo, img, conf_threshold=conf_threshold, nms_threshold=nms_threshold, is_print=True)

img_rgb = cv2.cvtColor(draw_img, cv2.COLOR_BGR2RGB)

plt.figure(figsize=(12, 12))
plt.imshow(img_rgb)

Detection 수행시간: 4.08 초
<matplotlib.image.AxesImage at 0x7f4dc2adbd50>

tiny Yolo로 Object Detection 수행하기.

import cv2
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
import time
import os

# image 로드 
img = cv2.imread('/content/data/beatles01.jpg')

weights_path = '/content/pretrained/yolov3-tiny.weights'
config_path =  '/content/pretrained/yolov3-tiny.cfg'

# darknet tiny yolo pretrained 모델 로딩
cv_net_yolo_tiny = cv2.dnn.readNetFromDarknet(config_path, weights_path)

conf_threshold = 0.2
nms_threshold = 0.4
# Object Detetion 수행 후 시각화 
draw_img = get_detected_img(cv_net_yolo_tiny, img, conf_threshold=conf_threshold, nms_threshold=nms_threshold, is_print=True)

img_rgb = cv2.cvtColor(draw_img, cv2.COLOR_BGR2RGB)

plt.figure(figsize=(12, 12))
plt.imshow(img_rgb)

Detection 수행시간: 0.48 초
<matplotlib.image.AxesImage at 0x7f4dc2275ad0>

저작자표시 비영리 (새창열림)

'Computer_Science > Computer Vision Guide' 카테고리의 다른 글

8-1. Ultralytics Yolo v3 패키지 개요 (0)	2021.10.27
7-10. GPU를 활용한 Object Detection 모델을 활용한 training 수행 시 유의사항 (0)	2021.10.27
7-7. opencv dnn yolo object detection (0)	2021.10.27
7-6~7. YOLO V3 (0)	2021.10.25
7-4~5. YOLO V2 (0)	2021.10.25

728x90

OpenCVDNN 으로 YOLO inference 구현 시 유의 사항

- opencv yolo inference 코드는 기존 opencv inference 코드와 다름

- 3개의 output feature map 에서 직접 object detection 정보 추출

Pretrained inference model loading 방법

- weight model file 과 config file은 darknet 사이트에서 download 가능

- cv2.dnn.readNetFromDarknet(config file, weight model file)으로 pretrained inference model loading

- readNetFromDarket(config file, weight model file)에서 config file 인자가 weight model file 인자보다 먼저 위치함

82번 layer, 92번 layer, 106번 layer 등

사용자가 직접 3개의 다른 scale별 구성된 output layer에서 object detect 결과를 추출해야함

사용자가 직접, NMS로 최종 결과 필터링 해야 함

Bounding box 정보 추출 시 직접 85개의 구성에서 추출

- coco 데이터 세트로 pretrained model에서 bbox 정보추출하기

* bounding box 정보를 4개 좌표, 1개 object score, 그리고 80개 class score로 구성된 85개의 정보구성

* class id와 class score는 80개 vector에서 가장 높은 값을 가지는 위치 인덱스와 그값임

추출 좌표의 변환

bx = d(tx)+cx

by = d(ty)+cy

bw = pwe^tw

bh = phe^th

* OpenCV yolo 로 추출한 좌표는 detected object의 center와 width, height 값이므로, 좌상단, 우하단 좌표로 변경 필요.

OpenCV yolo inference 구현 절차

저작자표시 비영리 (새창열림)

'Computer_Science > Computer Vision Guide' 카테고리의 다른 글

7-10. GPU를 활용한 Object Detection 모델을 활용한 training 수행 시 유의사항 (0)	2021.10.27
7-8. OpenCV DNN based yolo v3 inference (0)	2021.10.27
7-6~7. YOLO V3 (0)	2021.10.25
7-4~5. YOLO V2 (0)	2021.10.25
7-2~3. yolo v1 (0)	2021.10.24

728x90

yolo v2 이후에 2017. 08. retinaNet이 one-stage detector 중에서 매우 높은 예측 성능을 보이는데 FPN을 차용했기 때문임.

그래서 YOLO V3도 FPN을 차용하면서 성능을 높임. Real time detector로서 자리매김을 한다.

- 원본정도 fm - 절반 fm - 또 절반 fm 으로 구성

- 가장 최상위 '또 절반 fm'은 추상적이지만 학습에 완숙도가 높은, 그래서 object detection을 일반적으로 수행하는 fm이다. 그런데 일반적으로 최상위에서 od를 수행하니 큰 object만 수행을 하게 되더라. 그래서 ssd는 하위 fm에서도 뽑아낸 것임.

- 이런 상태에서 fpn은 conv 연산을 한 상태에서 크기가 다르니깐 2배 upsampling하고나서 하위 fm을 합치고 그 합쳐진 fm에서 predict 수행

- 그러면 추상적 + 상세함의 특징을 반영한상태로 예측가능함.

YOLO Vers 비교

항목	v1	v2	v3
원본 이미지 크기	446x446	416x416	416x416
Feature Extractor	Inception 변형	DarkNet19	DarkNet53 (resNet의영향
Grid당 Anchor Box 수	2개 => 셀 별 prediction 2개 (Anchor box는 고정크기)	5개	Output Feature Map 당 3개 서로다른 크기와 스케일로 총 9개
Anchor Box 결정 방법	-	Kmeans Clustering	Kmeans Clustering
Output Feature Map 크기 (Depth 제외)	7 x 7	13 x 13	13 x 13, 26 x 26, 52 x 52 3개의 Feature Map 사용
Feature Map Scaling 기법	-	-	FPN (Feature Pyramid Network)

- FPN

- backbone 성능 향상 : darknet 53 ( weight 가능한 layer

- 13x13에 매번 2배를 한 fm

- anchor box 9개

- multi labels 예측 : softmax 가 아닌 sigmoid 기반 logistic classifier로 개별 object의 multi labels 예측

Model Architect

upsampling feature map + feature map => predict

Yolo v3 Network 구조

- output : 13x13, 26x26, 52x52

- 연두색 : upsampling feature map

Output Feature map

25+25+25 => depth가 75

13x13x 75

26x26x 75

52x52x 75

Darknet 53 특성 ( 53개의 layer

Training

- Data Augmentation

- batch normalization

저작자표시 비영리 (새창열림)

'Computer_Science > Computer Vision Guide' 카테고리의 다른 글

7-8. OpenCV DNN based yolo v3 inference (0)	2021.10.27
7-7. opencv dnn yolo object detection (0)	2021.10.27
7-4~5. YOLO V2 (0)	2021.10.25
7-2~3. yolo v1 (0)	2021.10.24
7-1. YOLO - You Only Look Once (0)	2021.10.24

728x90

YOLO Vers 비교

항목	v1	v2	v3
원본 이미지 크기	446x446	416x416	416x416
Feature Extractor	Inception 변형	DarkNet19	DarkNet53 (resNet의영향
Grid당 Anchor Box 수	2개 => 셀 별 prediction 2개 (Anchor box는 고정크기)	5개	Output Feature Map 당 3개 서로다른 크기와 스케일로 총 9개
Anchor Box 결정 방법	-	Kmeans Clustering	Kmeans Clustering
Output Feature Map 크기 (Depth 제외)	7 x 7	13 x 13	13 x 13, 26 x 26, 52 x 52 3개의 Feature Map 사용
Feature Map Scaling 기법	-	-	FPN (Feature Pyramid Network)

YOLO V2 detection 시간 및 성능

- 속도면에서 ssd와 함께 압도적인 성능을 보여줌.

- yolo 중에서도 tiny yolo는 더욱 빠름

YOLO V2 특징

- Batch Normalization

* cnn - batch normali(정규화) - acti(relu)

- High Resolution classifier : 네트웍의 classifier 단을 보다 높은 resolution (448 x 448)로 fine tuning

- classification layer 를 fully connected dense layer에서 fully convolution 으로 변경하고 서로 다른 크기의 image들로 네트

- 13 x 13 feature map 기반에서 개별 grid cell 별 5개의 Anchor box에서 object detection

* Anchor box 크기와 ratio는 kmeans clustering으로 설정

- 예측 bbox의 x,y 좌표가 중심 cell 내에서 벗어나지 않도록 direct location prediction 적용

- darknet-19 classification model 채택 => 예측성능, 수행시간 향상

yolo v2 anchor box로 1cell 에서 여러개 object detection

- SSD와 마찬가지로 1개의 CELL에서 여러개의 Anchor를 통해 개별 cell에서 여러개 object detection가능

- kmeans clustering을 통해 데이터 세트의 이미지크기와 shape ratio따른 5개 군집화 분류를 하여 anchor box 계산

Output feature map

- depth 125개, anchor box 가 5개라서 개당 25개

- yolo v1 : 각 cell의 bbox의 class 확률 : 2개 ( bbox 좌표 4개, confidence 1개) (10개) // 20개의 pascal

- yolo v2 : bbox 25 => bbox 좌표 4개, confidence score 1개, class scores 20개 // 5개 묶음

Direct Location Prediction

(pw, ph) : anchor box size

(tx, ty, tw, th) : 모델 예측 offset 값

(bx, by) : 예측 bounding box 중심 좌표와 size

* center 좌표가 cell 중심을 너무 벗어나지 못하도록 0~1 사이의 시그모이드 값(1/1+e^x)으로 조절

- yolo v1 loss와 유사한 loss 식

Passthrough module을 통한 fine grained feature

- 좀더 작은 오브젝트를 detect하기 위해서 26x26x512 feature map 특징을 유지한 채 13x13x2048로 reshape한 뒤 13x13x1024에 추가하여 feature map 생성

=> 1/4로 줄어듬

- merge module로 넣어서 작은 object를 찾기

- SSD는 각각 feature map에서 끄집어내서 합치고 nms으로 필터링

Multi-scale training

- classification layer가 convolution layer로 생성하여 동적으로 입력 이미지 크기 변경 가능

- 학습 시 10회 배치시 마다 입력 이미지 크기를 모델에서 320부터 608까지 동적으로 변경(32배수로 설정)

Darknet 19 backbone

- classification layer에 fully conneted layer를 제거하고 conv layer를 적용

* vgg-16 : 30.69 bflops, top5 accuracy : 90%

-> 3x3이라 간단해서 선호하는 아키텍처

* yolo v1 : 8.52 bflops, top5 accuracy : 88%

* yolo v2 darknet19 : 5.58 bflops, top5 accuracy : 91.2%

성능 향상

저작자표시 비영리 (새창열림)

'Computer_Science > Computer Vision Guide' 카테고리의 다른 글

7-7. opencv dnn yolo object detection (0)	2021.10.27
7-6~7. YOLO V3 (0)	2021.10.25
7-2~3. yolo v1 (0)	2021.10.24
7-1. YOLO - You Only Look Once (0)	2021.10.24
6-7~8. TF hub pretrained model SSD Inference (0)	2021.10.22

728x90

yolo v1

- yolo v1은 입력 이미지를 SxS grid로 나누고 각 Grid의 Cell이 하나의 object에 대한 Detection 수행

- 각 Grid Cell이 2개의 bounding box 후보를 기반으로 object의 bounding box를 예측

Yolo-v1 네트웤 및 prediction 값

- inception net 적용 / 1x1

- backbone이 없다. - 2015년 당시 sota는 vgg를 많이 썼다.

- 2차원 convolution된 3차원 feature map을 dense하게 만든다

- 그걸 reshape해서 7x7하고 detection

=> 7x7x30 > 30의 정보

각 grid cell 별로 아래를 계산

ㄱ. 2개의 bounding box 후보의 좌표와 해당 box별 confidence score

- x, y, w, h : 정규화된 bbox의 중심 좌표와 너비 / 높이

- confidence score = 오브젝트일 확률 * IOU 값

ㄴ. 클래스 확률 : Pascal VOC 기준 20개 클래스의 확률

YOLO V1 LOSS

BBOX중심 X, Y 좌표 LOSS,

- 예측 좌표 x, y 값과 Ground Truth 좌표 x, y값의 오차 제곱을 기반

- 모든 cell의 2개의 bbox(98개 bbox) 중에 예측 bbox를 책임지는 bbox만 loss 계산

- 98개 bbox중 오브젝트 예측을 책임지는 bbox만 1, 나머지는 0

( 책임지는 bbox만 계산하고 아닌애들은 0으로 처리

BBOX 너비 w, 높이 h Loss

- 예측 너비, 높익밧과 Ground Truth 너비, 높이값의 오차 제곱을 기반으로 하되, 크기가 큰 오브젝트의 경우 오류가 상대적으로 커짐을 제약하기 위해서 제곱근을 취함

- 루트를 쓰는건 bbox를 잘못예측 했을 때 많이 잘못했을 때

coord => 가중치를 곱해라, 5 곱

noobj 0.5 곱

Object Confidence Loss => 독특한 loss

- 예측된 object confidence score와 ground Truth의 IOU의 예측 오차를 기반

- Object를 책임지는 bbox confidence loss + object가 없어야 하는 bbox의 confidence loss

Classfication Loss => bbox 계산

- 예측 classification 확률 오차의 제곱. object를 책임지는 bbox만 대상

NMS

One-stage는 대게 많이 예측하고 NMS로 필터링하는 전략

Two-stage는 예측하고 확정하는 전략

개별 class별 NMS 수행

1. 특정 confidence 값 이하는 모두 제거

2. 가장 높은 confidence 값을 가진 순으로 bbox 정렬

3. 가장 높은 confidence를 가진 bbox와 iou와 겹치는 부분이 iou threshold보다 큰 bbox는 모두 제거

4. 남아있는 bbox에 대해 3번 step을 반복

이슈

detection시간은 빠르나 detection 성능이 떨어짐

특히 작은 object에 대한 성능이 나쁨

=> 한셀이 한 object를 담당하기 때문에 2개 object가 들어가면 아예 인식 못함

=> 구조적 문제

저작자표시 비영리 (새창열림)

'Computer_Science > Computer Vision Guide' 카테고리의 다른 글

7-6~7. YOLO V3 (0)	2021.10.25
7-4~5. YOLO V2 (0)	2021.10.25
7-1. YOLO - You Only Look Once (0)	2021.10.24
6-7~8. TF hub pretrained model SSD Inference (0)	2021.10.22
6-6. TensorFlow hub (0)	2021.10.22

728x90

실시간 object detection의 대명사

You Only Look Once => one stage detection

non-FPN

1. yolo v1 || 2015. 05

- 150 FPS

- 빠른 시간, 낮은 정확도

2. SSD || 2015. 12

- 수행성능, 시간 향상

03. yolo v2 || 2016. 12

- SSD와 대등한 수행성능, 시간

- 수행시간, 성능 모두 개선, ssd에 비해 작은 object 성능 저하

FPN

4. retinaNet || 2017. 08

- 속도는 느리지만 성능이 좋음

- yolo v3 보다 작은 object에 성능이 좋음

5. yolo v3 || 2018. 04

- 성능 대폭 개선

6. EfficientDet || 2019. 11

- D0 : yolo v3 보다 조금도 좋음

7. yolo v4 || 2020. 04

- 성능, 시간 모두 개선

Darknet 기반의 yolo

=> c 기반의 deep learning framework

=> cuda 기반 인터페이스

저작자표시 비영리 (새창열림)

'Computer_Science > Computer Vision Guide' 카테고리의 다른 글

7-4~5. YOLO V2 (0)	2021.10.25
7-2~3. yolo v1 (0)	2021.10.24
6-7~8. TF hub pretrained model SSD Inference (0)	2021.10.22
6-6. TensorFlow hub (0)	2021.10.22
6-5. opencv를 이용한 SSD Inference 실습2 (0)	2021.10.22

728x90

sdsd

저작자표시 비영리 (새창열림)

'Computer_Science > Computer Vision Guide' 카테고리의 다른 글

7-2~3. yolo v1 (0)	2021.10.24
7-1. YOLO - You Only Look Once (0)	2021.10.24
6-6. TensorFlow hub (0)	2021.10.22
6-5. opencv를 이용한 SSD Inference 실습2 (0)	2021.10.22
6-4. openCV SSD Inference 1/2 (0)	2021.10.20

728x90

Tensorflow hub는 tensorflow로 pretrained model를 쉽게 다운로드해서 model로 운용하고, fine tuning할 수 있게 함.

git이 코드를 공유하고, 코드파일을 import할 수 있는 것이라면,

tensorflow는 모델을 가져오는 것

tensorflow hub tutorial

https://www.tensorflow.org/hub/tutorials/tf2_object_detection?hl=ko

TensorFlow Hub 객체 감지 Colab

ML 커뮤니티 데이는 11월 9일입니다! TensorFlow, JAX에서 업데이트를 우리와 함께, 더 자세히 알아보기 TensorFlow Hub 객체 감지 Colab TensorFlow Hub 객체 감지 Colab에 오신 것을 환영합니다! 이 노트북에서는

www.tensorflow.org

object detection model

https://tfhub.dev/s?module-type=image-object-detection

TensorFlow Hub

tfhub.dev

# FPN이 있어야 NECK이 있는 것

저작자표시 비영리 (새창열림)

'Computer_Science > Computer Vision Guide' 카테고리의 다른 글

7-1. YOLO - You Only Look Once (0)	2021.10.24
6-7~8. TF hub pretrained model SSD Inference (0)	2021.10.22
6-5. opencv를 이용한 SSD Inference 실습2 (0)	2021.10.22
6-4. openCV SSD Inference 1/2 (0)	2021.10.20
6-3. SSD 네트웤 구조, Multi scale Feature Map, Anchor box (0)	2021.10.19