'Data_Science/Data_Analysis_Py' 카테고리의 글 목록 (3 Page)

Data_Science/Data_Analysis_Py

40. Tensorflow를 통한 논리구현 2021.11.25
39. Tensorflow 구현 2021.11.25
38. 학생 점수 분석 || Kmeans 2021.11.25
37. iris || Kmeans 2021.11.25
36. 강남역 고기집 후기분석 || 감성분석 2021.11.25
35. 강남역 고기집 감성분석 || 감성분석, TF-IDF 2021.11.25
34. 강남역 고기집 후기분석 || 맵크로울링 2021.11.25
33. white wine || GBM 2021.11.24
32. titanic || GBM 2021.11.24
31. titanic || logistic 2021.11.24

40. Tensorflow를 통한 논리구현

2021. 11. 25. 23:49

728x90

텐서플로우를 통한 OR게이트

import tensorflow as tf
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.losses import mse
tf.random.set_seed(777)
# 데이터
data = np.array([[0,0],[1,0],[0,1],[1,1]])
# 라벨링
label = np.array([[0],[1],[1],[1]])
model = Sequential()
model.add(Dense(1, input_shape = (2,), activation = 'linear')) # 퍼셉트론
# 모델설정 
# GD 경사하강법, SGD stochastic :minibatch
# loss 손실함수, 비용함수
model.compile(optimizer = SGD(), loss = mse, metrics = ['acc'])
# epochs 100으로 하면 학습하다가 관둬서, 200으로 함
model.fit(data, label, epochs = 200)
# 그 값은 이거야
model.get_weights()
# 머신러닝은 값 하나하나 매겨주는데 딥러닝은 라벨링하고 주면 알아서 찾아감
model.predict(data)
model.evaluate(data, label) # 평가, 손실함수, 정확도

Epoch 1/200
1/1 [==============================] - 1s 1s/step - loss: 1.4290 - acc: 0.5000
Epoch 2/200
1/1 [==============================] - 0s 3ms/step - loss: 1.3602 - acc: 0.5000
Epoch 3/200
1/1 [==============================] - 0s 2ms/step - loss: 1.2956 - acc: 0.5000
Epoch 4/200
1/1 [==============================] - 0s 2ms/step - loss: 1.2349 - acc: 0.5000
Epoch 5/200
1/1 [==============================] - 0s 2ms/step - loss: 1.1779 - acc: 0.5000
Epoch 6/200
1/1 [==============================] - 0s 3ms/step - loss: 1.1242 - acc: 0.5000
Epoch 7/200
1/1 [==============================] - 0s 2ms/step - loss: 1.0738 - acc: 0.5000
Epoch 8/200
1/1 [==============================] - 0s 3ms/step - loss: 1.0264 - acc: 0.5000
Epoch 9/200
1/1 [==============================] - 0s 3ms/step - loss: 0.9819 - acc: 0.5000
Epoch 10/200
1/1 [==============================] - 0s 2ms/step - loss: 0.9399 - acc: 0.5000
Epoch 11/200
1/1 [==============================] - 0s 3ms/step - loss: 0.9005 - acc: 0.5000
Epoch 12/200
1/1 [==============================] - 0s 2ms/step - loss: 0.8634 - acc: 0.5000
Epoch 13/200
1/1 [==============================] - 0s 2ms/step - loss: 0.8284 - acc: 0.5000
Epoch 14/200
1/1 [==============================] - 0s 3ms/step - loss: 0.7955 - acc: 0.5000
Epoch 15/200
1/1 [==============================] - 0s 2ms/step - loss: 0.7646 - acc: 0.5000
Epoch 16/200
1/1 [==============================] - 0s 2ms/step - loss: 0.7354 - acc: 0.5000
Epoch 17/200
1/1 [==============================] - 0s 1000us/step - loss: 0.7079 - acc: 0.5000
Epoch 18/200
1/1 [==============================] - 0s 2ms/step - loss: 0.6820 - acc: 0.5000
Epoch 19/200
1/1 [==============================] - 0s 2ms/step - loss: 0.6576 - acc: 0.5000
Epoch 20/200
1/1 [==============================] - 0s 3ms/step - loss: 0.6346 - acc: 0.5000
Epoch 21/200
1/1 [==============================] - 0s 2ms/step - loss: 0.6129 - acc: 0.5000
Epoch 22/200
1/1 [==============================] - 0s 2ms/step - loss: 0.5925 - acc: 0.5000
Epoch 23/200
1/1 [==============================] - 0s 2ms/step - loss: 0.5732 - acc: 0.5000
Epoch 24/200
1/1 [==============================] - 0s 2ms/step - loss: 0.5549 - acc: 0.5000
Epoch 25/200
1/1 [==============================] - 0s 2ms/step - loss: 0.5377 - acc: 0.5000
Epoch 26/200
1/1 [==============================] - 0s 2ms/step - loss: 0.5215 - acc: 0.5000
Epoch 27/200
1/1 [==============================] - 0s 2ms/step - loss: 0.5061 - acc: 0.5000
Epoch 28/200
1/1 [==============================] - 0s 3ms/step - loss: 0.4916 - acc: 0.5000
Epoch 29/200
1/1 [==============================] - 0s 2ms/step - loss: 0.4778 - acc: 0.5000
Epoch 30/200
1/1 [==============================] - 0s 3ms/step - loss: 0.4648 - acc: 0.5000
Epoch 31/200
1/1 [==============================] - 0s 2ms/step - loss: 0.4525 - acc: 0.7500
Epoch 32/200
1/1 [==============================] - 0s 2ms/step - loss: 0.4409 - acc: 0.7500
Epoch 33/200
1/1 [==============================] - 0s 2ms/step - loss: 0.4298 - acc: 0.7500
Epoch 34/200
1/1 [==============================] - 0s 2ms/step - loss: 0.4193 - acc: 0.7500
Epoch 35/200
1/1 [==============================] - 0s 2ms/step - loss: 0.4094 - acc: 0.7500
Epoch 36/200
1/1 [==============================] - 0s 3ms/step - loss: 0.4000 - acc: 0.7500
Epoch 37/200
1/1 [==============================] - 0s 2ms/step - loss: 0.3911 - acc: 0.7500
Epoch 38/200
1/1 [==============================] - 0s 3ms/step - loss: 0.3826 - acc: 0.7500
Epoch 39/200
1/1 [==============================] - 0s 2ms/step - loss: 0.3745 - acc: 0.7500
Epoch 40/200
1/1 [==============================] - 0s 2ms/step - loss: 0.3668 - acc: 0.7500
Epoch 41/200
1/1 [==============================] - 0s 2ms/step - loss: 0.3595 - acc: 0.7500
Epoch 42/200
1/1 [==============================] - 0s 2ms/step - loss: 0.3525 - acc: 0.7500
Epoch 43/200
1/1 [==============================] - 0s 3ms/step - loss: 0.3459 - acc: 0.7500
Epoch 44/200
1/1 [==============================] - 0s 2ms/step - loss: 0.3396 - acc: 0.7500
Epoch 45/200
1/1 [==============================] - 0s 3ms/step - loss: 0.3336 - acc: 0.7500
Epoch 46/200
1/1 [==============================] - 0s 2ms/step - loss: 0.3278 - acc: 0.7500
Epoch 47/200
1/1 [==============================] - 0s 2ms/step - loss: 0.3223 - acc: 0.7500
Epoch 48/200
1/1 [==============================] - 0s 2ms/step - loss: 0.3170 - acc: 0.7500
Epoch 49/200
1/1 [==============================] - 0s 2ms/step - loss: 0.3120 - acc: 0.7500
Epoch 50/200
1/1 [==============================] - 0s 2ms/step - loss: 0.3072 - acc: 0.7500
Epoch 51/200
1/1 [==============================] - 0s 2ms/step - loss: 0.3026 - acc: 0.7500
Epoch 52/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2982 - acc: 0.7500
Epoch 53/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2939 - acc: 0.7500
Epoch 54/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2898 - acc: 0.7500
Epoch 55/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2859 - acc: 0.7500
Epoch 56/200
1/1 [==============================] - 0s 1ms/step - loss: 0.2822 - acc: 0.7500
Epoch 57/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2785 - acc: 0.7500
Epoch 58/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2750 - acc: 0.7500
Epoch 59/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2717 - acc: 0.7500
Epoch 60/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2684 - acc: 0.7500
Epoch 61/200
1/1 [==============================] - 0s 3ms/step - loss: 0.2653 - acc: 0.7500
Epoch 62/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2623 - acc: 0.7500
Epoch 63/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2594 - acc: 0.7500
Epoch 64/200
1/1 [==============================] - 0s 1ms/step - loss: 0.2565 - acc: 0.7500
Epoch 65/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2538 - acc: 0.7500
Epoch 66/200
1/1 [==============================] - 0s 3ms/step - loss: 0.2511 - acc: 0.7500
Epoch 67/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2486 - acc: 0.7500
Epoch 68/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2461 - acc: 0.7500
Epoch 69/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2436 - acc: 0.7500
Epoch 70/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2413 - acc: 0.7500
Epoch 71/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2390 - acc: 0.7500
Epoch 72/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2368 - acc: 0.7500
Epoch 73/200
1/1 [==============================] - 0s 3ms/step - loss: 0.2346 - acc: 0.7500
Epoch 74/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2325 - acc: 0.7500
Epoch 75/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2304 - acc: 0.7500
Epoch 76/200
1/1 [==============================] - 0s 3ms/step - loss: 0.2284 - acc: 0.7500
Epoch 77/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2264 - acc: 0.7500
Epoch 78/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2245 - acc: 0.7500
Epoch 79/200
1/1 [==============================] - 0s 1ms/step - loss: 0.2226 - acc: 0.7500
Epoch 80/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2208 - acc: 0.7500
Epoch 81/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2190 - acc: 0.7500
Epoch 82/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2173 - acc: 0.7500
Epoch 83/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2155 - acc: 0.7500
Epoch 84/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2139 - acc: 0.7500
Epoch 85/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2122 - acc: 0.7500
Epoch 86/200
1/1 [==============================] - 0s 3ms/step - loss: 0.2106 - acc: 0.7500
Epoch 87/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2090 - acc: 0.7500
Epoch 88/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2074 - acc: 0.7500
Epoch 89/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2059 - acc: 0.7500
Epoch 90/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2044 - acc: 0.7500
Epoch 91/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2029 - acc: 0.7500
Epoch 92/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2015 - acc: 0.7500
Epoch 93/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2000 - acc: 0.7500
Epoch 94/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1986 - acc: 0.7500
Epoch 95/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1973 - acc: 0.7500
Epoch 96/200
1/1 [==============================] - 0s 3ms/step - loss: 0.1959 - acc: 0.7500
Epoch 97/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1946 - acc: 0.7500
Epoch 98/200
1/1 [==============================] - 0s 1ms/step - loss: 0.1932 - acc: 0.7500
Epoch 99/200
1/1 [==============================] - 0s 1ms/step - loss: 0.1919 - acc: 0.7500
Epoch 100/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1907 - acc: 0.7500
Epoch 101/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1894 - acc: 0.7500
Epoch 102/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1881 - acc: 0.7500
Epoch 103/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1869 - acc: 0.7500
Epoch 104/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1857 - acc: 0.7500
Epoch 105/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1845 - acc: 0.7500
Epoch 106/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1833 - acc: 0.7500
Epoch 107/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1822 - acc: 0.7500
Epoch 108/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1810 - acc: 0.7500
Epoch 109/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1799 - acc: 0.7500
Epoch 110/200
1/1 [==============================] - 0s 3ms/step - loss: 0.1787 - acc: 0.7500
Epoch 111/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1776 - acc: 0.7500
Epoch 112/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1765 - acc: 0.7500
Epoch 113/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1755 - acc: 0.7500
Epoch 114/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1744 - acc: 0.7500
Epoch 115/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1733 - acc: 0.7500
Epoch 116/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1723 - acc: 0.7500
Epoch 117/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1712 - acc: 0.7500
Epoch 118/200
1/1 [==============================] - 0s 1ms/step - loss: 0.1702 - acc: 0.7500
Epoch 119/200
1/1 [==============================] - 0s 1ms/step - loss: 0.1692 - acc: 0.7500
Epoch 120/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1682 - acc: 0.7500
Epoch 121/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1672 - acc: 0.7500
Epoch 122/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1663 - acc: 0.7500
Epoch 123/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1653 - acc: 0.7500
Epoch 124/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1643 - acc: 0.7500
Epoch 125/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1634 - acc: 0.7500
Epoch 126/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1625 - acc: 0.7500
Epoch 127/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1615 - acc: 0.7500
Epoch 128/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1606 - acc: 0.7500
Epoch 129/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1597 - acc: 0.7500
Epoch 130/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1588 - acc: 0.7500
Epoch 131/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1579 - acc: 0.7500
Epoch 132/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1570 - acc: 0.7500
Epoch 133/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1562 - acc: 0.7500
Epoch 134/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1553 - acc: 0.7500
Epoch 135/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1545 - acc: 0.7500
Epoch 136/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1536 - acc: 0.7500
Epoch 137/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1528 - acc: 0.7500
Epoch 138/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1520 - acc: 0.7500
Epoch 139/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1511 - acc: 0.7500
Epoch 140/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1503 - acc: 0.7500
Epoch 141/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1495 - acc: 0.7500
Epoch 142/200
1/1 [==============================] - 0s 3ms/step - loss: 0.1487 - acc: 0.7500
Epoch 143/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1480 - acc: 0.7500
Epoch 144/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1472 - acc: 0.7500
Epoch 145/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1464 - acc: 0.7500
Epoch 146/200
1/1 [==============================] - 0s 3ms/step - loss: 0.1456 - acc: 0.7500
Epoch 147/200
1/1 [==============================] - 0s 3ms/step - loss: 0.1449 - acc: 0.7500
Epoch 148/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1441 - acc: 0.7500
Epoch 149/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1434 - acc: 0.7500
Epoch 150/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1427 - acc: 0.7500
Epoch 151/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1419 - acc: 0.7500
Epoch 152/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1412 - acc: 0.7500
Epoch 153/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1405 - acc: 0.7500
Epoch 154/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1398 - acc: 0.7500
Epoch 155/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1391 - acc: 0.7500
Epoch 156/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1384 - acc: 0.7500
Epoch 157/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1377 - acc: 0.7500
Epoch 158/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1370 - acc: 0.7500
Epoch 159/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1364 - acc: 0.7500
Epoch 160/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1357 - acc: 0.7500
Epoch 161/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1350 - acc: 0.7500
Epoch 162/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1344 - acc: 0.7500
Epoch 163/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1337 - acc: 0.7500
Epoch 164/200
1/1 [==============================] - 0s 1ms/step - loss: 0.1331 - acc: 0.7500
Epoch 165/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1325 - acc: 0.7500
Epoch 166/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1318 - acc: 0.7500
Epoch 167/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1312 - acc: 0.7500
Epoch 168/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1306 - acc: 0.7500
Epoch 169/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1300 - acc: 0.7500
Epoch 170/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1294 - acc: 0.7500
Epoch 171/200
1/1 [==============================] - 0s 3ms/step - loss: 0.1288 - acc: 0.7500
Epoch 172/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1282 - acc: 0.7500
Epoch 173/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1276 - acc: 0.7500
Epoch 174/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1270 - acc: 0.7500
Epoch 175/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1264 - acc: 0.7500
Epoch 176/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1258 - acc: 0.7500
Epoch 177/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1253 - acc: 0.7500
Epoch 178/200
1/1 [==============================] - 0s 3ms/step - loss: 0.1247 - acc: 0.7500
Epoch 179/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1242 - acc: 0.7500
Epoch 180/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1236 - acc: 0.7500
Epoch 181/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1231 - acc: 0.7500
Epoch 182/200
1/1 [==============================] - 0s 3ms/step - loss: 0.1225 - acc: 0.7500
Epoch 183/200
1/1 [==============================] - 0s 3ms/step - loss: 0.1220 - acc: 0.7500
Epoch 184/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1214 - acc: 0.7500
Epoch 185/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1209 - acc: 0.7500
Epoch 186/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1204 - acc: 0.7500
Epoch 187/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1199 - acc: 0.7500
Epoch 188/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1194 - acc: 0.7500
Epoch 189/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1189 - acc: 0.7500
Epoch 190/200
1/1 [==============================] - 0s 1ms/step - loss: 0.1184 - acc: 0.7500
Epoch 191/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1179 - acc: 0.7500
Epoch 192/200
1/1 [==============================] - 0s 3ms/step - loss: 0.1174 - acc: 1.0000
Epoch 193/200
1/1 [==============================] - 0s 1ms/step - loss: 0.1169 - acc: 1.0000
Epoch 194/200
1/1 [==============================] - 0s 1ms/step - loss: 0.1164 - acc: 1.0000
Epoch 195/200
1/1 [==============================] - 0s 1ms/step - loss: 0.1159 - acc: 1.0000
Epoch 196/200
1/1 [==============================] - 0s 1ms/step - loss: 0.1154 - acc: 1.0000
Epoch 197/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1150 - acc: 1.0000
Epoch 198/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1145 - acc: 1.0000
Epoch 199/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1140 - acc: 1.0000
Epoch 200/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1136 - acc: 1.0000
1/1 [==============================] - 0s 78ms/step - loss: 0.1131 - acc: 1.0000


[0.11312820017337799, 1.0]

텐서플로우를 통한 AND게이트

import tensorflow as tf
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.losses import mse
tf.random.set_seed(777)
# 데이터
data = np.array([[0,0],[1,0],[0,1],[1,1]])
# 라벨링
label = np.array([[0],[0],[0],[1]]) # AND 구현
model = Sequential()

model.add(Dense(1, input_shape = (2,), activation = 'linear')) # 퍼셉트론
model.compile(optimizer = SGD(), loss = mse, metrics = ['acc'])
model.fit(data, label, epochs = 2000, verbose = 0)
model.get_weights()
model.predict(data).flatten()
model.evaluate(data, label) # 평가, 손실함수, 정확도

1/1 [==============================] - 0s 53ms/step - loss: 0.0625 - acc: 1.0000
[0.06250044703483582, 1.0]

from tensorflow import keras
import numpy
x = numpy.array([0,1,2,3,4])
y = x*2+1 # [1,3,5,7,9]
model = keras.models.Sequential()
# 1개층 만들기
model.add(keras.layers.Dense(1, input_shape=(1,)))
# 로스함수 : mse, 활성화함수는 default linear
model.compile('SGD','mse')
# verbose = 0 학습내용 안보여줌
model.fit(x[:2], y[:2], epochs=1000, verbose=0)
model.get_weights()
# [array([[1.9739313]], dtype=float32), 가중치 : array([1.0161117], dtype=float32)]
# 0, 1 중 어디에 더가까운가?

# 이후값
model.predict(x[2:])

array([[5.000806],
       [7.001389],
       [9.001972]], dtype=float32)

model.predict([5])

# array([[11.002556]], dtype=float32)

# 백터 내적
from tensorflow import keras
import numpy
# 난수발생기, 10행 5열 실수형 난수
#              난수의 갯수분포가 균등하게 생성
x = tf.random.uniform((10,5)) # 세로랑
w = tf.random.uniform((5,3)) # 가로랑 곱
# 행렬 곱
d = tf.matmul(x, w)
print(d.shape)

# (10, 3)

x

<tf.Tensor: shape=(10, 5), dtype=float32, numpy=
array([[0.1357429 , 0.07509017, 0.2639438 , 0.47604764, 0.39591897],
       [0.14548802, 0.17393434, 0.00936472, 0.8090905 , 0.617025  ],
       [0.8713819 , 0.558359  , 0.17226672, 0.50340676, 0.18701088],
       [0.9073597 , 0.717615  , 0.38108468, 0.8958354 , 0.59624827],
       [0.77847326, 0.4488796 , 0.14225698, 0.8686327 , 0.03972971],
       [0.3629743 , 0.55276537, 0.3255931 , 0.5238236 , 0.05080891],
       [0.01347697, 0.3558432 , 0.77311885, 0.48737752, 0.5625943 ],
       [0.02250803, 0.8551339 , 0.36489332, 0.5632981 , 0.09144831],
       [0.25097954, 0.5333061 , 0.426386  , 0.19805324, 0.28281295],
       [0.99601805, 0.4646746 , 0.0783782 , 0.66289246, 0.17973018]],
      dtype=float32)>

# LINEAR로 구현이 안됨
# 텐서플로우를 통한 XOR게이트
import tensorflow as tf
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.losses import mse
tf.random.set_seed(777)
# 데이터
data = np.array([[0,0],[1,0],[0,1],[1,1]])
# 라벨링
label = np.array([[0],[1],[1],[0]])
model = Sequential()

model.add(Dense(1, input_shape = (2,), activation = 'linear')) # 퍼셉트론
model.compile(optimizer = SGD(), loss = mse, metrics = ['acc'])
model.fit(data, label, epochs = 2000, verbose = 0)
model.get_weights()
model.predict(data).flatten()
model.evaluate(data, label) # 평가, 손실함수, 정확도
# 1/1 [==============================] - 0s 52ms/step - loss: 0.2500 - acc: 0.5000
# 손실도 0.25, 정확도 0.5 => LINEAR로 는 실패, 

1/1 [==============================] - 0s 52ms/step - loss: 0.2500 - acc: 0.5000
[0.25, 0.5]

텐서플로우를 통한 XOR게이트

import tensorflow as tf
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import RMSprop, SGD
from tensorflow.keras.losses import mse
tf.random.set_seed(777)
# 데이터
data = np.array([[0,0],[1,0],[0,1],[1,1]])
# 라벨링
label = np.array([[0],[1],[1],[0]])
model = Sequential()

# 2개층
# 32개층, 활성화함수 relu
# relu 음수는 0, 양수는 그대로
model.add(Dense(32, input_shape = (2,), activation = 'relu')) # 퍼셉트론
# sigmoid는 0 ~ 1.0 이상의 값리턴
model.add(Dense(1, activation = 'sigmoid')) # 퍼셉트론

# optimizer 최적 위치 찾아가는 방식
# RMSprop : adagrad 알고리즘 보완 : 이전값을 참조해서, 해당 보폭을 찾음
# adagrad : 처음 첩근시 큰보폭, 가본곳은 작은 보폭 =
#            많은 변동시에 학습률이 감소될 수 있음


# model.compile(optimizer = RMSprop(), loss = mse, metrics = ['acc'])
model.compile(optimizer = SGD(), loss = mse, metrics = ['acc'])
model.fit(data, label, epochs = 100)
model.get_weights()
predict = model.predict(data).flatten()
model.evaluate(data, label) # 평가, 손실함수, 정확도
print(predict)
# 1/1 [==============================] - 0s 56ms/step - loss: 0.2106 - acc: 1.0000
# [0.48657197 0.54643464 0.55219495 0.44657207]

Epoch 1/100
1/1 [==============================] - 0s 163ms/step - loss: 0.2646 - acc: 0.5000
Epoch 2/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2644 - acc: 0.2500
Epoch 3/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2643 - acc: 0.2500
Epoch 4/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2642 - acc: 0.2500
Epoch 5/100
1/1 [==============================] - 0s 4ms/step - loss: 0.2640 - acc: 0.2500
Epoch 6/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2639 - acc: 0.2500
Epoch 7/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2638 - acc: 0.2500
Epoch 8/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2637 - acc: 0.2500
Epoch 9/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2635 - acc: 0.2500
Epoch 10/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2634 - acc: 0.2500
Epoch 11/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2633 - acc: 0.2500
Epoch 12/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2632 - acc: 0.2500
Epoch 13/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2630 - acc: 0.2500
Epoch 14/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2629 - acc: 0.2500
Epoch 15/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2628 - acc: 0.2500
Epoch 16/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2627 - acc: 0.2500
Epoch 17/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2625 - acc: 0.2500
Epoch 18/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2624 - acc: 0.2500
Epoch 19/100
1/1 [==============================] - 0s 4ms/step - loss: 0.2623 - acc: 0.2500
Epoch 20/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2622 - acc: 0.2500
Epoch 21/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2620 - acc: 0.2500
Epoch 22/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2619 - acc: 0.2500
Epoch 23/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2618 - acc: 0.2500
Epoch 24/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2617 - acc: 0.2500
Epoch 25/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2616 - acc: 0.2500
Epoch 26/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2614 - acc: 0.2500
Epoch 27/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2613 - acc: 0.2500
Epoch 28/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2612 - acc: 0.2500
Epoch 29/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2611 - acc: 0.2500
Epoch 30/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2610 - acc: 0.2500
Epoch 31/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2608 - acc: 0.2500
Epoch 32/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2607 - acc: 0.2500
Epoch 33/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2606 - acc: 0.2500
Epoch 34/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2605 - acc: 0.2500
Epoch 35/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2604 - acc: 0.2500
Epoch 36/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2602 - acc: 0.2500
Epoch 37/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2601 - acc: 0.2500
Epoch 38/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2600 - acc: 0.2500
Epoch 39/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2599 - acc: 0.2500
Epoch 40/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2598 - acc: 0.2500
Epoch 41/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2597 - acc: 0.2500
Epoch 42/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2596 - acc: 0.2500
Epoch 43/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2594 - acc: 0.2500
Epoch 44/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2593 - acc: 0.2500
Epoch 45/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2592 - acc: 0.2500
Epoch 46/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2591 - acc: 0.2500
Epoch 47/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2590 - acc: 0.2500
Epoch 48/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2589 - acc: 0.2500
Epoch 49/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2588 - acc: 0.2500
Epoch 50/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2586 - acc: 0.2500
Epoch 51/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2585 - acc: 0.2500
Epoch 52/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2584 - acc: 0.2500
Epoch 53/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2583 - acc: 0.2500
Epoch 54/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2582 - acc: 0.2500
Epoch 55/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2581 - acc: 0.2500
Epoch 56/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2580 - acc: 0.2500
Epoch 57/100
1/1 [==============================] - 0s 5ms/step - loss: 0.2579 - acc: 0.2500
Epoch 58/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2578 - acc: 0.2500
Epoch 59/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2576 - acc: 0.2500
Epoch 60/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2575 - acc: 0.2500
Epoch 61/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2574 - acc: 0.2500
Epoch 62/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2573 - acc: 0.2500
Epoch 63/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2572 - acc: 0.2500
Epoch 64/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2571 - acc: 0.2500
Epoch 65/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2570 - acc: 0.2500
Epoch 66/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2569 - acc: 0.2500
Epoch 67/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2568 - acc: 0.2500
Epoch 68/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2567 - acc: 0.2500
Epoch 69/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2566 - acc: 0.2500
Epoch 70/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2565 - acc: 0.2500
Epoch 71/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2563 - acc: 0.2500
Epoch 72/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2562 - acc: 0.2500
Epoch 73/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2561 - acc: 0.2500
Epoch 74/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2560 - acc: 0.2500
Epoch 75/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2559 - acc: 0.2500
Epoch 76/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2558 - acc: 0.2500
Epoch 77/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2557 - acc: 0.2500
Epoch 78/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2556 - acc: 0.2500
Epoch 79/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2555 - acc: 0.2500
Epoch 80/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2554 - acc: 0.2500
Epoch 81/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2553 - acc: 0.2500
Epoch 82/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2552 - acc: 0.2500
Epoch 83/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2551 - acc: 0.2500
Epoch 84/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2550 - acc: 0.2500
Epoch 85/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2549 - acc: 0.2500
Epoch 86/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2548 - acc: 0.2500
Epoch 87/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2547 - acc: 0.2500
Epoch 88/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2546 - acc: 0.2500
Epoch 89/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2545 - acc: 0.2500
Epoch 90/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2544 - acc: 0.2500
Epoch 91/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2543 - acc: 0.2500
Epoch 92/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2542 - acc: 0.2500
Epoch 93/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2541 - acc: 0.2500
Epoch 94/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2540 - acc: 0.2500
Epoch 95/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2539 - acc: 0.2500
Epoch 96/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2538 - acc: 0.2500
Epoch 97/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2537 - acc: 0.2500
Epoch 98/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2536 - acc: 0.2500
Epoch 99/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2535 - acc: 0.2500
Epoch 100/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2534 - acc: 0.2500
WARNING:tensorflow:8 out of the last 9 calls to <function Model.make_predict_function.<locals>.predict_function at 0x0000026102109700> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.
WARNING:tensorflow:7 out of the last 7 calls to <function Model.make_test_function.<locals>.test_function at 0x0000026102109B80> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.
1/1 [==============================] - 0s 56ms/step - loss: 0.2533 - acc: 0.2500
[0.50530165 0.44862053 0.49225846 0.442587  ]

import tensorflow as tf
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import RMSprop, SGD
from tensorflow.keras.losses import mse
tf.random.set_seed(777)
# 데이터
data = np.array([[0,0],[1,0],[0,1],[1,1]])
# 라벨링
label = np.array([[0],[1],[1],[0]])
model = Sequential()

# 2개층
# 32개층, 활성화함수 relu
# Dense 층쌓기
# relu 음수는 0, 양수는 그대로
model.add(Dense(32, input_shape = (2,), activation = 'relu')) # 퍼셉트론
# sigmoid는 0 ~ 1.0 이상의 값리턴
model.add(Dense(1, activation = 'sigmoid')) # 퍼셉트론

# optimizer 최적 위치 찾아가는 방식
# RMSprop : adagrad 알고리즘 보완 : 이전값을 참조해서, 해당 보폭을 찾음
# adagrad : 처음 첩근시 큰보폭, 가본곳은 작은 보폭 =
#            많은 변동시에 학습률이 감소될 수 있음


# model.compile(optimizer = RMSprop(), loss = mse, metrics = ['acc'])
model.compile(optimizer = SGD(), loss = mse, metrics = ['acc'])
model.fit(data, label, epochs = 100)
model.get_weights()
predict = model.predict(data).flatten()
model.evaluate(data, label) # 평가, 손실함수, 정확도
print(predict)
# 1/1 [==============================] - 0s 249ms/step - loss: 0.2533 - acc: 0.2500
# [0.50530165 0.44862053 0.49225846 0.442587  ]

Epoch 1/100
1/1 [==============================] - 0s 161ms/step - loss: 0.2646 - acc: 0.5000
Epoch 2/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2644 - acc: 0.2500
Epoch 3/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2643 - acc: 0.2500
Epoch 4/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2642 - acc: 0.2500
Epoch 5/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2640 - acc: 0.2500
Epoch 6/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2639 - acc: 0.2500
Epoch 7/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2638 - acc: 0.2500
Epoch 8/100
1/1 [==============================] - 0s 4ms/step - loss: 0.2637 - acc: 0.2500
Epoch 9/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2635 - acc: 0.2500
Epoch 10/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2634 - acc: 0.2500
Epoch 11/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2633 - acc: 0.2500
Epoch 12/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2632 - acc: 0.2500
Epoch 13/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2630 - acc: 0.2500
Epoch 14/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2629 - acc: 0.2500
Epoch 15/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2628 - acc: 0.2500
Epoch 16/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2627 - acc: 0.2500
Epoch 17/100
1/1 [==============================] - 0s 4ms/step - loss: 0.2625 - acc: 0.2500
Epoch 18/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2624 - acc: 0.2500
Epoch 19/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2623 - acc: 0.2500
Epoch 20/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2622 - acc: 0.2500
Epoch 21/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2620 - acc: 0.2500
Epoch 22/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2619 - acc: 0.2500
Epoch 23/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2618 - acc: 0.2500
Epoch 24/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2617 - acc: 0.2500
Epoch 25/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2616 - acc: 0.2500
Epoch 26/100
1/1 [==============================] - 0s 4ms/step - loss: 0.2614 - acc: 0.2500
Epoch 27/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2613 - acc: 0.2500
Epoch 28/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2612 - acc: 0.2500
Epoch 29/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2611 - acc: 0.2500
Epoch 30/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2610 - acc: 0.2500
Epoch 31/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2608 - acc: 0.2500
Epoch 32/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2607 - acc: 0.2500
Epoch 33/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2606 - acc: 0.2500
Epoch 34/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2605 - acc: 0.2500
Epoch 35/100
1/1 [==============================] - 0s 8ms/step - loss: 0.2604 - acc: 0.2500
Epoch 36/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2602 - acc: 0.2500
Epoch 37/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2601 - acc: 0.2500
Epoch 38/100
1/1 [==============================] - 0s 4ms/step - loss: 0.2600 - acc: 0.2500
Epoch 39/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2599 - acc: 0.2500
Epoch 40/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2598 - acc: 0.2500
Epoch 41/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2597 - acc: 0.2500
Epoch 42/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2596 - acc: 0.2500
Epoch 43/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2594 - acc: 0.2500
Epoch 44/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2593 - acc: 0.2500
Epoch 45/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2592 - acc: 0.2500
Epoch 46/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2591 - acc: 0.2500
Epoch 47/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2590 - acc: 0.2500
Epoch 48/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2589 - acc: 0.2500
Epoch 49/100
1/1 [==============================] - 0s 4ms/step - loss: 0.2588 - acc: 0.2500
Epoch 50/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2586 - acc: 0.2500
Epoch 51/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2585 - acc: 0.2500
Epoch 52/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2584 - acc: 0.2500
Epoch 53/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2583 - acc: 0.2500
Epoch 54/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2582 - acc: 0.2500
Epoch 55/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2581 - acc: 0.2500
Epoch 56/100
1/1 [==============================] - 0s 4ms/step - loss: 0.2580 - acc: 0.2500
Epoch 57/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2579 - acc: 0.2500
Epoch 58/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2578 - acc: 0.2500
Epoch 59/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2576 - acc: 0.2500
Epoch 60/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2575 - acc: 0.2500
Epoch 61/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2574 - acc: 0.2500
Epoch 62/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2573 - acc: 0.2500
Epoch 63/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2572 - acc: 0.2500
Epoch 64/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2571 - acc: 0.2500
Epoch 65/100
1/1 [==============================] - 0s 4ms/step - loss: 0.2570 - acc: 0.2500
Epoch 66/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2569 - acc: 0.2500
Epoch 67/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2568 - acc: 0.2500
Epoch 68/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2567 - acc: 0.2500
Epoch 69/100
1/1 [==============================] - 0s 4ms/step - loss: 0.2566 - acc: 0.2500
Epoch 70/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2565 - acc: 0.2500
Epoch 71/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2563 - acc: 0.2500
Epoch 72/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2562 - acc: 0.2500
Epoch 73/100
1/1 [==============================] - 0s 4ms/step - loss: 0.2561 - acc: 0.2500
Epoch 74/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2560 - acc: 0.2500
Epoch 75/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2559 - acc: 0.2500
Epoch 76/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2558 - acc: 0.2500
Epoch 77/100
1/1 [==============================] - 0s 4ms/step - loss: 0.2557 - acc: 0.2500
Epoch 78/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2556 - acc: 0.2500
Epoch 79/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2555 - acc: 0.2500
Epoch 80/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2554 - acc: 0.2500
Epoch 81/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2553 - acc: 0.2500
Epoch 82/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2552 - acc: 0.2500
Epoch 83/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2551 - acc: 0.2500
Epoch 84/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2550 - acc: 0.2500
Epoch 85/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2549 - acc: 0.2500
Epoch 86/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2548 - acc: 0.2500
Epoch 87/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2547 - acc: 0.2500
Epoch 88/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2546 - acc: 0.2500
Epoch 89/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2545 - acc: 0.2500
Epoch 90/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2544 - acc: 0.2500
Epoch 91/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2543 - acc: 0.2500
Epoch 92/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2542 - acc: 0.2500
Epoch 93/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2541 - acc: 0.2500
Epoch 94/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2540 - acc: 0.2500
Epoch 95/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2539 - acc: 0.2500
Epoch 96/100
1/1 [==============================] - 0s 3ms/step - loss: 0.2538 - acc: 0.2500
Epoch 97/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2537 - acc: 0.2500
Epoch 98/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2536 - acc: 0.2500
Epoch 99/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2535 - acc: 0.2500
Epoch 100/100
1/1 [==============================] - 0s 2ms/step - loss: 0.2534 - acc: 0.2500
WARNING:tensorflow:9 out of the last 10 calls to <function Model.make_predict_function.<locals>.predict_function at 0x000002610365D0D0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.
WARNING:tensorflow:8 out of the last 8 calls to <function Model.make_test_function.<locals>.test_function at 0x000002610365DEE0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.
1/1 [==============================] - 0s 249ms/step - loss: 0.2533 - acc: 0.2500
[0.50530165 0.44862053 0.49225846 0.442587  ]

저작자표시 비영리 (새창열림)

'Data_Science > Data_Analysis_Py' 카테고리의 다른 글

42. Fashion-MNIST 딥러닝 예측 (0)	2021.11.25
41. MNIST 딥러닝 예측 (0)	2021.11.25
39. Tensorflow 구현 (0)	2021.11.25
38. 학생 점수 분석 \|\| Kmeans (0)	2021.11.25
37. iris \|\| Kmeans (0)	2021.11.25

기존
and
x1 | x2 | y
0  | 0  | 0
0  | 1  | 0
1  | 0  | 0
1  | 1  | 1
b + x1*w1 + ... = y
b 편향
b 보다 작으면 0, 크면 1 이라고 하면 and 알고리즘을 알수있다
x1, x2 입력값
w1, w2 가중치
y = 0 : if x1w1 + x2w2 <=b
y = 1 : if x1w1 + x2w2 > b

# and게이트

import numpy as np
def AND (x1, x2) :
    x = np.array([x1, x2])
    w = np.array([0.5, 0.5])
    b = -0.5 # -0.8이되어도 영향 별로 없음
    tmp = np.sum(w*x) + b
    if tmp <= 0 :
        return 0
    else : 
        return 1
# 퍼셉트론 알고리즘
for xs in [(0,0),(1,0),(0,1),(1,1)] :
    y = AND(xs[0], xs[1])
    print(str(xs) + "=>" + str(y))
# 가중치와 편향을 찾아가는 방법
# 모든것의 최적의 가중치와 편향을 찾음

(0, 0)=>0
(1, 0)=>0
(0, 1)=>0
(1, 1)=>1

# OR게이트

import numpy as np
def OR (x1, x2) :
    x = np.array([x1, x2])
    w = np.array([0.5, 0.5])
    b = -0.2 # -0.8이되어도 영향 별로 없음
    tmp = np.sum(w*x) + b
    if tmp <= 0 :
        return 0
    else : 
        return 1
# 퍼셉트론 알고리즘
for xs in [(0,0),(1,0),(0,1),(1,1)] :
    y = OR(xs[0], xs[1])
    print(str(xs) + "=>" + str(y))
# -0.2로 바뀌었을 뿐 => 이걸로 조합이 OR이 ㅣㅗㄷㅁ

(0, 0)=>0
(1, 0)=>1
(0, 1)=>1
(1, 1)=>1

# nand게이트

import numpy as np
def NAND (x1, x2) :
    x = np.array([x1, x2])
    w = np.array([-0.5, -0.5])
    b = 0.8 # -0.8이되어도 영향 별로 없음
    tmp = np.sum(w*x) + b
    if tmp <= 0 :
        return 0
    else : 
        return 1
# 퍼셉트론 알고리즘
for xs in [(0,0),(1,0),(0,1),(1,1)] :
    y = NAND(xs[0], xs[1])
    print(str(xs) + "=>" + str(y))
    

(0, 0)=>1
(1, 0)=>1
(0, 1)=>1
(1, 1)=>0

# XOR게이트

 다중 신경망
import numpy as np
def XOR (x1, x2) :
    s1 = NAND(x1, x2)
    s2 = OR(x1, x2)
    y = AND(s1, s2) # 2층 구조
    return y
# 퍼셉트론 알고리즘
for xs in [(0,0),(1,0),(0,1),(1,1)] :
    y = XOR(xs[0], xs[1])
    print(str(xs) + "=>" + str(y))


(0, 0)=>1
(1, 0)=>0
(0, 1)=>0
(1, 1)=>1

# 같으면 1 다르면 0
# 10 # 선형 표현불가
# 01
x1을
다중 퍼셉트론

# 활성화 함수 : 비선형데이터로 변환

# 계단 함수 : 0, 1로 리턴

import matplotlib.pyplot as plt
def step_function (x):
    return np.array(x > 0,dtype = np.int)
    # 0보다 크면 실수 출력
x = np.arange(-5.0, 5.0, 0.1)
y = step_function(x)
plt.plot(x, y)
plt.ylim(-0.1, 1.1)
plt.show()

# 시그모이드함수 0 ~ 1.0

def sigmoid(x) :
    return 1 / (1 + np.exp(-x)) # 분류
x = np.arange(-5.0, 5.0, 0.1)
y = sigmoid(x)
plt.plot(x,y)
plt.ylim(-0.1, 1.1)
plt.show()

# ReLU : 0 ~ 이상의 값

def relu(x) :
    return np.maximum(0,x) # 회귀분석에서
x = np.arange(-5.0, 5.0, 0.1)
y = relu(x)
plt.plot(x, y)
plt.show()

# cost function (loss functin) # 어떻게 최적의 값으로 접근하지?
# 미분값이 최소

batch gd

stochastic

mini batch gc

# 오차역전파
수식을 통해
알고리즘을 통해
정답과 예측을 손실함수로 비교하고 다시 구함 =>

import numpy as np
import tensorflow as tf
print(tf.__version__)

2.4.1

a = tf.constant(2) # 스칼라값을 텐서로 선언
b = tf.constant([1, 2]) # 백터를 텐서로 선언
c = tf.constant([[1, 2],[3, 4]]) 
# rank : a텐서의 차원리터
print(tf.rank(a))
print(tf.rank(b))
print(tf.rank(c)) #

tf.Tensor(0, shape=(), dtype=int32)
tf.Tensor(1, shape=(), dtype=int32)
tf.Tensor(2, shape=(), dtype=int32)

# 더하기 add
a = tf.constant(3)
b = tf.constant(2)
print(tf.add(a,b))

# tf.Tensor(5, shape=(), dtype=int32)


# 빼기 subtract
print(tf.subtract(a,b))
# 곱셈
print(tf.multiply(a,b))
# 나눗셈
print(tf.divide(a,b))
# 나눗셈 결과값 많
print(tf.divide(a,b).numpy())
print(tf.multiply(a,b).numpy())

tf.Tensor(1, shape=(), dtype=int32)
tf.Tensor(6, shape=(), dtype=int32)
tf.Tensor(1.5, shape=(), dtype=float64)
1.5
6

c_square = np.square(tf.add(a,b).numpy(), dtype=np.float32)
c_square

# 25.0

c_tensor = tf.convert_to_tensor(c_square)
c_tensor

# <tf.Tensor: shape=(), dtype=float32, numpy=25.0>

@tf.function # 언어펑션
def square_pos1(x) :
    if x > 0 :
        x = x*x
    else :
        x = x*-1
    return x
print(square_pos1(tf.constant(2)))
print(square_pos1.__class__)

# tf.Tensor(4, shape=(), dtype=int32)
# <class 'tensorflow.python.eager.def_function.Function'>

def square_pos2(x) :
    if x > 0 :
        x = x*x
    else :
        x = x*-1
    return x
print(square_pos2(tf.constant(2)))
print(square_pos2.__class__)

# tf.Tensor(4, shape=(), dtype=int32)
# <class 'function'>

# 텐서플로우를 통한 OR게이트
import tensorflow as tf
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.losses import mse
tf.random.set_seed(777)
# 데이터
data = np.array([[0,0],[1,0],[0,1],[1,1]])
# 라벨링
label = np.array([[0],[1],[1],[1]])
model = Sequential()
model.add(Dense(1, input_shape = (2,), activation = 'linear')) # 퍼셉트론
model.compile(optimizer = SGD(), loss = mse, metrics = ['acc'])
# epochs 100으로 하면 학습하다가 관둬서, 200으로 함
model.fit(data, label, epochs = 200)
# 그 값은 이거야
model.get_weights()
# 머신러닝은 값 하나하나 매겨주는데 딥러닝은 라벨링하고 주면 알아서 찾아감

Epoch 1/200
1/1 [==============================] - 1s 700ms/step - loss: 1.4290 - acc: 0.5000
Epoch 2/200
1/1 [==============================] - 0s 2ms/step - loss: 1.3602 - acc: 0.5000
Epoch 3/200
1/1 [==============================] - 0s 2ms/step - loss: 1.2956 - acc: 0.5000
Epoch 4/200
1/1 [==============================] - 0s 2ms/step - loss: 1.2349 - acc: 0.5000
Epoch 5/200
1/1 [==============================] - 0s 2ms/step - loss: 1.1779 - acc: 0.5000
Epoch 6/200
1/1 [==============================] - 0s 2ms/step - loss: 1.1242 - acc: 0.5000
Epoch 7/200
1/1 [==============================] - 0s 2ms/step - loss: 1.0738 - acc: 0.5000
Epoch 8/200
1/1 [==============================] - 0s 2ms/step - loss: 1.0264 - acc: 0.5000
Epoch 9/200
1/1 [==============================] - 0s 2ms/step - loss: 0.9819 - acc: 0.5000
Epoch 10/200
1/1 [==============================] - 0s 2ms/step - loss: 0.9399 - acc: 0.5000
Epoch 11/200
1/1 [==============================] - 0s 3ms/step - loss: 0.9005 - acc: 0.5000
Epoch 12/200
1/1 [==============================] - 0s 2ms/step - loss: 0.8634 - acc: 0.5000
Epoch 13/200
1/1 [==============================] - 0s 4ms/step - loss: 0.8284 - acc: 0.5000
Epoch 14/200
1/1 [==============================] - 0s 2ms/step - loss: 0.7955 - acc: 0.5000
Epoch 15/200
1/1 [==============================] - 0s 3ms/step - loss: 0.7646 - acc: 0.5000
Epoch 16/200
1/1 [==============================] - 0s 2ms/step - loss: 0.7354 - acc: 0.5000
Epoch 17/200
1/1 [==============================] - 0s 2ms/step - loss: 0.7079 - acc: 0.5000
Epoch 18/200
1/1 [==============================] - 0s 2ms/step - loss: 0.6820 - acc: 0.5000
Epoch 19/200
1/1 [==============================] - 0s 2ms/step - loss: 0.6576 - acc: 0.5000
Epoch 20/200
1/1 [==============================] - 0s 2ms/step - loss: 0.6346 - acc: 0.5000
Epoch 21/200
1/1 [==============================] - 0s 3ms/step - loss: 0.6129 - acc: 0.5000
Epoch 22/200
1/1 [==============================] - 0s 2ms/step - loss: 0.5925 - acc: 0.5000
Epoch 23/200
1/1 [==============================] - 0s 2ms/step - loss: 0.5732 - acc: 0.5000
Epoch 24/200
1/1 [==============================] - 0s 2ms/step - loss: 0.5549 - acc: 0.5000
Epoch 25/200
1/1 [==============================] - 0s 3ms/step - loss: 0.5377 - acc: 0.5000
Epoch 26/200
1/1 [==============================] - 0s 2ms/step - loss: 0.5215 - acc: 0.5000
Epoch 27/200
1/1 [==============================] - 0s 2ms/step - loss: 0.5061 - acc: 0.5000
Epoch 28/200
1/1 [==============================] - 0s 2ms/step - loss: 0.4916 - acc: 0.5000
Epoch 29/200
1/1 [==============================] - 0s 2ms/step - loss: 0.4778 - acc: 0.5000
Epoch 30/200
1/1 [==============================] - 0s 2ms/step - loss: 0.4648 - acc: 0.5000
Epoch 31/200
1/1 [==============================] - 0s 2ms/step - loss: 0.4525 - acc: 0.7500
Epoch 32/200
1/1 [==============================] - 0s 2ms/step - loss: 0.4409 - acc: 0.7500
Epoch 33/200
1/1 [==============================] - 0s 2ms/step - loss: 0.4298 - acc: 0.7500
Epoch 34/200
1/1 [==============================] - 0s 2ms/step - loss: 0.4193 - acc: 0.7500
Epoch 35/200
1/1 [==============================] - 0s 2ms/step - loss: 0.4094 - acc: 0.7500
Epoch 36/200
1/1 [==============================] - 0s 2ms/step - loss: 0.4000 - acc: 0.7500
Epoch 37/200
1/1 [==============================] - 0s 2ms/step - loss: 0.3911 - acc: 0.7500
Epoch 38/200
1/1 [==============================] - 0s 2ms/step - loss: 0.3826 - acc: 0.7500
Epoch 39/200
1/1 [==============================] - 0s 3ms/step - loss: 0.3745 - acc: 0.7500
Epoch 40/200
1/1 [==============================] - 0s 2ms/step - loss: 0.3668 - acc: 0.7500
Epoch 41/200
1/1 [==============================] - 0s 2ms/step - loss: 0.3595 - acc: 0.7500
Epoch 42/200
1/1 [==============================] - 0s 3ms/step - loss: 0.3525 - acc: 0.7500
Epoch 43/200
1/1 [==============================] - 0s 2ms/step - loss: 0.3459 - acc: 0.7500
Epoch 44/200
1/1 [==============================] - 0s 2ms/step - loss: 0.3396 - acc: 0.7500
Epoch 45/200
1/1 [==============================] - 0s 2ms/step - loss: 0.3336 - acc: 0.7500
Epoch 46/200
1/1 [==============================] - 0s 2ms/step - loss: 0.3278 - acc: 0.7500
Epoch 47/200
1/1 [==============================] - 0s 3ms/step - loss: 0.3223 - acc: 0.7500
Epoch 48/200
1/1 [==============================] - 0s 2ms/step - loss: 0.3170 - acc: 0.7500
Epoch 49/200
1/1 [==============================] - 0s 2ms/step - loss: 0.3120 - acc: 0.7500
Epoch 50/200
1/1 [==============================] - 0s 2ms/step - loss: 0.3072 - acc: 0.7500
Epoch 51/200
1/1 [==============================] - 0s 2ms/step - loss: 0.3026 - acc: 0.7500
Epoch 52/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2982 - acc: 0.7500
Epoch 53/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2939 - acc: 0.7500
Epoch 54/200
1/1 [==============================] - 0s 3ms/step - loss: 0.2898 - acc: 0.7500
Epoch 55/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2859 - acc: 0.7500
Epoch 56/200
1/1 [==============================] - 0s 3ms/step - loss: 0.2822 - acc: 0.7500
Epoch 57/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2785 - acc: 0.7500
Epoch 58/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2750 - acc: 0.7500
Epoch 59/200
1/1 [==============================] - 0s 3ms/step - loss: 0.2717 - acc: 0.7500
Epoch 60/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2684 - acc: 0.7500
Epoch 61/200
1/1 [==============================] - 0s 3ms/step - loss: 0.2653 - acc: 0.7500
Epoch 62/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2623 - acc: 0.7500
Epoch 63/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2594 - acc: 0.7500
Epoch 64/200
1/1 [==============================] - 0s 3ms/step - loss: 0.2565 - acc: 0.7500
Epoch 65/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2538 - acc: 0.7500
Epoch 66/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2511 - acc: 0.7500
Epoch 67/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2486 - acc: 0.7500
Epoch 68/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2461 - acc: 0.7500
Epoch 69/200
1/1 [==============================] - 0s 3ms/step - loss: 0.2436 - acc: 0.7500
Epoch 70/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2413 - acc: 0.7500
Epoch 71/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2390 - acc: 0.7500
Epoch 72/200
1/1 [==============================] - 0s 3ms/step - loss: 0.2368 - acc: 0.7500
Epoch 73/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2346 - acc: 0.7500
Epoch 74/200
1/1 [==============================] - 0s 3ms/step - loss: 0.2325 - acc: 0.7500
Epoch 75/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2304 - acc: 0.7500
Epoch 76/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2284 - acc: 0.7500
Epoch 77/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2264 - acc: 0.7500
Epoch 78/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2245 - acc: 0.7500
Epoch 79/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2226 - acc: 0.7500
Epoch 80/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2208 - acc: 0.7500
Epoch 81/200
1/1 [==============================] - 0s 3ms/step - loss: 0.2190 - acc: 0.7500
Epoch 82/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2173 - acc: 0.7500
Epoch 83/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2155 - acc: 0.7500
Epoch 84/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2139 - acc: 0.7500
Epoch 85/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2122 - acc: 0.7500
Epoch 86/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2106 - acc: 0.7500
Epoch 87/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2090 - acc: 0.7500
Epoch 88/200
1/1 [==============================] - 0s 3ms/step - loss: 0.2074 - acc: 0.7500
Epoch 89/200
1/1 [==============================] - 0s 999us/step - loss: 0.2059 - acc: 0.7500
Epoch 90/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2044 - acc: 0.7500
Epoch 91/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2029 - acc: 0.7500
Epoch 92/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2015 - acc: 0.7500
Epoch 93/200
1/1 [==============================] - 0s 2ms/step - loss: 0.2000 - acc: 0.7500
Epoch 94/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1986 - acc: 0.7500
Epoch 95/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1973 - acc: 0.7500
Epoch 96/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1959 - acc: 0.7500
Epoch 97/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1946 - acc: 0.7500
Epoch 98/200
1/1 [==============================] - 0s 1ms/step - loss: 0.1932 - acc: 0.7500
Epoch 99/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1919 - acc: 0.7500
Epoch 100/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1907 - acc: 0.7500
Epoch 101/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1894 - acc: 0.7500
Epoch 102/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1881 - acc: 0.7500
Epoch 103/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1869 - acc: 0.7500
Epoch 104/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1857 - acc: 0.7500
Epoch 105/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1845 - acc: 0.7500
Epoch 106/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1833 - acc: 0.7500
Epoch 107/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1822 - acc: 0.7500
Epoch 108/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1810 - acc: 0.7500
Epoch 109/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1799 - acc: 0.7500
Epoch 110/200
1/1 [==============================] - 0s 1ms/step - loss: 0.1787 - acc: 0.7500
Epoch 111/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1776 - acc: 0.7500
Epoch 112/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1765 - acc: 0.7500
Epoch 113/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1755 - acc: 0.7500
Epoch 114/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1744 - acc: 0.7500
Epoch 115/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1733 - acc: 0.7500
Epoch 116/200
1/1 [==============================] - 0s 3ms/step - loss: 0.1723 - acc: 0.7500
Epoch 117/200
1/1 [==============================] - 0s 3ms/step - loss: 0.1712 - acc: 0.7500
Epoch 118/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1702 - acc: 0.7500
Epoch 119/200
1/1 [==============================] - 0s 4ms/step - loss: 0.1692 - acc: 0.7500
Epoch 120/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1682 - acc: 0.7500
Epoch 121/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1672 - acc: 0.7500
Epoch 122/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1663 - acc: 0.7500
Epoch 123/200
1/1 [==============================] - 0s 3ms/step - loss: 0.1653 - acc: 0.7500
Epoch 124/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1643 - acc: 0.7500
Epoch 125/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1634 - acc: 0.7500
Epoch 126/200
1/1 [==============================] - 0s 3ms/step - loss: 0.1625 - acc: 0.7500
Epoch 127/200
1/1 [==============================] - 0s 3ms/step - loss: 0.1615 - acc: 0.7500
Epoch 128/200
1/1 [==============================] - 0s 3ms/step - loss: 0.1606 - acc: 0.7500
Epoch 129/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1597 - acc: 0.7500
Epoch 130/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1588 - acc: 0.7500
Epoch 131/200
1/1 [==============================] - 0s 3ms/step - loss: 0.1579 - acc: 0.7500
Epoch 132/200
1/1 [==============================] - 0s 3ms/step - loss: 0.1570 - acc: 0.7500
Epoch 133/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1562 - acc: 0.7500
Epoch 134/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1553 - acc: 0.7500
Epoch 135/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1545 - acc: 0.7500
Epoch 136/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1536 - acc: 0.7500
Epoch 137/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1528 - acc: 0.7500
Epoch 138/200
1/1 [==============================] - 0s 1ms/step - loss: 0.1520 - acc: 0.7500
Epoch 139/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1511 - acc: 0.7500
Epoch 140/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1503 - acc: 0.7500
Epoch 141/200
1/1 [==============================] - 0s 3ms/step - loss: 0.1495 - acc: 0.7500
Epoch 142/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1487 - acc: 0.7500
Epoch 143/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1480 - acc: 0.7500
Epoch 144/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1472 - acc: 0.7500
Epoch 145/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1464 - acc: 0.7500
Epoch 146/200
1/1 [==============================] - 0s 3ms/step - loss: 0.1456 - acc: 0.7500
Epoch 147/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1449 - acc: 0.7500
Epoch 148/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1441 - acc: 0.7500
Epoch 149/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1434 - acc: 0.7500
Epoch 150/200
1/1 [==============================] - 0s 3ms/step - loss: 0.1427 - acc: 0.7500
Epoch 151/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1419 - acc: 0.7500
Epoch 152/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1412 - acc: 0.7500
Epoch 153/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1405 - acc: 0.7500
Epoch 154/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1398 - acc: 0.7500
Epoch 155/200
1/1 [==============================] - 0s 3ms/step - loss: 0.1391 - acc: 0.7500
Epoch 156/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1384 - acc: 0.7500
Epoch 157/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1377 - acc: 0.7500
Epoch 158/200
1/1 [==============================] - 0s 1000us/step - loss: 0.1370 - acc: 0.7500
Epoch 159/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1364 - acc: 0.7500
Epoch 160/200
1/1 [==============================] - 0s 4ms/step - loss: 0.1357 - acc: 0.7500
Epoch 161/200
1/1 [==============================] - 0s 3ms/step - loss: 0.1350 - acc: 0.7500
Epoch 162/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1344 - acc: 0.7500
Epoch 163/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1337 - acc: 0.7500
Epoch 164/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1331 - acc: 0.7500
Epoch 165/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1325 - acc: 0.7500
Epoch 166/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1318 - acc: 0.7500
Epoch 167/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1312 - acc: 0.7500
Epoch 168/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1306 - acc: 0.7500
Epoch 169/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1300 - acc: 0.7500
Epoch 170/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1294 - acc: 0.7500
Epoch 171/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1288 - acc: 0.7500
Epoch 172/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1282 - acc: 0.7500
Epoch 173/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1276 - acc: 0.7500
Epoch 174/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1270 - acc: 0.7500
Epoch 175/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1264 - acc: 0.7500
Epoch 176/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1258 - acc: 0.7500
Epoch 177/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1253 - acc: 0.7500
Epoch 178/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1247 - acc: 0.7500
Epoch 179/200
1/1 [==============================] - 0s 3ms/step - loss: 0.1242 - acc: 0.7500
Epoch 180/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1236 - acc: 0.7500
Epoch 181/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1231 - acc: 0.7500
Epoch 182/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1225 - acc: 0.7500
Epoch 183/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1220 - acc: 0.7500
Epoch 184/200
1/1 [==============================] - 0s 3ms/step - loss: 0.1214 - acc: 0.7500
Epoch 185/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1209 - acc: 0.7500
Epoch 186/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1204 - acc: 0.7500
Epoch 187/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1199 - acc: 0.7500
Epoch 188/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1194 - acc: 0.7500
Epoch 189/200
1/1 [==============================] - 0s 3ms/step - loss: 0.1189 - acc: 0.7500
Epoch 190/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1184 - acc: 0.7500
Epoch 191/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1179 - acc: 0.7500
Epoch 192/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1174 - acc: 1.0000
Epoch 193/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1169 - acc: 1.0000
Epoch 194/200
1/1 [==============================] - 0s 3ms/step - loss: 0.1164 - acc: 1.0000
Epoch 195/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1159 - acc: 1.0000
Epoch 196/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1154 - acc: 1.0000
Epoch 197/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1150 - acc: 1.0000
Epoch 198/200
1/1 [==============================] - 0s 2ms/step - loss: 0.1145 - acc: 1.0000
Epoch 199/200
1/1 [==============================] - 0s 3ms/step - loss: 0.1140 - acc: 1.0000
Epoch 200/200
1/1 [==============================] - 0s 3ms/step - loss: 0.1136 - acc: 1.0000
[array([[0.5995085 ],
        [0.06513146]], dtype=float32),
 array([0.4472612], dtype=float32)]

저작자표시 비영리 (새창열림)

'Data_Science > Data_Analysis_Py' 카테고리의 다른 글

41. MNIST 딥러닝 예측 (0)	2021.11.25
40. Tensorflow를 통한 논리구현 (0)	2021.11.25
38. 학생 점수 분석 \|\| Kmeans (0)	2021.11.25
37. iris \|\| Kmeans (0)	2021.11.25
36. 강남역 고기집 후기분석 \|\| 감성분석 (0)	2021.11.25

38. 학생 점수 분석 || Kmeans

2021. 11. 25. 14:36

728x90

academy1.csv

0.00MB

import pandas as pd
data = pd.read_csv('academy1.csv')
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32 entries, 0 to 31
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   학번      32 non-null     int64
 1   국어점수    32 non-null     int64
 2   영어점수    32 non-null     int64
dtypes: int64(3)
memory usage: 896.0 bytes

from sklearn.cluster import KMeans
km = KMeans(n_clusters = 3)
km.fit(data.iloc[:,1:])
aaa = km.fit_predict(data.iloc[:,1:])

import mglearn
import matplotlib.pyplot as plt
from matplotlib import rc
rc('font', family = 'Malgun Gothic')
mglearn.plots.plot_kmeans_algorithm()
plt.show()

mglearn.plots.plot_kmeans_boundaries()
plt.show()

mglearn.discrete_scatter(data.iloc[:,1], data.iloc[:,2], km.labels_)
plt.legend(["클러스터 0","클러스터 1","클러스터 2"], loc='best')
plt.xlabel("국어점수")
plt.ylabel("영어점수")
plt.show()

# 국어점수 100점, 영어점수 80점인 학생은 몇번 클러스터?
km.predict([[100, 80]])
# 0번 클러스터

# array([0])

academy2.csv

0.00MB

data = pd.read_csv('academy2.csv')
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18 entries, 0 to 17
Data columns (total 6 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   학번      18 non-null     int64
 1   국어점수    18 non-null     int64
 2   영어점수    18 non-null     int64
 3   수학점수    18 non-null     int64
 4   과학점수    18 non-null     int64
 5   학업성취도   18 non-null     int64
dtypes: int64(6)
memory usage: 992.0 bytes

km = KMeans(n_clusters = 3)
km.fit(data.iloc[:,1:])
aaa = km.fit_predict(data.iloc[:,1:])
mglearn.plots.plot_kmeans_algorithm()
plt.show()

mglearn.plots.plot_kmeans_boundaries()
plt.show()

mglearn.discrete_scatter(data.iloc[:,1], data.iloc[:,2], km.labels_)
plt.legend(["클러스터 0","클러스터 1","클러스터 2"], loc='best')
plt.xlabel("국어점수")
plt.ylabel("영어점수")
plt.show()

km.predict([[100, 80, 70, 70, 70]])
# array([0])

km.labels_
# array([0, 0, 2, 0, 0, 2, 0, 0, 2, 1, 0, 2, 1, 0, 2, 1, 0, 2])

for no, cla in enumerate(km.labels_) :
    print(data.iloc[no].tolist(), cla)
    
[1, 90, 80, 80, 80, 80] 0
[2, 90, 75, 75, 75, 75] 0
[3, 65, 90, 90, 90, 90] 2
[4, 90, 80, 80, 80, 80] 0
[5, 90, 75, 75, 75, 75] 0
[6, 65, 90, 90, 90, 90] 2
[7, 90, 80, 80, 80, 80] 0
[8, 90, 75, 75, 75, 75] 0
[9, 65, 90, 60, 88, 80] 2
[10, 90, 80, 60, 30, 40] 1
[11, 90, 75, 85, 60, 70] 0
[12, 65, 90, 60, 88, 80] 2
[13, 90, 30, 40, 30, 40] 1
[14, 90, 60, 70, 60, 70] 0
[15, 65, 88, 80, 88, 80] 2
[16, 90, 30, 40, 30, 40] 1
[17, 90, 60, 70, 60, 70] 0
[18, 65, 88, 80, 88, 80] 2

from sklearn.cluster import DBSCAN
model = DBSCAN()
model.fit(data.iloc[:,1:])
clusters = model.fit_predict(data.iloc[:,1:])
mglearn.discrete_scatter(data.iloc[:,1], data.iloc[:,2], model.labels_)
plt.legend(['클러스터 0','클러스터 1','클러스터 2'], loc = 'best')
plt.xlabel('국어점수')
plt.ylabel('영어점수')
plt.show()

# kmeans
# 성능이좋다. 군집에 많이 사용됨
랜덤으로 만들고 하나하나 만들어감
원 형일 때 좋음
클러스터개수 수동설정

dbscan 밀집도 기준으로 구분
클러스터 자동

저작자표시 비영리 (새창열림)

'Data_Science > Data_Analysis_Py' 카테고리의 다른 글

40. Tensorflow를 통한 논리구현 (0)	2021.11.25
39. Tensorflow 구현 (0)	2021.11.25
37. iris \|\| Kmeans (0)	2021.11.25
36. 강남역 고기집 후기분석 \|\| 감성분석 (0)	2021.11.25
35. 강남역 고기집 감성분석 \|\| 감성분석, TF-IDF (0)	2021.11.25

37. iris || Kmeans

2021. 11. 25. 14:27

728x90

from sklearn import datasets
iris = datasets.load_iris()
type(iris)

# sklearn.utils.Bunch

iris.target

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

iris.data
array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2],
       [5.4, 3.9, 1.7, 0.4],
       [4.6, 3.4, 1.4, 0.3],
       [5. , 3.4, 1.5, 0.2],
       [4.4, 2.9, 1.4, 0.2],
       [4.9, 3.1, 1.5, 0.1],
       [5.4, 3.7, 1.5, 0.2],
       [4.8, 3.4, 1.6, 0.2],
       [4.8, 3. , 1.4, 0.1],
       [4.3, 3. , 1.1, 0.1],
       [5.8, 4. , 1.2, 0.2],
       [5.7, 4.4, 1.5, 0.4],
       [5.4, 3.9, 1.3, 0.4],
       [5.1, 3.5, 1.4, 0.3],
       [5.7, 3.8, 1.7, 0.3],
       [5.1, 3.8, 1.5, 0.3],
       [5.4, 3.4, 1.7, 0.2],
       [5.1, 3.7, 1.5, 0.4],
       [4.6, 3.6, 1. , 0.2],
       [5.1, 3.3, 1.7, 0.5],
       [4.8, 3.4, 1.9, 0.2],
       [5. , 3. , 1.6, 0.2],
       [5. , 3.4, 1.6, 0.4],
       [5.2, 3.5, 1.5, 0.2],
       [5.2, 3.4, 1.4, 0.2],
       [4.7, 3.2, 1.6, 0.2],
       [4.8, 3.1, 1.6, 0.2],
       [5.4, 3.4, 1.5, 0.4],
       [5.2, 4.1, 1.5, 0.1],
       [5.5, 4.2, 1.4, 0.2],
       [4.9, 3.1, 1.5, 0.2],
       [5. , 3.2, 1.2, 0.2],
       [5.5, 3.5, 1.3, 0.2],
       [4.9, 3.6, 1.4, 0.1],
       [4.4, 3. , 1.3, 0.2],
       [5.1, 3.4, 1.5, 0.2],
       [5. , 3.5, 1.3, 0.3],
       [4.5, 2.3, 1.3, 0.3],
       [4.4, 3.2, 1.3, 0.2],
       [5. , 3.5, 1.6, 0.6],
       [5.1, 3.8, 1.9, 0.4],
       [4.8, 3. , 1.4, 0.3],
       [5.1, 3.8, 1.6, 0.2],
       [4.6, 3.2, 1.4, 0.2],
       [5.3, 3.7, 1.5, 0.2],
       [5. , 3.3, 1.4, 0.2],
       [7. , 3.2, 4.7, 1.4],
       [6.4, 3.2, 4.5, 1.5],
       [6.9, 3.1, 4.9, 1.5],
       [5.5, 2.3, 4. , 1.3],
       [6.5, 2.8, 4.6, 1.5],
       [5.7, 2.8, 4.5, 1.3],
       [6.3, 3.3, 4.7, 1.6],
       [4.9, 2.4, 3.3, 1. ],
       [6.6, 2.9, 4.6, 1.3],
       [5.2, 2.7, 3.9, 1.4],
       [5. , 2. , 3.5, 1. ],
       [5.9, 3. , 4.2, 1.5],
       [6. , 2.2, 4. , 1. ],
       [6.1, 2.9, 4.7, 1.4],
       [5.6, 2.9, 3.6, 1.3],
       [6.7, 3.1, 4.4, 1.4],
       [5.6, 3. , 4.5, 1.5],
       [5.8, 2.7, 4.1, 1. ],
       [6.2, 2.2, 4.5, 1.5],
       [5.6, 2.5, 3.9, 1.1],
       [5.9, 3.2, 4.8, 1.8],
       [6.1, 2.8, 4. , 1.3],
       [6.3, 2.5, 4.9, 1.5],
       [6.1, 2.8, 4.7, 1.2],
       [6.4, 2.9, 4.3, 1.3],
       [6.6, 3. , 4.4, 1.4],
       [6.8, 2.8, 4.8, 1.4],
       [6.7, 3. , 5. , 1.7],
       [6. , 2.9, 4.5, 1.5],
       [5.7, 2.6, 3.5, 1. ],
       [5.5, 2.4, 3.8, 1.1],
       [5.5, 2.4, 3.7, 1. ],
       [5.8, 2.7, 3.9, 1.2],
       [6. , 2.7, 5.1, 1.6],
       [5.4, 3. , 4.5, 1.5],
       [6. , 3.4, 4.5, 1.6],
       [6.7, 3.1, 4.7, 1.5],
       [6.3, 2.3, 4.4, 1.3],
       [5.6, 3. , 4.1, 1.3],
       [5.5, 2.5, 4. , 1.3],
       [5.5, 2.6, 4.4, 1.2],
       [6.1, 3. , 4.6, 1.4],
       [5.8, 2.6, 4. , 1.2],
       [5. , 2.3, 3.3, 1. ],
       [5.6, 2.7, 4.2, 1.3],
       [5.7, 3. , 4.2, 1.2],
       [5.7, 2.9, 4.2, 1.3],
       [6.2, 2.9, 4.3, 1.3],
       [5.1, 2.5, 3. , 1.1],
       [5.7, 2.8, 4.1, 1.3],
       [6.3, 3.3, 6. , 2.5],
       [5.8, 2.7, 5.1, 1.9],
       [7.1, 3. , 5.9, 2.1],
       [6.3, 2.9, 5.6, 1.8],
       [6.5, 3. , 5.8, 2.2],
       [7.6, 3. , 6.6, 2.1],
       [4.9, 2.5, 4.5, 1.7],
       [7.3, 2.9, 6.3, 1.8],
       [6.7, 2.5, 5.8, 1.8],
       [7.2, 3.6, 6.1, 2.5],
       [6.5, 3.2, 5.1, 2. ],
       [6.4, 2.7, 5.3, 1.9],
       [6.8, 3. , 5.5, 2.1],
       [5.7, 2.5, 5. , 2. ],
       [5.8, 2.8, 5.1, 2.4],
       [6.4, 3.2, 5.3, 2.3],
       [6.5, 3. , 5.5, 1.8],
       [7.7, 3.8, 6.7, 2.2],
       [7.7, 2.6, 6.9, 2.3],
       [6. , 2.2, 5. , 1.5],
       [6.9, 3.2, 5.7, 2.3],
       [5.6, 2.8, 4.9, 2. ],
       [7.7, 2.8, 6.7, 2. ],
       [6.3, 2.7, 4.9, 1.8],
       [6.7, 3.3, 5.7, 2.1],
       [7.2, 3.2, 6. , 1.8],
       [6.2, 2.8, 4.8, 1.8],
       [6.1, 3. , 4.9, 1.8],
       [6.4, 2.8, 5.6, 2.1],
       [7.2, 3. , 5.8, 1.6],
       [7.4, 2.8, 6.1, 1.9],
       [7.9, 3.8, 6.4, 2. ],
       [6.4, 2.8, 5.6, 2.2],
       [6.3, 2.8, 5.1, 1.5],
       [6.1, 2.6, 5.6, 1.4],
       [7.7, 3. , 6.1, 2.3],
       [6.3, 3.4, 5.6, 2.4],
       [6.4, 3.1, 5.5, 1.8],
       [6. , 3. , 4.8, 1.8],
       [6.9, 3.1, 5.4, 2.1],
       [6.7, 3.1, 5.6, 2.4],
       [6.9, 3.1, 5.1, 2.3],
       [5.8, 2.7, 5.1, 1.9],
       [6.8, 3.2, 5.9, 2.3],
       [6.7, 3.3, 5.7, 2.5],
       [6.7, 3. , 5.2, 2.3],
       [6.3, 2.5, 5. , 1.9],
       [6.5, 3. , 5.2, 2. ],
       [6.2, 3.4, 5.4, 2.3],
       [5.9, 3. , 5.1, 1.8]])

import pandas as pd
labels = pd.DataFrame(iris.target)
labels.columns = ['labels']
labels.head()
data['labels'] = labels['labels']

labels['labels'].unique()

array([0, 1, 2])

labels['labels'].value_counts()

2    50
1    50
0    50
Name: labels, dtype: int64

data = pd.DataFrame(iris.data)
data.columns = ['Sepal length', 'Sepal width','Petal length','Petal width']
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 4 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Sepal length  150 non-null    float64
 1   Sepal width   150 non-null    float64
 2   Petal length  150 non-null    float64
 3   Petal width   150 non-null    float64
dtypes: float64(4)
memory usage: 4.8 KB

feature = data[['Sepal length','Sepal width']]
feature.head()

	Sepal length	Sepal width
0	5.1	3.5
1	4.9	3.0
2	4.7	3.2
3	4.6	3.1
4	5.0	3.6

from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
import seaborn as sns
mo = KMeans(n_clusters=3, algorithm = 'auto')
mo.fit(feature)
predict = pd.DataFrame(mo.predict(feature))
predict.columns = ['predict']
r = pd.concat([feature, predict], axis = 1)
r.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 3 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Sepal length  150 non-null    float64
 1   Sepal width   150 non-null    float64
 2   predict       150 non-null    int32  
dtypes: float64(2), int32(1)
memory usage: 3.1 KB

# 예측 데이터 그래프
plt.scatter(r['Sepal length'], r['Sepal width'], c = r['predict'], alpha=0.5)

# 실제 데이터 그래프
plt.scatter(data['Sepal length'], data['Sepal width'], c =data['labels'], alpha=0.5)

from sklearn.metrics import confusion_matrix, accuracy_score
print(accuracy_score(data['labels'].values, r['predict'].values))
print(confusion_matrix(data['labels'].values, r['predict'].values))

0.08
[[ 0  0 50]
 [38 12  0]
 [15 35  0]]

저작자표시 비영리 (새창열림)

'Data_Science > Data_Analysis_Py' 카테고리의 다른 글

39. Tensorflow 구현 (0)	2021.11.25
38. 학생 점수 분석 \|\| Kmeans (0)	2021.11.25
36. 강남역 고기집 후기분석 \|\| 감성분석 (0)	2021.11.25
35. 강남역 고기집 감성분석 \|\| 감성분석, TF-IDF (0)	2021.11.25
34. 강남역 고기집 후기분석 \|\| 맵크로울링 (0)	2021.11.25

36. 강남역 고기집 후기분석 || 감성분석

2021. 11. 25. 14:05

728x90

review_data.csv

0.01MB

import pandas as pd
df = pd.read_csv('review_data.csv')
df

	score	review	y
0	1	예약할 때는 룸을 주기로 하고 홀을 주고, 덥고, 직원들이 정신이 없어 그 가격에 ...	0
1	5	점심식사 잘했던곳.후식커피한잔 하기도 좋고 주차가능합니다. 음식 맛있고 직원분 친절...	1
2	5	新鮮でおいしいです。	1
3	4	녹는다 녹아	1
4	4	NaN	1
...	...	...	...
75	2	이렇게 대기가 긴 맛집인줄 모르고 갔다가 엄청 기다림 예써라는 어플로 대기 하던데 ...	0
76	1	단짠의 정석. 진짜 정석으로 달고 짬. 질리는 맛. 사장님이랑 와이프로 추정되는 ...	0
77	4	만족스러움! 맛있어용	1
78	1	곱창은 없고 대창만 들어있어서 느끼한데 양념은 너무 매워서 위에 탈이나 고생했습니다ㅠㅠ	0
79	5	대창덮밥도 맛있고 곱도리탕도 맛나요 완전 소주각입니다. 자리가 쫍아서 테이블마다 ...	1
80 rows × 3 columns

import re
def text_cleaning(text) :
    hangul = re.compile('[^ ㄱ-ㅣ가-힣]+')
    result = hangul.sub('', text)
    return result
text_cleaning("abc가나다123 라마사아 123")

'가나다 라마사아 '

df['ko_text'] = df['review'].apply(lambda x : text_cleaning(str(x))) # null 값
df['ko_text']

0     예약할 때는 룸을 주기로 하고 홀을 주고 덥고 직원들이 정신이 없어 그 가격에 내가...
1     점심식사 잘했던곳후식커피한잔 하기도 좋고 주차가능합니다 음식 맛있고 직원분 친절하여...
2                                                      
3                                                녹는다 녹아
4                                                      
                            ...                        
75    이렇게 대기가 긴 맛집인줄 모르고 갔다가 엄청 기다림 예써라는 어플로 대기 하던데 ...
76    단짠의 정석 진짜 정석으로 달고 짬 질리는 맛  사장님이랑 와이프로 추정되는 서빙해...
77                                           만족스러움 맛있어용
78    곱창은 없고 대창만 들어있어서 느끼한데 양념은 너무 매워서 위에 탈이나 고생했습니다ㅠㅠ 
79    대창덮밥도 맛있고 곱도리탕도 맛나요 완전 소주각입니다  자리가 쫍아서 테이블마다 가...
Name: ko_text, Length: 80, dtype: object

df['review'].head()

0    예약할 때는 룸을 주기로 하고 홀을 주고, 덥고, 직원들이 정신이 없어 그 가격에 ...
1    점심식사 잘했던곳.후식커피한잔 하기도 좋고 주차가능합니다. 음식 맛있고 직원분 친절...
2                                           新鮮でおいしいです。
3                                               녹는다 녹아
4                                                  NaN
Name: review, dtype: object

df1 = df.loc[df['ko_text'].apply(lambda x : len(x)) > 0]
df1.isnull().value_counts()

score  review  y      ko_text
False  False   False  False      65
dtype: int64

del df['review']
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 80 entries, 0 to 79
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   score    80 non-null     int64 
 1   y        80 non-null     int64 
 2   ko_text  80 non-null     object
dtypes: int64(2), object(1)
memory usage: 2.0+ KB

from konlpy.tag import Okt

# 텍스트 데이터 형태소 추출
def get_pos(x) :
    tagger = Okt()
    pos = tagger.pos(x)
    # word : konlpy 모듈 형태소 분석단어
    # tag : 형태소 분석된 품사
    pos = ['{0}/{1}'.format(word, tag) for word, tag in pos]
    return pos

result = get_pos(df['ko_text'].values[0])
print(result)

['예약/Noun', '할/Verb', '때/Noun', '는/Josa', '룸/Noun', '을/Josa', '주기/Noun', '로/Josa', '하고/Verb', '홀/Noun', '을/Josa', '주고/Verb', '덥고/Adjective', '직원/Noun', '들/Suffix', '이/Josa', '정신/Noun', '이/Josa', '없어/Adjective', '그/Noun', '가격/Noun', '에/Josa', '내/Noun', '가/Josa', '직접/Noun', '구워/Verb', '먹고/Verb', '갈비살/Noun', '등심/Noun', '은/Josa', '질/Noun', '기고/Noun', '냉면/Noun', '은/Josa', '맛/Noun', '이/Josa', '없고/Adjective', '장어/Noun', '양념/Noun', '들/Suffix', '도/Josa', '제/Noun', '때/Noun', '안/Noun', '가져다/Verb', '주고/Verb', '회식/Noun', '으로/Josa', '한/Determiner', '시간/Noun', '만에/Josa', '만원/Noun', '을/Josa', '썼는데/Verb', '이런/Adjective', '경험/Noun', '처음/Noun', '입니다/Adjective']

from sklearn.feature_extraction.text import CountVectorizer
                     #글뭉치(corpus) 인덱스로 생성
index_vectorizer = CountVectorizer(tokenizer = lambda x : get_pos(x))
# 
 # 형태소분석하고 단어품사 분리
x = index_vectorizer.fit_transform(df['ko_text'].tolist())
x.shape

# (80, 779)

from sklearn.feature_extraction.text import TfidfTransformer
# 글뭉치, 형태소분석의 단어
tfidf_vectorizer =  TfidfTransformer()
x = tfidf_vectorizer.fit_transform(x)
print(x.shape)

# (80, 779)

# 긍부정 리뷰분류
# 데이터셋 분리
from sklearn.model_selection import train_test_split
y = df['y']
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.30)

from sklearn.linear_model import LogisticRegression
lr = LogisticRegression(random_state = 0)
lr.fit(x_train, y_train)
y_pred = lr.predict(x_test)

x_train.shape
# (56, 779)

len(lr.coef_[0])
# 779

import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = [10, 8]
plt.bar(range(len(lr.coef_[0])), lr.coef_[0])

# lr.coef_[0] 내림차순
# 상위 양수 : 긍정적인 단어, 긍정가중치
sorted(((value, index) for index, value in enumerate(lr.coef_[0])), reverse=True)[:5]

[(0.31921037916122147, 269),
 (0.31181674718077157, 266),
 (0.31181674718077157, 2),
 (0.22722099938708767, 778),
 (0.22499528665817484, 719)]

# 하위 음수 : 부정적인 단어, 부정 가중치값 5개
sorted(((value, index) for index, value in enumerate(lr.coef_[0])), reverse=True)[-5:]

[(-0.3303223649310512, 736),
 (-0.35074686047120107, 374),
 (-0.35074686047120107, 80),
 (-0.3756982096297823, 538),
 (-0.3907079128326151, 147)]

coef_pos_index = sorted(((value, index) for index, value in enumerate(lr.coef_[0])), reverse=True)
invert_index_vectorizer = { v : k for k, v in index_vectorizer.vocabulary_.items()}
cnt = 0
for k, v in index_vectorizer.vocabulary_.items() :
    print(k, v)
    cnt += 1
    if cnt >= 10 :
        break
# index_vectorizer 글뭉치 데이터
# invert_index_vectorizer 피처 인덱스 : 단어 / 품사

예약/Noun 504
할/Verb 743
때/Noun 224
는/Josa 162
룸/Noun 236
을/Josa 538
주기/Noun 631
로/Josa 235
하고/Verb 721
홀/Noun 769

# 회귀모델의 계수를 index_vectorizer에 맵핑하여, 어떤 형태소인지 출력
for coef in coef_pos_index[:20] :
    print(invert_index_vectorizer[coef[1]], coef[0]) # 단어 품사
#      피처 인덱스값 가져와서, ceof 가중치 가져오기

맛있어요/Adjective 0.31921037916122147
맛있댜/Noun 0.31181674718077157
ㅈㅁㅌㅌㄱㄹ/KoreanParticle 0.31181674718077157
흠/Noun 0.22722099938708767
하/Suffix 0.22499528665817484
비싸다으/Adjective 0.22048773641905486
맛잇으느/Noun 0.22048773641905486
녹아/Verb 0.22048773641905486
녹는다/Verb 0.22048773641905486
탕/Noun 0.2164660184839546
도리/Noun 0.2164660184839546
아이스크림/Noun 0.21489782468607826
후식/Noun 0.2005066237398065
매번/Noun 0.19501461266793108
맛있어용/Adjective 0.19425368225749348
만족스러/Adjective 0.19425368225749348
삼겹/Noun 0.19379228613545138
떡/Noun 0.19269205269234155
맛있네요/Adjective 0.19081949184719
닭갈비/Noun 0.1884669938903386

for coef in coef_pos_index[-20:] :
    print(invert_index_vectorizer[coef[1]], coef[0])
    
할말은/Verb -0.24837542600818435
않습니다/Verb -0.24837542600818435
많지만/Adjective -0.24837542600818435
그냥/Noun -0.2558545368223156
내/Noun -0.2666707017208134
먹기/Noun -0.2691943240304982
불친절해요/Adjective -0.282816560343047
해줌/Verb -0.2946023898159785
편하게/Adjective -0.2946023898159785
ㅜㅜ/KoreanParticle -0.29915590661697383
요/Josa -0.30167857441633256
평범함/Adjective -0.32572092898704597
무질/Noun -0.3292976179312531
너/Modifier -0.3292976179312531
겨/Noun -0.3292976179312531
하지/Verb -0.3303223649310512
비싸긴한데/Adjective -0.35074686047120107
괜찮아요/Adjective -0.35074686047120107
을/Josa -0.3756982096297823
너무/Adverb -0.3907079128326151

# 명사 기준으로 긍정 10개, 부정 10개
noun_list=[]
for coef in coef_pos_index :
    category = invert_index_vectorizer[coef[1]].split("/")[1] # 이름 가져오고 split
    if category == 'Noun' :
        noun_list.append((invert_index_vectorizer[coef[1]], coef[0]))
noun_list[:10]

[('맛있댜/Noun', 0.31181674718077157),
 ('흠/Noun', 0.22722099938708767),
 ('맛잇으느/Noun', 0.22048773641905486),
 ('탕/Noun', 0.2164660184839546),
 ('도리/Noun', 0.2164660184839546),
 ('아이스크림/Noun', 0.21489782468607826),
 ('후식/Noun', 0.2005066237398065),
 ('매번/Noun', 0.19501461266793108),
 ('삼겹/Noun', 0.19379228613545138),
 ('떡/Noun', 0.19269205269234155)]

# 형용사 기준으로 긍정 10개, 부정 10개
adjective_list=[]
for coef in coef_pos_index :
    category = invert_index_vectorizer[coef[1]].split("/")[1] # 이름 가져오고 split
    if category == 'Adjective' :
        adjective_list.append((invert_index_vectorizer[coef[1]], coef[0]))
adjective_list[:10]      

[('맛있어요/Adjective', 0.31921037916122147),
 ('비싸다으/Adjective', 0.22048773641905486),
 ('맛있어용/Adjective', 0.19425368225749348),
 ('만족스러/Adjective', 0.19425368225749348),
 ('맛있네요/Adjective', 0.19081949184719),
 ('맛있고/Adjective', 0.16450683695840304),
 ('맛있게/Adjective', 0.16330050009866345),
 ('좋음/Adjective', 0.1431229621617376),
 ('정갈하게/Adjective', 0.1431229621617376),
 ('비싸지만/Adjective', 0.1431229621617376)]

저작자표시 비영리 (새창열림)

'Data_Science > Data_Analysis_Py' 카테고리의 다른 글

38. 학생 점수 분석 \|\| Kmeans (0)	2021.11.25
37. iris \|\| Kmeans (0)	2021.11.25
35. 강남역 고기집 감성분석 \|\| 감성분석, TF-IDF (0)	2021.11.25
34. 강남역 고기집 후기분석 \|\| 맵크로울링 (0)	2021.11.25
33. white wine \|\| GBM (0)	2021.11.24

35. 강남역 고기집 감성분석 || 감성분석, TF-IDF

2021. 11. 25. 13:56

728x90

감성분석

import pandas as pd
df = pd.read_csv('review_data.csv')
df

	score	review	y
0	1	예약할 때는 룸을 주기로 하고 홀을 주고, 덥고, 직원들이 정신이 없어 그 가격에 ...	0
1	5	점심식사 잘했던곳.후식커피한잔 하기도 좋고 주차가능합니다. 음식 맛있고 직원분 친절...	1
2	5	新鮮でおいしいです。	1
3	4	녹는다 녹아	1
4	4	NaN	1
...	...	...	...
75	2	이렇게 대기가 긴 맛집인줄 모르고 갔다가 엄청 기다림 예써라는 어플로 대기 하던데 ...	0
76	1	단짠의 정석. 진짜 정석으로 달고 짬. 질리는 맛. 사장님이랑 와이프로 추정되는 ...	0
77	4	만족스러움! 맛있어용	1
78	1	곱창은 없고 대창만 들어있어서 느끼한데 양념은 너무 매워서 위에 탈이나 고생했습니다ㅠㅠ	0
79	5	대창덮밥도 맛있고 곱도리탕도 맛나요 완전 소주각입니다. 자리가 쫍아서 테이블마다 ...	1
80 rows × 3 columns

import re
def text_cleaning(text) :
    hangul = re.compile('[^ ㄱ-ㅣ가-힣]+')
    result = hangul.sub('', text)
    return result
text_cleaning("abc가나다123 라마사아 123")

'가나다 라마사아 '

df['ko_text'] = df['review'].apply(lambda x : text_cleaning(str(x))) # null 값
df['ko_text']

0     예약할 때는 룸을 주기로 하고 홀을 주고 덥고 직원들이 정신이 없어 그 가격에 내가...
1     점심식사 잘했던곳후식커피한잔 하기도 좋고 주차가능합니다 음식 맛있고 직원분 친절하여...
2                                                      
3                                                녹는다 녹아
4                                                      
                            ...                        
75    이렇게 대기가 긴 맛집인줄 모르고 갔다가 엄청 기다림 예써라는 어플로 대기 하던데 ...
76    단짠의 정석 진짜 정석으로 달고 짬 질리는 맛  사장님이랑 와이프로 추정되는 서빙해...
77                                           만족스러움 맛있어용
78    곱창은 없고 대창만 들어있어서 느끼한데 양념은 너무 매워서 위에 탈이나 고생했습니다ㅠㅠ 
79    대창덮밥도 맛있고 곱도리탕도 맛나요 완전 소주각입니다  자리가 쫍아서 테이블마다 가...
Name: ko_text, Length: 80, dtype: object

df['review'].head()

0    예약할 때는 룸을 주기로 하고 홀을 주고, 덥고, 직원들이 정신이 없어 그 가격에 ...
1    점심식사 잘했던곳.후식커피한잔 하기도 좋고 주차가능합니다. 음식 맛있고 직원분 친절...
2                                           新鮮でおいしいです。
3                                               녹는다 녹아
4                                                  NaN
Name: review, dtype: object

df1 = df.loc[df['ko_text'].apply(lambda x : len(x)) > 0]
df1.isnull().value_counts()

score  review  y      ko_text
False  False   False  False      65
dtype: int64

del df['review']
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 80 entries, 0 to 79
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   score    80 non-null     int64 
 1   y        80 non-null     int64 
 2   ko_text  80 non-null     object
dtypes: int64(2), object(1)
memory usage: 2.0+ KB

from konlpy.tag import Okt

# 텍스트 데이터 형태소 추출
def get_pos(x) :
    tagger = Okt()
    pos = tagger.pos(x)
    # word : konlpy 모듈 형태소 분석단어
    # tag : 형태소 분석된 품사
    pos = ['{0}/{1}'.format(word, tag) for word, tag in pos]
    return pos

result = get_pos(df['ko_text'].values[0])
print(result)

['예약/Noun', '할/Verb', '때/Noun', '는/Josa', '룸/Noun', '을/Josa', '주기/Noun', '로/Josa', '하고/Verb', '홀/Noun', '을/Josa', '주고/Verb', '덥고/Adjective', '직원/Noun', '들/Suffix', '이/Josa', '정신/Noun', '이/Josa', '없어/Adjective', '그/Noun', '가격/Noun', '에/Josa', '내/Noun', '가/Josa', '직접/Noun', '구워/Verb', '먹고/Verb', '갈비살/Noun', '등심/Noun', '은/Josa', '질/Noun', '기고/Noun', '냉면/Noun', '은/Josa', '맛/Noun', '이/Josa', '없고/Adjective', '장어/Noun', '양념/Noun', '들/Suffix', '도/Josa', '제/Noun', '때/Noun', '안/Noun', '가져다/Verb', '주고/Verb', '회식/Noun', '으로/Josa', '한/Determiner', '시간/Noun', '만에/Josa', '만원/Noun', '을/Josa', '썼는데/Verb', '이런/Adjective', '경험/Noun', '처음/Noun', '입니다/Adjective']

from sklearn.feature_extraction.text import CountVectorizer
                     #글뭉치(corpus) 인덱스로 생성
index_vectorizer = CountVectorizer(tokenizer = lambda x : get_pos(x))
# 
 # 형태소분석하고 단어품사로 분리
x = index_vectorizer.fit_transform(df['ko_text'].tolist())
x.shape

# (80, 779)

for a in x[:10] :
    print(a)
    
(0, 504)	1
  (0, 743)	1
  (0, 224)	2
  (0, 162)	1
  (0, 236)	1
  (0, 538)	3
  (0, 631)	1
  (0, 235)	1
  (0, 721)	1
  (0, 769)	1
  (0, 629)	2
  (0, 189)	1
  (0, 650)	1
  (0, 210)	2
  (0, 546)	3
  (0, 609)	1
  (0, 485)	1
  (0, 97)	1
  (0, 18)	1
  (0, 491)	1
  (0, 141)	1
  (0, 13)	1
  (0, 651)	1
  (0, 87)	1
  (0, 281)	1
  (0, 34)	1
  (0, 222)	1
  (0, 537)	2
  (0, 653)	1
  (0, 107)	1
  (0, 145)	1
  (0, 258)	1
  (0, 481)	1
  (0, 588)	1
  (0, 468)	1
  (0, 192)	1
  (0, 610)	1
  (0, 453)	1
  (0, 29)	1
  (0, 772)	1
  (0, 536)	1
  (0, 738)	1
  (0, 417)	1
  (0, 250)	1
  (0, 251)	1
  (0, 439)	1
  (0, 551)	1
  (0, 61)	1
  (0, 672)	1
  (0, 573)	1
  (0, 650)	1
  (0, 13)	1
  (0, 604)	1
  (0, 585)	1
  (0, 761)	1
  (0, 79)	1
  (0, 776)	1
  (0, 691)	1
  (0, 723)	1
  (0, 618)	1
  (0, 635)	1
  (0, 22)	1
  (0, 540)	1
  (0, 261)	1
  (0, 363)	1
  (0, 689)	1
  (0, 600)	1
  (0, 321)	1
  (0, 648)	1

  (0, 154)	1
  (0, 155)	1


  (0, 162)	1
  (0, 546)	1
  (0, 491)	1
  (0, 192)	1
  (0, 251)	1
  (0, 672)	1
  (0, 529)	1
  (0, 238)	1
  (0, 451)	1
  (0, 454)	2
  (0, 443)	1
  (0, 444)	1
  (0, 506)	1
  (0, 247)	1
  (0, 516)	1
  (0, 397)	1
  (0, 49)	1
  (0, 129)	1
  (0, 641)	1
  (0, 333)	1
  (0, 66)	1
  (0, 511)	1
  (0, 116)	1
  (0, 54)	2
  (0, 318)	1
  (0, 643)	1
  (0, 509)	1
  (0, 460)	1
  (0, 547)	1
  (0, 58)	1
  (0, 409)	1
  (0, 569)	1
  (0, 71)	1
  (0, 446)	1
  (0, 301)	1
  (0, 265)	1
  (0, 649)	1
  (0, 191)	1
  (0, 168)	1
  (0, 510)	1
  (0, 48)	1
  (0, 660)	1
  (0, 389)	1
  (0, 657)	1
  (0, 186)	1
  (0, 132)	1
  (0, 538)	2
  (0, 454)	1
  (0, 66)	1
  (0, 450)	1
  (0, 300)	1
  (0, 246)	1
  (0, 527)	1
  (0, 477)	1
  (0, 237)	1
  (0, 285)	2
  (0, 62)	1
  (0, 88)	1
  (0, 337)	1
  (0, 159)	1
  (0, 314)	1
  (0, 352)	1
  (0, 652)	1
  (0, 373)	1
  (0, 437)	1
  (0, 142)	1
  (0, 663)	1
  (0, 637)	1
  (0, 382)	1
  (0, 504)	1
  (0, 546)	2
  (0, 491)	2
  (0, 536)	1
  (0, 49)	1
  (0, 531)	1
  (0, 178)	1
  (0, 599)	1
  (0, 326)	1
  (0, 628)	1
  (0, 297)	1
  (0, 577)	1
  (0, 68)	1
  (0, 457)	1
  (0, 483)	1
  (0, 746)	1
  (0, 669)	1
  (0, 597)	1
  (0, 690)	1
  (0, 494)	1
  (0, 463)	1
  (0, 632)	1
  (0, 239)	1
  (0, 165)	1
  (0, 695)	1
  (0, 213)	1
  (0, 367)	1
  (0, 296)	1
  (0, 298)	1
  (0, 475)	1
  (0, 727)	1
  (0, 713)	1
  (0, 399)	1
  (0, 702)	1
  (0, 412)	1
  (0, 182)	1
  (0, 567)	1
  (0, 255)	1
  (0, 358)	1
  (0, 346)	1
  (0, 18)	1
  (0, 13)	1
  (0, 573)	1
  (0, 397)	1
  (0, 129)	1
  (0, 182)	1
  (0, 428)	1
  (0, 774)	1
  (0, 542)	1
  (0, 147)	1
  (0, 339)	1

print(str(index_vectorizer.vocabulary_)[:60]+"..")

{'예약/Noun': 504, '할/Verb': 743, '때/Noun': 224, '는/Josa': 162..

# TF-IDF 변환

# TF : 1개 텍스트에 맛집 3번 있으면 3
# IDF : INVERSE역산 DF
# 모든 데이터에서 맛집단어가 10번이 존재, 0.1값
# TF - IDF 전체문서에서 나타나지 않지만 현재문서에서 많이 나타나면
# 그 단어가 현재문서에서 중요한 단어로 판단

from sklearn.feature_extraction.text import TfidfTransformer
tfidf_vectorizer =  TfidfTransformer()
x = tfidf_vectorizer.fit_transform(x)
print(x.shape)
print(x[0])

(80, 779)
  (0, 772)	0.13918867813287145
  (0, 769)	0.13918867813287145
  (0, 743)	0.13918867813287145
  (0, 738)	0.12718431152605908
  (0, 721)	0.12718431152605908
  (0, 672)	0.10666271126619092
  (0, 653)	0.12718431152605908
  (0, 651)	0.12718431152605908
  (0, 650)	0.09465834465937853
  (0, 631)	0.13918867813287145
  (0, 629)	0.2783773562657429
  (0, 610)	0.12718431152605908
  (0, 609)	0.13918867813287145
  (0, 588)	0.13918867813287145
  (0, 573)	0.11206059819730396
  (0, 551)	0.11866707787300333
  (0, 546)	0.22748699966260583
  (0, 538)	0.31998813379857277
  (0, 537)	0.17228222201264556
  (0, 536)	0.09814547761313519
  (0, 504)	0.12718431152605908
  (0, 491)	0.07253600802895468
  (0, 485)	0.12718431152605908
  (0, 481)	0.11866707787300333
  (0, 468)	0.11866707787300333
  (0, 453)	0.11206059819730396
  (0, 439)	0.13918867813287145
  (0, 417)	0.11866707787300333
  (0, 281)	0.11206059819730396
  (0, 258)	0.07762387735326703
  (0, 251)	0.11866707787300333
  (0, 250)	0.13918867813287145
  (0, 236)	0.13918867813287145
  (0, 235)	0.10209886289470016
  (0, 224)	0.19629095522627038
  (0, 222)	0.12718431152605908
  (0, 210)	0.21332542253238185
  (0, 192)	0.07101739767756768
  (0, 189)	0.13918867813287145
  (0, 162)	0.08377133371130897
  (0, 145)	0.13918867813287145
  (0, 141)	0.12718431152605908
  (0, 107)	0.12718431152605908
  (0, 97)	0.11206059819730396
  (0, 87)	0.12718431152605908
  (0, 61)	0.13918867813287145
  (0, 34)	0.13918867813287145
  (0, 29)	0.13918867813287145
  (0, 18)	0.11866707787300333
  (0, 13)	0.08377133371130897

# 긍부정 리뷰분류
# 데이터셋 분리
from sklearn.model_selection import train_test_split
y = df['y']
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.3)

x_train.shape
# (56, 779)

from sklearn.linear_model import LogisticRegression
lr = LogisticRegression(random_state = 0)
lr.fit(x_train, y_train)
y_pred = lr.predict(x_test)

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
print("accuracy :%.2f" %accuracy_score(y_test, y_pred)) # (TP+TN) / TP+TN+FP+FN
print("precision_score :%.2f" %precision_score(y_test, y_pred))
print("recall_score :%.2f" %recall_score(y_test, y_pred))
print("f1_score :%.2f" %f1_score(y_test, y_pred))

accuracy :0.58
precision_score :0.57
recall_score :1.00
f1_score :0.72

# (TP+TN) / TP+TN+FP+FN
# print("accuracy :%.2f" %accuracy_score(y_test, y_pred))
=> 그냥 다 TRUE로 하면 90%인데?

# TP / TP+FP
# print("precision_score :%.2f" %precision_score(y_test, y_pred))
=> 얼마나 적절하게 맞췄는가?, TRUE 예측 중에 실제 TRUE

# (TP+TN) / TP+FN
# print("recall_score :%.2f" %recall_score(y_test, y_pred))

# print("f1_score :%.2f" %f1_score(y_test, y_pred))

from sklearn.metrics import confusion_matrix
confmat = confusion_matrix(y_test, y_pred)
print(confmat)

[[ 1 10]
 [ 0 13]]

실0[][]
실1[][]
  예0예1
TP : 실제 T, 예측 P // 진짜 posi 54
TN : 실제 F, 예측 N // 진짜 nega8
FP : 실제 F, 예측 P 가짜 posi 31
FN : 실제 T, 예측 n 가짜 false 1
F[TN][FP]
T[FN][TP]
  N P
정확도 62(54+8) / 94(54+8+31+1)  = 0.659
정밀도 54 / 85(54+31) = 0.635
재현율 54 / 55(54 + 1) = 0.9818
F1score 2*( 0.635* 0.9818) / ( 0.635+ 0.9818)

# 특이도 specificity 모델이 false로 예측한 정답중 실제 false
tn / tn+fp

roc
roc
y : tpr true positive rate 진짜 양성비율
x : Fpr false positive rate 가짜 양성비율 1-특이율
auc (Area under the curve) : 곡선 아래 면적

# roc
from sklearn.metrics import roc_curve, roc_auc_score
import matplotlib.pyplot as plt
# y_pred 예측값
# y_pred_probability 예측값의 확률값
y_pred_probability = lr.predict_proba(x_test)[:,1]
false_positive_rate, true_positive_rate, thresholds = \
            roc_curve(y_test, y_pred_probability)
roc_auc = roc_auc_score(y_test, y_pred_probability)
print('AUC : %.3f' % roc_auc)

plt.rcParams['figure.figsize'] = [5, 4]
plt.plot(false_positive_rate, true_positive_rate, \
         label='ROC  Curve(area = %0.3f)' % roc_auc, 
         color = 'red', linewidth=4.0)
plt.plot([0,1], [0,1], 'k--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.0])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC curve of Logistic regression')
plt.legend(loc='lower right')

저작자표시 비영리 (새창열림)

'Data_Science > Data_Analysis_Py' 카테고리의 다른 글

37. iris \|\| Kmeans (0)	2021.11.25
36. 강남역 고기집 후기분석 \|\| 감성분석 (0)	2021.11.25
34. 강남역 고기집 후기분석 \|\| 맵크로울링 (0)	2021.11.25
33. white wine \|\| GBM (0)	2021.11.24
32. titanic \|\| GBM (0)	2021.11.24

34. 강남역 고기집 후기분석 || 맵크로울링

2021. 11. 25. 13:38

728x90

맵 크로울링

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from selenium import webdriver
from bs4 import BeautifulSoup
import re
import time

path = "C:/R/chromedriver"
source_url = "https://map.kakao.com/"
driver = webdriver.Chrome(path)
driver.get(source_url) 
# 검색창
searchbox = driver.find_element_by_xpath("//*[@id='search.keyword.query']") 
# // input 가장 처음 input 찾기 , @ 속성표시
searchbox.send_keys("강남역 고기집")
searchbutton = driver.find_element_by_xpath("//*[@id='search.keyword.submit']")

driver.execute_script("arguments[0].click();", searchbutton)
time.sleep(1)

html = driver.page_source

soup = BeautifulSoup(html, "html.parser")
# 페이지 url 
moreviews = soup.find_all(name = "a", attrs = {"class":"moreview"})
page_urls = []
for moreview in moreviews :
    page_url = moreview.get("href")
    print(page_url)
    page_urls.append(page_url)
driver.close()


https://place.map.kakao.com/85570955
https://place.map.kakao.com/1503746075
https://place.map.kakao.com/95713992
https://place.map.kakao.com/741391811
https://place.map.kakao.com/2011092566
https://place.map.kakao.com/13573220
https://place.map.kakao.com/2062959414
https://place.map.kakao.com/1648266796
https://place.map.kakao.com/168079537
https://place.map.kakao.com/263830255
https://place.map.kakao.com/27238067
https://place.map.kakao.com/26431943
https://place.map.kakao.com/1780387311
https://place.map.kakao.com/1907052666
https://place.map.kakao.com/1052874675
https://place.map.kakao.com/1576421052

# for p in page_urls :
#     print(p)
columns = ['score','review']
df = pd.DataFrame(columns = columns)
driver = webdriver.Chrome(path)
for page in page_urls :
    driver.get(page)
    time.sleep(1.5)
    html = driver.page_source
    soup = BeautifulSoup(html, "html.parser")
    # 리뷰
    contents_div = soup.find(name = "div", attrs={"class":"evaluation_review"})
    # 평점
    rates = contents_div.find_all(name="em", attrs={"class":"num_rate"})
    # 리뷰
    reviews = contents_div.find_all(name = "p", attrs={"class":"txt_comment"})
    print(rates.text)
    for rate, review in zip(rates, reviews) :
        row = [rate.text[0], review.find(name="span").text]
        series = pd.Series(row, index=df.columns)
        df = df.append(series, ignore_index=True)
        
    for button_num in range(2, 6) :
        try :
            another_reviews = driver.find_element_by_xpath\
                ("//a[@data-page='"+str(button_num)+"']")
            another_reviews.click()
            time.sleep(1.5)
            html = driver.page_source
            soup = BeautifulSoup(html, 'html.parser')
            
            contents_div = soup.find\
                (name="div", attrs={"class":"evaluation_reivew"})
            rates = contents_div.find_all\
                (name = "em", attrs = {"class":"num_rate"})
            raviews = contents_div.find_all\
                (name = "p", attrs = {"class":"txt_comment"})
            
            for rate, review in zip(rates, reviews) :
                row = [rate.text[0], review.find(name="span").text]
                series = pd.Series(row, index=df.columns)
                df = df.append(series, ignore_index=True)
        except :
            break
driver.close()



[<em class="num_rate">1<span class="screen_out">점</span></em>, <em class="num_rate">5<span class="screen_out">점</span></em>, <em class="num_rate">5<span class="screen_out">점</span></em>, <em class="num_rate">4<span class="screen_out">점</span></em>, <em class="num_rate">4<span class="screen_out">점</span></em>]
[<p class="txt_comment "><span>예약할 때는 룸을 주기로 하고 홀을 주고, 덥고, 직원들이 정신이 없어 그 가격에 내가 직접 구워먹고 갈비살, 등심은 질기고 냉면은 맛이 없고 장어 양념들도 제 때 안 가져다 주고 회식으로 한시간만에 120만원을 썼는데 이런 경험 처음입니다.</span><button class="btn_fold" type="button">더보기</button></p>, <p class="txt_comment "><span>점심식사 잘했던곳.후식커피한잔 하기도 좋고 주차가능합니다. 음식 맛있고 직원분 친절하여 절로 미소가 지어졌어요. </span><button class="btn_fold" type="button">더보기</button></p>, <p class="txt_comment "><span>新鮮でおいしいです。</span><button class="btn_fold" type="button">더보기</button></p>, <p class="txt_comment "><span>녹는다 녹아</span><button class="btn_fold" type="button">더보기</button></p>, <p class="txt_comment "><span></span><button class="btn_fold" type="button">더보기</button></p>]

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 80 entries, 0 to 79
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   score   80 non-null     object
 1   review  80 non-null     object
dtypes: object(2)
memory usage: 1.4+ KB

df.head()

	score	review
0	1	예약할 때는 룸을 주기로 하고 홀을 주고, 덥고, 직원들이 정신이 없어 그 가격에 ...
1	5	점심식사 잘했던곳.후식커피한잔 하기도 좋고 주차가능합니다. 음식 맛있고 직원분 친절...
2	5	新鮮でおいしいです。
3	4	녹는다 녹아
4	4

# 긍부정 평가
df['y'] = df['score'].apply(lambda x : 1 if float(x) > 3 else 0)
df

	score	review	y
0	1	예약할 때는 룸을 주기로 하고 홀을 주고, 덥고, 직원들이 정신이 없어 그 가격에 ...	0
1	5	점심식사 잘했던곳.후식커피한잔 하기도 좋고 주차가능합니다. 음식 맛있고 직원분 친절...	1
2	5	新鮮でおいしいです。	1
3	4	녹는다 녹아	1
4	4		1
...	...	...	...
75	2	이렇게 대기가 긴 맛집인줄 모르고 갔다가 엄청 기다림 예써라는 어플로 대기 하던데 ...	0
76	1	단짠의 정석. 진짜 정석으로 달고 짬. 질리는 맛. 사장님이랑 와이프로 추정되는 ...	0
77	4	만족스러움! 맛있어용	1
78	1	곱창은 없고 대창만 들어있어서 느끼한데 양념은 너무 매워서 위에 탈이나 고생했습니다ㅠㅠ	0
79	5	대창덮밥도 맛있고 곱도리탕도 맛나요 완전 소주각입니다. 자리가 쫍아서 테이블마다 ...	1
80 rows × 3 columns

df.y.value_counts()

1    44
0    36
Name: y, dtype: int64

df.to_csv('review_data.csv', index=False)

review_data.csv

0.01MB

감성분석

import pandas as pd
df = pd.read_csv('review_data.csv')
df

	score	review	y
0	1	예약할 때는 룸을 주기로 하고 홀을 주고, 덥고, 직원들이 정신이 없어 그 가격에 ...	0
1	5	점심식사 잘했던곳.후식커피한잔 하기도 좋고 주차가능합니다. 음식 맛있고 직원분 친절...	1
2	5	新鮮でおいしいです。	1
3	4	녹는다 녹아	1
4	4	NaN	1
...	...	...	...
75	2	이렇게 대기가 긴 맛집인줄 모르고 갔다가 엄청 기다림 예써라는 어플로 대기 하던데 ...	0
76	1	단짠의 정석. 진짜 정석으로 달고 짬. 질리는 맛. 사장님이랑 와이프로 추정되는 ...	0
77	4	만족스러움! 맛있어용	1
78	1	곱창은 없고 대창만 들어있어서 느끼한데 양념은 너무 매워서 위에 탈이나 고생했습니다ㅠㅠ	0
79	5	대창덮밥도 맛있고 곱도리탕도 맛나요 완전 소주각입니다. 자리가 쫍아서 테이블마다 ...	1
80 rows × 3 columns

import re
def text_cleaning(text) :
    hangul = re.compile('[^ ㄱ-ㅣ가-힣]+')
    result = hangul.sub('', text)
    return result
text_cleaning("abc가나다123 라마사아 123")

'가나다 라마사아 '

df['ko_text'] = df['review'].apply(lambda x : text_cleaning(str(x))) # null 값
df['ko_text']

0     예약할 때는 룸을 주기로 하고 홀을 주고 덥고 직원들이 정신이 없어 그 가격에 내가...
1     점심식사 잘했던곳후식커피한잔 하기도 좋고 주차가능합니다 음식 맛있고 직원분 친절하여...
2                                                      
3                                                녹는다 녹아
4                                                      
                            ...                        
75    이렇게 대기가 긴 맛집인줄 모르고 갔다가 엄청 기다림 예써라는 어플로 대기 하던데 ...
76    단짠의 정석 진짜 정석으로 달고 짬 질리는 맛  사장님이랑 와이프로 추정되는 서빙해...
77                                           만족스러움 맛있어용
78    곱창은 없고 대창만 들어있어서 느끼한데 양념은 너무 매워서 위에 탈이나 고생했습니다ㅠㅠ 
79    대창덮밥도 맛있고 곱도리탕도 맛나요 완전 소주각입니다  자리가 쫍아서 테이블마다 가...
Name: ko_text, Length: 80, dtype: object

df['review'].head()

0    예약할 때는 룸을 주기로 하고 홀을 주고, 덥고, 직원들이 정신이 없어 그 가격에 ...
1    점심식사 잘했던곳.후식커피한잔 하기도 좋고 주차가능합니다. 음식 맛있고 직원분 친절...
2                                           新鮮でおいしいです。
3                                               녹는다 녹아
4                                                  NaN
Name: review, dtype: object

df1 = df.loc[df['ko_text'].apply(lambda x : len(x)) > 0]
df1.isnull().value_counts()

score  review  y      ko_text
False  False   False  False      65
dtype: int64

del df['review']
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 80 entries, 0 to 79
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   score    80 non-null     int64 
 1   y        80 non-null     int64 
 2   ko_text  80 non-null     object
dtypes: int64(2), object(1)
memory usage: 2.0+ KB

from konlpy.tag import Okt

# 텍스트 데이터 형태소 추출
def get_pos(x) :
    tagger = Okt()
    pos = tagger.pos(x)
    # word : konlpy 모듈 형태소 분석단어
    # tag : 형태소 분석된 품사
    pos = ['{0}/{1}'.format(word, tag) for word, tag in pos]
    return pos

result = get_pos(df['ko_text'].values[0])
print(result)

['예약/Noun', '할/Verb', '때/Noun', '는/Josa', '룸/Noun', '을/Josa', '주기/Noun', '로/Josa', '하고/Verb', '홀/Noun', '을/Josa', '주고/Verb', '덥고/Adjective', '직원/Noun', '들/Suffix', '이/Josa', '정신/Noun', '이/Josa', '없어/Adjective', '그/Noun', '가격/Noun', '에/Josa', '내/Noun', '가/Josa', '직접/Noun', '구워/Verb', '먹고/Verb', '갈비살/Noun', '등심/Noun', '은/Josa', '질/Noun', '기고/Noun', '냉면/Noun', '은/Josa', '맛/Noun', '이/Josa', '없고/Adjective', '장어/Noun', '양념/Noun', '들/Suffix', '도/Josa', '제/Noun', '때/Noun', '안/Noun', '가져다/Verb', '주고/Verb', '회식/Noun', '으로/Josa', '한/Determiner', '시간/Noun', '만에/Josa', '만원/Noun', '을/Josa', '썼는데/Verb', '이런/Adjective', '경험/Noun', '처음/Noun', '입니다/Adjective']

from sklearn.feature_extraction.text import CountVectorizer
                     #글뭉치(corpus) 인덱스로 생성
index_vectorizer = CountVectorizer(tokenizer = lambda x : get_pos(x))
# 
 # 형태소분석하고 단어품사로 분리
x = index_vectorizer.fit_transform(df['ko_text'].tolist())
x.shape

# (80, 779)

for a in x[:10] :
    print(a)
    
(0, 504)	1
  (0, 743)	1
  (0, 224)	2
  (0, 162)	1
  (0, 236)	1
  (0, 538)	3
  (0, 631)	1
  (0, 235)	1
  (0, 721)	1
  (0, 769)	1
  (0, 629)	2
  (0, 189)	1
  (0, 650)	1
  (0, 210)	2
  (0, 546)	3
  (0, 609)	1
  (0, 485)	1
  (0, 97)	1
  (0, 18)	1
  (0, 491)	1
  (0, 141)	1
  (0, 13)	1
  (0, 651)	1
  (0, 87)	1
  (0, 281)	1
  (0, 34)	1
  (0, 222)	1
  (0, 537)	2
  (0, 653)	1
  (0, 107)	1
  (0, 145)	1
  (0, 258)	1
  (0, 481)	1
  (0, 588)	1
  (0, 468)	1
  (0, 192)	1
  (0, 610)	1
  (0, 453)	1
  (0, 29)	1
  (0, 772)	1
  (0, 536)	1
  (0, 738)	1
  (0, 417)	1
  (0, 250)	1
  (0, 251)	1
  (0, 439)	1
  (0, 551)	1
  (0, 61)	1
  (0, 672)	1
  (0, 573)	1
  (0, 650)	1
  (0, 13)	1
  (0, 604)	1
  (0, 585)	1
  (0, 761)	1
  (0, 79)	1
  (0, 776)	1
  (0, 691)	1
  (0, 723)	1
  (0, 618)	1
  (0, 635)	1
  (0, 22)	1
  (0, 540)	1
  (0, 261)	1
  (0, 363)	1
  (0, 689)	1
  (0, 600)	1
  (0, 321)	1
  (0, 648)	1

  (0, 154)	1
  (0, 155)	1


  (0, 162)	1
  (0, 546)	1
  (0, 491)	1
  (0, 192)	1
  (0, 251)	1
  (0, 672)	1
  (0, 529)	1
  (0, 238)	1
  (0, 451)	1
  (0, 454)	2
  (0, 443)	1
  (0, 444)	1
  (0, 506)	1
  (0, 247)	1
  (0, 516)	1
  (0, 397)	1
  (0, 49)	1
  (0, 129)	1
  (0, 641)	1
  (0, 333)	1
  (0, 66)	1
  (0, 511)	1
  (0, 116)	1
  (0, 54)	2
  (0, 318)	1
  (0, 643)	1
  (0, 509)	1
  (0, 460)	1
  (0, 547)	1
  (0, 58)	1
  (0, 409)	1
  (0, 569)	1
  (0, 71)	1
  (0, 446)	1
  (0, 301)	1
  (0, 265)	1
  (0, 649)	1
  (0, 191)	1
  (0, 168)	1
  (0, 510)	1
  (0, 48)	1
  (0, 660)	1
  (0, 389)	1
  (0, 657)	1
  (0, 186)	1
  (0, 132)	1
  (0, 538)	2
  (0, 454)	1
  (0, 66)	1
  (0, 450)	1
  (0, 300)	1
  (0, 246)	1
  (0, 527)	1
  (0, 477)	1
  (0, 237)	1
  (0, 285)	2
  (0, 62)	1
  (0, 88)	1
  (0, 337)	1
  (0, 159)	1
  (0, 314)	1
  (0, 352)	1
  (0, 652)	1
  (0, 373)	1
  (0, 437)	1
  (0, 142)	1
  (0, 663)	1
  (0, 637)	1
  (0, 382)	1
  (0, 504)	1
  (0, 546)	2
  (0, 491)	2
  (0, 536)	1
  (0, 49)	1
  (0, 531)	1
  (0, 178)	1
  (0, 599)	1
  (0, 326)	1
  (0, 628)	1
  (0, 297)	1
  (0, 577)	1
  (0, 68)	1
  (0, 457)	1
  (0, 483)	1
  (0, 746)	1
  (0, 669)	1
  (0, 597)	1
  (0, 690)	1
  (0, 494)	1
  (0, 463)	1
  (0, 632)	1
  (0, 239)	1
  (0, 165)	1
  (0, 695)	1
  (0, 213)	1
  (0, 367)	1
  (0, 296)	1
  (0, 298)	1
  (0, 475)	1
  (0, 727)	1
  (0, 713)	1
  (0, 399)	1
  (0, 702)	1
  (0, 412)	1
  (0, 182)	1
  (0, 567)	1
  (0, 255)	1
  (0, 358)	1
  (0, 346)	1
  (0, 18)	1
  (0, 13)	1
  (0, 573)	1
  (0, 397)	1
  (0, 129)	1
  (0, 182)	1
  (0, 428)	1
  (0, 774)	1
  (0, 542)	1
  (0, 147)	1
  (0, 339)	1

print(str(index_vectorizer.vocabulary_)[:60]+"..")

{'예약/Noun': 504, '할/Verb': 743, '때/Noun': 224, '는/Josa': 162..

# TF-IDF 변환

from sklearn.feature_extraction.text import TfidfTransformer
tfidf_vectorizer =  TfidfTransformer()
x = tfidf_vectorizer.fit_transform(x)
print(x.shape)
print(x[0])

(80, 779)
  (0, 772)	0.13918867813287145
  (0, 769)	0.13918867813287145
  (0, 743)	0.13918867813287145
  (0, 738)	0.12718431152605908
  (0, 721)	0.12718431152605908
  (0, 672)	0.10666271126619092
  (0, 653)	0.12718431152605908
  (0, 651)	0.12718431152605908
  (0, 650)	0.09465834465937853
  (0, 631)	0.13918867813287145
  (0, 629)	0.2783773562657429
  (0, 610)	0.12718431152605908
  (0, 609)	0.13918867813287145
  (0, 588)	0.13918867813287145
  (0, 573)	0.11206059819730396
  (0, 551)	0.11866707787300333
  (0, 546)	0.22748699966260583
  (0, 538)	0.31998813379857277
  (0, 537)	0.17228222201264556
  (0, 536)	0.09814547761313519
  (0, 504)	0.12718431152605908
  (0, 491)	0.07253600802895468
  (0, 485)	0.12718431152605908
  (0, 481)	0.11866707787300333
  (0, 468)	0.11866707787300333
  (0, 453)	0.11206059819730396
  (0, 439)	0.13918867813287145
  (0, 417)	0.11866707787300333
  (0, 281)	0.11206059819730396
  (0, 258)	0.07762387735326703
  (0, 251)	0.11866707787300333
  (0, 250)	0.13918867813287145
  (0, 236)	0.13918867813287145
  (0, 235)	0.10209886289470016
  (0, 224)	0.19629095522627038
  (0, 222)	0.12718431152605908
  (0, 210)	0.21332542253238185
  (0, 192)	0.07101739767756768
  (0, 189)	0.13918867813287145
  (0, 162)	0.08377133371130897
  (0, 145)	0.13918867813287145
  (0, 141)	0.12718431152605908
  (0, 107)	0.12718431152605908
  (0, 97)	0.11206059819730396
  (0, 87)	0.12718431152605908
  (0, 61)	0.13918867813287145
  (0, 34)	0.13918867813287145
  (0, 29)	0.13918867813287145
  (0, 18)	0.11866707787300333
  (0, 13)	0.08377133371130897

# 긍부정 리뷰분류
# 데이터셋 분리
from sklearn.model_selection import train_test_split
y = df['y']
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.3)

x_train.shape
# (56, 779)

from sklearn.linear_model import LogisticRegression
lr = LogisticRegression(random_state = 0)
lr.fit(x_train, y_train)
y_pred = lr.predict(x_test)

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
print("accuracy :%.2f" %accuracy_score(y_test, y_pred)) # (TP+TN) / TP+TN+FP+FN
print("precision_score :%.2f" %precision_score(y_test, y_pred))
print("recall_score :%.2f" %recall_score(y_test, y_pred))
print("f1_score :%.2f" %f1_score(y_test, y_pred))

accuracy :0.58
precision_score :0.57
recall_score :1.00
f1_score :0.72

from sklearn.metrics import confusion_matrix
confmat = confusion_matrix(y_test, y_pred)
print(confmat)

[[ 1 10]
 [ 0 13]]

roc
roc
y : tpr true positive rate 진짜 양성비율
x : Fpr false positive rate 가짜 양성비율 1-특이율
auc (Area under the curve) : 곡선 아래 면적

# roc
from sklearn.metrics import roc_curve, roc_auc_score
import matplotlib.pyplot as plt
# y_pred 예측값
# y_pred_probability 예측값의 확률값
y_pred_probability = lr.predict_proba(x_test)[:,1]
false_positive_rate, true_positive_rate, thresholds = \
            roc_curve(y_test, y_pred_probability)
roc_auc = roc_auc_score(y_test, y_pred_probability)
print('AUC : %.3f' % roc_auc)

plt.rcParams['figure.figsize'] = [5, 4]
plt.plot(false_positive_rate, true_positive_rate, \
         label='ROC  Curve(area = %0.3f)' % roc_auc, 
         color = 'red', linewidth=4.0)
plt.plot([0,1], [0,1], 'k--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.0])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC curve of Logistic regression')
plt.legend(loc='lower right')

저작자표시 비영리 (새창열림)

'Data_Science > Data_Analysis_Py' 카테고리의 다른 글

36. 강남역 고기집 후기분석 \|\| 감성분석 (0)	2021.11.25
35. 강남역 고기집 감성분석 \|\| 감성분석, TF-IDF (0)	2021.11.25
33. white wine \|\| GBM (0)	2021.11.24
32. titanic \|\| GBM (0)	2021.11.24
31. titanic \|\| logistic (0)	2021.11.24

33. white wine || GBM

2021. 11. 24. 17:09

728x90

# white wine 분석
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv"
savefile = "winequality-white.csv"
from urllib.request import urlretrieve
urlretrieve(url, savefile)

# ('winequality-white.csv', <http.client.HTTPMessage at 0x214ffabed90>)

winequality-white.csv

0.25MB

df = pd.read_csv('winequality-white.csv', sep=';', encoding = 'utf-8')
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4898 entries, 0 to 4897
Data columns (total 12 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   fixed acidity         4898 non-null   float64
 1   volatile acidity      4898 non-null   float64
 2   citric acid           4898 non-null   float64
 3   residual sugar        4898 non-null   float64
 4   chlorides             4898 non-null   float64
 5   free sulfur dioxide   4898 non-null   float64
 6   total sulfur dioxide  4898 non-null   float64
 7   density               4898 non-null   float64
 8   pH                    4898 non-null   float64
 9   sulphates             4898 non-null   float64
 10  alcohol               4898 non-null   float64
 11  quality               4898 non-null   int64  
dtypes: float64(11), int64(1)
memory usage: 459.3 KB

df.describe()

	fixed acidity	volatile acidity	citric acid	residual sugar	chlorides	free sulfur dioxide	total sulfur dioxide	density	pH	sulphates	alcohol	quality
count	4898.000000	4898.000000	4898.000000	4898.000000	4898.000000	4898.000000	4898.000000	4898.000000	4898.000000	4898.000000	4898.000000	4898.000000
mean	6.854788	0.278241	0.334192	6.391415	0.045772	35.308085	138.360657	0.994027	3.188267	0.489847	10.514267	5.877909
std	0.843868	0.100795	0.121020	5.072058	0.021848	17.007137	42.498065	0.002991	0.151001	0.114126	1.230621	0.885639
min	3.800000	0.080000	0.000000	0.600000	0.009000	2.000000	9.000000	0.987110	2.720000	0.220000	8.000000	3.000000
25%	6.300000	0.210000	0.270000	1.700000	0.036000	23.000000	108.000000	0.991723	3.090000	0.410000	9.500000	5.000000
50%	6.800000	0.260000	0.320000	5.200000	0.043000	34.000000	134.000000	0.993740	3.180000	0.470000	10.400000	6.000000
75%	7.300000	0.320000	0.390000	9.900000	0.050000	46.000000	167.000000	0.996100	3.280000	0.550000	11.400000	6.000000
max	14.200000	1.100000	1.660000	65.800000	0.346000	289.000000	440.000000	1.038980	3.820000	1.080000	14.200000	9.000000

sns.countplot(df['quality'])

plt.hist(df['quality'])

(array([  20.,  163.,    0., 1457.,    0., 2198.,  880.,    0.,  175.,
           5.]),
 array([3. , 3.6, 4.2, 4.8, 5.4, 6. , 6.6, 7.2, 7.8, 8.4, 9. ]),
 <BarContainer object of 10 artists>)

# df['quality'].value_counts() # 큰순서대로
df.groupby('quality')['quality'].count() # 그대로

quality
3      20
4     163
5    1457
6    2198
7     880
8     175
9       5
Name: quality, dtype: int64

plt.plot(df.groupby('quality')['quality'].count())

# gradientBoostingClassifier
x = df.drop('quality', axis = 1)
y = df['quality']

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state=10)

from sklearn.ensemble import GradientBoostingClassifier
clf = GradientBoostingClassifier()
clf.fit(x_train, y_train)
y_pred = clf.predict(x_test)
y_pred[:10]

# array([6, 5, 4, 5, 6, 6, 6, 6, 5, 6], dtype=int64)

# 평가
from sklearn.metrics import confusion_matrix
confmat = confusion_matrix(y_true = y_test, y_pred = y_pred)
print(confmat)

[[  0   1   0   0   1   0   0]
 [  0   2  18   9   0   0   0]
 [  0   6 159 109   5   1   0]
 [  1   6  73 337  33   0   0]
 [  0   0   6 104  68   2   0]
 [  0   0   0  19   8  10   1]
 [  0   0   0   1   0   0   0]]

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
print('정확도(accuracy) : %.2f'% accuracy_score(y_test, y_pred))
# print('정밀도(precision) : %.3f'% precision_score(y_test, y_pred))
# print('재현율(recall) : %.3f'% recall_score(y_test, y_pred))
# print('F1-score : %.3f'% f1_score(y_test, y_pred))
# # f= 2*(정밀도*재현율)/(정밀도+재현율)

정확도(accuracy) : 0.59

# y  = 3 ~9
# 3개 등급 으로
df.groupby('quality')['quality'].count()

quality
3      20
4     163
5    1457
6    2198
7     880
8     175
9       5
Name: quality, dtype: int64

y = df['quality']
newlist = []
for v in list(y) :
    if v <= 4 :
        newlist += [0]
    elif v <= 7 :
        newlist += [1]
    else :
        newlist += [2]
y = newlist
y[:10]

[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state=10)
from sklearn.ensemble import GradientBoostingClassifier
clf = GradientBoostingClassifier()
clf.fit(x_train, y_train)
y_pred = clf.predict(x_test)
y_pred[:10]

array([1, 1, 0, 1, 1, 1, 1, 1, 1, 1])

from sklearn.metrics import confusion_matrix
confmat = confusion_matrix(y_true = y_test, y_pred = y_pred)
print(confmat)
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
print('정확도(accuracy) : %.2f'% accuracy_score(y_test, y_pred))

[[  3  28   0]
 [  8 899   3]
 [  0  30   9]]
정확도(accuracy) : 0.93

저작자표시 비영리 (새창열림)

'Data_Science > Data_Analysis_Py' 카테고리의 다른 글

35. 강남역 고기집 감성분석 \|\| 감성분석, TF-IDF (0)	2021.11.25
34. 강남역 고기집 후기분석 \|\| 맵크로울링 (0)	2021.11.25
32. titanic \|\| GBM (0)	2021.11.24
31. titanic \|\| logistic (0)	2021.11.24
30. 보스턴 주택가격정보 \|\| 선형회귀 (0)	2021.11.24

32. titanic || GBM

2021. 11. 24. 17:05

728x90

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

df_train = pd.read_csv('titanic_train.csv')
df_test = pd.read_csv('titanic_test.csv')
df_train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 916 entries, 0 to 915
Data columns (total 13 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   pclass     916 non-null    int64  
 1   survived   916 non-null    int64  
 2   name       916 non-null    object 
 3   sex        916 non-null    object 
 4   age        741 non-null    float64
 5   sibsp      916 non-null    int64  
 6   parch      916 non-null    int64  
 7   ticket     916 non-null    object 
 8   fare       916 non-null    float64
 9   cabin      214 non-null    object 
 10  embarked   914 non-null    object 
 11  body       85 non-null     float64
 12  home.dest  527 non-null    object 
dtypes: float64(3), int64(4), object(6)
memory usage: 93.2+ KB

df_train = df_train.drop(['ticket','body','home.dest'], axis=1)
df_test = df_test.drop(['ticket','body','home.dest'], axis=1)

df_train['age'] = df_train['age'].fillna(age_mean)
df_test['age'] = df_test['age'].fillna(age_mean)

em_mode = df_train['embarked'].value_counts().index[0]
df_train['embarked'] = df_train['embarked'].fillna(em_mode)
df_test['embarked'] = df_test['embarked'].fillna(em_mode)

whole_df = df_train.append(df_test)
train_idx_num = len(df_train)
whole_df['cabin'].value_counts()

C23 C25 C27        6
G6                 5
B57 B59 B63 B66    5
D                  4
F2                 4
                  ..
A20                1
C128               1
D6                 1
C49                1
A10                1
Name: cabin, Length: 186, dtype: int64

whole_df['cabin'].isnull().value_counts()

True     1014
False     295
Name: cabin, dtype: int64

whole_df['cabin'] = whole_df['cabin'].fillna('X')
whole_df['cabin'].value_counts()

X                  1014
C23 C25 C27           6
G6                    5
B57 B59 B63 B66       5
F2                    4
                   ... 
A9                    1
E52                   1
C95                   1
C99                   1
A10                   1
Name: cabin, Length: 187, dtype: int64

whole_df['cabin'].unique()

array(['X', 'E36', 'C68', 'E24', 'C22 C26', 'D38', 'B50', 'A24', 'C111',
       'F', 'C6', 'C87', 'E8', 'B45', 'C93', 'D28', 'D36', 'C125', 'B35',
       'T', 'B73', 'B57 B59 B63 B66', 'A26', 'A18', 'B96 B98', 'G6',
       'C78', 'C101', 'D9', 'D33', 'C128', 'E50', 'B26', 'B69', 'E121',
       'C123', 'B94', 'A34', 'D', 'C39', 'D43', 'E31', 'B5', 'D17', 'F33',
       'E44', 'D7', 'A21', 'D34', 'A29', 'D35', 'A11', 'B51 B53 B55',
       'D46', 'E60', 'C30', 'D26', 'E68', 'A9', 'B71', 'D37', 'F2',
       'C55 C57', 'C89', 'C124', 'C23 C25 C27', 'C126', 'E49', 'F E46',
       'E46', 'D19', 'B58 B60', 'C82', 'B52 B54 B56', 'C92', 'E45',
       'F G73', 'C65', 'E25', 'B3', 'D40', 'C91', 'B102', 'B61', 'F G63',
       'A20', 'B36', 'C7', 'B77', 'D20', 'C148', 'C105', 'E38', 'B86',
       'C132', 'C86', 'A14', 'C54', 'A5', 'B49', 'B28', 'B24', 'C2', 'F4',
       'A6', 'C83', 'B42', 'A36', 'C52', 'D56', 'C116', 'B19', 'E77',
       'F E57', 'E101', 'B18', 'C95', 'D15', 'E33', 'B30', 'D21', 'E10',
       'C130', 'D6', 'C51', 'D30', 'E67', 'C110', 'C103', 'C90', 'C118',
       'C97', 'D47', 'E34', 'B4', 'D50', 'C62 C64', 'E17', 'B41', 'C49',
       'C85', 'B20', 'C28', 'E63', 'C99', 'D49', 'A10', 'A16', 'B37',
       'C80', 'B78', 'E12', 'C104', 'A31', 'D11', 'D48', 'D10 D12', 'B38',
       'D45', 'C50', 'C31', 'B82 B84', 'A32', 'C53', 'B10', 'C70', 'A23',
       'C106', 'C46', 'E58', 'B11', 'F E69', 'B80', 'E39 E41', 'D22',
       'E40', 'A19', 'C32', 'B79', 'C45', 'B22', 'B39', 'C47', 'B101',
       'A7', 'E52', 'F38'], dtype=object)

# whole_df['cabin'] = whole_df['cabin'].values
whole_df['cabin'] = [ ca[0] for ca in  whole_df['cabin'].values ]
# whole_df['cabin'] = whole_df['cabin'].apply(lambda x : x[0])

whole_df['cabin'].value_counts()
X    1014
C      94
B      65
D      46
E      41
A      22
F      21
G       5
T       1
Name: cabin, dtype: int64

whole_df['cabin'] = whole_df['cabin'].replace('G', 'X')
whole_df['cabin'] = whole_df['cabin'].replace('T', 'X')

whole_df['cabin'].value_counts()

X    1020
C      94
B      65
D      46
E      41
A      22
F      21
Name: cabin, dtype: int64

sns.countplot(x='cabin', hue='survived', data = whole_df)

whole_df['name']

0                 Mellinger, Miss. Madeleine Violet
1                                 Wells, Miss. Joan
2                    Duran y More, Miss. Florentina
3                                Scanlan, Mr. James
4                      Bradley, Miss. Bridget Delia
                           ...                     
388               Karlsson, Mr. Julius Konrad Eugen
389    Ware, Mrs. John James (Florence Louise Long)
390                            O'Keefe, Mr. Patrick
391                                Tobin, Mr. Roger
392                            Daniels, Miss. Sarah
Name: name, Length: 1309, dtype: object

# import re
# re.compile(',')
n_grade = whole_df['name'].apply(lambda  x : x.split(", ")[1].split(".")[0])
n_grade = n_grade.unique().tolist()
n_grade
# nana = [ na[na.find(',')+2 : na.find('.')] for na in whole_df['name'].values]
# nana

['Miss',
 'Mr',
 'Master',
 'Mrs',
 'Dr',
 'Mlle',
 'Col',
 'Rev',
 'Ms',
 'Mme',
 'Sir',
 'the Countess',
 'Dona',
 'Jonkheer',
 'Lady',
 'Major',
 'Don',
 'Capt']

# 호칭에 따른 사회적 지위 정의
grade_dict = {
    'A' : ['Rev', 'Col', 'Major', 'Dr', 'Capt', 'Sir'], # 명예직
'B' : ['Ms', 'Mme','Mrs','Dona'], # 여성
'C' : ['Jonkheer','the Countess'], # 귀족
'D' : ['Mr','Don'], # 남성
'E' : ['Master'], # 젊은 남성
'F' : ['Miss','Mlle','Lady'] # 젊은 여성
}

print(grade_dict.values())
print(grade_dict['A'])

dict_values([['Rev', 'Col', 'Major', 'Dr', 'Capt', 'Sir'], ['Ms', 'Mme', 'Mrs', 'Dona'], ['Jonkheer', 'the Countess'], ['Mr', 'Don'], ['Master'], ['Miss', 'Mlle', 'Lady']])
['Rev', 'Col', 'Major', 'Dr', 'Capt', 'Sir']

def give_grade(x) : # name 지위를 쭉 나열한 컬럼을 넣었을때  
    g = x.split(", ")[1].split(".")[0]
    for k, v in grade_dict.items() :
        for title in v :
            if g == title :
                return k
    return 'G'
whole_df['name'] = whole_df['name'].apply(lambda x : give_grade(x))

whole_df['name'].value_counts()

D    758
F    263
B    201
E     61
A     24
C      2
Name: name, dtype: int64

sns.countplot(x=whole_df['name'], hue=whole_df['survived'])

# 인코딩
whole_df_encoded = pd.get_dummies(whole_df)
whole_df_encoded.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1309 entries, 0 to 392
Data columns (total 24 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   pclass      1309 non-null   int64  
 1   survived    1309 non-null   int64  
 2   age         1309 non-null   float64
 3   sibsp       1309 non-null   int64  
 4   parch       1309 non-null   int64  
 5   fare        1309 non-null   float64
 6   name_A      1309 non-null   uint8  
 7   name_B      1309 non-null   uint8  
 8   name_C      1309 non-null   uint8  
 9   name_D      1309 non-null   uint8  
 10  name_E      1309 non-null   uint8  
 11  name_F      1309 non-null   uint8  
 12  sex_female  1309 non-null   uint8  
 13  sex_male    1309 non-null   uint8  
 14  cabin_A     1309 non-null   uint8  
 15  cabin_B     1309 non-null   uint8  
 16  cabin_C     1309 non-null   uint8  
 17  cabin_D     1309 non-null   uint8  
 18  cabin_E     1309 non-null   uint8  
 19  cabin_F     1309 non-null   uint8  
 20  cabin_X     1309 non-null   uint8  
 21  embarked_C  1309 non-null   uint8  
 22  embarked_Q  1309 non-null   uint8  
 23  embarked_S  1309 non-null   uint8  
dtypes: float64(2), int64(4), uint8(18)
memory usage: 134.6 KB

# 학습데이터의 독립변수 x_train
x_train = whole_df_encoded[:train_num+1]
x_train = x_train.loc[:,x_train.columns != 'survived'].values
y_train = whole_df_encoded[:train_num+1]['survived']

x_test = whole_df_encoded[train_num+1:]
x_test = x_test.loc[:,x_test.columns != 'survived'].values
y_test = whole_df_encoded[train_num+1:]['survived']

x_train.shape

# (917, 23)

y_train = df_train['survived'].values
x_train = df_train.loc[:,df_train.columns != 'survived'].values
y_test = df_test['survived'].values
x_test = df_train.loc[:,df_train.columns != 'survived'].values

ttt = df_train.copy()
ttt['name'] = ttt['name'].apply(lambda x : x.split(', ')[1].split('.')[0])
for i in ttt['name'] :
    if i in grade_dict['A'] :
        ttt.name.replace(i, 'A', inplace = True)
        print()
    elif i in grade_dict['B'] :
        ttt.name.replace(i, 'B', inplace = True)
    elif i in grade_dict['C'] :
        ttt.name.replace(i, 'C', inplace = True)
    elif i in grade_dict['D'] :
        ttt.name.replace(i, 'D', inplace = True)
    elif i in grade_dict['E'] :
        ttt.name.replace(i, 'E', inplace = True)
    elif i in grade_dict['F'] :
        ttt.name.replace(i, 'F', inplace = True)
    else :
        ttt.name.replace(i, 'G', inplace = True)

y_test

1      1
2      0
3      0
4      0
5      1
      ..
388    0
389    1
390    1
391    0
392    1
Name: survived, Length: 392, dtype: int64

from sklearn.linear_model import LogisticRegression
lr = LogisticRegression(random_state=0)
lr.fit(x_train, y_train)
y_pred = lr.predict(x_test)

# 평가
from sklearn.metrics import confusion_matrix
confmat = confusion_matrix(y_true = y_test, y_pred = y_pred)
print(confmat)

[[208  37]
 [ 42 105]]

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
print('정확도(accuracy) : %.2f'% accuracy_score(y_test, y_pred))
print('정밀도(precision) : %.3f'% precision_score(y_test, y_pred))
print('재현율(recall) : %.3f'% recall_score(y_test, y_pred))
print('F1-score : %.3f'% f1_score(y_test, y_pred))
# f= 2*(정밀도*재현율)/(정밀도+재현율)

정확도(accuracy) : 0.80
정밀도(precision) : 0.739
재현율(recall) : 0.714
F1-score : 0.727

저작자표시 비영리 (새창열림)

'Data_Science > Data_Analysis_Py' 카테고리의 다른 글

34. 강남역 고기집 후기분석 \|\| 맵크로울링 (0)	2021.11.25
33. white wine \|\| GBM (0)	2021.11.24
31. titanic \|\| logistic (0)	2021.11.24
30. 보스턴 주택가격정보 \|\| 선형회귀 (0)	2021.11.24
29. 비트코인 시계열 분석 \|\| prophet (0)	2021.11.24

31. titanic || logistic

2021. 11. 24. 16:57

728x90

# 분류
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

df_train = pd.read_csv('titanic_train.csv')
df_test = pd.read_csv('titanic_test.csv')
df_train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 916 entries, 0 to 915
Data columns (total 13 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   pclass     916 non-null    int64  
 1   survived   916 non-null    int64  
 2   name       916 non-null    object 
 3   sex        916 non-null    object 
 4   age        741 non-null    float64
 5   sibsp      916 non-null    int64  
 6   parch      916 non-null    int64  
 7   ticket     916 non-null    object 
 8   fare       916 non-null    float64
 9   cabin      214 non-null    object 
 10  embarked   914 non-null    object 
 11  body       85 non-null     float64
 12  home.dest  527 non-null    object 
dtypes: float64(3), int64(4), object(6)
memory usage: 93.2+ KB

df_train['survived'].value_counts()

0    563
1    353
Name: survived, dtype: int64

df_train['survived'].value_counts().plot.bar()

df_train[['pclass','survived']].value_counts().sort_index().plot.bar()

ax = sns.countplot(x='pclass', hue = 'survived', data = df_train)

df_train[['sex','survived']].value_counts().sort_index().plot.bar()

ax = sns.countplot(x='sex', hue = 'survived', data = df_train)

# 분류
# 결측치 처리
- 삭제 : 처리가 쉽지만, 중요정보 삭제
# 변경
- 평균값 또는 중앙값, 최빈값으로 처리

df_train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 916 entries, 0 to 915
Data columns (total 13 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   pclass     916 non-null    int64  
 1   survived   916 non-null    int64  
 2   name       916 non-null    object 
 3   sex        916 non-null    object 
 4   age        741 non-null    float64
 5   sibsp      916 non-null    int64  
 6   parch      916 non-null    int64  
 7   ticket     916 non-null    object 
 8   fare       916 non-null    float64
 9   cabin      214 non-null    object 
 10  embarked   914 non-null    object 
 11  body       85 non-null     float64
 12  home.dest  527 non-null    object 
dtypes: float64(3), int64(4), object(6)
memory usage: 93.2+ KB

age_mean = df_train['age'].mean()
age_mean

30.23144399460189

df_train['age'] = df_train['age'].fillna(age_mean)
df_test['age'] = df_test['age'].fillna(age_mean)
# age_mean = df_train['age'].mean(skipna = False)
df_train['age']

0      13.000000
1       4.000000
2      30.000000
3      30.231444
4      22.000000
         ...    
911     0.170000
912    30.231444
913    30.231444
914    20.000000
915    32.000000
Name: age, Length: 916, dtype: float64

df_train['embarked'].isnull().value_counts()

False    914
True       2
Name: embarked, dtype: int64

replace_embarked = df_train['embarked'].value_counts().index[0]

df_train['embarked'] = df_train['embarked'].fillna(replace_embarked)
df_test['embarked'] = df_test['embarked'].fillna(replace_embarked)

df_train['embarked']

0      S
1      S
2      C
3      Q
4      Q
      ..
911    S
912    S
913    Q
914    S
915    Q
Name: embarked, Length: 916, dtype: object

df_train = df_train.drop(['name','ticket','body','cabin','home.dest'], axis=1)
df_test = df_test.drop(['name','ticket','body','cabin','home.dest'], axis=1)

df_test.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 393 entries, 0 to 392
Data columns (total 8 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   pclass    393 non-null    int64  
 1   survived  393 non-null    int64  
 2   sex       393 non-null    object 
 3   age       393 non-null    float64
 4   sibsp     393 non-null    int64  
 5   parch     393 non-null    int64  
 6   fare      393 non-null    float64
 7   embarked  393 non-null    object 
dtypes: float64(2), int64(4), object(2)
memory usage: 24.7+ KB

whole_df = df_train.append(df_test)
whole_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1309 entries, 0 to 392
Data columns (total 8 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   pclass    1309 non-null   int64  
 1   survived  1309 non-null   int64  
 2   sex       1309 non-null   object 
 3   age       1309 non-null   float64
 4   sibsp     1309 non-null   int64  
 5   parch     1309 non-null   int64  
 6   fare      1309 non-null   float64
 7   embarked  1309 non-null   object 
dtypes: float64(2), int64(4), object(2)
memory usage: 92.0+ KB

train_num = len(df_train)

whole_df_encoded = pd.get_dummies(whole_df)
whole_df_encoded.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1309 entries, 0 to 392
Data columns (total 11 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   pclass      1309 non-null   int64  
 1   survived    1309 non-null   int64  
 2   age         1309 non-null   float64
 3   sibsp       1309 non-null   int64  
 4   parch       1309 non-null   int64  
 5   fare        1309 non-null   float64
 6   sex_female  1309 non-null   uint8  
 7   sex_male    1309 non-null   uint8  
 8   embarked_C  1309 non-null   uint8  
 9   embarked_Q  1309 non-null   uint8  
 10  embarked_S  1309 non-null   uint8  
dtypes: float64(2), int64(4), uint8(5)
memory usage: 78.0 KB

df_train = whole_df_encoded[:train_num]
df_test = whole_df_encoded[train_num:]
df_train.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 916 entries, 0 to 915
Data columns (total 11 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   pclass      916 non-null    int64  
 1   survived    916 non-null    int64  
 2   age         916 non-null    float64
 3   sibsp       916 non-null    int64  
 4   parch       916 non-null    int64  
 5   fare        916 non-null    float64
 6   sex_female  916 non-null    uint8  
 7   sex_male    916 non-null    uint8  
 8   embarked_C  916 non-null    uint8  
 9   embarked_Q  916 non-null    uint8  
 10  embarked_S  916 non-null    uint8  
dtypes: float64(2), int64(4), uint8(5)
memory usage: 54.6 KB

df_test.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 393 entries, 0 to 392
Data columns (total 11 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   pclass      393 non-null    int64  
 1   survived    393 non-null    int64  
 2   age         393 non-null    float64
 3   sibsp       393 non-null    int64  
 4   parch       393 non-null    int64  
 5   fare        393 non-null    float64
 6   sex_female  393 non-null    uint8  
 7   sex_male    393 non-null    uint8  
 8   embarked_C  393 non-null    uint8  
 9   embarked_Q  393 non-null    uint8  
 10  embarked_S  393 non-null    uint8  
dtypes: float64(2), int64(4), uint8(5)
memory usage: 23.4 KB

y_train = df_train['survived'].values
x_train = df_train.loc[:,df_train.columns != 'survived'].values
y_test = df_test['survived'].values
x_test = df_train.loc[:,df_train.columns != 'survived'].values

from sklearn.linear_model import LogisticRegression
lr = LogisticRegression(random_state=0)
lr.fit(x_train, y_train)

y_pred = lr.predict(x_test)

# 평가
from sklearn.metrics import confusion_matrix
confmat = confusion_matrix(y_true = y_test, y_pred = y_pred)
print(confmat)

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
print('정확도(accuracy) : %.2f'% accuracy_score(y_test, y_pred))
print('정밀도(precision) : %.3f'% precision_score(y_test, y_pred))
print('재현율(recall) : %.3f'% recall_score(y_test, y_pred))
print('F1-score : %.3f'% f1_score(y_test, y_pred))
# f= 2*(정밀도*재현율)/(정밀도+재현율)

저작자표시 비영리 (새창열림)

'Data_Science > Data_Analysis_Py' 카테고리의 다른 글

33. white wine \|\| GBM (0)	2021.11.24
32. titanic \|\| GBM (0)	2021.11.24
30. 보스턴 주택가격정보 \|\| 선형회귀 (0)	2021.11.24
29. 비트코인 시계열 분석 \|\| prophet (0)	2021.11.24
28. 비트코인 가격 시계열 분석 \|\| Arima, fbProphet (0)	2021.11.24

PREV 1 2 3 4 5 6 NEXT