728x90
반응형
- Truncated SVD 를 이용한 행렬 분해
import numpy as np
from scipy.sparse.linalg import svds
from scipy.linalg import svd
# 원본 행렬을 출력하고, SVD를 적용할 경우 U, Sigma, Vt 의 차원 확인
np.random.seed(121)
matrix = np.random.random((6, 6))
print('원본 행렬:\n',matrix)
U, Sigma, Vt = svd(matrix, full_matrices=False)
print('\n분해 행렬 차원:',U.shape, Sigma.shape, Vt.shape)
print('\nSigma값 행렬:', Sigma)
# Truncated SVD로 Sigma 행렬의 특이값을 4개로 하여 Truncated SVD 수행.
num_components = 5
U_tr, Sigma_tr, Vt_tr = svds(matrix, k=num_components)
print('\nTruncated SVD 분해 행렬 차원:',U_tr.shape, Sigma_tr.shape, Vt_tr.shape)
print('\nTruncated SVD Sigma값 행렬:', Sigma_tr)
matrix_tr = np.dot(np.dot(U_tr,np.diag(Sigma_tr)), Vt_tr) # output of TruncatedSVD
print('\nTruncated SVD로 분해 후 복원 행렬:\n', matrix_tr)
원본 행렬:
[[0.11133083 0.21076757 0.23296249 0.15194456 0.83017814 0.40791941]
[0.5557906 0.74552394 0.24849976 0.9686594 0.95268418 0.48984885]
[0.01829731 0.85760612 0.40493829 0.62247394 0.29537149 0.92958852]
[0.4056155 0.56730065 0.24575605 0.22573721 0.03827786 0.58098021]
[0.82925331 0.77326256 0.94693849 0.73632338 0.67328275 0.74517176]
[0.51161442 0.46920965 0.6439515 0.82081228 0.14548493 0.01806415]]
분해 행렬 차원: (6, 6) (6,) (6, 6)
Sigma값 행렬: [3.2535007 0.88116505 0.83865238 0.55463089 0.35834824 0.0349925 ]
Truncated SVD 분해 행렬 차원: (6, 5) (5,) (5, 6)
Truncated SVD Sigma값 행렬: [0.35834824 0.55463089 0.83865238 0.88116505 3.2535007 ]
Truncated SVD로 분해 후 복원 행렬:
[[0.11368271 0.19721195 0.23106956 0.15961551 0.82758207 0.41695496]
[0.55500167 0.75007112 0.24913473 0.96608621 0.95355502 0.48681791]
[0.01789183 0.85994318 0.40526464 0.62115143 0.29581906 0.92803075]
[0.40782587 0.55456069 0.24397702 0.23294659 0.035838 0.58947208]
[0.82711496 0.78558742 0.94865955 0.7293489 0.67564311 0.73695659]
[0.5136488 0.45748403 0.64231412 0.82744766 0.14323933 0.0258799 ]]
사이킷런 TruncatedSVD 클래스를 이용한 변환
from sklearn.decomposition import TruncatedSVD, PCA
from sklearn.datasets import load_iris
import matplotlib.pyplot as plt
%matplotlib inline
iris = load_iris()
iris_ftrs = iris.data
# 2개의 주요 component로 TruncatedSVD 변환
tsvd = TruncatedSVD(n_components=2)
tsvd.fit(iris_ftrs)
iris_tsvd = tsvd.transform(iris_ftrs)
# Scatter plot 2차원으로 TruncatedSVD 변환 된 데이터 표현. 품종은 색깔로 구분
plt.scatter(x=iris_tsvd[:,0], y= iris_tsvd[:,1], c= iris.target)
plt.xlabel('TruncatedSVD Component 1')
plt.ylabel('TruncatedSVD Component 2')
# Text(0,0.5,'TruncatedSVD Component 2')
from sklearn.preprocessing import StandardScaler
# iris 데이터를 StandardScaler로 변환
scaler = StandardScaler()
iris_scaled = scaler.fit_transform(iris_ftrs)
# 스케일링된 데이터를 기반으로 TruncatedSVD 변환 수행
tsvd = TruncatedSVD(n_components=2)
tsvd.fit(iris_scaled)
iris_tsvd = tsvd.transform(iris_scaled)
# 스케일링된 데이터를 기반으로 PCA 변환 수행
pca = PCA(n_components=2)
pca.fit(iris_scaled)
iris_pca = pca.transform(iris_scaled)
# TruncatedSVD 변환 데이터를 왼쪽에, PCA변환 데이터를 오른쪽에 표현
fig, (ax1, ax2) = plt.subplots(figsize=(9,4), ncols=2)
ax1.scatter(x=iris_tsvd[:,0], y= iris_tsvd[:,1], c= iris.target)
ax2.scatter(x=iris_pca[:,0], y= iris_pca[:,1], c= iris.target)
ax1.set_title('Truncated SVD Transformed')
ax2.set_title('PCA Transformed')
# Text(0.5,1,'PCA Transformed')
반응형
'Data_Science > ML_Perfect_Guide' 카테고리의 다른 글
7-1. Kmeans (0) | 2021.12.29 |
---|---|
6-6. NMF (Non Negative Matrix Factorization) (0) | 2021.12.29 |
6-4. SVD (singular value decomposition) (0) | 2021.12.29 |
6-3. LDA(Linear Discriminant Analysis) (0) | 2021.12.29 |
6-2. 신용카드 데이터 PCA (0) | 2021.12.29 |