Data_Science/Data_Analysis_Py

10. folium 2 2021.10.26
9. tips || '21.06.28. 2021.10.26
8. iris || '21.06.28. 2021.10.26
7. folium || '21.06.24. 2021.10.26
6. Titanic || '21.06.24. 2021.10.26
5. auto-mpg 분석 || '21.06.24. 2021.10.26
4. 시도별 전출입 인구수 분석 ( 2 || '21.06.24. 2021.10.26
3. 시도별 전출입 인구수 분석 ( 1 || 2021.06.23 2021.10.26
2. auto-mpg 데이터 분석 || 2021.06.24 2021.10.19
1. 남북한발전전력량 분석 || 2021.06.24 2021.10.19

10. folium 2

2021. 10. 26. 23:05

728x90

import folium
seoul_map = folium.Map(location=[37.55, 126.98], zoom_start=12)
seoul_map.save('seoul.html')

seoul.html

0.00MB

# tiles : 지도 스타일 설정
# openstreetmap, cartodbdark_matter, cartodbpositron, stamenterrain,
seoul_map2 = folium.Map(location=[37.55, 126.98], zoom_start=12, tiles = "openstreetmap")
seoul_map2.save('seoul2.html')

seoul2.html

0.00MB

import pandas as pd
import folium
df = pd.read_excel('서울지역 대학교 위치.xlsx', index_col=0,engine='openpyxl')
print(df.head())
                     위도          경도
KAIST 서울캠퍼스   37.592573  127.046737
KC대학교         37.548345  126.854797
가톨릭대학교(성신교정)  37.585922  127.004328
가톨릭대학교(성의교정)  37.499623  127.006065
감리교신학대학교      37.567645  126.961610


print(df.index)
Index(['KAIST 서울캠퍼스 ', 'KC대학교', '가톨릭대학교(성신교정)', '가톨릭대학교(성의교정)', '감리교신학대학교',
       '건국대학교', '경기대학교 서울캠퍼스 ', '경희대학교 서울캠퍼스 ', '고려대학교', '광운대학교', '국민대학교',
       '덕성여자대학교', '동국대학교', '동덕여자대학교', '명지대학교 서울캠퍼스 ', '삼육대학교', '상명대학교 서울캠퍼스 ',
       '서강대학교', '서경대학교', '서울과학기술대학교', '서울교육대학교', '서울기독대학교', '서울대학교', '서울시립대학교',
       '서울여자대학교', '서울한영대학교', '성공회대학교', '성균관대학교 서울캠퍼스  ', '성신여자대학교', '세종대학교',
       '숙명여자대학교', '숭실대학교', '연세대학교', '육군사관학교', '이화여자대학교', '장로회신학대학교',
       '중앙대학교 서울캠퍼스 ', '총신대학교', '추계예술대학교', '한국방송통신대학교', '한국성서대학교', '한국예술종합학교',
       '한국외국어대학교', '한국체육대학교', '한성대학교', '한양대학교', '홍익대학교'],
      dtype='object')

seoul_map = folium.Map(location=[37.55, 126.98], zoom_start=12, tiles = "openstreetmap")
for name, lat, lng in zip(df.index, df.위도, df.경도) :
    # marker = w지도 표시 객체 ,  popup 마커 표시내용\
    # tooltip 마커에 커서가 들어온 경우 표시됨
    folium.Marker([lat, lng], popup=name, tooltip=name).add_to(seoul_map)
seoul_map.save('seoul_colleges.html')

seoul_map3 = folium.Map(location=[37.55, 126.98], zoom_start=12, tiles = "openstreetmap")
for name, lat, lng in zip(df.index, df.위도, df.경도) :
    folium.CircleMarker([lat, lng], # 위경도
                        radius = 10, #  반지름
                        color = 'brown', # 색 
                        fill= True, # 원 둘레 색
                        fillcolor= 'coral', # 원을 채우는 색
                        fill_opacity=0.7,# 투명도
                        popup=name
                       ).add_to(seoul_map)
seoul_map3.save('seoul_colleges.html')

# 아이콘 마커표시3
seoul_map4 = folium.Map(location=[37.55, 126.98], zoom_start=12, tiles = "openstreetmap")
for name, lat, lng in zip(df.index, df.위도, df.경도) :
    folium.CircleMarker([lat, lng], # 위경도
                        popup=name,
                        # icon home, flag, bookmark, star
                        icon = folium.Icon(color = 'blue', icon='star')
                       ).add_to(seoul_map)
seoul_map4.save('seoul_colleges.html')

seoul_colleges.html

0.04MB

library.csv

import pandas as pd
import folium
from folium import Marker
library = pd.read_csv('library.csv')
lib_map = folium.Map(location=[37.55, 126.98], zoom_start=12)

print(df1.head())
   고유번호   구명   법정동명  산지여부  주지번 부지번                 새주소명                 시설명  \
0    21  구로구   구로3동     1  777   1  구로구 디지털로 27다길 65 2층             꿈마을 도서관   
1    22  용산구    후암동     1   30  84        용산구 후암동 30-84              남산 도서관   
2    23   중구    신당동     1  844                중구 다산로 32  남산타운 문화체육센터 어린이도서관   
3    24  노원구  상계10동     1  686               노원구 온곡길 21            노원 정보도서관   
4    25  노원구   중계3동     1  508             노원구 중계3동 508            노원 평생학습관   

         운영기관 설립주체    시설구분         개관일      면적                       홈페이지주소  \
0  구로구 시설관리공단        구립도서관  2007-04-05   476.0    lib.guro.go.kr/dreamtown/   
1                   교육청도서관  1922-10-05     0.0  lib.sen.go.kr/lib_index.jsp   
2      시설관리공단        구립도서관  2010-04-01   273.8        www.e-junggulib.or.kr   
3   노원 교육복지재단        구립도서관  2006-02-15  6526.0              www.nowonlib.kr   
4                   교육청도서관  1990-05-08     0.0  lib.sen.go.kr/lib_index.jsp 

print(df1.index)
            연락처 생성일          경도         위도  
0      830-5807      126.890115  37.487220  
1                    126.981375  37.552664  
2  02-2280-8520      127.009297  37.549020  
3   02-950-0029      127.064177  37.660927  
4                    127.067120  37.640120  
RangeIndex(start=0, stop=123, step=1)

color='blue'

# color='blue'
for name, lat, lng, kbn in zip(library['시설명'],library['위도'],library['경도'],library['시설구분']) :
    if kbn == '구립도서관' :
        color ='green'
    else :
        color = 'blue'
    Marker(location = [lat, lng], 
           popup = kbn,
           tooltip=name, 
           icon = folium.Icon(color=color,icon='bookmark')
          ).add_to(lib_map)
lib_map.save('library.html')

library.html

0.30MB

markercluster 기능

# markercluster 기능
from folium.plugins import MarkerCluster
lib_map = folium.Map(location=[37.55, 126.98], zoom_start=12)

# add points to the map
mc = MarkerCluster()
# 데이터 중 한개 레코드씩 조회 row 따고 그중 하나씩
# _ 인덱스값 저장, 사용안하지만 절차상 피룡
for _, row in library.iterrows(): 
    mc.add_child(
        Marker(location = [row['위도'], row['경도']],
              popup = row['시설구분'],
              tooltip = row['시설명']
              )
    )
lib_map.add_child(mc)
lib_map.save('library2.html')

library2.html

0.09MB

# 경기도 인구데이텉와 위치정보를 가지고 지도 표시
import pandas as pd
import folium
import json
file_path = './경기도인구데이터.xlsx'
df = pd.read_excel(file_path, index_col = '구분', engine = 'openpyxl')
df.columns = df.columns.map(str)
geo_path = './경기도행정구역경계.json'

try :
    geo_data = json.load(open(geo_path, encoding = 'utf-8')) # 이게 안되면 
except :
    geo_data = json.load(open(geo_path, encoding = 'utf-8-sig')) # 이걸로
print(type(geo_data)) # dict
g_map = folium.Map(location=[37.5502, 126.982], zoom_start=9)
year = '2017'
# <class 'dict'>

# choropleth 클래스로 단계구분 표시
# fill_color BuGn, PuRd, BuPu, GnBu, OrRd, PuBu, PuBuGn
folium.Choropleth(geo_data = geo_data, # 지도 경계 
                 data = df[year], # 표시하려는 데이터
                 columns = [df.index, df[year]],
                 fill_color = 'YlOrRd', # 면적 색깔
                  fill_opacity=0.7, # 면적 투명도 
                  line_opacity=0.3,  # 선 투명도
                 threshold_scale = [10000,100000,300000,500000,700000], # 색깔구분
                 key_on ='feature.properties.name', # 이름으로 구분
                 ).add_to(g_map) # 에 추가하자
g_map.save('./gyonggi_population_'+year+'.html')

gyonggi_population_2017.html

0.07MB

us-states.json

0.08MB

# json
import pandas as pd
import folium
import json
file_path = './US_Unemployment_Oct2012.csv'
df = pd.read_csv(file_path)
df.columns = df.columns.map(str)
geo_path = './us-states.json'

try :
    geo_data = json.load(open(geo_path, encoding = 'utf-8')) # 이게 안되면 
except :
    geo_data = json.load(open(geo_path, encoding = 'utf-8-sig')) # 이걸로
print(type(geo_data)) # dict
g_map = folium.Map(location=[37, -100], zoom_start=3, tiles="stamentoner")
# <class 'dict'>

# choropleth 클래스로 단계구분 표시
# fill_color BuGn, PuRd, BuPu, GnBu, OrRd, PuBu, PuBuGn
folium.Choropleth(geo_data = geo_data, # 지도 경계 
                 data = df, # 표시하려는 데이터
                 columns = ['State','Unemployment'],
                 fill_color = 'YlGn', # 면적 색깔
                  fill_opacity=0.7, # 면적 투명도 
                  line_opacity=0.3,  # 선 투명도
                 threshold_scale = [0, 2, 4, 6, 8, 10, 12], # 색깔구분
                  
                  legend_name='Unemployment Rate (%)',
                 key_on ='feature.id', # 이름으로 구분
                 ).add_to(g_map) # 에 추가하자
g_map.save('./US_Unemployment_Oct_'+'11'+'.html')

US_Unemployment_Oct2012.csv

0.00MB

import pandas as pd
import folium
import json
file_path='US_Unemployment_Oct2012.csv'
df=pd.read_csv(file_path)
df.head()
df.columns=df.columns.map(str)
geo_path='us-states.json'

try:
    geo_data=json.load(open(geo_path,encoding='utf-8'))
except:
    geo_data=json.load(open(geo_path,encoding='utf-8-sig'))

print(type(geo_data))

g_map = folium.Map(location=[37,-102],zoom_start=3)
folium.Choropleth(geo_data=geo_data, #지도 경계
data=df,
columns=['State','Unemployment'], #열 지정
fill_color='YlGn',fill_opacity=0.7,line_opacity=0.3,
legent_name="Unemployment Rate (%)",
key_on='feature.id',
).add_to(g_map)
g_map.save('US_Unemployment.html')

US_Unemployment.html

0.11MB

crime_in_Seoul_final.csv

0.01MB

skorea_municipalities_geo_simple.json

0.01MB

import pandas as pd
import folium
import json
file_path='crime_in_Seoul_final.csv'
df=pd.read_csv(file_path,index_col='구별')
df.head()
df.columns=df.columns.map(str)
geo_path='skorea_municipalities_geo_simple.json'

try:
    geo_data=json.load(open(geo_path,encoding='utf-8'))
except:
    geo_data=json.load(open(geo_path,encoding='utf-8-sig'))

print(type(geo_data))
g_map = folium.Map(location=[37.5502,126.978],zoom_start=10)
typ='강간'
folium.Choropleth(geo_data=geo_data,
data=df[typ], 
columns=[df.index,df[typ]],
fill_color='YlOrRd',fill_opacity=0.7,line_opacity=0.3,
key_on='feature.properties.name',
).add_to(g_map)
g_map.save('crime_Seoul3.html')

crime_Seoul.html

0.04MB

저작자표시 비영리

'Data_Science > Data_Analysis_Py' 카테고리의 다른 글

12. titanic (2 (0)	2021.10.26
11. 행정안전부, 연령별 인구 분석 (0)	2021.10.26
9. tips \|\| '21.06.28. (0)	2021.10.26
8. iris \|\| '21.06.28. (0)	2021.10.26
7. folium \|\| '21.06.24. (0)	2021.10.26

9. tips || '21.06.28.

2021. 10. 26. 00:34

728x90

import seaborn as sns
print(sns.get_dataset_names())
tips = sns.load_dataset('tips')
print(tips.head())
# ['anagrams', 'anscombe', 'attention', 'brain_

['anagrams', 'anscombe', 'attention', 'brain_networks', 'car_crashes', 'diamonds', 'dots', 'exercise', 'flights', 'fmri', 'gammas', 'geyser', 'iris', 'mpg', 'penguins', 'planets', 'tips', 'titanic']
   total_bill   tip     sex smoker  day    time  size
0       16.99  1.01  Female     No  Sun  Dinner     2
1       10.34  1.66    Male     No  Sun  Dinner     3
2       21.01  3.50    Male     No  Sun  Dinner     3
3       23.68  3.31    Male     No  Sun  Dinner     2
4       24.59  3.61  Female     No  Sun  Dinner     4

sns.set_style('darkgrid')
# darkgrid, whitegrid, dark, white, ticks 등
# 선형회귀, 산점도
sns.regplot(x='total_bill',
           y='tip',
           data = tips)
plt.title('총지불금액과 팁')
plt.xlabel('총지불금액')
plt.ylabel('팀')
plt.show()

sns.barplot(x='time', y='tip', data=tips)

저작자표시 비영리

'Data_Science > Data_Analysis_Py' 카테고리의 다른 글

11. 행정안전부, 연령별 인구 분석 (0)	2021.10.26
10. folium 2 (0)	2021.10.26
8. iris \|\| '21.06.28. (0)	2021.10.26
7. folium \|\| '21.06.24. (0)	2021.10.26
6. Titanic \|\| '21.06.24. (0)	2021.10.26

8. iris || '21.06.28.

2021. 10. 26. 00:33

728x90

import seaborn as sns
print(sns.get_dataset_names())
iris = sns.load_dataset('iris')
print(iris.head())
# ['anagrams', 'anscombe', 'attention', 'brain_

['anagrams', 'anscombe', 'attention', 'brain_networks', 'car_crashes', 'diamonds', 'dots', 'exercise', 'flights', 'fmri', 'gammas', 'geyser', 'iris', 'mpg', 'penguins', 'planets', 'tips', 'titanic']
   sepal_length  sepal_width  petal_length  petal_width species
0           5.1          3.5           1.4          0.2  setosa
1           4.9          3.0           1.4          0.2  setosa
2           4.7          3.2           1.3          0.2  setosa
3           4.6          3.1           1.5          0.2  setosa
4           5.0          3.6           1.4          0.2  setosa

iris_pair = iris[['sepal_length','sepal_width','petal_length','petal_width', 'species']]
print(iris_pair)
# 조건에 따라 그리드 나누기
g = sns.pairplot(iris_pair, hue = 'species')

     sepal_length  sepal_width  petal_length  petal_width    species
0             5.1          3.5           1.4          0.2     setosa
1             4.9          3.0           1.4          0.2     setosa
2             4.7          3.2           1.3          0.2     setosa
3             4.6          3.1           1.5          0.2     setosa
4             5.0          3.6           1.4          0.2     setosa
..            ...          ...           ...          ...        ...
145           6.7          3.0           5.2          2.3  virginica
146           6.3          2.5           5.0          1.9  virginica
147           6.5          3.0           5.2          2.0  virginica
148           6.2          3.4           5.4          2.3  virginica
149           5.9          3.0           5.1          1.8  virginica

[150 rows x 5 columns]

# boxplot 막대 그래프 건수 출력
fig = plt.figure(figsize = (10,5))
ax1 = fig.add_subplot(2,2,1) # 위
ax2 = fig.add_subplot(2,2,2) # 아래
ax3 = fig.add_subplot(2,2,3) # 아래
ax4 = fig.add_subplot(2,2,4) # 아래

sns.boxplot(x='species', y= 'sepal_length', data=iris, ax=ax1)
sns.boxplot(x='species', y= 'sepal_width', data=iris, ax=ax2)
sns.boxplot(x='species', y= 'petal_length', data=iris, ax=ax3)
sns.boxplot(x='species', y= 'petal_width', data=iris, ax=ax4)

plt.show()

저작자표시 비영리

'Data_Science > Data_Analysis_Py' 카테고리의 다른 글

10. folium 2 (0)	2021.10.26
9. tips \|\| '21.06.28. (0)	2021.10.26
7. folium \|\| '21.06.24. (0)	2021.10.26
6. Titanic \|\| '21.06.24. (0)	2021.10.26
5. auto-mpg 분석 \|\| '21.06.24. (0)	2021.10.26

7. folium || '21.06.24.

2021. 10. 26. 00:31

728x90

# 지도 그리기
# folium 모듈 사용
import folium  # pip install folium
seoul_map = folium.Map(location=[37.55, 126.98], zoom_start=12)
seoul_map.save('seoul.html')

seoul.html

0.00MB

pip install folium

저작자표시 비영리

'Data_Science > Data_Analysis_Py' 카테고리의 다른 글

9. tips \|\| '21.06.28. (0)	2021.10.26
8. iris \|\| '21.06.28. (0)	2021.10.26
6. Titanic \|\| '21.06.24. (0)	2021.10.26
5. auto-mpg 분석 \|\| '21.06.24. (0)	2021.10.26
4. 시도별 전출입 인구수 분석 ( 2 \|\| '21.06.24. (0)	2021.10.26

6. Titanic || '21.06.24.

2021. 10. 26. 00:29

728x90

seaborn

matplot 모듈 기능, 스타일 확장한 고급 시각화 도구, 데이터 셋 저장 모듈

import seaborn as sns
print(sns.get_dataset_names())
titanic = sns.load_dataset('titanic')
# ['anagrams', 'anscombe', 'attention', 'brain_networks', 'car_crashes', 'diamonds', 'dots', 'exercise', 'flights', 'fmri', 'gammas', 'geyser', 'iris', 'mpg', 'penguins', 'planets', 'tips', 'titanic']

print(titanic.head())

   survived  pclass     sex   age  sibsp  parch     fare embarked  class  \
0         0       3    male  22.0      1      0   7.2500        S  Third   
1         1       1  female  38.0      1      0  71.2833        C  First   
2         1       3  female  26.0      0      0   7.9250        S  Third   
3         1       1  female  35.0      1      0  53.1000        S  First   
4         0       3    male  35.0      0      0   8.0500        S  Third   

     who  adult_male deck  embark_town alive  alone  
0    man        True  NaN  Southampton    no  False  
1  woman       False    C    Cherbourg   yes  False  
2  woman       False  NaN  Southampton   yes   True  
3  woman       False    C  Southampton   yes  False  
4    man        True  NaN  Southampton    no   True

print(titanic.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 15 columns):
 #   Column       Non-Null Count  Dtype   
---  ------       --------------  -----   
 0   survived     891 non-null    int64   
 1   pclass       891 non-null    int64   
 2   sex          891 non-null    object  
 3   age          714 non-null    float64 
 4   sibsp        891 non-null    int64   
 5   parch        891 non-null    int64   
 6   fare         891 non-null    float64 
 7   embarked     889 non-null    object  
 8   class        891 non-null    category
 9   who          891 non-null    object  
 10  adult_male   891 non-null    bool    
 11  deck         203 non-null    category
 12  embark_town  889 non-null    object  
 13  alive        891 non-null    object  
 14  alone        891 non-null    bool    
dtypes: bool(2), category(2), float64(2), int64(4), object(5)
memory usage: 80.6+ KB
None

sns.set_style('darkgrid')
# darkgrid, whitegrid, dark, white, ticks 등
fig = plt.figure(figsize = (15, 5)) # fig 크기
ax1 = fig.add_subplot(1,2,1) # 위
ax2 = fig.add_subplot(1,2,2) # 아래
# 선형회귀, 산점도
sns.regplot(x='age',
           y='fare',
           data = titanic,
           ax = ax1)

sns.regplot(x='age',
           y='fare',
           data = titanic,
           ax = ax2,
           fit_reg=False)

plt.show()

히스토그램

distplot 두개 다
kedplot 선형 분포
histplot 막대 분포

fig = plt.figure(figsize = (15, 5)) # fig 크기
ax1 = fig.add_subplot(1,3,1) # 위
ax2 = fig.add_subplot(1,3,2) # 아래
ax3 = fig.add_subplot(1,3,3) # 아래
# distplot : 합친 것
sns.distplot(titanic['fare'], ax=ax1)
# kdeplot : 커널 밀도
sns.kdeplot(x='fare', data =titanic, ax=ax2)
# sns.distplot(titanic['fare'], hist = False, ax=ax1)
# histplot : 히스토그램
sns.histplot(x='fare', data =titanic, ax=ax3)
# sns.distplot(titanic['fare'], kde = False, ax=ax1)

ax1.set_title('titanic fare - distplot')
ax2.set_title('titanic fare - kdeplot')
ax3.set_title('titanic fare - histplot')
plt.show()

히트맵

sns.set_style('darkgrid')
# 피벗테이블로 범주현 변수를 각각 행, 열로 재구분하여 정리
table = titanic.pivot_table(index = ['sex'], columns = ['class'], aggfunc='size')
# 성별이 인덱스, 컬럼이 탑승등급, aggdfunc 건수 표시
# 피벗테이블 : 범위를 가지는 값 별로 건수로 출력되는 테이블
# 히트맵 그리기
sns.heatmap(table,                  # 데이터 프레임
           annot = True, fmt = 'd', # 데이터 값 표시, 정수형 포맷
           cmap='YlGnBu',           # 컬러맵
           linewidth=5,             # 구분선
           cbar=True)              # 컬러 바 표시 여부

plt.show()

# 산점도
sns.set_style('whitegrid')
fig = plt.figure(figsize = (15,5))
ax1 = fig.add_subplot(1,2,1) # 위
ax2 = fig.add_subplot(1,2,2) # 아래
# 이산형 변수 의 분포 데이터 분산 미고려
sns.stripplot(x = 'class',
            y = 'age',
            data = titanic,
            ax = ax1)
# 이산형 변수의 분포 : 데이터 분산 고려, 중복 없음, 겹치지 않게 옆으로 밀어냄
sns.swarmplot(x = 'class',
            y = 'age',
            data = titanic,
            ax = ax2)
# 차트제목 표시
ax1.set_title('Strip Plot')
ax2.set_title('Swarm Plot')
plt.show()

# 산점도
sns.set_style('whitegrid')
fig = plt.figure(figsize = (15,5))
ax1 = fig.add_subplot(1,2,1) # 위
ax2 = fig.add_subplot(1,2,2) # 아래
# 이산형 변수 의 분포 데이터 분산 미고려
sns.stripplot(x = 'class',
            y = 'age',
            data = titanic,
            hue = 'sex', # 데이터 구분 컬럼 // 동일 배치에 색깔로 구분
            ax = ax1)
# 이산형 변수의 분포 : 데이터 분산 고려, 중복 없음, 겹치지 않게 옆으로 밀어냄
sns.swarmplot(x = 'class',
            y = 'age',
            data = titanic,
            hue = 'sex', 
            ax = ax2)
# 차트제목 표시
ax1.set_title('Strip Plot')
ax2.set_title('Swarm Plot')
ax1.legend(loc = 'upper right')
ax2.legend(loc = 'upper right')
plt.show()

# 막대 그래프
fig = plt.figure(figsize = (15,5))
ax1 = fig.add_subplot(1,3,1) # 위
ax2 = fig.add_subplot(1,3,2) # 아래
ax3 = fig.add_subplot(1,3,3) # 아래

sns.barplot(x='sex', y='survived', data=titanic, ax=ax1)
sns.barplot(x='sex', y='survived', hue = 'class', data=titanic, ax=ax2)
sns.barplot(x='sex', y='survived', hue = 'class', dodge = False, data=titanic, ax=ax3)

# 차트제목 표시
ax1.set_title('titanic survived  sex')
ax2.set_title('titanic survived  sex/class')
ax3.set_title('titanic survived  sex/class(stacked)')
plt.show()

# countplot 막대 그래프 건수 출력
fig = plt.figure(figsize = (15,5))
ax1 = fig.add_subplot(1,3,1) # 위
ax2 = fig.add_subplot(1,3,2) # 아래
ax3 = fig.add_subplot(1,3,3) # 아래

sns.countplot(x='class', palette='Set1', data=titanic, ax=ax1)
sns.countplot(x='class', hue = 'who', palette='Set1', data=titanic, ax=ax2)
sns.countplot(x='class', hue = 'who', palette='Set1', dodge = False, data=titanic, ax=ax3)
# 차트제목 표시
ax1.set_title('titanic survived')
ax2.set_title('titanic survived - who')
ax3.set_title('titanic survived - who(stacked)')
plt.show()

# boxplot 막대 그래프 건수 출력
fig = plt.figure(figsize = (15,5))
ax1 = fig.add_subplot(2,2,1) # 위
ax2 = fig.add_subplot(2,2,2) # 아래
ax3 = fig.add_subplot(2,2,3) # 아래
ax4 = fig.add_subplot(2,2,4) # 아래

sns.boxplot(x='alive', y= 'age', data=titanic, ax=ax1)
sns.boxplot(x='alive', y= 'age', hue = 'sex', data=titanic, ax=ax2)
sns.violinplot(x='alive', y='age', data=titanic, ax=ax3)
sns.violinplot(x='alive', y='age', hue = 'sex', data=titanic, ax=ax4)

# 차트제목 표시
ax2.legend(loc='upper center')
ax4.legend(loc='upper center')
plt.show()

# 조인트그래프 - 산점도
j1 = sns.jointplot(x='fare', y= 'age', data=titanic)
# 조인트그래프 - 회귀선
j2 = sns.jointplot(x='fare', y= 'age', kind = 'reg', data=titanic)
# 조인트그래프 - 육각 그래프
j3 = sns.jointplot(x='fare', y='age', kind = 'hex', data=titanic)
# 조인트그래프 - 커럴 밀집 그래프
j4 = sns.jointplot(x='fare', y='age', kind = 'kde', data=titanic)

# 차트제목 표시
j1.fig.suptitle('titanic fare - scatter',size = 15)
j2.fig.suptitle('titanic fare - reg',size = 15)
j3.fig.suptitle('titanic fare - hex',size = 15)
j4.fig.suptitle('titanic fare - kde',size = 15)
plt.show()

# 조건에 따라 그리드 나누기
# who : man, woman, child
# survived : 0 1
g = sns.FacetGrid(data = titanic, col='who', row='survived') # 한방에, 컬럼 로우별
# 그래프 적용하기
g = g.map(plt.hist, 'age')
plt.show()

# 이변수 데이터 분포 그리기 pairplot
# 각변수들의 산점도 출력, 대각선 위치의 그래프는 히스토그램 으로 표시
# pairplot
titanic_pair = titanic[['age', 'pclass', 'fare']]
print(titanic_pair)
# 조건에 따라 그리드 나누기
g = sns.pairplot(titanic_pair)

      age  pclass     fare
0    22.0       3   7.2500
1    38.0       1  71.2833
2    26.0       3   7.9250
3    35.0       1  53.1000
4    35.0       3   8.0500
..    ...     ...      ...
886  27.0       2  13.0000
887  19.0       1  30.0000
888   NaN       3  23.4500
889  26.0       1  30.0000
890  32.0       3   7.7500

[891 rows x 3 columns]

저작자표시 비영리

'Data_Science > Data_Analysis_Py' 카테고리의 다른 글

8. iris \|\| '21.06.28. (0)	2021.10.26
7. folium \|\| '21.06.24. (0)	2021.10.26
5. auto-mpg 분석 \|\| '21.06.24. (0)	2021.10.26
4. 시도별 전출입 인구수 분석 ( 2 \|\| '21.06.24. (0)	2021.10.26
3. 시도별 전출입 인구수 분석 ( 1 \|\| 2021.06.23 (0)	2021.10.26

5. auto-mpg 분석 || '21.06.24.

2021. 10. 26. 00:26

728x90

import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('default')

auto-mpg.csv

0.02MB

scatter 산점도

df = pd.read_csv('auto-mpg.csv', header=None)
df.columns = ['mpg', 'cylinders', 'desplacement', 'horsepower', 'weight', 'acceleration', 'model year', 'origin', 'name']
df.plot(kind = 'scatter', x='weight', y='mpg', c ='coral', s=10, figsize = (10, 5))
plt.title('Scatter Plot - mpg vs. weight')
plt.show()

bubble => s로 크기 지정, alpha로 투명도

df = pd.read_csv('auto-mpg.csv', header=None)
df.columns = ['mpg', 'cylinders', 'desplacement', 'horsepower', 'weight', 'acceleration', 'model year', 'origin', 'name']
cylineder_size = df.cylinders / df.cylinders.max() * 300
df.plot(kind = 'scatter', x='weight', y='mpg', c ='coral', s=cylineder_size, figsize = (10, 5), alpha = 0.3)
plt.title('Scatter Plot - mpg vs. weight')
plt.show()

색상 설정

cylineder_size = df.cylinders / df.cylinders.max() * 300
df.plot(kind = 'scatter', x='weight', y='mpg', marker ='+', s=50, c=cylineder_size, cmap='viridis', figsize = (10, 5), alpha = 0.3)
# cmap color mapping
plt.title('Scatter Plot - mpg vs. weight - cylinder')
# plt.savefig('scatter_transparent.png', transparent = True) # 현재 그림을 이미지파일 생성
plt.savefig('scatter_transparent.png', transparent = True) # transparent 투명한 그림으로 표시
plt.show()

pie graph

df['count'] = 1
print(df.head())
df_origin = df.groupby('origin').sum()# origin 기준으로 그룹별 합이 df_origin
print(df_origin.head())
# df_origin['count'] 국가별 자동차 갯수
df_origin.index = ['USA','EU','JAPAN']
# 제조국가 origin 값을 실제 지역명으로 변경
# '%1.1f%%' : %1.1f 소숫점 이하 한자리, %% 퍼센트 표시
df_origin['count'].plot(kind='pie', figsize = (7, 5), autopct='%1.1f%%',# 퍼센트% 표시
                        startangle = 10, # 파이조각 나누는 시작점  각도 표시
                        colors=['chocolate','bisque','cadetblue']
                       )
plt.title('model origin', size = 20)
plt.axis('equal') # 파이차트 비율 같게, 원에 가깝게 조정
plt.legend(labels = df_origin.index, loc='upper right')
plt.show()


    mpg  cylinders  desplacement horsepower  weight  acceleration  model year  \
0  18.0          8         307.0      130.0  3504.0          12.0          70   
1  15.0          8         350.0      165.0  3693.0          11.5          70   
2  18.0          8         318.0      150.0  3436.0          11.0          70   
3  16.0          8         304.0      150.0  3433.0          12.0          70   
4  17.0          8         302.0      140.0  3449.0          10.5          70   

   origin                       name  count  
0       1  chevrolet chevelle malibu      1  
1       1          buick skylark 320      1  
2       1         plymouth satellite      1  
3       1              amc rebel sst      1  
4       1                ford torino      1  
           mpg  cylinders  desplacement    weight  acceleration  model year  \
origin                                                                        
1       5000.8       1556       61229.5  837121.0        3743.4       18827   
2       1952.4        291        7640.0  169631.0        1175.1        5307   
3       2405.6        324        8114.0  175477.0        1277.6        6118   

        count  
origin         
1         249  
2          70  
3          79

import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import font_manager, rc
rc('font', family = 'Malgun Gothic')
plt.style.use('seaborn-poster')
plt.rcParams['axes.unicode_minus'] = False

df = pd.read_csv('auto-mpg.csv', header=None)
df.columns = ['mpg', 'cylinders', 'desplacement', 'horsepower', 'weight', 'acceleration', 'model year', 'origin', 'name']

fig = plt.figure(figsize = (15, 5)) # fig 크기
ax1 = fig.add_subplot(1,2,1) # 위
ax2 = fig.add_subplot(1,2,2) # 아래

ax1.boxplot(x=[df[df['origin']==1]['mpg'],
              df[df['origin']==2]['mpg'],
              df[df['origin']==3]['mpg']],
            labels = ['USA','EU', 'JAPAN'])
# 가로               
ax2.boxplot(x=[df[df['origin']==1]['mpg'],
              df[df['origin']==2]['mpg'],
              df[df['origin']==3]['mpg']],
            labels = ['USA','EU', 'JAPAN'], vert = False)               
ax1.set_title('제조국가별 연비 분포(수직 박스 플롯)')
ax2.set_title('제조국가별 연비 분포(수직 박스 플롯)')               
plt.show()

저작자표시 비영리

'Data_Science > Data_Analysis_Py' 카테고리의 다른 글

7. folium \|\| '21.06.24. (0)	2021.10.26
6. Titanic \|\| '21.06.24. (0)	2021.10.26
4. 시도별 전출입 인구수 분석 ( 2 \|\| '21.06.24. (0)	2021.10.26
3. 시도별 전출입 인구수 분석 ( 1 \|\| 2021.06.23 (0)	2021.10.26
2. auto-mpg 데이터 분석 \|\| 2021.06.24 (0)	2021.10.19

4. 시도별 전출입 인구수 분석 ( 2 || '21.06.24.

2021. 10. 26. 00:14

728x90

import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import font_manager, rc
rc('font', family='Malgun Gothic') # 폰트지정
df = pd.read_excel('./시도별 전출입 인구수.xlsx', engine='openpyxl', header=0)
df = df.fillna(method = 'ffill')
mask = (df['전출지별'] == '서울특별시') & (df['전입지별'] != '서울특별시')
df_seoul = df[mask]
df_seoul = df_seoul.drop(['전출지별'], axis = 1)
df_seoul.rename({'전입지별' : '전입지'}, axis = 1, inplace = True)
df_seoul.set_index('전입지', inplace = True)

print(df_seoul.head())
         1970     1971     1972     1973     1974     1975     1976     1977  \
전입지                                                                             
전국     1448985  1419016  1210559  1647268  1819660  2937093  2495620  2678007   
부산광역시    11568    11130    11768    16307    22220    27515    23732    27213   
대구광역시        -        -        -        -        -        -        -        -   
인천광역시        -        -        -        -        -        -        -        -   
광주광역시        -        -        -        -        -        -        -        -   

          1978     1979  ...     2008     2009     2010     2011     2012  \
전입지                      ...                                                
전국     3028911  2441242  ...  2083352  1925452  1848038  1834806  1658928   
부산광역시    29856    28542  ...    17353    17738    17418    18816    16135   
대구광역시        -        -  ...     9720    10464    10277    10397    10135   
인천광역시        -        -  ...    50493    45392    46082    51641    49640   
광주광역시        -        -  ...    10846    11725    11095    10587    10154   

          2013     2014     2015     2016     2017  
전입지                                                 
전국     1620640  1661425  1726687  1655859  1571423  
부산광역시    16153    17320    17009    15062    14484  
대구광역시    10631    10062    10191     9623     8891  
인천광역시    47424    43212    44915    43745    40485  
광주광역시     9129     9759     9216     8354     7932

print(df_seoul.info())
<class 'pandas.core.frame.DataFrame'>
Index: 17 entries, 전국 to 제주특별자치도
Data columns (total 48 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   1970    17 non-null     object
 1   1971    17 non-null     object
 2   1972    17 non-null     object
 3   1973    17 non-null     object
 4   1974    17 non-null     object
 5   1975    17 non-null     object
 6   1976    17 non-null     object
 7   1977    17 non-null     object
 8   1978    17 non-null     object
 9   1979    17 non-null     object
 10  1980    17 non-null     object
 11  1981    17 non-null     object
 12  1982    17 non-null     object
 13  1983    17 non-null     object
 14  1984    17 non-null     object
 15  1985    17 non-null     object
 16  1986    17 non-null     object
 17  1987    17 non-null     object
 18  1988    17 non-null     object
 19  1989    17 non-null     object
 20  1990    17 non-null     object
 21  1991    17 non-null     object
 22  1992    17 non-null     object
 23  1993    17 non-null     object
 24  1994    17 non-null     object
 25  1995    17 non-null     object
 26  1996    17 non-null     object
 27  1997    17 non-null     object
 28  1998    17 non-null     object
 29  1999    17 non-null     object
 30  2000    17 non-null     object
 31  2001    17 non-null     object
 32  2002    17 non-null     object
 33  2003    17 non-null     object
 34  2004    17 non-null     object
 35  2005    17 non-null     object
 36  2006    17 non-null     object
 37  2007    17 non-null     object
 38  2008    17 non-null     object
 39  2009    17 non-null     object
 40  2010    17 non-null     object
 41  2011    17 non-null     object
 42  2012    17 non-null     object
 43  2013    17 non-null     object
 44  2014    17 non-null     object
 45  2015    17 non-null     object
 46  2016    17 non-null     object
 47  2017    17 non-null     object
dtypes: object(48)
memory usage: 6.5+ KB
None

col_years = list(map(str, range(1970, 1980))) # 문자열 리스트
sr1 = df_seoul.loc[['충청남도', '경상북도', '강원도'], col_years]
print(sr1)

       1970   1971   1972   1973   1974   1975   1976   1977   1978   1979
전입지                                                                       
충청남도  15954  18943  23406  27139  25509  51205  41447  43993  48091  45388
경상북도  11868  16459  22073  27531  26902  46177  40376  41155  42940  43565
강원도    9352  12885  13561  16481  15479  27837  25927  25415  26700  27599

plt.style.use('ggplot') #print(plt.style.available)로 스타일 설정 가능
fig = plt.figure(figsize=(20,5)) # 크기 지정
ax = fig.add_subplot(1,1,1) # 1행1열 1번째

ax.plot(col_years, sr1.loc['충청남도', :], marker = 'o',\
         markerfacecolor = 'green', markersize = 10, color = 'olive',\
        linewidth = 2, label = '서울 -> 충남') # 선그래프
ax.plot(col_years, sr1.loc['경상북도', :], marker = 'o',\
         markerfacecolor = 'blue', markersize = 10, color = 'skyblue',\
        linewidth = 2, label = '서울 -> 경북') # 선그래프
ax.plot(col_years, sr1.loc['강원도', :], marker = 'o',\
         markerfacecolor = 'red', markersize = 10, color = 'magenta',\
        linewidTh = 2, label = '서울 -> 강원') # 선그래프

ax.legend(loc='best', fontsize = 20) # best 최적의 장소, 우상은 그래프로 가리니깐 좌상으로 감
ax.set_title('서울 -> 충남, 경북, 강원 인구이동', size = 20) # 차트 제목
ax.set_ylabel('이동 인구수', size = 12) #  x축 이름
ax.set_xlabel('기간', size = 12)     
ax.set_xticklabels(col_years, rotation = 90)
ax.tick_params(axis = "x", labelsize = 10)
ax.tick_params(axis = "y", labelsize = 10)
plt.show()

col_years = list(map(str, range(2000, 2018))) # 문자열 리스트
sr2 = df_seoul.loc[['경기도','부산광역시'], col_years]
print(sr2)

         2000    2001    2002    2003    2004    2005    2006    2007    2008  \
전입지                                                                             
경기도    435573  499575  516765  457656  400206  414621  449632  431637  412408   
부산광역시   15968   16128   16732   16368   15559   15915   17079   17182   17353   

         2009    2010    2011    2012    2013    2014    2015    2016    2017  
전입지                                                                            
경기도    398282  410735  373771  354135  340801  332785  359337  370760  342433  
부산광역시   17738   17418   18816   16135   16153   17320   17009   15062   14484

plt.style.use('ggplot') #print(plt.style.available)로 스타일 설정 가능
fig = plt.figure(figsize=(20,5)) # 크기 지정
ax = fig.add_subplot(1,1,1) # 1행1열 1번째

ax.plot(col_years, sr2.loc['경기도', :], marker = 'o',\
         markerfacecolor = 'green', markersize = 10, color = 'olive',\
        linewidth = 2, label = '서울 -> 경기') # 선그래프
ax.plot(col_years, sr2.loc['부산광역시', :], marker = 'o',\
         markerfacecolor = 'blue', markersize = 10, color = 'skyblue',\
        linewidth = 2, label = '서울 -> 부산') # 선그래프


ax.legend(loc='best', fontsize = 20) # best 최적의 장소, 우상은 그래프로 가리니깐 좌상으로 감
ax.set_title('서울 -> 경기, 부산 인구이동', size = 20) # 차트 제목
ax.set_ylabel('이동 인구수', size = 12) #  x축 이름
ax.set_xlabel('기간', size = 12)     
ax.set_xticklabels(col_years, rotation = 90)
ax.tick_params(axis = "x", labelsize = 10) # x축의 라벨 크기, 문자 크기
ax.tick_params(axis = "y", labelsize = 10)
plt.show()

# 그림판 여러개 그래프 작성
col_years = list(map(str, range(1970, 2018))) # 문자열 리스트
sr3 = df_seoul.loc[['충청남도','경상북도', '강원도', '전라남도'], col_years]
print(sr3)
         1970   1971   1972   1973   1974   1975   1976   1977   1978   1979  \
전입지                                                                          
충청남도  15954  18943  23406  27139  25509  51205  41447  43993  48091  45388   
경상북도  11868  16459  22073  27531  26902  46177  40376  41155  42940  43565   
강원도    9352  12885  13561  16481  15479  27837  25927  25415  26700  27599   
전라남도  10513  16755  20157  22160  21314  46610  46251  43430  44624  47934   

      ...   2008   2009   2010   2011   2012   2013   2014   2015   2016  \
전입지   ...                                                                  
충청남도  ...  27458  24889  24522  24723  22269  21486  21473  22299  21741   
경상북도  ...  15425  16569  16042  15818  15191  14420  14456  15113  14236   
강원도   ...  23668  23331  22736  23624  22332  20601  21173  22659  21590   
전라남도  ...  16601  17468  16429  15974  14765  14187  14591  14598  13065   

       2017  
전입지          
충청남도  21020  
경상북도  12464  
강원도   21016  
전라남도  12426  

[4 rows x 48 columns]

plt.style.use('ggplot') #print(plt.style.available)로 스타일 설정 가능
fig = plt.figure(figsize=(20,10)) # 크기 지정
ax1 = fig.add_subplot(2,2,1) # 1행1열 1번째
ax2 = fig.add_subplot(2,2,2) # 1행1열 1번째
ax3 = fig.add_subplot(2,2,3) # 1행1열 1번째
ax4 = fig.add_subplot(2,2,4) # 1행1열 1번째
ax1.plot(col_years, sr3.loc['충청남도', :], marker = 'o',\
         markerfacecolor = 'green', markersize = 10, color = 'olive',\
        linewidth = 2, label = '서울 -> 충남') # 선그래프
ax2.plot(col_years, sr3.loc['경상북도', :], marker = 'o',\
         markerfacecolor = 'blue', markersize = 10, color = 'skyblue',\
        linewidth = 2, label = '서울 -> 경북') # 선그래프
ax3.plot(col_years, sr3.loc['강원도', :], marker = 'o',\
         markerfacecolor = 'red', markersize = 10, color = 'red',\
        linewidth = 2, label = '서울 -> 강원') # 선그래프
ax4.plot(col_years, sr3.loc['전라남도', :], marker = 'o',\
         markerfacecolor = 'yellow', markersize = 10, color = 'yellow',\
        linewidth = 2, label = '서울 -> 전남') # 선그래프

ax1.legend(loc='best', fontsize = 20)
ax2.legend(loc='best', fontsize = 20)
ax3.legend(loc='best', fontsize = 20)
ax4.legend(loc='best', fontsize = 20)

ax1.set_title('서울 ->충남 인구이동', size = 20) # 차트 제목
ax2.set_title('서울 ->경북 인구이동', size = 20) # 차트 제목
ax3.set_title('서울 ->강원 인구이동', size = 20) # 차트 제목
ax4.set_title('서울 ->전남 인구이동', size = 20) # 차트 제목

ax1.set_ylabel('이동 인구수', size = 12) #  x축 이름
ax1.set_xlabel('기간', size = 12)     
ax2.set_ylabel('이동 인구수', size = 12) #  x축 이름
ax2.set_xlabel('기간', size = 12)     
ax3.set_ylabel('이동 인구수', size = 12) #  x축 이름
ax3.set_xlabel('기간', size = 12)     
ax4.set_ylabel('이동 인구수', size = 12) #  x축 이름
ax4.set_xlabel('기간', size = 12)     

ax1.set_xticklabels(col_years, rotation = 90)
ax2.set_xticklabels(col_years, rotation = 90)
ax3.set_xticklabels(col_years, rotation = 90)
ax4.set_xticklabels(col_years, rotation = 90)

ax1.tick_params(axis = "x", labelsize = 10) # x축의 라벨 크기, 문자 크기
ax1.tick_params(axis = "y", labelsize = 10)
ax2.tick_params(axis = "x", labelsize = 10) # x축의 라벨 크기, 문자 크기
ax2.tick_params(axis = "y", labelsize = 10)
ax3.tick_params(axis = "x", labelsize = 10) # x축의 라벨 크기, 문자 크기
ax3.tick_params(axis = "y", labelsize = 10)
ax4.tick_params(axis = "x", labelsize = 10) # x축의 라벨 크기, 문자 크기
ax4.tick_params(axis = "y", labelsize = 10)

plt.show()

# 면적그래프 area 선그래프 작성시 선과 x축 공간 을 색으로 표시
sr4 = sr3.T

plt.style.use('ggplot')
# T 된 년도가 인덱스
sr4.index = sr4.index.map(int)
# area 그래프 작성 // 메모리 구조 스택 LIFO T 쌓여진 형태 // F 겹쳐진 혀애
sr4.plot(kind = 'area', stacked = True, alpha= 0.2, figsize = (20,10)) # alpha 투명도
plt.title('서울 -> 타도시 인구이동')
plt.ylabel('인구이동수',size = 20)
plt.xlabel('기간',size = 20)
plt.legend(loc='best')
plt.show()

# 막대그래프
plt.style.use('ggplot')
# area 그래프 작성 // 메모리 구조 스택 LIFO T 쌓여진 형태 // F 겹쳐진 혀애
sr4.plot(kind = 'bar', figsize = (20,10), width = 0.7, color=['orange','green','skyblue','blue']) # alpha 투명도
plt.title('서울 -> 타도시 인구이동', size= 30)
plt.ylabel('인구이동수',size = 20)
plt.xlabel('기간',size = 20)
plt.ylim(5000,60000)
plt.legend(loc='best')
plt.show()

# 그림판 여러개 그래프 작성
col_years = list(map(str, range(2000, 2017))) # 문자열 리스트
sr5 = df_seoul.loc[['충청남도','경상북도', '강원도', '전라남도'], col_years]
print(sr3)


전입지
충청남도    23083
경상북도    14576
강원도     22832
전라남도    22969
Name: 2000, dtype: object

# 막대그래프
plt.style.use('ggplot')
# area 그래프 작성 // 메모리 구조 스택 LIFO T 쌓여진 형태 // F 겹쳐진 혀애
sr5.plot(kind = 'bar', figsize = (20,10), width = 0.7, color=['orange','green','skyblue','blue']) # alpha 투명도
plt.title('서울 -> 타도시 인구이동', size= 30)
plt.ylabel('인구이동수',size = 20)
plt.xlabel('기간',size = 20)
plt.ylim(5000,60000)
plt.legend(loc='best')
plt.show()

#가로 막대그래프
# print(sr3.sum(axis=1))
sr3['합계'] = sr3.sum(axis=1)
print(sr3['합계'])
# 합계 내림차순
sr3_tot = sr3[['합계']].sort_values(by='합계', ascending=True)
print(sr3_tot)

전입지
충청남도    6117092.0
경상북도    4208700.0
강원도     4585100.0
전라남도    5526628.0
Name: 합계, dtype: float64
             합계
전입지            
경상북도  4208700.0
강원도   4585100.0
전라남도  5526628.0
충청남도  6117092.0

# 수직
plt.style.use('ggplot')
sr3_tot.plot(kind = 'barh', figsize = (10,5), width = 0.5, color='cornflowerblue') # alpha 투명도
plt.title('서울 -> 타도시 인구이동', size= 30)
plt.ylabel('전입지',size = 20)
plt.xlabel('이동인구 수',size = 20)
plt.legend(loc='best')
plt.show()

# 그림판 여러개 그래프 작성
col_years = list(map(str, range(2010, 2018))) # 문자열 리스트
sr6 = df_seoul.loc[['충청남도','경상북도', '강원도', '전라남도'], col_years]
sr6['합계'] = sr6.sum(axis=1)
# print(sr6[['합계']]) 데이터 프레임 여러개 중에 한개
# print(sr6['합계']) 시리즈 한열
# 합계 내림차순
sr6_tot = sr6[['합계']].sort_values(by='합계', ascending=True)
print(sr6_tot)


전입지
충청남도    179533.0
경상북도    117740.0
강원도     175731.0
전라남도    116035.0
Name: 합계, dtype: float64
            합계
전입지           
전라남도  116035.0
경상북도  117740.0
강원도   175731.0
충청남도  179533.0

수직 막대

# 수직
plt.style.use('ggplot')
sr6_tot.plot(kind = 'barh', figsize = (10,5), width = 0.5, color='cornflowerblue') # alpha 투명도
plt.title('서울 -> 타도시 인구이동', size= 30)
plt.ylabel('전입지',size = 20)
plt.xlabel('이동인구 수',size = 20)
plt.legend(loc='best')
plt.show()

리스트데이터를 막대로 출력

# 리스트데이터를 막대로 출력
import matplotlib.pyplot as plt

plt.style.use('ggplot')

subjects = ['Orable', 'R', 'Python','Sklearn', 'Tensorflow']
scores = [65,90,85,60,95]

fig = plt.figure()
ax1 = fig.add_subplot(1,1,1)
ax1.bar(range(len(subjects)), scores, align = 'center', color = 'darkblue') # 수치 데이터 삽입

ax1.xaxis.set_ticks_position('bottom')
ax1.yaxis.set_ticks_position('left') # 축 수치 위치
plt.xticks(range(len(subjects)), subjects, rotation = 0, fontsize = 'small') # 종류 데이터 삽입

plt.xlabel('subject') # 축 이름 
plt.ylabel('score')
plt.title('Class Score') # 제목

plt.savefig('bar_plot.png', dpi=400, bbox_inches='tight') # 저장
plt.show()

연합 막대 그리기

# 연합 막대 그리기
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('ggplot')
plt.rcParams['axes.unicode_minus'] = False # 마이너스 부호 출력

# Excel 데이터 프레임으로 변환
df = pd.read_excel('남북한발전전력량.xlsx', engine='openpyxl')
df = df.loc[5:9]
df.drop('전력량 (억㎾h)', axis='columns', inplace = True)
df.set_index('발전 전력별', inplace=True)
df = df.T
print(df.head())

발전 전력별   합계   수력   화력 원자력
1990    277  156  121   -
1991    263  150  113   -
1992    247  142  105   -
1993    221  133   88   -
1994    231  138   93   -

증감율 변동률 계산

# 증감율 변동률 계산
df = df.rename(columns={'합계' : '총발전량'})
 # shift 앞의 값으로 내값을 가져와 // 그래서 맨 위값은 결측값, 가져올게 없어서
df['총발전량 - 1년'] = df['총발전량'].shift(1)
df['증감율'] = ((df['총발전량'] / df['총발전량 - 1년'])- 1) *100
# 2축 그래프 그리기
ax1 = df[['수력', '화력']].plot(kind='bar', figsize = (20,10), width=0.7, stacked=True)

ax2 = ax1.twinx() # 복사, 같은 영역인 것처럼
ax2.plot(df.index, df.증감율, ls='--', marker='o', markersize=20, color='green', label='전년대비 증감율(%)')
# ls -- 점선, marker 0, color green
ax1.set_ylim(0, 500)
ax2.set_ylim(-50, 50)
ax1.set_xlabel('연도', size=20)
ax1.set_ylabel('발전량 (억㎾h)')
ax2.set_ylabel('전년 대비 증감율(%)')
plt.title('북한 전력 발전량 (1990 ~ 2016)', size = 30)
ax1.legend(loc='upper left')

plt.show()

히스토그램

# 히스토그램
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('classic')
df= pd.read_csv('auto-mpg.csv', header = None)
df.columns = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration', 'model year', 'origin', 'name']
df['mpg'].plot(kind='hist', bins=100, color='coral', figsize = (10,5))
plt.title('Histogram')
plt.xlabel('mpg')
plt.show()

plot(kind='hist') 히스토그램
plot(kind='area') 면적
plot(kind='bar') 막대
plot(kind='barh') 수직 막대
bins = 10 구간 10개

# 1 
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import font_manager, rc
rc('font',family='Malgun Gothic')
df = pd.read_excel('시도별 전출입 인구수.xlsx',engine='openpyxl')
df = df.fillna(method='ffill')
mask = (df['전출지별'] == '서울특별시') & (df['전입지별'] == '전국')
df_seoul_out = df[mask]
df_seoul_out = df_seoul_out.drop(["전출지별"],axis=1)
df_seoul_out.rename(columns={'전입지별':'전입지'},inplace=True)
df_seoul_out.set_index('전입지',inplace=True)

mask2 = (df['전출지별'] == '전국') & (df['전입지별'] == '서울특별시')
df_seoul_in = df[mask2]
df_seoul_in = df_seoul_in.drop(["전입지별"],axis=1)
df_seoul_in.rename(columns={'전출지별':'전출지'},inplace=True)
df_seoul_in.set_index('전출지',inplace=True)

print(df_seoul_in.head())

        1970     1971     1972     1973     1974     1975     1976     1977  \
전출지                                                                           
전국   1742813  1671705  1349333  1831858  2050392  3396662  2756510  2893403   

        1978     1979  ...     2008     2009     2010     2011     2012  \
전출지                    ...                                                
전국   3307439  2589667  ...  2025358  1873188  1733015  1721748  1555281   

        2013     2014     2015     2016     2017  
전출지                                               
전국   1520090  1573594  1589431  1515602  1472937  

[1 rows x 48 columns]

print(df_seoul_out.head())

        1970     1971     1972     1973     1974     1975     1976     1977  \
전입지                                                                           
전국   1448985  1419016  1210559  1647268  1819660  2937093  2495620  2678007   

        1978     1979  ...     2008     2009     2010     2011     2012  \
전입지                    ...                                                
전국   3028911  2441242  ...  2083352  1925452  1848038  1834806  1658928   

        2013     2014     2015     2016     2017  
전입지                                               
전국   1620640  1661425  1726687  1655859  1571423  

[1 rows x 48 columns]

plt.style.use('ggplot')
fig = plt.figure(figsize=(20,5))
ax = fig.add_subplot(1,1,1) 
ax.plot(df_seoul_in.loc['전국',:],marker='o',markerfacecolor='green',markersize=5,color='olive', \
linewidth=2, label='서울 전입자')
ax.plot(df_seoul_out.loc['전국',:],marker='o',markerfacecolor='darkblue',markersize=5,color='blue', \
linewidth=2, label='서울 전출자')
ax.legend(loc='best')
ax.set_title('연도별 서울 전입/전출',size = 20)
ax.set_xlabel('기간',size=12)
ax.set_ylabel('이동 인구수',size=12)
ax.tick_params(axis='x',labelsize=10) 
ax.tick_params(axis='y',labelsize=10)

# 2 
import pandas as pd
import matplotlib.pyplot as plt

from matplotlib import font_manager, rc
font_name = font_manager.FontProperties(fname='c:/Windows/Fonts/malgun.ttf').get_name()
rc('font',family=font_name)

df = pd.read_excel('시도별 전출입 인구수.xlsx', engine='openpyxl')
df = df.fillna(method='ffill')
mask = ((df['전출지별'] == '서울특별시') & (df['전입지별'] == '전국')) |\
((df['전출지별']=='전국') & (df['전입지별'] == '서울특별시'))
df_seoul = df[mask]
df_seoul = df_seoul.drop('전출지별',axis=1)
df_seoul.rename(columns={'전입지별':'전입지'}, inplace =True)
df_seoul.set_index('전입지',inplace=True)
df_seoul = df_seoul.T
df_seoul.rename(columns={'서울특별시':'전입'},inplace=True)
df_seoul.rename(columns={'전국':'전출'}, inplace=True)
df_seoul['증감율'] =((df_seoul['전입'] / df_seoul['전출']) - 1) * 100

plt.style.use('ggplot')
ax1 = df_seoul[['전입','전출']].plot(kind='bar',figsize=(20,10), width=0.8, color=['pink','skyblue'])
ax2 = ax1.twinx()
ax2.plot(df_seoul.index, df_seoul.증감율, ls='--', marker='o', markersize=5,
color='yellowgreen', label='증감율(%)')
ax1.set_ylim(0,4000000)
ax2.set_ylim(-50,50)
ax1.set_xlabel('년도',size=20)
ax1.set_ylabel('이동 인구 수')
ax2.set_ylabel('증감율(%)')
plt.title('서울의 전입 전출 정보',size=30)
ax1.legend(loc='upper left',fontsize=15)
ax2.legend(loc='upper right', fontsize=15)
plt.show()

# 3
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import font_manager, rc

font_name = font_manager.FontProperties(fname='c:/Windows/Fonts/malgun.ttf').get_name()
rc('font',family=font_name)

plt.style.use('ggplot')
plt.rcParams['axes.unicode_minus'] = False

df = pd.read_excel('남북한발전전력량.xlsx', engine='openpyxl')
df = df.loc[0:4]
df.drop('전력량 (억㎾h)',axis='columns', inplace=True)
df.set_index('발전 전력별', inplace=True)
df = df.T
print(df.head())
print(df.tail())
df = df.rename(columns={'합계':'총발전량'})
df['총발젼량 - 1년'] = df['총발전량'].shift(1)
df['증감율'] = ((df['총발전량'] / df['총발젼량 - 1년']) - 1) * 100

발전 전력별    합계  수력    화력  원자력 신재생
1990    1077  64   484  529   -
1991    1186  51   573  563   -
1992    1310  49   696  565   -
1993    1444  60   803  581   -
1994    1650  41  1022  587   -
발전 전력별    합계  수력    화력   원자력  신재생
2012    5096  77  3430  1503   86
2013    5171  84  3581  1388  118
2014    5220  78  3427  1564  151
2015    5281  58  3402  1648  173
2016    5404  66  3523  1620  195

ax1 = df[['수력','화력','원자력']].plot(kind='bar', figsize=(20,10), width = 0.7, stacked=True)
ax2 = ax1.twinx()
ax2.plot(df.index, df.증감율, ls='--', marker='o', markersize=10,
color='green', label='전년대비 증감율(%)')
ax1.set_ylim(0,7000)
ax2.set_ylim(-50,50)
ax1.set_xlabel('년도',size=20)
ax1.set_ylabel('발전량(억 kWh)')
ax2.set_ylabel('전년 대비 증감율(%)')
plt.title('남한 전력 발전량(1990 ~ 2016)', size= 30)
ax1.legend(loc='upper left')
plt.show()

저작자표시 비영리

'Data_Science > Data_Analysis_Py' 카테고리의 다른 글

6. Titanic \|\| '21.06.24. (0)	2021.10.26
5. auto-mpg 분석 \|\| '21.06.24. (0)	2021.10.26
3. 시도별 전출입 인구수 분석 ( 1 \|\| 2021.06.23 (0)	2021.10.26
2. auto-mpg 데이터 분석 \|\| 2021.06.24 (0)	2021.10.19
1. 남북한발전전력량 분석 \|\| 2021.06.24 (0)	2021.10.19

3. 시도별 전출입 인구수 분석 ( 1 || 2021.06.23

2021. 10. 26. 00:05

728x90

시도별 전출입 인구수.xlsx

0.10MB

df = pd.read_excel('./시도별 전출입 인구수.xlsx', engine='openpyxl')
print(df.head())


   전출지별   전입지별      1970      1971      1972      1973      1974      1975  \
0  전출지별   전입지별  이동자수 (명)  이동자수 (명)  이동자수 (명)  이동자수 (명)  이동자수 (명)  이동자수 (명)   
1    전국     전국   4046536   4210164   3687938   4860418   5297969   9011440   
2   NaN  서울특별시   1742813   1671705   1349333   1831858   2050392   3396662   
3   NaN  부산광역시    448577    389797    362202    482061    680984    805979   
4   NaN  대구광역시         -         -         -         -         -         -   

       1976      1977  ...      2008      2009      2010      2011      2012  \
0  이동자수 (명)  이동자수 (명)  ...  이동자수 (명)  이동자수 (명)  이동자수 (명)  이동자수 (명)  이동자수 (명)   
1   6773250   7397623  ...   8808256   8487275   8226594   8127195   7506691   
2   2756510   2893403  ...   2025358   1873188   1733015   1721748   1555281   
3    724664    785117  ...    514502    519310    519334    508043    461042   
4         -         -  ...    409938    398626    370817    370563    348642   

       2013      2014      2015      2016      2017  
0  이동자수 (명)  이동자수 (명)  이동자수 (명)  이동자수 (명)  이동자수 (명)  
1   7411784   7629098   7755286   7378430   7154226  
2   1520090   1573594   1589431   1515602   1472937  
3    478451    485710    507031    459015    439073  
4    351873    350213    351424    328228    321182  

[5 rows x 50 columns]

#누락값
df =df.fillna(method = 'ffill')
print(df.head())

   전출지별   전입지별      1970      1971      1972      1973      1974      1975  \
0  전출지별   전입지별  이동자수 (명)  이동자수 (명)  이동자수 (명)  이동자수 (명)  이동자수 (명)  이동자수 (명)   
1    전국     전국   4046536   4210164   3687938   4860418   5297969   9011440   
2    전국  서울특별시   1742813   1671705   1349333   1831858   2050392   3396662   
3    전국  부산광역시    448577    389797    362202    482061    680984    805979   
4    전국  대구광역시         -         -         -         -         -         -   

       1976      1977  ...      2008      2009      2010      2011      2012  \
0  이동자수 (명)  이동자수 (명)  ...  이동자수 (명)  이동자수 (명)  이동자수 (명)  이동자수 (명)  이동자수 (명)   
1   6773250   7397623  ...   8808256   8487275   8226594   8127195   7506691   
2   2756510   2893403  ...   2025358   1873188   1733015   1721748   1555281   
3    724664    785117  ...    514502    519310    519334    508043    461042   
4         -         -  ...    409938    398626    370817    370563    348642   

       2013      2014      2015      2016      2017  
0  이동자수 (명)  이동자수 (명)  이동자수 (명)  이동자수 (명)  이동자수 (명)  
1   7411784   7629098   7755286   7378430   7154226  
2   1520090   1573594   1589431   1515602   1472937  
3    478451    485710    507031    459015    439073  
4    351873    350213    351424    328228    321182  

[5 rows x 50 columns]

# 전출지가 서울에서 다른 지역으로 이동한 데이터만 추출하기
mask = (df['전출지별'] == '서울특별시') & (df['전입지별'] != '서울특별시')
print(mask)


0      False
1      False
2      False
3      False
4      False
       ...  
320    False
321    False
322    False
323    False
324    False
Length: 325, dtype: bool

print(mask.value_counts())

False    308
True      17
dtype: int64

# mask 값이 true 인 레코드만 선택해서 df_seoul에 저장
df_seoul = df[mask]
print(df_seoul)


     전출지별     전입지별     1970     1971     1972     1973     1974     1975  \
19  서울특별시       전국  1448985  1419016  1210559  1647268  1819660  2937093   
21  서울특별시    부산광역시    11568    11130    11768    16307    22220    27515   
22  서울특별시    대구광역시        -        -        -        -        -        -   
23  서울특별시    인천광역시        -        -        -        -        -        -   
24  서울특별시    광주광역시        -        -        -        -        -        -   
25  서울특별시    대전광역시        -        -        -        -        -        -   
26  서울특별시    울산광역시        -        -        -        -        -        -   
27  서울특별시  세종특별자치시        -        -        -        -        -        -   
28  서울특별시      경기도   130149   150313    93333   143234   149045   253705   
29  서울특별시      강원도     9352    12885    13561    16481    15479    27837   
30  서울특별시     충청북도     6700     9457    10853    12617    11786    21073   
31  서울특별시     충청남도    15954    18943    23406    27139    25509    51205   
32  서울특별시     전라북도    10814    13192    16583    18642    16647    34411   
33  서울특별시     전라남도    10513    16755    20157    22160    21314    46610   
34  서울특별시     경상북도    11868    16459    22073    27531    26902    46177   
35  서울특별시     경상남도     8409    10001    11263    15193    16771    23150   
36  서울특별시  제주특별자치도     1039     1325     1617     2456     2261     3440   

       1976     1977  ...     2008     2009     2010     2011     2012  \
19  2495620  2678007  ...  2083352  1925452  1848038  1834806  1658928   
21    23732    27213  ...    17353    17738    17418    18816    16135   
22        -        -  ...     9720    10464    10277    10397    10135   
23        -        -  ...    50493    45392    46082    51641    49640   
24        -        -  ...    10846    11725    11095    10587    10154   
25        -        -  ...    13515    13632    13819    13900    14080   
26        -        -  ...     5057     4845     4742     5188     5691   
27        -        -  ...        -        -        -        -     2998   
28   202276   207722  ...   412408   398282   410735   373771   354135   
29    25927    25415  ...    23668    23331    22736    23624    22332   
30    18029    17478  ...    15294    15295    15461    15318    14555   
31    41447    43993  ...    27458    24889    24522    24723    22269   
32    29835    28444  ...    18390    18332    17569    17755    16120   
33    46251    43430  ...    16601    17468    16429    15974    14765   
34    40376    41155  ...    15425    16569    16042    15818    15191   
35    22400    27393  ...    15438    15303    15689    16039    14474   
36     3623     3551  ...     5473     5332     5714     6133     6954   

       2013     2014     2015     2016     2017  
19  1620640  1661425  1726687  1655859  1571423  
21    16153    17320    17009    15062    14484  
22    10631    10062    10191     9623     8891  
23    47424    43212    44915    43745    40485  
24     9129     9759     9216     8354     7932  
25    13440    13403    13453    12619    11815  
26     5542     6047     5950     5102     4260  
27     2851     6481     7550     5943     5813  
28   340801   332785   359337   370760   342433  
29    20601    21173    22659    21590    21016  
30    13783    14244    14379    14087    13302  
31    21486    21473    22299    21741    21020  
32    14909    14566    14835    13835    13179  
33    14187    14591    14598    13065    12426  
34    14420    14456    15113    14236    12464  
35    14447    14799    15220    13717    12692  
36     7828     9031    10434    10465    10404  

[17 rows x 50 columns]

#df seoul 데이터에서 전출지별 컬럼 갑은 모두 서울 특별시임
# 전출지별 컬럼 제거

df_seoul = df_seoul.drop('전출지별',axis=1)
print(df_seoul)


       전입지별     1970     1971     1972     1973     1974     1975     1976  \
19       전국  1448985  1419016  1210559  1647268  1819660  2937093  2495620   
21    부산광역시    11568    11130    11768    16307    22220    27515    23732   
22    대구광역시        -        -        -        -        -        -        -   
23    인천광역시        -        -        -        -        -        -        -   
24    광주광역시        -        -        -        -        -        -        -   
25    대전광역시        -        -        -        -        -        -        -   
26    울산광역시        -        -        -        -        -        -        -   
27  세종특별자치시        -        -        -        -        -        -        -   
28      경기도   130149   150313    93333   143234   149045   253705   202276   
29      강원도     9352    12885    13561    16481    15479    27837    25927   
30     충청북도     6700     9457    10853    12617    11786    21073    18029   
31     충청남도    15954    18943    23406    27139    25509    51205    41447   
32     전라북도    10814    13192    16583    18642    16647    34411    29835   
33     전라남도    10513    16755    20157    22160    21314    46610    46251   
34     경상북도    11868    16459    22073    27531    26902    46177    40376   
35     경상남도     8409    10001    11263    15193    16771    23150    22400   
36  제주특별자치도     1039     1325     1617     2456     2261     3440     3623   

       1977     1978  ...     2008     2009     2010     2011     2012  \
19  2678007  3028911  ...  2083352  1925452  1848038  1834806  1658928   
21    27213    29856  ...    17353    17738    17418    18816    16135   
22        -        -  ...     9720    10464    10277    10397    10135   
23        -        -  ...    50493    45392    46082    51641    49640   
24        -        -  ...    10846    11725    11095    10587    10154   
25        -        -  ...    13515    13632    13819    13900    14080   
26        -        -  ...     5057     4845     4742     5188     5691   
27        -        -  ...        -        -        -        -     2998   
28   207722   237684  ...   412408   398282   410735   373771   354135   
29    25415    26700  ...    23668    23331    22736    23624    22332   
30    17478    18420  ...    15294    15295    15461    15318    14555   
31    43993    48091  ...    27458    24889    24522    24723    22269   
32    28444    29676  ...    18390    18332    17569    17755    16120   
33    43430    44624  ...    16601    17468    16429    15974    14765   
34    41155    42940  ...    15425    16569    16042    15818    15191   
35    27393    28697  ...    15438    15303    15689    16039    14474   
36     3551     3937  ...     5473     5332     5714     6133     6954   

       2013     2014     2015     2016     2017  
19  1620640  1661425  1726687  1655859  1571423  
21    16153    17320    17009    15062    14484  
22    10631    10062    10191     9623     8891  
23    47424    43212    44915    43745    40485  
24     9129     9759     9216     8354     7932  
25    13440    13403    13453    12619    11815  
26     5542     6047     5950     5102     4260  
27     2851     6481     7550     5943     5813  
28   340801   332785   359337   370760   342433  
29    20601    21173    22659    21590    21016  
30    13783    14244    14379    14087    13302  
31    21486    21473    22299    21741    21020  
32    14909    14566    14835    13835    13179  
33    14187    14591    14598    13065    12426  
34    14420    14456    15113    14236    12464  
35    14447    14799    15220    13717    12692  
36     7828     9031    10434    10465    10404  

[17 rows x 49 columns]

df_seoul.rename(columns={'전입지별':'전입지'}, inplace = True)
print(df_seoul)



        전입지     1970     1971     1972     1973     1974     1975     1976  \
19       전국  1448985  1419016  1210559  1647268  1819660  2937093  2495620   
21    부산광역시    11568    11130    11768    16307    22220    27515    23732   
22    대구광역시        -        -        -        -        -        -        -   
23    인천광역시        -        -        -        -        -        -        -   
24    광주광역시        -        -        -        -        -        -        -   
25    대전광역시        -        -        -        -        -        -        -   
26    울산광역시        -        -        -        -        -        -        -   
27  세종특별자치시        -        -        -        -        -        -        -   
28      경기도   130149   150313    93333   143234   149045   253705   202276   
29      강원도     9352    12885    13561    16481    15479    27837    25927   
30     충청북도     6700     9457    10853    12617    11786    21073    18029   
31     충청남도    15954    18943    23406    27139    25509    51205    41447   
32     전라북도    10814    13192    16583    18642    16647    34411    29835   
33     전라남도    10513    16755    20157    22160    21314    46610    46251   
34     경상북도    11868    16459    22073    27531    26902    46177    40376   
35     경상남도     8409    10001    11263    15193    16771    23150    22400   
36  제주특별자치도     1039     1325     1617     2456     2261     3440     3623   

       1977     1978  ...     2008     2009     2010     2011     2012  \
19  2678007  3028911  ...  2083352  1925452  1848038  1834806  1658928   
21    27213    29856  ...    17353    17738    17418    18816    16135   
22        -        -  ...     9720    10464    10277    10397    10135   
23        -        -  ...    50493    45392    46082    51641    49640   
24        -        -  ...    10846    11725    11095    10587    10154   
25        -        -  ...    13515    13632    13819    13900    14080   
26        -        -  ...     5057     4845     4742     5188     5691   
27        -        -  ...        -        -        -        -     2998   
28   207722   237684  ...   412408   398282   410735   373771   354135   
29    25415    26700  ...    23668    23331    22736    23624    22332   
30    17478    18420  ...    15294    15295    15461    15318    14555   
31    43993    48091  ...    27458    24889    24522    24723    22269   
32    28444    29676  ...    18390    18332    17569    17755    16120   
33    43430    44624  ...    16601    17468    16429    15974    14765   
34    41155    42940  ...    15425    16569    16042    15818    15191   
35    27393    28697  ...    15438    15303    15689    16039    14474   
36     3551     3937  ...     5473     5332     5714     6133     6954   

       2013     2014     2015     2016     2017  
19  1620640  1661425  1726687  1655859  1571423  
21    16153    17320    17009    15062    14484  
22    10631    10062    10191     9623     8891  
23    47424    43212    44915    43745    40485  
24     9129     9759     9216     8354     7932  
25    13440    13403    13453    12619    11815  
26     5542     6047     5950     5102     4260  
27     2851     6481     7550     5943     5813  
28   340801   332785   359337   370760   342433  
29    20601    21173    22659    21590    21016  
30    13783    14244    14379    14087    13302  
31    21486    21473    22299    21741    21020  
32    14909    14566    14835    13835    13179  
33    14187    14591    14598    13065    12426  
34    14420    14456    15113    14236    12464  
35    14447    14799    15220    13717    12692  
36     7828     9031    10434    10465    10404  

[17 rows x 49 columns]

# 전입지 컬럼을 index로
df_seoul.set_index('전입지', inplace = True)
print(df_seoul.head())




         전입지     1970     1971     1972     1973     1974     1975     1976  \
전입지                                                                           
전국        전국  1448985  1419016  1210559  1647268  1819660  2937093  2495620   
부산광역시  부산광역시    11568    11130    11768    16307    22220    27515    23732   
대구광역시  대구광역시        -        -        -        -        -        -        -   
인천광역시  인천광역시        -        -        -        -        -        -        -   
광주광역시  광주광역시        -        -        -        -        -        -        -   

          1977     1978  ...     2008     2009     2010     2011     2012  \
전입지                      ...                                                
전국     2678007  3028911  ...  2083352  1925452  1848038  1834806  1658928   
부산광역시    27213    29856  ...    17353    17738    17418    18816    16135   
대구광역시        -        -  ...     9720    10464    10277    10397    10135   
인천광역시        -        -  ...    50493    45392    46082    51641    49640   
광주광역시        -        -  ...    10846    11725    11095    10587    10154   

          2013     2014     2015     2016     2017  
전입지                                                 
전국     1620640  1661425  1726687  1655859  1571423  
부산광역시    16153    17320    17009    15062    14484  
대구광역시    10631    10062    10191     9623     8891  
인천광역시    47424    43212    44915    43745    40485  
광주광역시     9129     9759     9216     8354     7932  

[5 rows x 49 columns]

sr1 = pd.Series(df_seoul.loc['경기도'])
sr1

전입지        경기도
1970    130149
1971    150313
1972     93333
1973    143234
1974    149045
1975    253705
1976    202276
1977    207722
1978    237684
1979    278411
1980    297539
1981    252073
1982    320174
1983    400875
1984    352238
1985    390265
1986    412535
1987    405220
1988    415174
1989    412933
1990    473889
1991    384714
1992    428344
1993    502584
1994    542204
1995    599411
1996    520566
1997    495454
1998    407050
1999    471841
2000    435573
2001    499575
2002    516765
2003    457656
2004    400206
2005    414621
2006    449632
2007    431637
2008    412408
2009    398282
2010    410735
2011    373771
2012    354135
2013    340801
2014    332785
2015    359337
2016    370760
2017    342433
Name: 경기도, dtype: object

sr1.plot()

import matplotlib.pyplot as plt
from matplotlib import font_manager, rc
font_name = font_manager.FontProperties(fname="C:/Windows/Fonts/Malgun.ttf").get_name()
print(font_name)
print(sr1.values)
sr1 = sr1.astype(int)

# Malgun Gothic
# [130149 150313 93333 143234 149045 253705 202276 207722 237684 278411
#  297539 252073 320174 400875 352238 390265 412535 405220 415174 412933
#  473889 384714 428344 502584 542204 599411 520566 495454 407050 471841
#  435573 499575 516765 457656 400206 414621 449632 431637 412408 398282
#  410735 373771 354135 340801 332785 359337 370760 342433]

# 한글깨지면
font_name = font_manager.FontProperties(fname="c:/Windows\Fonts\Malgun.ttf").get_name()
print(font_name)
rc('font', family=font_name) # 폰트지정

plt.plot(sr1.index, sr1.values, marker = 'o', markersize = 10) # 선그래프
plt.title('서울->경기인구이동') # 차트 제목
plt.ylabel('이동 인구수') #  x축 이름
plt.xlabel('기간')         # y축 이름

plt.style.use('ggplot') #print(plt.style.available)로 스타일 설정 가능
plt.legend(labels=['서울 -> 경기'], loc='best', fontsize = 15) # 범례 표시
plt.figure(figsize=(14,5)) # 크기 지정
plt.xticks(rotation='vertical', size = 10) #size x축 라벨 글자 크기

plt.ylim(50000, 800000) # y축 범위 지정(최소 최대)

plt.annotate('', # 그래프 하나 그리기
            xy = (20,620000), # 화살표 머리, 끝점
            xytext = (2, 290000), # 화살표 꼬리 , 시작점
            xycoords = 'data', # 좌표 체계
            arrowprops = dict(arrowstyle = '->', color = 'skyblue', lw = 5), # 화살표 서식, lw 굵기
            ) 
# 주석표시 글자 설정
plt.annotate('인구이동증가(1970-1995)', # 이름
            xy = (10,550000), # 범위
            rotation = 25, #  회전각도
            va = 'baseline', # 상하정렬
            ha = 'center', # 좌우정렬
            fontsize = '15', # 텍스트 사이즈 
            ) 
plt.annotate('인구이동증가(1995-2017)', # 이름
            xy = (40,560000), # 화살표 머리, 끝점
            rotation = -11, # 화살표 꼬리 , 시작점
            va = 'baseline',
            ha = 'center',
            fontsize = '15', 
            ) 
# Malgun Gothic
# Text(40, 560000, '인구이동증가(1995-2017)')

# matplotlib에 등록된 스타일 리스트 조회
print(plt.style.available)

# ['Solarize_Light2', '_classic_test_patch', 'bmh', 'classic', 'dark_background', 'fast', 'fivethirtyeight', 'ggplot', 'grayscale', 'seaborn', 'seaborn-bright', 'seaborn-colorblind', 'seaborn-dark', 'seaborn-dark-palette', 'seaborn-darkgrid', 'seaborn-deep', 'seaborn-muted', 'seaborn-notebook', 'seaborn-paper', 'seaborn-pastel', 'seaborn-poster', 'seaborn-talk', 'seaborn-ticks', 'seaborn-white', 'seaborn-whitegrid', 'tableau-colorblind10']

fig = plt.figure(figsize=(10,10))
ax1 = fig.add_subplot(2,1,1) # 2행1열 1번째
ax2 = fig.add_subplot(2,1,2) # 2행1열 2번째 

ax1.plot(sr1, 'o', markersize =10)
# markerfacecolor 마커색상설정, # color 선 색상 # linewidth 선의 두께
ax2.plot(sr1, marker = 'o', markersize = 10, markerfacecolor = 'green', color = 'olive', linewidth=2, label = '서울 -> 경기')
ax2.legend(loc = 'best')
ax1.set_ylim(50000, 800000)
ax2.set_ylim(50000, 800000)
ax1.set_xticklabels(sr1.index, rotation=75) # rotation 75도
ax2.set_xticklabels(sr1.index, rotation=75)

[Text(0, 0, '1970'),
 Text(1, 0, '1971'),
 Text(2, 0, '1972'),
 Text(3, 0, '1973'),
 Text(4, 0, '1974'),
 Text(5, 0, '1975'),
 Text(6, 0, '1976'),
 Text(7, 0, '1977'),
 Text(8, 0, '1978'),
 Text(9, 0, '1979'),
 Text(10, 0, '1980'),
 Text(11, 0, '1981'),
 Text(12, 0, '1982'),
 Text(13, 0, '1983'),
 Text(14, 0, '1984'),
 Text(15, 0, '1985'),
 Text(16, 0, '1986'),
 Text(17, 0, '1987'),
 Text(18, 0, '1988'),
 Text(19, 0, '1989'),
 Text(20, 0, '1990'),
 Text(21, 0, '1991'),
 Text(22, 0, '1992'),
 Text(23, 0, '1993'),
 Text(24, 0, '1994'),
 Text(25, 0, '1995'),
 Text(26, 0, '1996'),
 Text(27, 0, '1997'),
 Text(28, 0, '1998'),
 Text(29, 0, '1999'),
 Text(30, 0, '2000'),
 Text(31, 0, '2001'),
 Text(32, 0, '2002'),
 Text(33, 0, '2003'),
 Text(34, 0, '2004'),
 Text(35, 0, '2005'),
 Text(36, 0, '2006'),
 Text(37, 0, '2007'),
 Text(38, 0, '2008'),
 Text(39, 0, '2009'),
 Text(40, 0, '2010'),
 Text(41, 0, '2011'),
 Text(42, 0, '2012'),
 Text(43, 0, '2013'),
 Text(44, 0, '2014'),
 Text(45, 0, '2015'),
 Text(46, 0, '2016'),
 Text(47, 0, '2017')]

저작자표시 비영리

'Data_Science > Data_Analysis_Py' 카테고리의 다른 글

6. Titanic \|\| '21.06.24. (0)	2021.10.26
5. auto-mpg 분석 \|\| '21.06.24. (0)	2021.10.26
4. 시도별 전출입 인구수 분석 ( 2 \|\| '21.06.24. (0)	2021.10.26
2. auto-mpg 데이터 분석 \|\| 2021.06.24 (0)	2021.10.19
1. 남북한발전전력량 분석 \|\| 2021.06.24 (0)	2021.10.19

2. auto-mpg 데이터 분석 || 2021.06.24

2021. 10. 19. 23:30

728x90

import csv
import pandas as pd

df = pd.read_csv('auto-mpg.csv', header=None)
df.columns = ['mpg','cylinders','displacemente','horepower',\
              'weight','acceleratoin','modelyear','origin','name']
print(df.head())

    mpg  cylinders  displacemente horepower  weight  acceleratoin  modelyear  \
0  18.0          8          307.0     130.0  3504.0          12.0         70   
1  15.0          8          350.0     165.0  3693.0          11.5         70   
2  18.0          8          318.0     150.0  3436.0          11.0         70   
3  16.0          8          304.0     150.0  3433.0          12.0         70   
4  17.0          8          302.0     140.0  3449.0          10.5         70   

   origin                       name  
0       1  chevrolet chevelle malibu  
1       1          buick skylark 320  
2       1         plymouth satellite  
3       1              amc rebel sst  
4       1                ford torino

print(df.corr())

                    mpg  cylinders  displacemente    weight  acceleratoin  \
mpg            1.000000  -0.775396      -0.804203 -0.831741      0.420289   
cylinders     -0.775396   1.000000       0.950721  0.896017     -0.505419   
displacemente -0.804203   0.950721       1.000000  0.932824     -0.543684   
weight        -0.831741   0.896017       0.932824  1.000000     -0.417457   
acceleratoin   0.420289  -0.505419      -0.543684 -0.417457      1.000000   
modelyear      0.579267  -0.348746      -0.370164 -0.306564      0.288137   
origin         0.563450  -0.562543      -0.609409 -0.581024      0.205873   

               modelyear    origin  
mpg             0.579267  0.563450  
cylinders      -0.348746 -0.562543  
displacemente  -0.370164 -0.609409  
weight         -0.306564 -0.581024  
acceleratoin    0.288137  0.205873  
modelyear       1.000000  0.180662  
origin          0.180662  1.000000

print(df[['mpg','weight']].corr())

             mpg    weight
mpg     1.000000 -0.831741
weight -0.831741  1.000000

df.plot(x='weight', y='mpg', kind='scatter')

df.plot(x='mpg', y='weight', kind='scatter')

df[['mpg', 'cylinders']].plot(kind='box')

저작자표시 비영리

'Data_Science > Data_Analysis_Py' 카테고리의 다른 글

6. Titanic \|\| '21.06.24. (0)	2021.10.26
5. auto-mpg 분석 \|\| '21.06.24. (0)	2021.10.26
4. 시도별 전출입 인구수 분석 ( 2 \|\| '21.06.24. (0)	2021.10.26
3. 시도별 전출입 인구수 분석 ( 1 \|\| 2021.06.23 (0)	2021.10.26
1. 남북한발전전력량 분석 \|\| 2021.06.24 (0)	2021.10.19

1. 남북한발전전력량 분석 || 2021.06.24

2021. 10. 19. 23:27

728x90

# 데이터 로드 및 라이브러리 import
import pandas as pd
df = pd.read_excel('./남북한발전전력량.xlsx', engine='openpyxl')
print(df.head())
#   전력량 (억㎾h) 발전 전력별  1990  1991  1992  1993  1994  1995  1996  1997  ...  2007  \
# 0        남한     합계  1077  1186  1310  1444  1650  1847  2055  2244  ...  4031   
# 1       NaN     수력    64    51    49    60    41    55    52    54  ...    50   
# 2       NaN     화력   484   573   696   803  1022  1122  1264  1420  ...  2551   
# 3       NaN    원자력   529   563   565   581   587   670   739   771  ...  1429   
# 4       NaN    신재생     -     -     -     -     -     -     -     -  ...     -   
# 
#    2008  2009  2010  2011  2012  2013  2014  2015  2016  
# 0  4224  4336  4747  4969  5096  5171  5220  5281  5404  
# 1    56    56    65    78    77    84    78    58    66  
# 2  2658  2802  3196  3343  3430  3581  3427  3402  3523  
# 3  1510  1478  1486  1547  1503  1388  1564  1648  1620  
# 4     -     -     -     -    86   118   151   173   195  
# 
# [5 rows x 29 columns]

# 0, 5행의 정보만, 1990이후 데이터 2열 이후 정보만 저장
ndf = df.iloc[[0,5],2:]
print(ndf)

   1990  1991  1992  1993  1994  1995  1996  1997  1998  1999  ...  2007  \
0  1077  1186  1310  1444  1650  1847  2055  2244  2153  2393  ...  4031   
5   277   263   247   221   231   230   213   193   170   186  ...   236   

   2008  2009  2010  2011  2012  2013  2014  2015  2016  
0  4224  4336  4747  4969  5096  5171  5220  5281  5404  
5   255   235   237   211   215   221   216   190   239  

[2 rows x 27 columns]

# 인덱스 변경
ndf.index=['South','North']
print(ndf)

       1990  1991  1992  1993  1994  1995  1996  1997  1998  1999  ...  2007  \
South  1077  1186  1310  1444  1650  1847  2055  2244  2153  2393  ...  4031   
North   277   263   247   221   231   230   213   193   170   186  ...   236   

       2008  2009  2010  2011  2012  2013  2014  2015  2016  
South  4224  4336  4747  4969  5096  5171  5220  5281  5404  
North   255   235   237   211   215   221   216   190   239  

[2 rows x 27 columns]

# 열의 이름을 정수형으로 변경
ndf.columns = ndf.columns.map(int)
print(ndf.head())

       1990  1991  1992  1993  1994  1995  1996  1997  1998  1999  ...  2007  \
South  1077  1186  1310  1444  1650  1847  2055  2244  2153  2393  ...  4031   
North   277   263   247   221   231   230   213   193   170   186  ...   236   

       2008  2009  2010  2011  2012  2013  2014  2015  2016  
South  4224  4336  4747  4969  5096  5171  5220  5281  5404  
North   255   235   237   211   215   221   216   190   239  

[2 rows x 27 columns]

# 선그래프 출력
# 열별로 선그래프가 작성되어버림 => 전치 행렬 필요
ndf.plot()

ndf2 = ndf.T
print(ndf2.corr())
ndf2.plot()

# 막대그래프로 출력
ndf2.plot(kind='bar')

# 히스토그램 출력
ndf2.plot(kind='hist')

저작자표시 비영리

'Data_Science > Data_Analysis_Py' 카테고리의 다른 글

6. Titanic \|\| '21.06.24. (0)	2021.10.26
5. auto-mpg 분석 \|\| '21.06.24. (0)	2021.10.26
4. 시도별 전출입 인구수 분석 ( 2 \|\| '21.06.24. (0)	2021.10.26
3. 시도별 전출입 인구수 분석 ( 1 \|\| 2021.06.23 (0)	2021.10.26
2. auto-mpg 데이터 분석 \|\| 2021.06.24 (0)	2021.10.19

PREV 1 ···3 4 5 6 NEXT

My_Flow

Data_Science/Data_Analysis_Py

10. folium 2

library.csv

color='blue'

markercluster 기능

'Data_Science > Data_Analysis_Py' 카테고리의 다른 글

9. tips || '21.06.28.

'Data_Science > Data_Analysis_Py' 카테고리의 다른 글

8. iris || '21.06.28.

'Data_Science > Data_Analysis_Py' 카테고리의 다른 글

7. folium || '21.06.24.

'Data_Science > Data_Analysis_Py' 카테고리의 다른 글

6. Titanic || '21.06.24.

seaborn

히스토그램

히트맵

'Data_Science > Data_Analysis_Py' 카테고리의 다른 글

5. auto-mpg 분석 || '21.06.24.

scatter 산점도

bubble => s로 크기 지정, alpha로 투명도

색상 설정

pie graph

'Data_Science > Data_Analysis_Py' 카테고리의 다른 글

4. 시도별 전출입 인구수 분석 ( 2 || '21.06.24.

수직 막대

리스트데이터를 막대로 출력

연합 막대 그리기

증감율 변동률 계산

히스토그램

'Data_Science > Data_Analysis_Py' 카테고리의 다른 글

3. 시도별 전출입 인구수 분석 ( 1 || 2021.06.23

'Data_Science > Data_Analysis_Py' 카테고리의 다른 글

2. auto-mpg 데이터 분석 || 2021.06.24

'Data_Science > Data_Analysis_Py' 카테고리의 다른 글

1. 남북한발전전력량 분석 || 2021.06.24

'Data_Science > Data_Analysis_Py' 카테고리의 다른 글

+ Recent posts

티스토리툴바