728x90
반응형

seoul_5.csv
1.13MB

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('seoul_5.csv', encoding = 'cp949')
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 40221 entries, 0 to 40220
Data columns (total 5 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   날짜       40221 non-null  object 
 1   지점       40221 non-null  int64  
 2   평균기온(℃)  39465 non-null  float64
 3   최저기온(℃)  39464 non-null  float64
 4   최고기온(℃)  39463 non-null  float64
dtypes: float64(3), int64(1), object(1)
memory usage: 1.5+ MB
df.describe()

	지점	평균기온(℃)	최저기온(℃)	최고기온(℃)
count	40221.0	39465.000000	39464.000000	39463.000000
mean	108.0	11.704019	7.406393	16.716083
std	0.0	10.668056	10.891154	10.998383
min	108.0	-19.200000	-23.100000	-16.300000
25%	108.0	2.600000	-1.500000	7.200000
50%	108.0	12.900000	8.000000	18.600000
75%	108.0	21.200000	17.000000	26.200000
max	108.0	33.700000	30.300000	39.600000
df

	날짜	지점	평균기온(℃)	최저기온(℃)	최고기온(℃)
0	1907-10-01	108	13.5	7.9	20.7
1	1907-10-02	108	16.2	7.9	22.0
2	1907-10-03	108	16.2	13.1	21.3
3	1907-10-04	108	16.5	11.2	22.0
4	1907-10-05	108	17.6	10.9	25.4
...	...	...	...	...	...
40216	2019-01-13	108	1.2	-3.0	7.6
40217	2019-01-14	108	1.4	-2.4	5.3
40218	2019-01-15	108	-1.7	-7.2	2.6
40219	2019-01-16	108	-5.2	-10.1	-1.1
40220	2019-01-17	108	-0.3	-3.2	4.0
# 결측치 제거
# 날짜 dateTime => 월일 잘라내기
df['날짜'] = pd.to_datetime(df['날짜'])
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 40221 entries, 0 to 40220
Data columns (total 5 columns):
 #   Column   Non-Null Count  Dtype         
---  ------   --------------  -----         
 0   날짜       40221 non-null  datetime64[ns]
 1   지점       40221 non-null  int64         
 2   평균기온(℃)  39465 non-null  float64       
 3   최저기온(℃)  39464 non-null  float64       
 4   최고기온(℃)  39463 non-null  float64       
dtypes: datetime64[ns](1), float64(3), int64(1)
memory usage: 1.5 MB
df['Year'] = df['날짜'].dt.year
df['Month'] = df['날짜'].dt.month
df['Day'] = df['날짜'].dt.day
df.head()


날짜	지점	평균기온(℃)	최저기온(℃)	최고기온(℃)	Year	Month	Day
0	1907-10-01	108	13.5	7.9	20.7	1907	10	1
1	1907-10-02	108	16.2	7.9	22.0	1907	10	2
2	1907-10-03	108	16.2	13.1	21.3	1907	10	3
3	1907-10-04	108	16.5	11.2	22.0	1907	10	4
4	1907-10-05	108	17.6	10.9	25.4	1907	10	5
df.columns = ['날짜','지점','평균기온','최저기온','최고기온','Year','Month','Day']
df.head()

	날짜	지점	평균기온	최저기온	최고기온	Year	Month	Day
0	1907-10-01	108	13.5	7.9	20.7	1907	10	1
1	1907-10-02	108	16.2	7.9	22.0	1907	10	2
2	1907-10-03	108	16.2	13.1	21.3	1907	10	3
3	1907-10-04	108	16.5	11.2	22.0	1907	10	4
4	1907-10-05	108	17.6	10.9	25.4	1907	10	5
df_0214 = df[(df['Month'] == 2)&(df['Day'] == 14)]
df_0214

	날짜	지점	평균기온	최저기온	최고기온	Year	Month	Day
136	1908-02-14	108	-3.3	-7.5	2.3	1908	2	14
502	1909-02-14	108	2.6	-4.5	8.8	1909	2	14
867	1910-02-14	108	-3.1	-10.1	2.8	1910	2	14
1232	1911-02-14	108	0.8	0.0	3.5	1911	2	14
1597	1912-02-14	108	6.3	0.9	11.2	1912	2	14
...	...	...	...	...	...	...	...	...
38422	2014-02-14	108	2.7	-0.7	7.6	2014	2	14
38787	2015-02-14	108	2.0	-3.1	6.6	2015	2	14
39152	2016-02-14	108	-2.6	-6.8	5.8	2016	2	14
39518	2017-02-14	108	0.3	-4.0	6.5	2017	2	14
39883	2018-02-14	108	3.5	-0.7	8.7	2018	2	14
110 rows × 8 columns
df_year = df_0214['Year']
df_high = df_0214['최고기온']
df_low = df_0214['최저기온']
plt.style.use('ggplot')
fig = plt.figure(figsize=(20,5))
ax = fig.add_subplot(1,1,1) 

ax.plot(df_year, df_0214['최고기온'], label = df['날짜'])
ax.plot(df_year, df_0214['최저기온'])
ax.set_title('2월 14일 기온 변화')
ax.set_xticklabels(df['Year'], rotation = 90)
ax.set_xlabel('year')
ax.set_ylabel('temperture')
plt.show()

df.head()

	날짜	지점	평균기온	최저기온	최고기온	Year	Month	Day
0	1907-10-01	108	13.5	7.9	20.7	1907	10	1
1	1907-10-02	108	16.2	7.9	22.0	1907	10	2
2	1907-10-03	108	16.2	13.1	21.3	1907	10	3
3	1907-10-04	108	16.5	11.2	22.0	1907	10	4
4	1907-10-05	108	17.6	10.9	25.4	1907	10	5
plt.style.use('ggplot')
plt.hist(df['평균기온'], bins = 100, color = 'r')
plt.show()

df.isnull().sum()

날짜         0
지점         0
평균기온     756
최저기온     757
최고기온     758
Year       0
Month      0
Day        0
dtype: int64
df = df.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)
df.isnull().sum()

날짜       0
지점       0
평균기온     0
최저기온     0
최고기온     0
Year     0
Month    0
Day      0
dtype: int64
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 39463 entries, 0 to 40220
Data columns (total 8 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   날짜      39463 non-null  datetime64[ns]
 1   지점      39463 non-null  int64         
 2   평균기온    39463 non-null  float64       
 3   최저기온    39463 non-null  float64       
 4   최고기온    39463 non-null  float64       
 5   Year    39463 non-null  int64         
 6   Month   39463 non-null  int64         
 7   Day     39463 non-null  int64         
dtypes: datetime64[ns](1), float64(3), int64(4)
memory usage: 2.7 MB
plt.style.use('ggplot')
plt.boxplot(df['평균기온'])
plt.title("1907년부터 2019년까지 서울의 평균기온")

# Text(0.5, 1.0, '1907년부터 2019년까지 서울의 평균기온')

폰트 설정하면 글자 깨지는 것 보완가능

 

 

반응형

'Data_Science > Data_Analysis_Py' 카테고리의 다른 글

19. 세계음주데이터2  (0) 2021.11.23
18. 세계음주 데이터 분석  (0) 2021.11.03
16. EDA, 멕시코식당 주문 CHIPOTLE  (0) 2021.10.28
15. 스크래핑  (0) 2021.10.28
14. Stockprice (2  (0) 2021.10.26

+ Recent posts