728x90
반응형
멕시코식당 주문 CHIPOTLE
1. 탐색 : 기본정보 살펴보기
- 데이터셋의 기본 정보
- 데이터셋의 행과열, 데이터
- 데이터셋의 수치적 특징 파악
- tsv 셀구분 \t tab 구분 sep = \t설정
import pandas as pd
df = pd.read_csv('chipotle.tsv', sep ='\t')
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4622 entries, 0 to 4621
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 order_id 4622 non-null int64
1 quantity 4622 non-null int64
2 item_name 4622 non-null object
3 choice_description 3376 non-null object
4 item_price 4622 non-null object
dtypes: int64(2), object(3)
memory usage: 180.7+ KB
df.shape
# (4622, 5)
df.head()
order_id quantity item_name choice_description item_price
0 1 1 Chips and Fresh Tomato Salsa NaN $2.39
1 1 1 Izze [Clementine] $3.39
2 1 1 Nantucket Nectar [Apple] $3.39
3 1 1 Chips and Tomatillo-Green Chili Salsa NaN $2.39
4 2 2 Chicken Bowl [Tomatillo-Red Chili Salsa (Hot), [Black Beans... $16.98
df.columns
Index(['order_id', 'quantity', 'item_name', 'choice_description',
'item_price'],
dtype='object')
df.index
# RangeIndex(start=0, stop=4622, step=1)
수치적 특징 파악
df.describe()
order_id quantity
count 4622.000000 4622.000000
mean 927.254868 1.075725
std 528.890796 0.410186
min 1.000000 1.000000
25% 477.250000 1.000000
50% 926.000000 1.000000
75% 1393.000000 1.000000
max 1834.000000 15.000000
# df['order_id'] = str(df['order_id'])
df['order_id'] = df['order_id'].astype(str)
df.describe()
quantity
count 4622.000000
mean 1.075725
std 0.410186
min 1.000000
25% 1.000000
50% 1.000000
75% 1.000000
max 15.000000
# item_name 종류 출력하기
df['item_name'].unique()
print(len(df['item_name'].unique()))
# 50
탐색과 시각화
# item별 주문 수량 조회 가장많이 주문 top10개 출력
df['item_name'].value_counts()[:10]
Chicken Bowl 726
Chicken Burrito 553
Chips and Guacamole 479
Steak Burrito 368
Canned Soft Drink 301
Chips 211
Steak Bowl 211
Bottled Water 162
Chicken Soft Tacos 115
Chips and Fresh Tomato Salsa 110
Name: item_name, dtype: int64
# 아이템당 주문 갯수 조회
print(df.groupby('item_name')['order_id'].count())
item_name
6 Pack Soft Drink 54
Barbacoa Bowl 66
Barbacoa Burrito 91
Barbacoa Crispy Tacos 11
Barbacoa Salad Bowl 10
Barbacoa Soft Tacos 25
Bottled Water 162
Bowl 2
Burrito 6
Canned Soda 104
Canned Soft Drink 301
Carnitas Bowl 68
Carnitas Burrito 59
Carnitas Crispy Tacos 7
Carnitas Salad 1
Carnitas Salad Bowl 6
Carnitas Soft Tacos 40
Chicken Bowl 726
Chicken Burrito 553
Chicken Crispy Tacos 47
Chicken Salad 9
Chicken Salad Bowl 110
Chicken Soft Tacos 115
Chips 211
Chips and Fresh Tomato Salsa 110
Chips and Guacamole 479
Chips and Mild Fresh Tomato Salsa 1
Chips and Roasted Chili Corn Salsa 22
Chips and Roasted Chili-Corn Salsa 18
Chips and Tomatillo Green Chili Salsa 43
Chips and Tomatillo Red Chili Salsa 48
Chips and Tomatillo-Green Chili Salsa 31
Chips and Tomatillo-Red Chili Salsa 20
Crispy Tacos 2
Izze 20
Nantucket Nectar 27
Salad 2
Side of Chips 101
Steak Bowl 211
Steak Burrito 368
Steak Crispy Tacos 35
Steak Salad 4
Steak Salad Bowl 29
Steak Soft Tacos 55
Veggie Bowl 85
Veggie Burrito 95
Veggie Crispy Tacos 1
Veggie Salad 6
Veggie Salad Bowl 18
Veggie Soft Tacos 7
Name: order_id, dtype: int64
print(df['order_id'].unique())
print(len(df['order_id'].unique()))
# ['1' '2' '3' ... '1832' '1833' '1834']
# 1834
print(df['order_id'].tail())
# ['1' '2' '3' ... '1832' '1833' '1834']
# 1834
print(df['order_id'].tail())
4617 1833
4618 1833
4619 1834
4620 1834
4621 1834
Name: order_id, dtype: object
# 간단한 시각화
# item당 주문 개수를 시각화로 출력하기
import matplotlib.pyplot as plt
order_count =df.groupby('item_name')['order_id'].count()
order_count = order_count.sort_values(ascending=False)
print(order_count)
item_name
Chicken Bowl 726
Chicken Burrito 553
Chips and Guacamole 479
Steak Burrito 368
Canned Soft Drink 301
Steak Bowl 211
Chips 211
Bottled Water 162
Chicken Soft Tacos 115
Chicken Salad Bowl 110
Chips and Fresh Tomato Salsa 110
Canned Soda 104
Side of Chips 101
Veggie Burrito 95
Barbacoa Burrito 91
Veggie Bowl 85
Carnitas Bowl 68
Barbacoa Bowl 66
Carnitas Burrito 59
Steak Soft Tacos 55
6 Pack Soft Drink 54
Chips and Tomatillo Red Chili Salsa 48
Chicken Crispy Tacos 47
Chips and Tomatillo Green Chili Salsa 43
Carnitas Soft Tacos 40
Steak Crispy Tacos 35
Chips and Tomatillo-Green Chili Salsa 31
Steak Salad Bowl 29
Nantucket Nectar 27
Barbacoa Soft Tacos 25
Chips and Roasted Chili Corn Salsa 22
Chips and Tomatillo-Red Chili Salsa 20
Izze 20
Chips and Roasted Chili-Corn Salsa 18
Veggie Salad Bowl 18
Barbacoa Crispy Tacos 11
Barbacoa Salad Bowl 10
Chicken Salad 9
Veggie Soft Tacos 7
Carnitas Crispy Tacos 7
Burrito 6
Carnitas Salad Bowl 6
Veggie Salad 6
Steak Salad 4
Salad 2
Bowl 2
Crispy Tacos 2
Chips and Mild Fresh Tomato Salsa 1
Veggie Crispy Tacos 1
Carnitas Salad 1
Name: order_id, dtype: int64
item_name_list = order_count.index.tolist()
order_cnt = order_count.values.tolist()
plt.style.use('ggplot')
fig = plt.figure(figsize = (20,5))
ax = fig.add_subplot(1,1,1)
ax.bar(item_name_list, order_cnt, align = 'center')
plt.ylabel('ordered_item_count')
plt.xlabel('item Name')
ax.set_xticklabels(item_name_list, rotation = 45)
plt.title('Distribution of all ordered item')
plt.show()
item_qty = df.groupby('item_name')['quantity'].sum()
item_qty = item_qty.sort_values(ascending=False)[:10]
item_name_list = item_qty.index.tolist()
sell_cnt = item_qty.values.tolist()
plt.style.use('ggplot')
fig = plt.figure(figsize = (20,5))
ax = fig.add_subplot(1,1,1)
ax.bar(item_name_list, sell_cnt, align = 'center')
plt.ylabel('selled_item_count')
plt.xlabel('item Name')
ax.set_xticklabels(item_name_list, rotation = 45)
plt.title('Distribution of all sell item')
plt.show()
df['item_price']
0 $2.39
1 $3.39
2 $3.39
3 $2.39
4 $16.98
...
4617 $11.75
4618 $11.75
4619 $11.25
4620 $8.75
4621 $8.75
Name: item_price, Length: 4622, dtype: object
# 숫자형으로 바꾸려면
df['item_price'] = df['item_price'].apply(lambda x : float(x[1:]))
df['item_price'
0 2.39
1 3.39
2 3.39
3 2.39
4 16.98
Name: item_price, dtype: float64
df['item_price'].describe()
count 4622.000000
mean 7.464336
std 4.245557
min 1.090000
25% 3.390000
50% 8.750000
75% 9.250000
max 44.250000
Name: item_price, dtype: float64
# 주문당 평균 계산금액
df[['order_id','item_price']].head()
order_id item_price
0 1 2.39
1 1 3.39
2 1 3.39
3 1 2.39
4 2 16.98
df.groupby('order_id')['item_price'].sum().mean()
# 18.811428571428717
df.groupby('order_id')['item_price'].sum().describe()
count 1834.000000
mean 18.811429
std 11.652512
min 10.080000
25% 12.572500
50% 16.200000
75% 21.960000
max 205.250000
Name: item_price, dtype: float64
order_group_tot = df.groupby('order_id').sum()
order_group_tot
quantity item_price
order_id
1 4 11.56
10 2 13.20
100 2 10.08
1000 2 20.50
1001 2 10.08
... ... ...
995 3 24.95
996 4 43.00
997 2 22.50
998 2 10.88
999 5 29.25
results = order_group_tot[order_group_tot.item_price >= 50]
results
quantity item_price
order_id
1006 8 71.40
1051 7 59.35
1360 6 52.18
1443 35 160.74
1449 11 95.39
1454 10 85.24
1483 14 139.00
1559 16 82.44
1586 6 51.20
1660 19 70.25
1764 7 55.55
1786 20 114.30
1825 6 66.50
195 8 81.00
205 12 109.90
343 7 54.05
488 6 50.90
491 10 102.00
511 17 104.59
561 10 52.45
577 11 71.15
691 11 118.25
759 18 86.30
818 9 70.85
848 7 55.45
916 8 52.45
926 23 205.25
953 8 81.14
print(results.index.values)
['1006' '1051' '1360' '1443' '1449' '1454' '1483' '1559' '1586' '1660'
'1764' '1786' '1825' '195' '205' '343' '488' '491' '511' '561' '577'
'691' '759' '818' '848' '916' '926' '953']
df[df['order_id'] == '1006']
order_id quantity item_name choice_description item_price
2529 1006 1 Chicken Soft Tacos [Fresh Tomato Salsa, [Sour Cream, Lettuce]] 8.75
2530 1006 1 Veggie Bowl [Fresh Tomato Salsa, [Rice, Black Beans, Chees... 11.25
2531 1006 1 Steak Bowl [Fresh Tomato Salsa, [Rice, Cheese, Sour Cream... 9.25
2532 1006 1 Chicken Burrito [Fresh Tomato Salsa, [Rice, Cheese, Sour Cream... 11.25
2533 1006 1 Chicken Bowl [Fresh Tomato Salsa, [Fajita Vegetables, Rice,... 8.75
2534 1006 1 Chicken Burrito [Fresh Tomato Salsa, [Rice, Black Beans, Chees... 8.75
2535 1006 1 Chicken Burrito [Fresh Tomato Salsa, [Rice, Black Beans, Chees... 11.25
2536 1006 1 Chips NaN 2.15
df.loc[df['order_id'].isin(results.index.values)]
order_id quantity item_name choice_description item_price
448 195 1 Chicken Burrito [Fresh Tomato Salsa, [Rice, Cheese, Sour Cream... 8.75
449 195 1 Chicken Bowl [Fresh Tomato Salsa, [Rice, Black Beans, Chees... 8.75
450 195 1 Barbacoa Burrito [Fresh Tomato Salsa, [Fajita Vegetables, Rice,... 11.75
451 195 1 Chicken Burrito [Fresh Tomato Salsa, [Fajita Vegetables, Rice,... 8.75
452 195 1 Chicken Burrito [Fresh Tomato Salsa, [Rice, Black Beans, Chees... 11.25
... ... ... ... ... ...
4590 1825 1 Chicken Bowl [Roasted Chili Corn Salsa, [Rice, Black Beans,... 11.25
4591 1825 1 Chicken Bowl [Tomatillo Red Chili Salsa, [Rice, Black Beans... 8.75
4592 1825 1 Barbacoa Burrito [Tomatillo Red Chili Salsa, [Rice, Fajita Vege... 11.75
4593 1825 1 Carnitas Bowl [Roasted Chili Corn Salsa, [Rice, Sour Cream, ... 11.75
4594 1825 1 Barbacoa Bowl [Roasted Chili Corn Salsa, [Pinto Beans, Sour ... 11.75
- 6명이 해야할 것,
- 마스크 살수있는 것
- 세부적으로, 화장실 위치,
- 내가 정말 필요할까?,
- 출산율이 떨어지니깐 각지역별 출산율 > 각정보가지고 분석
- 왜 특이점이 생겼는가? 하나가 아니라 관련된 것 모두 가져오기
- 얼굴인식, 딥러닝
# 각 단가 구하기
item_two_price = df[df.quantity == 2]
item_two_price[['quantity','item_name','item_price']]
quantity item_name item_price
4 2 Chicken Bowl 16.98
18 2 Canned Soda 2.18
51 2 Canned Soda 2.18
135 2 Chicken Salad Bowl 22.50
148 2 Steak Burrito 17.98
... ... ... ...
4435 2 Chicken Bowl 17.50
4499 2 Canned Soft Drink 2.50
4560 2 Canned Soft Drink 2.50
4561 2 Chicken Salad Bowl 17.50
4582 2 Bottled Water 3.00
price_one_item = df.groupby('item_name').min()
price_one_item = price_one_item['item_price']
price_one_item
item_name
6 Pack Soft Drink 6.49
Barbacoa Bowl 8.69
Barbacoa Burrito 8.69
Barbacoa Crispy Tacos 8.99
Barbacoa Salad Bowl 9.39
Barbacoa Soft Tacos 8.99
Bottled Water 1.09
Bowl 7.40
Burrito 7.40
Canned Soda 1.09
Canned Soft Drink 1.25
Carnitas Bowl 8.99
Carnitas Burrito 8.69
Carnitas Crispy Tacos 8.99
Carnitas Salad 8.99
Carnitas Salad Bowl 9.39
Carnitas Soft Tacos 8.99
Chicken Bowl 8.19
Chicken Burrito 8.19
Chicken Crispy Tacos 8.49
Chicken Salad 8.19
Chicken Salad Bowl 8.75
Chicken Soft Tacos 8.49
Chips 1.99
Chips and Fresh Tomato Salsa 2.29
Chips and Guacamole 3.89
Chips and Mild Fresh Tomato Salsa 3.00
Chips and Roasted Chili Corn Salsa 2.95
Chips and Roasted Chili-Corn Salsa 2.39
Chips and Tomatillo Green Chili Salsa 2.95
Chips and Tomatillo Red Chili Salsa 2.95
Chips and Tomatillo-Green Chili Salsa 2.39
Chips and Tomatillo-Red Chili Salsa 2.39
Crispy Tacos 7.40
Izze 3.39
Nantucket Nectar 3.39
Salad 7.40
Side of Chips 1.69
Steak Bowl 8.69
Steak Burrito 8.69
Steak Crispy Tacos 8.69
Steak Salad 8.69
Steak Salad Bowl 9.39
Steak Soft Tacos 8.99
Veggie Bowl 8.49
Veggie Burrito 8.49
Veggie Crispy Tacos 8.49
Veggie Salad 8.49
Veggie Salad Bowl 8.75
Veggie Soft Tacos 8.49
Name: item_price, dtype: float64
plt.hist(price_one_item)
plt.ylabel('counts')
plt.title('Histogram of item price')
plt.show()
price_top10 = price_one_item.sort_values(ascending = False)
price_top10
item_name
Carnitas Salad Bowl 9.39
Steak Salad Bowl 9.39
Barbacoa Salad Bowl 9.39
Barbacoa Soft Tacos 8.99
Carnitas Salad 8.99
Carnitas Crispy Tacos 8.99
Steak Soft Tacos 8.99
Carnitas Bowl 8.99
Carnitas Soft Tacos 8.99
Barbacoa Crispy Tacos 8.99
Veggie Salad Bowl 8.75
Chicken Salad Bowl 8.75
Carnitas Burrito 8.69
Barbacoa Burrito 8.69
Barbacoa Bowl 8.69
Steak Bowl 8.69
Steak Burrito 8.69
Steak Salad 8.69
Steak Crispy Tacos 8.69
Veggie Soft Tacos 8.49
Chicken Crispy Tacos 8.49
Veggie Bowl 8.49
Veggie Burrito 8.49
Veggie Crispy Tacos 8.49
Veggie Salad 8.49
Chicken Soft Tacos 8.49
Chicken Salad 8.19
Chicken Burrito 8.19
Chicken Bowl 8.19
Burrito 7.40
Bowl 7.40
Salad 7.40
Crispy Tacos 7.40
6 Pack Soft Drink 6.49
Chips and Guacamole 3.89
Izze 3.39
Nantucket Nectar 3.39
Chips and Mild Fresh Tomato Salsa 3.00
Chips and Roasted Chili Corn Salsa 2.95
Chips and Tomatillo Green Chili Salsa 2.95
Chips and Tomatillo Red Chili Salsa 2.95
Chips and Roasted Chili-Corn Salsa 2.39
Chips and Tomatillo-Green Chili Salsa 2.39
Chips and Tomatillo-Red Chili Salsa 2.39
Chips and Fresh Tomato Salsa 2.29
Chips 1.99
Side of Chips 1.69
Canned Soft Drink 1.25
Canned Soda 1.09
Bottled Water 1.09
Name: item_price, dtype: float64
df.groupby('order_id').sum()["item_price"].sort_values(ascending = False)[:5]
order_id
926 205.25
1443 160.74
1483 139.00
691 118.25
1786 114.30
Name: item_price, dtype: float64
result1 = df.groupby('order_id').sum()
result2 = result1['item_price']
df.groupby('order_id').sum().sort_values(by = 'item_price', ascending = False)[:5]
result1 = df.groupby('order_id').sum()
result2 = result1.sort_values(by="item_price",ascending = False)
df.groupby('order_id').sum().sort_values(by = 'quantity', ascending = False)[:10]
quantity item_price
order_id
1443 35 160.74
926 23 205.25
1786 20 114.30
1660 19 70.25
759 18 86.30
511 17 104.59
1559 16 82.44
1483 14 139.00
205 12 109.90
691 11 118.25
df_salad = df[df['item_name'] == 'Veggie Salad Bowl']
df_salad['item_name']
df_salad.groupby('order_id').sum()
print(len(df_salad.groupby('order_id').sum()))
# 18
df_salad.groupby('order_id').sum().shape[0]
# 18
df_salad = df_salad.drop_duplicates(['item_name', 'order_id'])
print(len(df_salad))
# 18
반응형
'Data_Science > Data_Analysis_Py' 카테고리의 다른 글
18. 세계음주 데이터 분석 (0) | 2021.11.03 |
---|---|
17. 서울 기온 분석 (0) | 2021.11.02 |
15. 스크래핑 (0) | 2021.10.28 |
14. Stockprice (2 (0) | 2021.10.26 |
13. Stockprice (0) | 2021.10.26 |