파이썬 - Pandas 실습 -2
winemag-data.csv 파일을 reviews 로 읽는다.
import pandas as pd
reviews = pd.read_csv("winemag-data.csv",index_col = 0)
인덱스를 title 컬럼으로 셋팅한다.
# reviews.set_index('title', inplace=True)
reviews = reviews.set_index('title')
먼저 데이터가 비어있느것이 있는지 확인한다.
reviews.isna().sum()
country 63
description 0
designation 37465
points 0
price 8996
province 63
region_1 21247
region_2 79460
taster_name 26244
taster_twitter_handle 31213
variety 1
winery 0
dtype: int64
그리고나서, 가격이 없는 데이터는 빼고, 데이터셋을 가져온다.
data_set = reviews.dropna(subset=['price'])
data_set.head()
country | description | designation | points | price | province | region_1 | region_2 | taster_name | taster_twitter_handle | variety | winery | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
title | ||||||||||||
Quinta dos Avidagos 2011 Avidagos Red (Douro) | Portugal | This is ripe and fruity, a wine that is smooth... | Avidagos | 87 | 15.0 | Douro | NaN | NaN | Roger Voss | @vossroger | Portuguese Red | Quinta dos Avidagos |
Rainstorm 2013 Pinot Gris (Willamette Valley) | US | Tart and snappy, the flavors of lime flesh and... | NaN | 87 | 14.0 | Oregon | Willamette Valley | Willamette Valley | Paul Gregutt | @paulgwine | Pinot Gris | Rainstorm |
St. Julian 2013 Reserve Late Harvest Riesling (Lake Michigan Shore) | US | Pineapple rind, lemon pith and orange blossom ... | Reserve Late Harvest | 87 | 13.0 | Michigan | Lake Michigan Shore | NaN | Alexander Peartree | NaN | Riesling | St. Julian |
Sweet Cheeks 2012 Vintner's Reserve Wild Child Block Pinot Noir (Willamette Valley) | US | Much like the regular bottling from 2012, this... | Vintner's Reserve Wild Child Block | 87 | 65.0 | Oregon | Willamette Valley | Willamette Valley | Paul Gregutt | @paulgwine | Pinot Noir | Sweet Cheeks |
Tandem 2011 Ars In Vitro Tempranillo-Merlot (Navarra) | Spain | Blackberry and raspberry aromas show a typical... | Ars In Vitro | 87 | 15.0 | Northern Spain | Navarra | NaN | Michael Schachner | @wineschach | Tempranillo-Merlot | Tandem |
reviews.shape
(129971, 12)
리뷰에 새로운 컬럼 critic 만들고, everyone 이라고 값 넣는다.
reviews['critic'] = 'everyone'
reviews.head(2)
country | description | designation | points | price | province | region_1 | region_2 | taster_name | taster_twitter_handle | variety | winery | critic | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
title | |||||||||||||
Nicosia 2013 Vulkà Bianco (Etna) | Italy | Aromas include tropical fruit, broom, brimston... | Vulkà Bianco | 87 | NaN | Sicily & Sardinia | Etna | NaN | Kerin O’Keefe | @kerinokeefe | White Blend | Nicosia | everyone |
Quinta dos Avidagos 2011 Avidagos Red (Douro) | Portugal | This is ripe and fruity, a wine that is smooth... | Avidagos | 87 | 15.0 | Douro | NaN | NaN | Roger Voss | @vossroger | Portuguese Red | Quinta dos Avidagos | everyone |
리뷰의 포인트 컬럼은 수치로 되어있다. 이 컬럼의 기초통계데이터를 확인하시오. (평균, 최대 최소 등)
reviews["points"].describe()
count 129971.000000
mean 88.447138
std 3.039730
min 80.000000
25% 86.000000
50% 88.000000
75% 91.000000
max 100.000000
Name: points, dtype: float64
점수를 100점 맞은 와인의 데이터를 가져오시오.
wine_data = reviews.loc[reviews['points'] == 100,]
wine_data.head()
country | description | designation | points | price | province | region_1 | region_2 | taster_name | taster_twitter_handle | variety | winery | critic | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
title | |||||||||||||
Chambers Rosewood Vineyards NV Rare Muscat (Rutherglen) | Australia | This wine contains some material over 100 year... | Rare | 100 | 350.0 | Victoria | Rutherglen | NaN | Joe Czerwinski | @JoeCz | Muscat | Chambers Rosewood Vineyards | everyone |
Avignonesi 1995 Occhio di Pernice (Vin Santo di Montepulciano) | Italy | Thick as molasses and dark as caramelized brow... | Occhio di Pernice | 100 | 210.0 | Tuscany | Vin Santo di Montepulciano | NaN | NaN | NaN | Prugnolo Gentile | Avignonesi | everyone |
Krug 2002 Brut (Champagne) | France | This is a fabulous wine from the greatest Cham... | Brut | 100 | 259.0 | Champagne | Champagne | NaN | Roger Voss | @vossroger | Champagne Blend | Krug | everyone |
Tenuta dell'Ornellaia 2007 Masseto Merlot (Toscana) | Italy | A perfect wine from a classic vintage, the 200... | Masseto | 100 | 460.0 | Tuscany | Toscana | NaN | NaN | NaN | Merlot | Tenuta dell'Ornellaia | everyone |
Casa Ferreirinha 2008 Barca-Velha Red (Douro) | Portugal | This is the latest release of what has long be... | Barca-Velha | 100 | 450.0 | Douro | NaN | NaN | Roger Voss | @vossroger | Portuguese Red | Casa Ferreirinha | everyone |
taster_name 컬럼은 사람 이름으로 되어있다. 몇명의 사람들이 평가를 한것인까?
# unique의 수.
reviews['taster_name'].describe()
count 103727
unique 19
top Roger Voss
freq 25514
Name: taster_name, dtype: object
리뷰 포인트의 평균을 구하시오
reviews["points"].mean()
88.44713820775404
테스터들의 이름을 전부 확인하시오
reviews["taster_name"].unique()
array(['Kerin O’Keefe', 'Roger Voss', 'Paul Gregutt',
'Alexander Peartree', 'Michael Schachner', 'Anna Lee C. Iijima',
'Virginie Boone', 'Matt Kettmann', nan, 'Sean P. Sullivan',
'Jim Gordon', 'Joe Czerwinski', 'Anne Krebiehl\xa0MW',
'Lauren Buzzeo', 'Mike DeSimone', 'Jeff Jenssen',
'Susan Kostrzewa', 'Carrie Dykes', 'Fiona Adams',
'Christina Pickard'], dtype=object)
각 테스터들은, 각각 몇개의 와인을 테스트 했는지 확인하시오. ( 테스터 이름, 갯수 )
reviews['taster_name'].value_counts()
Roger Voss 25514
Michael Schachner 15134
Kerin O’Keefe 10776
Virginie Boone 9537
Paul Gregutt 9532
Matt Kettmann 6332
Joe Czerwinski 5147
Sean P. Sullivan 4966
Anna Lee C. Iijima 4415
Jim Gordon 4177
Anne Krebiehl MW 3685
Lauren Buzzeo 1835
Susan Kostrzewa 1085
Mike DeSimone 514
Jeff Jenssen 491
Alexander Peartree 415
Carrie Dykes 139
Fiona Adams 27
Christina Pickard 6
Name: taster_name, dtype: int64
리뷰의 포인트의 평균을 구하고, 리뷰의 포인트값이, 평균보다 큰 데이터 (즉, 평가가 좋은 와인) 만 가져오시오.
good_wine = reviews['points']> reviews['points'].mean()
reviews.loc[good_wine, ].head()
country | description | designation | points | price | province | region_1 | region_2 | taster_name | taster_twitter_handle | variety | winery | critic | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
title | |||||||||||||
Dopff & Irion 2004 Schoenenbourg Grand Cru Vendanges Tardives Riesling (Alsace) | France | Medium-gold in color. Complex and inviting nos... | Schoenenbourg Grand Cru Vendanges Tardives | 92 | 80.0 | Alsace | Alsace | NaN | NaN | NaN | Riesling | Dopff & Irion | everyone |
Ceretto 2003 Bricco Rocche Prapó (Barolo) | Italy | Slightly backward, particularly given the vint... | Bricco Rocche Prapó | 92 | 70.0 | Piedmont | Barolo | NaN | NaN | NaN | Nebbiolo | Ceretto | everyone |
Matrix 2007 Stuhlmuller Vineyard Chardonnay (Alexander Valley) | US | The vineyard is one of the better Chardonnay s... | Stuhlmuller Vineyard | 92 | 36.0 | California | Alexander Valley | Sonoma | NaN | NaN | Chardonnay | Matrix | everyone |
Mauritson 2007 Rockpile Cemetary Vineyard Zinfandel (Rockpile) | US | Defines Rockpile Zinfandel in intensity of fru... | Rockpile Cemetary Vineyard | 92 | 39.0 | California | Rockpile | Sonoma | NaN | NaN | Zinfandel | Mauritson | everyone |
Henry's Drive Vignerons 2006 Parson's Flat Shiraz-Cabernet Sauvignon (Padthaway) | Australia | The blend is roughly two-thirds Shiraz and one... | Parson's Flat | 92 | 40.0 | South Australia | Padthaway | NaN | Joe Czerwinski | @JoeCz | Shiraz-Cabernet Sauvignon | Henry's Drive Vignerons | everyone |
댓글남기기