4 분 소요

winemag-data.csv 파일을 reviews 로 읽는다.

import pandas as pd
reviews = pd.read_csv("winemag-data.csv",index_col = 0)

인덱스를 title 컬럼으로 셋팅한다.

# reviews.set_index('title', inplace=True)
reviews = reviews.set_index('title')

먼저 데이터가 비어있느것이 있는지 확인한다.

country                     63
description                  0
designation              37465
points                       0
price                     8996
province                    63
region_1                 21247
region_2                 79460
taster_name              26244
taster_twitter_handle    31213
variety                      1
winery                       0
dtype: int64

그리고나서, 가격이 없는 데이터는 빼고, 데이터셋을 가져온다.

data_set = reviews.dropna(subset=['price'])
country description designation points price province region_1 region_2 taster_name taster_twitter_handle variety winery
Quinta dos Avidagos 2011 Avidagos Red (Douro) Portugal This is ripe and fruity, a wine that is smooth... Avidagos 87 15.0 Douro NaN NaN Roger Voss @vossroger Portuguese Red Quinta dos Avidagos
Rainstorm 2013 Pinot Gris (Willamette Valley) US Tart and snappy, the flavors of lime flesh and... NaN 87 14.0 Oregon Willamette Valley Willamette Valley Paul Gregutt @paulgwine Pinot Gris Rainstorm
St. Julian 2013 Reserve Late Harvest Riesling (Lake Michigan Shore) US Pineapple rind, lemon pith and orange blossom ... Reserve Late Harvest 87 13.0 Michigan Lake Michigan Shore NaN Alexander Peartree NaN Riesling St. Julian
Sweet Cheeks 2012 Vintner's Reserve Wild Child Block Pinot Noir (Willamette Valley) US Much like the regular bottling from 2012, this... Vintner's Reserve Wild Child Block 87 65.0 Oregon Willamette Valley Willamette Valley Paul Gregutt @paulgwine Pinot Noir Sweet Cheeks
Tandem 2011 Ars In Vitro Tempranillo-Merlot (Navarra) Spain Blackberry and raspberry aromas show a typical... Ars In Vitro 87 15.0 Northern Spain Navarra NaN Michael Schachner @wineschach Tempranillo-Merlot Tandem
(129971, 12)

리뷰에 새로운 컬럼 critic 만들고, everyone 이라고 값 넣는다.

reviews['critic'] = 'everyone'
country description designation points price province region_1 region_2 taster_name taster_twitter_handle variety winery critic
Nicosia 2013 Vulkà Bianco (Etna) Italy Aromas include tropical fruit, broom, brimston... Vulkà Bianco 87 NaN Sicily & Sardinia Etna NaN Kerin O’Keefe @kerinokeefe White Blend Nicosia everyone
Quinta dos Avidagos 2011 Avidagos Red (Douro) Portugal This is ripe and fruity, a wine that is smooth... Avidagos 87 15.0 Douro NaN NaN Roger Voss @vossroger Portuguese Red Quinta dos Avidagos everyone

리뷰의 포인트 컬럼은 수치로 되어있다. 이 컬럼의 기초통계데이터를 확인하시오. (평균, 최대 최소 등)

count    129971.000000
mean         88.447138
std           3.039730
min          80.000000
25%          86.000000
50%          88.000000
75%          91.000000
max         100.000000
Name: points, dtype: float64

점수를 100점 맞은 와인의 데이터를 가져오시오.

wine_data = reviews.loc[reviews['points'] == 100,]
country description designation points price province region_1 region_2 taster_name taster_twitter_handle variety winery critic
Chambers Rosewood Vineyards NV Rare Muscat (Rutherglen) Australia This wine contains some material over 100 year... Rare 100 350.0 Victoria Rutherglen NaN Joe Czerwinski @JoeCz Muscat Chambers Rosewood Vineyards everyone
Avignonesi 1995 Occhio di Pernice (Vin Santo di Montepulciano) Italy Thick as molasses and dark as caramelized brow... Occhio di Pernice 100 210.0 Tuscany Vin Santo di Montepulciano NaN NaN NaN Prugnolo Gentile Avignonesi everyone
Krug 2002 Brut (Champagne) France This is a fabulous wine from the greatest Cham... Brut 100 259.0 Champagne Champagne NaN Roger Voss @vossroger Champagne Blend Krug everyone
Tenuta dell'Ornellaia 2007 Masseto Merlot (Toscana) Italy A perfect wine from a classic vintage, the 200... Masseto 100 460.0 Tuscany Toscana NaN NaN NaN Merlot Tenuta dell'Ornellaia everyone
Casa Ferreirinha 2008 Barca-Velha Red (Douro) Portugal This is the latest release of what has long be... Barca-Velha 100 450.0 Douro NaN NaN Roger Voss @vossroger Portuguese Red Casa Ferreirinha everyone

taster_name 컬럼은 사람 이름으로 되어있다. 몇명의 사람들이 평가를 한것인까?

# unique의 수.
count         103727
unique            19
top       Roger Voss
freq           25514
Name: taster_name, dtype: object

리뷰 포인트의 평균을 구하시오


테스터들의 이름을 전부 확인하시오

array(['Kerin O’Keefe', 'Roger Voss', 'Paul Gregutt',
       'Alexander Peartree', 'Michael Schachner', 'Anna Lee C. Iijima',
       'Virginie Boone', 'Matt Kettmann', nan, 'Sean P. Sullivan',
       'Jim Gordon', 'Joe Czerwinski', 'Anne Krebiehl\xa0MW',
       'Lauren Buzzeo', 'Mike DeSimone', 'Jeff Jenssen',
       'Susan Kostrzewa', 'Carrie Dykes', 'Fiona Adams',
       'Christina Pickard'], dtype=object)

각 테스터들은, 각각 몇개의 와인을 테스트 했는지 확인하시오. ( 테스터 이름, 갯수 )

Roger Voss            25514
Michael Schachner     15134
Kerin O’Keefe         10776
Virginie Boone         9537
Paul Gregutt           9532
Matt Kettmann          6332
Joe Czerwinski         5147
Sean P. Sullivan       4966
Anna Lee C. Iijima     4415
Jim Gordon             4177
Anne Krebiehl MW       3685
Lauren Buzzeo          1835
Susan Kostrzewa        1085
Mike DeSimone           514
Jeff Jenssen            491
Alexander Peartree      415
Carrie Dykes            139
Fiona Adams              27
Christina Pickard         6
Name: taster_name, dtype: int64

리뷰의 포인트의 평균을 구하고, 리뷰의 포인트값이, 평균보다 큰 데이터 (즉, 평가가 좋은 와인) 만 가져오시오.

good_wine = reviews['points']> reviews['points'].mean()
reviews.loc[good_wine, ].head()
country description designation points price province region_1 region_2 taster_name taster_twitter_handle variety winery critic
Dopff & Irion 2004 Schoenenbourg Grand Cru Vendanges Tardives Riesling (Alsace) France Medium-gold in color. Complex and inviting nos... Schoenenbourg Grand Cru Vendanges Tardives 92 80.0 Alsace Alsace NaN NaN NaN Riesling Dopff & Irion everyone
Ceretto 2003 Bricco Rocche Prapó (Barolo) Italy Slightly backward, particularly given the vint... Bricco Rocche Prapó 92 70.0 Piedmont Barolo NaN NaN NaN Nebbiolo Ceretto everyone
Matrix 2007 Stuhlmuller Vineyard Chardonnay (Alexander Valley) US The vineyard is one of the better Chardonnay s... Stuhlmuller Vineyard 92 36.0 California Alexander Valley Sonoma NaN NaN Chardonnay Matrix everyone
Mauritson 2007 Rockpile Cemetary Vineyard Zinfandel (Rockpile) US Defines Rockpile Zinfandel in intensity of fru... Rockpile Cemetary Vineyard 92 39.0 California Rockpile Sonoma NaN NaN Zinfandel Mauritson everyone
Henry's Drive Vignerons 2006 Parson's Flat Shiraz-Cabernet Sauvignon (Padthaway) Australia The blend is roughly two-thirds Shiraz and one... Parson's Flat 92 40.0 South Australia Padthaway NaN Joe Czerwinski @JoeCz Shiraz-Cabernet Sauvignon Henry's Drive Vignerons everyone
