파이썬 - Pandas 실습 -1
winemag-data_first150k.csv 파일을 reviews 로 읽는다.
import pandas as pd
reviews = pd.read_csv('winemag-data_first150k.csv',index_col = 0)
reviews = pd.DataFrame(reviews)
reviews.head()
country | description | designation | points | price | province | region_1 | region_2 | variety | winery | |
---|---|---|---|---|---|---|---|---|---|---|
0 | US | This tremendous 100% varietal wine hails from ... | Martha's Vineyard | 96 | 235.0 | California | Napa Valley | Napa | Cabernet Sauvignon | Heitz |
1 | Spain | Ripe aromas of fig, blackberry and cassis are ... | Carodorum Selección Especial Reserva | 96 | 110.0 | Northern Spain | Toro | NaN | Tinta de Toro | Bodega Carmen Rodríguez |
2 | US | Mac Watson honors the memory of a wine once ma... | Special Selected Late Harvest | 96 | 90.0 | California | Knights Valley | Sonoma | Sauvignon Blanc | Macauley |
3 | US | This spent 20 months in 30% new French oak, an... | Reserve | 96 | 65.0 | Oregon | Willamette Valley | Willamette Valley | Pinot Noir | Ponzi |
4 | France | This is the top wine from La Bégude, named aft... | La Brûlade | 95 | 66.0 | Provence | Bandol | NaN | Provence red blend | Domaine de la Bégude |
리뷰의 디스크립션 컬럼을 desc 로 저장한다.
reviews = reviews.rename(columns = {'description':'desc'})
reviews.head()
country | desc | designation | points | price | province | region_1 | region_2 | variety | winery | |
---|---|---|---|---|---|---|---|---|---|---|
0 | US | This tremendous 100% varietal wine hails from ... | Martha's Vineyard | 96 | 235.0 | California | Napa Valley | Napa | Cabernet Sauvignon | Heitz |
1 | Spain | Ripe aromas of fig, blackberry and cassis are ... | Carodorum Selección Especial Reserva | 96 | 110.0 | Northern Spain | Toro | NaN | Tinta de Toro | Bodega Carmen Rodríguez |
2 | US | Mac Watson honors the memory of a wine once ma... | Special Selected Late Harvest | 96 | 90.0 | California | Knights Valley | Sonoma | Sauvignon Blanc | Macauley |
3 | US | This spent 20 months in 30% new French oak, an... | Reserve | 96 | 65.0 | Oregon | Willamette Valley | Willamette Valley | Pinot Noir | Ponzi |
4 | France | This is the top wine from La Bégude, named aft... | La Brûlade | 95 | 66.0 | Provence | Bandol | NaN | Provence red blend | Domaine de la Bégude |
first_description 이라는 변수에는, 디스크립션 컬럼의 첫번째 데이터를 저장한다.
first_description = reviews['desc'][0]
first_description
'This tremendous 100% varietal wine hails from Oakville and was aged over three years in oak. Juicy red-cherry fruit and a compelling hint of caramel greet the palate, framed by elegant, fine tannins and a subtle minty tone in the background. Balanced and rewarding from start to finish, it has years ahead of it to develop further nuance. Enjoy 2022–2030.'
first_row 라는 변수에, 첫번째 리뷰 데이터(행)를 저장한다.
first_row = reviews.iloc[0,]
first_row
country US
desc This tremendous 100% varietal wine hails from ...
designation Martha's Vineyard
points 96
price 235.0
province California
region_1 Napa Valley
region_2 Napa
variety Cabernet Sauvignon
winery Heitz
Name: 0, dtype: object
리뷰의 description column 의 값들 중, 첫번째부터 10번째 데이터까지를 first_descriptions 변수에 저장한다.
first_description = reviews['desc'][:10]
first_description
0 This tremendous 100% varietal wine hails from ...
1 Ripe aromas of fig, blackberry and cassis are ...
2 Mac Watson honors the memory of a wine once ma...
3 This spent 20 months in 30% new French oak, an...
4 This is the top wine from La Bégude, named aft...
5 Deep, dense and pure from the opening bell, th...
6 Slightly gritty black-fruit aromas include a s...
7 Lush cedary black-fruit aromas are luxe and of...
8 This re-named vineyard was formerly bottled as...
9 The producer sources from two blocks of the vi...
Name: desc, dtype: object
리뷰에서 인덱스가 1, 2, 3, 5, 8 인 데이터를, sample_reviews 변수에 저장한다.
sample_reviews = reviews.iloc[[1,2,3,5,8], ]
sample_reviews
country | desc | designation | points | price | province | region_1 | region_2 | variety | winery | |
---|---|---|---|---|---|---|---|---|---|---|
1 | Spain | Ripe aromas of fig, blackberry and cassis are ... | Carodorum Selección Especial Reserva | 96 | 110.0 | Northern Spain | Toro | NaN | Tinta de Toro | Bodega Carmen Rodríguez |
2 | US | Mac Watson honors the memory of a wine once ma... | Special Selected Late Harvest | 96 | 90.0 | California | Knights Valley | Sonoma | Sauvignon Blanc | Macauley |
3 | US | This spent 20 months in 30% new French oak, an... | Reserve | 96 | 65.0 | Oregon | Willamette Valley | Willamette Valley | Pinot Noir | Ponzi |
5 | Spain | Deep, dense and pure from the opening bell, th... | Numanthia | 95 | 73.0 | Northern Spain | Toro | NaN | Tinta de Toro | Numanthia |
8 | US | This re-named vineyard was formerly bottled as... | Silice | 95 | 65.0 | Oregon | Chehalem Mountains | Willamette Valley | Pinot Noir | Bergström |
df 라는 변수에, 다음 조건을 만족하는 데이터프레임을 저장하시오. 인덱스가 0, 1, 10, 100 인 데이터에서, 컬럼이 country, province, region_1, region_2 인 데이터들만 가져와서 저장하시오.
sample = reviews.iloc[[0,1,10,100], ]
df = sample[["country","province","region_1","region_2"]]
df
country | province | region_1 | region_2 | |
---|---|---|---|---|
0 | US | California | Napa Valley | Napa |
1 | Spain | Northern Spain | Toro | NaN |
10 | Italy | Northeastern Italy | Collio | NaN |
100 | US | California | South Coast | South Coast |
Italy 에서 만들어진 와인에 대해서 italian_wines 이라는 이름으로 데이터프레임을 만드시오.
is_Italy = reviews['country'] == 'Italy'
iitalian_wines = reviews[is_Italy]
iitalian_wines.head()
country | desc | designation | points | price | province | region_1 | region_2 | variety | winery | |
---|---|---|---|---|---|---|---|---|---|---|
10 | Italy | Elegance, complexity and structure come togeth... | Ronco della Chiesa | 95 | 80.0 | Northeastern Italy | Collio | NaN | Friulano | Borgo del Tiglio |
32 | Italy | Underbrush, scorched earth, menthol and plum s... | Vigna Piaggia | 90 | NaN | Tuscany | Brunello di Montalcino | NaN | Sangiovese | Abbadia Ardenga |
35 | Italy | Forest floor, tilled soil, mature berry and a ... | Riserva | 90 | 135.0 | Tuscany | Brunello di Montalcino | NaN | Sangiovese | Carillon |
37 | Italy | Aromas of forest floor, violet, red berry and ... | NaN | 90 | 29.0 | Tuscany | Vino Nobile di Montepulciano | NaN | Sangiovese | Avignonesi |
38 | Italy | This has a charming nose that boasts rose, vio... | NaN | 90 | 23.0 | Tuscany | Chianti Classico | NaN | Sangiovese | Casina di Cornia |
리뷰점수가 95점 이상이고, Australia와 New Zealand 에서 만들어진 와인에 대한 데이터프레임을 top_oceania_wines 이라는 이름의 변수로 저장.
is_point = reviews['points'] >= 95
is_New = reviews['country'] == "New Zealand"
is_Aus = reviews['country'] == "Australia"
condition = is_point&(is_Aus|is_New)
top_oceania_wines = reviews.loc[condition ,]
# reviews['country'].isin(["Australia","New Zealand"])
top_oceania_wines.head()
country | desc | designation | points | price | province | region_1 | region_2 | variety | winery | |
---|---|---|---|---|---|---|---|---|---|---|
2148 | Australia | Full-bodied and plush yet vibrant and imbued w... | The Factor | 98 | 125.0 | South Australia | Barossa Valley | NaN | Shiraz | Torbreck |
2458 | Australia | This is a top example of the classic Australia... | The Peake | 96 | 150.0 | South Australia | McLaren Vale | NaN | Cabernet-Shiraz | Hickinbotham |
3033 | Australia | This Cabernet equivalent to Grange has explode... | Bin 707 | 95 | 500.0 | South Australia | South Australia | NaN | Cabernet Sauvignon | Penfolds |
3044 | Australia | From vines planted in 1912, this has been an i... | Mount Edelstone Vineyard | 95 | 200.0 | South Australia | Eden Valley | NaN | Shiraz | Henschke |
3047 | Australia | This is a throwback to those brash, flavor-exu... | One | 95 | 95.0 | South Australia | Langhorne Creek | NaN | Red Blend | Heartland |
댓글남기기