3 분 소요

Working with Time Series

Dates and Times in Python

Native Python dates and times: datetime and dateutil

from datetime import datetime
someday = datetime(2021,11,11)
someday.isoformat()
'2021-11-11T00:00:00'
someday.strftime('%A')
'Thursday'
someday.strftime('%Y/%m/%d')
'2021/11/11'
str_date = '2021-11-16'
from dateutil import parser
my_date = parser.parse(str_date)
my_date.strftime('%A')
'Tuesday'
my_date.weekday()
1

strftime section

datetime documentation

dateutil’s online documentation

Typed arrays of times: NumPy’s datetime64

기존의 파이썬 datetime 을 보강하기 위해, date 의 array 도 처리할 수 있게 numpy 에서 64-bit 로 처리하도록 라이브러리를 강화했음.

import numpy as np
any_date = np.array('2020-11-19',dtype=np.datetime64)
any_date
array('2020-11-19', dtype='datetime64[D]')
any_date + 10
numpy.datetime64('2020-11-29')
# 넘파이의 datetime64는 날짜 연산 + - 가 된다.
any_date - 35
numpy.datetime64('2020-10-15')
# 파이썬에서 기본적으로 제공하는 datetime은 연산이 안된다.
# someday + 10 = error
np.arange(10) + any_date
array(['2020-11-19', '2020-11-20', '2020-11-21', '2020-11-22',
       '2020-11-23', '2020-11-24', '2020-11-25', '2020-11-26',
       '2020-11-27', '2020-11-28'], dtype='datetime64[D]')
any_date + np.arange(0,70,7)
array(['2020-11-19', '2020-11-26', '2020-12-03', '2020-12-10',
       '2020-12-17', '2020-12-24', '2020-12-31', '2021-01-07',
       '2021-01-14', '2021-01-21'], dtype='datetime64[D]')
Code Meaning Time span (relative) Time span (absolute)
Y Year ± 9.2e18 years [9.2e18 BC, 9.2e18 AD]
M Month ± 7.6e17 years [7.6e17 BC, 7.6e17 AD]
W Week ± 1.7e17 years [1.7e17 BC, 1.7e17 AD]
D Day ± 2.5e16 years [2.5e16 BC, 2.5e16 AD]
h Hour ± 1.0e15 years [1.0e15 BC, 1.0e15 AD]
m Minute ± 1.7e13 years [1.7e13 BC, 1.7e13 AD]
s Second ± 2.9e12 years [ 2.9e9 BC, 2.9e9 AD]
ms Millisecond ± 2.9e9 years [ 2.9e6 BC, 2.9e6 AD]
us Microsecond ± 2.9e6 years [290301 BC, 294241 AD]
ns Nanosecond ± 292 years [ 1678 AD, 2262 AD]
ps Picosecond ± 106 days [ 1969 AD, 1970 AD]
fs Femtosecond ± 2.6 hours [ 1969 AD, 1970 AD]
as Attosecond ± 9.2 seconds [ 1969 AD, 1970 AD]

Dates and times in pandas: best of both worlds

import pandas as pd
dates = ['2021-01-04','2021-01-07','2021-01-08','2021-01-22']
dates2 = ['2021/01/04','2021/01/07','2021/01/08','2021/01/22']
dates3 = ['2021,01,04','2021,01,07','2021,01,08','2021,01,22']
pd.to_datetime(dates)
DatetimeIndex(['2021-01-04', '2021-01-07', '2021-01-08', '2021-01-22'], dtype='datetime64[ns]', freq=None)
pd.to_datetime(dates2)
DatetimeIndex(['2021-01-04', '2021-01-07', '2021-01-08', '2021-01-22'], dtype='datetime64[ns]', freq=None)
pd.to_datetime(dates3, format = '%Y,%m,%d')
DatetimeIndex(['2021-01-04', '2021-01-07', '2021-01-08', '2021-01-22'], dtype='datetime64[ns]', freq=None)
pd.to_timedelta(np.arange(10),'D')
TimedeltaIndex(['0 days', '1 days', '2 days', '3 days', '4 days', '5 days',
                '6 days', '7 days', '8 days', '9 days'],
               dtype='timedelta64[ns]', freq=None)
# 1일 단위
any_date + pd.to_timedelta(np.arange(10),'D')
DatetimeIndex(['2020-11-19', '2020-11-20', '2020-11-21', '2020-11-22',
               '2020-11-23', '2020-11-24', '2020-11-25', '2020-11-26',
               '2020-11-27', '2020-11-28'],
              dtype='datetime64[ns]', freq=None)
# 1주일 단위
any_date + pd.to_timedelta(np.arange(10),'W')
DatetimeIndex(['2020-11-19', '2020-11-26', '2020-12-03', '2020-12-10',
               '2020-12-17', '2020-12-24', '2020-12-31', '2021-01-07',
               '2021-01-14', '2021-01-21'],
              dtype='datetime64[ns]', freq=None)
# 1시간 단위
any_date + pd.to_timedelta(np.arange(10),'h')
DatetimeIndex(['2020-11-19 00:00:00', '2020-11-19 01:00:00',
               '2020-11-19 02:00:00', '2020-11-19 03:00:00',
               '2020-11-19 04:00:00', '2020-11-19 05:00:00',
               '2020-11-19 06:00:00', '2020-11-19 07:00:00',
               '2020-11-19 08:00:00', '2020-11-19 09:00:00'],
              dtype='datetime64[ns]', freq=None)

Pandas Time Series: Indexing by Time

dates
['2021-01-04', '2021-01-07', '2021-01-08', '2021-01-22']
date_index = pd.DatetimeIndex(dates)
date_index
DatetimeIndex(['2021-01-04', '2021-01-07', '2021-01-08', '2021-01-22'], dtype='datetime64[ns]', freq=None)
df = pd.Series(data=[6,2,1,8],index = date_index)
df
2021-01-04    6
2021-01-07    2
2021-01-08    1
2021-01-22    8
dtype: int64

Regular sequences: pd.date_range()

# 시작일과 종료일을 셋팅하면 알아서 날짜를 채워준다.
pd.date_range('2021-11-12','2021-12-29')
DatetimeIndex(['2021-11-12', '2021-11-13', '2021-11-14', '2021-11-15',
               '2021-11-16', '2021-11-17', '2021-11-18', '2021-11-19',
               '2021-11-20', '2021-11-21', '2021-11-22', '2021-11-23',
               '2021-11-24', '2021-11-25', '2021-11-26', '2021-11-27',
               '2021-11-28', '2021-11-29', '2021-11-30', '2021-12-01',
               '2021-12-02', '2021-12-03', '2021-12-04', '2021-12-05',
               '2021-12-06', '2021-12-07', '2021-12-08', '2021-12-09',
               '2021-12-10', '2021-12-11', '2021-12-12', '2021-12-13',
               '2021-12-14', '2021-12-15', '2021-12-16', '2021-12-17',
               '2021-12-18', '2021-12-19', '2021-12-20', '2021-12-21',
               '2021-12-22', '2021-12-23', '2021-12-24', '2021-12-25',
               '2021-12-26', '2021-12-27', '2021-12-28', '2021-12-29'],
              dtype='datetime64[ns]', freq='D')
# 시작일, 종료일 , 주기! 도 셋팅할 수 있다.
# 주 단위로
pd.date_range('2021-11-12','2021-12-29',freq ='W')
DatetimeIndex(['2021-11-14', '2021-11-21', '2021-11-28', '2021-12-05',
               '2021-12-12', '2021-12-19', '2021-12-26'],
              dtype='datetime64[ns]', freq='W-SUN')
# 매주 수요일 단위로
pd.date_range('2021-11-12','2021-12-29',freq = 'W-Wed')
DatetimeIndex(['2021-11-17', '2021-11-24', '2021-12-01', '2021-12-08',
               '2021-12-15', '2021-12-22', '2021-12-29'],
              dtype='datetime64[ns]', freq='W-WED')
# 시작일만 정하고, 갯수표시, 주기표시
pd.date_range('2021-11-07',periods=10, freq='M')
DatetimeIndex(['2021-11-30', '2021-12-31', '2022-01-31', '2022-02-28',
               '2022-03-31', '2022-04-30', '2022-05-31', '2022-06-30',
               '2022-07-31', '2022-08-31'],
              dtype='datetime64[ns]', freq='M')
# 월초로 변경
pd.date_range('2021-11-07',periods=10, freq='MS')
DatetimeIndex(['2021-12-01', '2022-01-01', '2022-02-01', '2022-03-01',
               '2022-04-01', '2022-05-01', '2022-06-01', '2022-07-01',
               '2022-08-01', '2022-09-01'],
              dtype='datetime64[ns]', freq='MS')
# 연도를 기준으로 변경
pd.date_range('2021-11-07',periods=10, freq='Y')
DatetimeIndex(['2021-12-31', '2022-12-31', '2023-12-31', '2024-12-31',
               '2025-12-31', '2026-12-31', '2027-12-31', '2028-12-31',
               '2029-12-31', '2030-12-31'],
              dtype='datetime64[ns]', freq='A-DEC')
# 연초로 변경
pd.date_range('2021-11-07',periods=10, freq='YS')
DatetimeIndex(['2022-01-01', '2023-01-01', '2024-01-01', '2025-01-01',
               '2026-01-01', '2027-01-01', '2028-01-01', '2029-01-01',
               '2030-01-01', '2031-01-01'],
              dtype='datetime64[ns]', freq='AS-JAN')
# 공휴일을 제외한 영업일(비즈니스 데이)로 변경
pd.date_range('2021-11-07',periods=10, freq='B')
DatetimeIndex(['2021-11-08', '2021-11-09', '2021-11-10', '2021-11-11',
               '2021-11-12', '2021-11-15', '2021-11-16', '2021-11-17',
               '2021-11-18', '2021-11-19'],
              dtype='datetime64[ns]', freq='B')

Frequencies and Offsets

Code Description Code Description
D Calendar day B Business day
W Weekly    
M Month end BM Business month end
Q Quarter end BQ Business quarter end
A Year end BA Business year end
H Hours BH Business hours
T Minutes    
S Seconds    
L Milliseonds    
U Microseconds    
N nanoseconds    

댓글남기기