gtime.feature_generation.Calendar

class gtime.feature_generation.Calendar(region: str = 'america', country: str = 'Brazil', start_date: str = '01/01/2018', end_date: str = '01/01/2020', kernel: Union[List[T], numpy.ndarray] = None, reindex_method: str = 'pad')

Create a feature based on the national holidays of a specific country.

The interface for this is based on the one of ‘workalendar’. To see which regions and countries are available, check the ‘workalendar’ documentation.

Parameters:
region : str, optional, default: 'america'

The region in which the country is located.

country : str, optional, default: 'Brazil'

The name of the country from which to retrieve the holidays. The country must be located in the given region. For certain countries workalendar provides additional ‘subregions’. In order to use them instead of the whole country, just use the name of the subregion instead of the country name (e.g. ‘Vaud’ instead of ‘Switzerland’ for the canton of Vaud which is a part of Switzerland).

start_date : str, optional, default: '01/01/2019'

The date starting from which to retrieve the holidays.

end_date : str, optional, default: '01/01/2020'

The date until which to retrieve the holidays.

kernel : array-like, optional, default: None

The kernel to use when creating the feature. The holiday feature is created by taking the dot product between the kernel and the column which contains a 1 if the corresponding day is a holiday and a 0 if the day is not a holiday. The rolling window has the same size as the kernel and the calculated value of the dot product is divided by the number of holidays in the window to get the value of the holiday feature.

reindex_method : str, optional, default: pad

Used only if X is passed in the transform method. It is used as the method with which to reindex the holiday events with the index of X. This method should be compatible with the reindex methods provided by pandas. Please refer to the pandas documentation for further details.

Examples

>>> import pandas as pd
>>> from gtime.feature_extraction import Calendar
>>> X = pd.DataFrame(range(0, 10), index=pd.period_range(start='2019-04-18',
...                  end='2019-04-27', freq='d'))
>>> cal_feature = Calendar(region="europe", country="Italy", kernel=[2, 1])
>>> cal_feature.fit_transform(X)
            status__Calendar
2019-04-18               0.0
2019-04-19               0.0
2019-04-20               0.0
2019-04-21               1.0
2019-04-22               2.0
2019-04-23               0.0
2019-04-24               1.0
2019-04-25               2.0
2019-04-26               0.0
2019-04-27               0.0

Methods

fit(self, X[, y]) Fit the estimator.
fit_transform(self, X[, y]) Fit to data, then transform it.
get_feature_names(self) Return feature names for output features.
get_params(self[, deep]) Get parameters for this estimator.
set_params(self, \*\*params) Set the parameters of this estimator.
transform(self, time_series, NoneType] = None) Generate a DataFrame containing the events associated to the holidays of the selected country.
__init__(self, region: str = 'america', country: str = 'Brazil', start_date: str = '01/01/2018', end_date: str = '01/01/2020', kernel: Union[List, numpy.ndarray] = None, reindex_method: str = 'pad')

Initialize self. See help(type(self)) for accurate signature.

fit(self, X: pandas.core.frame.DataFrame, y=None)

Fit the estimator. Just used to be compatible with the sklearn API.

Parameters:
X : pd.DataFrame, shape (n_samples, n_features)

Input data.

y : None

There is no need of a target in a transformer, yet the pipeline API requires this parameter.

Returns:
self : object

Returns self.

fit_transform(self, X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
X : numpy array of shape [n_samples, n_features]

Training set.

y : numpy array of shape [n_samples]

Target values.

**fit_params : dict

Additional fit parameters.

Returns:
X_new : numpy array of shape [n_samples, n_features_new]

Transformed array.

get_feature_names(self)

Return feature names for output features.

Returns:
output_feature_names : ndarray, shape (n_output_features,)

Array of feature names.

get_params(self, deep=True)

Get parameters for this estimator.

Parameters:
deep : bool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
params : mapping of string to any

Parameter names mapped to their values.

set_params(self, **params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**params : dict

Estimator parameters.

Returns:
self : object

Estimator instance.

transform(self, time_series: Union[pandas.core.frame.DataFrame, NoneType] = None) → pandas.core.frame.DataFrame

Generate a DataFrame containing the events associated to the holidays of the selected country.

Parameters:
time_series : pd.DataFrame, shape (n_samples, 1), optional, default: None

If provided, both start_date and end_date are going to be overwritten with the start and end date of the index of time_series. Also, if provided the output DataFrame is going to be re-indexed with the index of time_series, using the chosen reindex_method.

Returns:
events : pd.DataFrame, shape (length, 1)

A DataFrame containing the events.