Luminaire Outlier Detection Models: Structural Modeling¶
- exception luminaire.model.lad_structural.LADStructuralError(message)[source]¶
Exception class for Luminaire structural anomaly detection model.
- class luminaire.model.lad_structural.LADStructuralHyperParams(include_holidays_exog=True, p=2, q=2, is_log_transformed=True, max_ft_freq=3)[source]¶
Exception class for Luminaire structural anomaly detection model.
- Parameters:
include_holidays_exog (bool) – whether to include holidays as exogenous variables in the regression. Holidays are defined in
LADHolidays
p (int) – Order for the AR component of the model.
q (int) – Order for the MA component of the model.
is_log_transformed (bool) – A flag to specify whether to take a log transform of the input data. If the data contain negatives, is_log_transformed is ignored even though it is set to True.
max_ft_freq (int) – The maximum frequency order for the Fourier transformation.
- class luminaire.model.lad_structural.LADStructuralModel(hyper_params: {'include_holidays_exog': True, 'p': 2, 'q': 2, 'is_log_transformed': True, 'max_ft_freq': 3}, freq, min_ts_length=None, max_ts_length=None, min_ts_mean=None, min_ts_mean_window=None, **kwargs)[source]¶
A LAD structural time series model.
- Parameters:
hyper_params (dict) – Hyper parameters for Luminaire structural modeling. See
luminaire.model.lad_structural.LADStructuralHyperParams
for detailed information.freq (str) – The frequency of the time-series. A Pandas offset such as ‘D’, ‘H’, or ‘M’. Luminaire currently supports the following pandas frequency types: ‘H’, ‘D’, ‘W’, ‘W-SUN’, ‘W-MON’, ‘W-TUE’, ‘W-WED’, ‘W-THU’, ‘W-FRI’, ‘W-SAT’.
min_ts_length (int) – The minimum required length of the time series for training.
max_ts_length (int) – The maximum required length of the time series for training. The input time series will be truncated if the length is greater than this value.
min_ts_mean (float) – Minimum average values in the most recent window of the time series. This optional parameter can be used to avoid over-alerting from noisy low volume time series.
min_ts_mean_window (int) – Size of the most recent window to calculate min_ts_mean.
Note
This class should be used to manually configure the structural model. Exact configuration parameters can be found in luminaire.model.lad_structural.LADStructuralHyperParams. Optimal configuration can be obtained by using Luminaire hyperparameter optimization.
>>> hyper = {"include_holidays_exog": 0, "is_log_transformed": 1, "max_ft_freq": 2, "p": 5, "q": 1} lad_struct_model = LADStructuralModel(hyper_params=hyper, freq='D') >>> lad_struct_model <luminaire.model.lad_structural.LADStructuralModel object at 0x103efe320>
- score(observed_value, pred_date, **kwargs)[source]¶
This function scores a value observed at a data date given a trained LAD structural model object.
- Parameters:
observed_value (float) – Observed time series value on the prediction date.
pred_date (str) – Prediction date. Needs to be in yyyy-mm-dd or yyyy-mm-dd hh:mm:ss format.
- Returns:
Anomaly flag, anomaly probability, prediction and other related metrics.
- Return type:
dict
>>> model <luminaire.model.lad_structural.LADStructuralModel object at 0x11c1c3550> >>> model._params['training_end_date'] # Last data date for training time series '2020-06-07 00:00:00'
>>> model.score(2000 ,'2020-06-08') {'Success': True, 'IsLogTransformed': 0, 'AdjustedActual': 2000, 'Prediction': 1943.20426163425, 'StdErr': 93.084646777553, 'CILower': 1785.519523590432, 'CIUpper': 2100.88899967807, 'ConfLevel': 90.0, 'ExogenousHolidays': 0, 'IsAnomaly': False, 'IsAnomalyExtreme': False, 'AnomalyProbability': 0.42671448831719605, 'DownAnomalyProbability': 0.286642755841402, 'UpAnomalyProbability': 0.713357244158598, 'ModelFreshness': 0.1} >>> model.score(2500 ,'2020-06-09') {'Success': True, 'IsLogTransformed': 0, 'AdjustedActual': 2500, 'Prediction': 2028.989933854948, 'StdErr': 93.6623172459385, 'CILower': 1861.009403637476, 'CIUpper': 2186.97046407242, 'ConfLevel': 90.0, 'ExogenousHolidays': 0, 'IsAnomaly': True, 'IsAnomalyExtreme': True, 'AnomalyProbability': 0.9999987021695071, 'DownAnomalyProbability': 6.489152464261849e-07, 'UpAnomalyProbability': 0.9999993510847536, 'ModelFreshness': 0.2}
- train(data, optimize=False, validate=False, **kwargs)[source]¶
This function trains a structural LAD model for a given time series.
- Parameters:
data (pandas.DataFrame) – Input time series data
optimize (bool) – Flag to identify whether called from hyperparameter optimization
validate (bool) – Flag to identify whether to run model validation after training
- Returns:
success flag, the model date and the trained lad structural model object
- Return type:
tuple[bool, str, LADStructuralModel object]
>>> data raw interpolated 2020-01-01 1326.0 1326.0 2020-01-02 1552.0 1552.0 2020-01-03 1432.0 1432.0 2020-01-04 1470.0 1470.0 2020-01-05 1565.0 1565.0 ... ... ... 2020-06-03 1934.0 1934.0 2020-06-04 1873.0 1873.0 2020-06-05 1674.0 1674.0 2020-06-06 1747.0 1747.0 2020-06-07 1782.0 1782.0 >>> hyper = {"include_holidays_exog": 0, "is_log_transformed": 0, "max_ft_freq": 2, "p": 5, "q": 1} >>> de_obj = DataExploration(freq='D', is_log_transformed=0) >>> data, pre_prc = de_obj.profile(data) >>> pre_prc {'success': True, 'trend_change_list': ['2020-04-01 00:00:00'], 'change_point_list': ['2020-03-16 00:00:00'], 'is_log_transformed': 0, 'min_ts_mean': None, 'ts_start': '2020-01-01 00:00:00', 'ts_end': '2020-06-07 00:00:00'} >>> lad_struct_obj = LADStructuralModel(hyper_params=hyper, freq='D') >>> model = lad_struct_obj.train(data=data, **pre_prc)
>>> model (True, '2020-06-07 00:00:00', <luminaire.model.lad_structural.LADStructuralModel object at 0x126edf588>)
Luminaire Outlier Detection Models: Factoring holidays as exogenous¶
- class luminaire.model.model_utils.LADHolidays(name=None, holiday_rules=None)[source]¶
A class that generates holiday calendars to be used as external features in the batch outlier detection model. By default a list of common US holidays are included:
Memorial Day, plus the weekend leading into it
Veterans Day, plus the weekend leading into it
Labor Day
President’s Day
Martin Luther King Jr. Day
Valentine’s Day
Mother’s Day
Father’s Day
Independence Day (actual and observed)
Halloween
Superbowl
Easter
Thanksgiving, plus the following weekend
Christmas Eve, Christmas Day, and all dates up to New Year’s Day (actual and observed)
Luminaire Outlier Detection Models: Kalman Filter¶
- class luminaire.model.lad_filtering.LADFilteringHyperParams(is_log_transformed=True)[source]¶
Exception class for Luminaire filtering anomaly detection model.
- Parameters:
is_log_transformed (bool) – A flag to specify whether to take a log transform of the input data. If the data contain negatives, is_log_transformed is ignored even though it is set to True.
- class luminaire.model.lad_filtering.LADFilteringModel(hyper_params: {'is_log_transformed': True}, freq, min_ts_length=None, max_ts_length=None, **kwargs)[source]¶
A Markovian state space model. This model detects anomaly based on the residual process obtained through Kalman Filter based model estimation.
- Parameters:
hyper_params (dict) – Hyper parameters for Luminaire structural modeling. See
luminaire.model.lad_filtering.LADFilteringHyperParams
for detailed information.freq (str) – The frequency of the time-series. A Pandas offset such as ‘D’, ‘H’, or ‘M’. Luminaire currently supports the following pandas frequency types: ‘H’, ‘D’, ‘W’, ‘W-SUN’, ‘W-MON’, ‘W-TUE’, ‘W-WED’, ‘W-THU’, ‘W-FRI’, ‘W-SAT’.
min_ts_length (int) – The minimum required length of the time series for training.
max_ts_length (int) – The maximum required length of the time series for training. The input time series will be truncated if the length is greater than this value.
Note
This class should be used to manually configure the structural model. Exact configuration parameters can be found in luminaire.model.lad_filtering.LADFilteringHyperParams. Optimal configuration can be obtained by using Luminaire hyperparameter optimization.
>>> hyper = {"is_log_transformed": 1} lad_filtering_model = LADFilteringModel(hyper_params=hyper, freq='D')
>>> lad_filtering_model <luminaire.model.filtering.LADFilteringModel object at 0x103efe320>
- score(observed_value, pred_date, synthetic_actual=None, **kwargs)[source]¶
This function scores a value observed at a data date given a trained LAD filtering model object.
- Parameters:
observed_value (float) – Observed time series value on the prediction date.
pred_date (str) – Prediction date. Needs to be in yyyy-mm-dd or yyyy-mm-dd hh:mm:ss format.
synthetic_actual (float) – Synthetic time series value. This is an artificial value used to optimize classification accuracy in Luminaire hyperparameter optimization.
- Returns:
Model results and LAD filtering model object
- Return type:
tuple[dict, LADFilteringlModel object]
>>> model <luminaire.model.lad_filtering.LADFilteringModel object at 0x11f0b2b38> >>> model._params['training_end_date'] '2020-06-07 00:00:00'
>>> model.score(2000 ,'2020-06-08') ({'Success': True, 'AdjustedActual': 0.10110881711268949, 'ConfLevel': 90.0, 'Prediction': 1934.153554885343, 'PredStdErr': 212.4399633739204, 'IsAnomaly': False, 'IsAnomalyExtreme': False, 'AnomalyProbability': 0.4244056403219776, 'DownAnomalyProbability': 0.2877971798390112, 'UpAnomalyProbability': 0.7122028201609888, 'NonStationarityDiffOrder': 2, 'ModelFreshness': 0.1}, <luminaire.model.lad_filtering.LADFilteringModel object at 0x11f3c0860>)
- train(data, **kwargs)[source]¶
This function trains a filtering LAD model for a given time series.
- Parameters:
data (pandas.DataFrame) – Input time series data
- Returns:
The success flag, model date and a trained lad filtering object
- Return type:
tuple[bool, str, LADFilteringModel object]
>>> data raw interpolated 2020-01-01 1326.0 1326.0 2020-01-02 1552.0 1552.0 2020-01-03 1432.0 1432.0 2020-01-04 1470.0 1470.0 2020-01-05 1565.0 1565.0 ... ... ... 2020-06-03 1934.0 1934.0 2020-06-04 1873.0 1873.0 2020-06-05 1674.0 1674.0 2020-06-06 1747.0 1747.0 2020-06-07 1782.0 1782.0 >>> hyper = {"is_log_transformed": 1} >>> de_obj = DataExploration(freq='D', is_log_transformed=1, fill_rate=0.95) >>> data, pre_prc = de_obj.profile(data) >>> pre_prc {'success': True, 'trend_change_list': ['2020-04-01 00:00:00'], 'change_point_list': ['2020-03-16 00:00:00'], 'is_log_transformed': 1, 'min_ts_mean': None, 'ts_start': '2020-01-01 00:00:00', 'ts_end': '2020-06-07 00:00:00'} >>> lad_filter_obj = LADFilteringModel(hyper_params=hyper, freq='D') >>> model = lad_filter_obj.train(data=data, **pre_prc)
>>> model (True, '2020-06-07 00:00:00', <luminaire.model.lad_filtering.LADFilteringModel object at 0x11b6c4f60>)