Luminaire Outlier Detection Models: Structural Modeling

exception luminaire.model.lad_structural.LADStructuralError(message)[source]

Exception class for Luminaire structural anomaly detection model.

class luminaire.model.lad_structural.LADStructuralHyperParams(include_holidays_exog=True, p=2, q=2, is_log_transformed=True, max_ft_freq=3)[source]

Exception class for Luminaire structural anomaly detection model.

Parameters:
  • include_holidays_exog (bool) – whether to include holidays as exogenous variables in the regression. Holidays are defined in LADHolidays

  • p (int) – Order for the AR component of the model.

  • q (int) – Order for the MA component of the model.

  • is_log_transformed (bool) – A flag to specify whether to take a log transform of the input data. If the data contain negatives, is_log_transformed is ignored even though it is set to True.

  • max_ft_freq (int) – The maximum frequency order for the Fourier transformation.

class luminaire.model.lad_structural.LADStructuralModel(hyper_params: {'include_holidays_exog': True, 'p': 2, 'q': 2, 'is_log_transformed': True, 'max_ft_freq': 3}, freq, min_ts_length=None, max_ts_length=None, min_ts_mean=None, min_ts_mean_window=None, **kwargs)[source]

A LAD structural time series model.

Parameters:
  • hyper_params (dict) – Hyper parameters for Luminaire structural modeling. See luminaire.model.lad_structural.LADStructuralHyperParams for detailed information.

  • freq (str) – The frequency of the time-series. A Pandas offset such as ‘D’, ‘H’, or ‘M’. Luminaire currently supports the following pandas frequency types: ‘H’, ‘D’, ‘W’, ‘W-SUN’, ‘W-MON’, ‘W-TUE’, ‘W-WED’, ‘W-THU’, ‘W-FRI’, ‘W-SAT’.

  • min_ts_length (int) – The minimum required length of the time series for training.

  • max_ts_length (int) – The maximum required length of the time series for training. The input time series will be truncated if the length is greater than this value.

  • min_ts_mean (float) – Minimum average values in the most recent window of the time series. This optional parameter can be used to avoid over-alerting from noisy low volume time series.

  • min_ts_mean_window (int) – Size of the most recent window to calculate min_ts_mean.

Note

This class should be used to manually configure the structural model. Exact configuration parameters can be found in luminaire.model.lad_structural.LADStructuralHyperParams. Optimal configuration can be obtained by using Luminaire hyperparameter optimization.

>>> hyper = {"include_holidays_exog": 0, "is_log_transformed": 1, "max_ft_freq": 2, "p": 5, "q": 1}
lad_struct_model = LADStructuralModel(hyper_params=hyper, freq='D')
>>> lad_struct_model
<luminaire.model.lad_structural.LADStructuralModel object at 0x103efe320>
score(observed_value, pred_date, **kwargs)[source]

This function scores a value observed at a data date given a trained LAD structural model object.

Parameters:
  • observed_value (float) – Observed time series value on the prediction date.

  • pred_date (str) – Prediction date. Needs to be in yyyy-mm-dd or yyyy-mm-dd hh:mm:ss format.

Returns:

Anomaly flag, anomaly probability, prediction and other related metrics.

Return type:

dict

>>> model
<luminaire.model.lad_structural.LADStructuralModel object at 0x11c1c3550>
>>> model._params['training_end_date'] # Last data date for training time series
'2020-06-07 00:00:00'
>>> model.score(2000 ,'2020-06-08')
{'Success': True, 'IsLogTransformed': 0, 'AdjustedActual': 2000, 'Prediction': 1943.20426163425,
'StdErr': 93.084646777553, 'CILower': 1785.519523590432, 'CIUpper': 2100.88899967807, 'ConfLevel': 90.0,
'ExogenousHolidays': 0, 'IsAnomaly': False, 'IsAnomalyExtreme': False, 'AnomalyProbability': 0.42671448831719605,
'DownAnomalyProbability': 0.286642755841402, 'UpAnomalyProbability': 0.713357244158598, 'ModelFreshness': 0.1}
>>> model.score(2500 ,'2020-06-09')
{'Success': True, 'IsLogTransformed': 0, 'AdjustedActual': 2500, 'Prediction': 2028.989933854948,
'StdErr': 93.6623172459385, 'CILower': 1861.009403637476, 'CIUpper': 2186.97046407242, 'ConfLevel': 90.0,
'ExogenousHolidays': 0, 'IsAnomaly': True, 'IsAnomalyExtreme': True, 'AnomalyProbability': 0.9999987021695071,
'DownAnomalyProbability': 6.489152464261849e-07, 'UpAnomalyProbability': 0.9999993510847536,
'ModelFreshness': 0.2}
train(data, optimize=False, validate=False, **kwargs)[source]

This function trains a structural LAD model for a given time series.

Parameters:
  • data (pandas.DataFrame) – Input time series data

  • optimize (bool) – Flag to identify whether called from hyperparameter optimization

  • validate (bool) – Flag to identify whether to run model validation after training

Returns:

success flag, the model date and the trained lad structural model object

Return type:

tuple[bool, str, LADStructuralModel object]

>>> data
               raw interpolated
2020-01-01  1326.0       1326.0
2020-01-02  1552.0       1552.0
2020-01-03  1432.0       1432.0
2020-01-04  1470.0       1470.0
2020-01-05  1565.0       1565.0
...            ...          ...
2020-06-03  1934.0       1934.0
2020-06-04  1873.0       1873.0
2020-06-05  1674.0       1674.0
2020-06-06  1747.0       1747.0
2020-06-07  1782.0       1782.0
>>> hyper = {"include_holidays_exog": 0, "is_log_transformed": 0, "max_ft_freq": 2, "p": 5, "q": 1}
>>> de_obj = DataExploration(freq='D', is_log_transformed=0)
>>> data, pre_prc = de_obj.profile(data)
>>> pre_prc
{'success': True, 'trend_change_list': ['2020-04-01 00:00:00'], 'change_point_list': ['2020-03-16 00:00:00'],
'is_log_transformed': 0, 'min_ts_mean': None, 'ts_start': '2020-01-01 00:00:00',
'ts_end': '2020-06-07 00:00:00'}
>>> lad_struct_obj = LADStructuralModel(hyper_params=hyper, freq='D')
>>> model = lad_struct_obj.train(data=data, **pre_prc)
>>> model
(True, '2020-06-07 00:00:00', <luminaire.model.lad_structural.LADStructuralModel object at 0x126edf588>)

Luminaire Outlier Detection Models: Factoring holidays as exogenous

class luminaire.model.model_utils.LADHolidays(name=None, holiday_rules=None)[source]

A class that generates holiday calendars to be used as external features in the batch outlier detection model. By default a list of common US holidays are included:

  • Memorial Day, plus the weekend leading into it

  • Veterans Day, plus the weekend leading into it

  • Labor Day

  • President’s Day

  • Martin Luther King Jr. Day

  • Valentine’s Day

  • Mother’s Day

  • Father’s Day

  • Independence Day (actual and observed)

  • Halloween

  • Superbowl

  • Easter

  • Thanksgiving, plus the following weekend

  • Christmas Eve, Christmas Day, and all dates up to New Year’s Day (actual and observed)

Luminaire Outlier Detection Models: Kalman Filter

class luminaire.model.lad_filtering.LADFilteringHyperParams(is_log_transformed=True)[source]

Exception class for Luminaire filtering anomaly detection model.

Parameters:

is_log_transformed (bool) – A flag to specify whether to take a log transform of the input data. If the data contain negatives, is_log_transformed is ignored even though it is set to True.

class luminaire.model.lad_filtering.LADFilteringModel(hyper_params: {'is_log_transformed': True}, freq, min_ts_length=None, max_ts_length=None, **kwargs)[source]

A Markovian state space model. This model detects anomaly based on the residual process obtained through Kalman Filter based model estimation.

Parameters:
  • hyper_params (dict) – Hyper parameters for Luminaire structural modeling. See luminaire.model.lad_filtering.LADFilteringHyperParams for detailed information.

  • freq (str) – The frequency of the time-series. A Pandas offset such as ‘D’, ‘H’, or ‘M’. Luminaire currently supports the following pandas frequency types: ‘H’, ‘D’, ‘W’, ‘W-SUN’, ‘W-MON’, ‘W-TUE’, ‘W-WED’, ‘W-THU’, ‘W-FRI’, ‘W-SAT’.

  • min_ts_length (int) – The minimum required length of the time series for training.

  • max_ts_length (int) – The maximum required length of the time series for training. The input time series will be truncated if the length is greater than this value.

Note

This class should be used to manually configure the structural model. Exact configuration parameters can be found in luminaire.model.lad_filtering.LADFilteringHyperParams. Optimal configuration can be obtained by using Luminaire hyperparameter optimization.

>>> hyper = {"is_log_transformed": 1}
lad_filtering_model = LADFilteringModel(hyper_params=hyper, freq='D')
>>> lad_filtering_model
<luminaire.model.filtering.LADFilteringModel object at 0x103efe320>
score(observed_value, pred_date, synthetic_actual=None, **kwargs)[source]

This function scores a value observed at a data date given a trained LAD filtering model object.

Parameters:
  • observed_value (float) – Observed time series value on the prediction date.

  • pred_date (str) – Prediction date. Needs to be in yyyy-mm-dd or yyyy-mm-dd hh:mm:ss format.

  • synthetic_actual (float) – Synthetic time series value. This is an artificial value used to optimize classification accuracy in Luminaire hyperparameter optimization.

Returns:

Model results and LAD filtering model object

Return type:

tuple[dict, LADFilteringlModel object]

>>> model
<luminaire.model.lad_filtering.LADFilteringModel object at 0x11f0b2b38>
>>> model._params['training_end_date']
'2020-06-07 00:00:00'
>>> model.score(2000 ,'2020-06-08')
({'Success': True, 'AdjustedActual': 0.10110881711268949, 'ConfLevel': 90.0, 'Prediction': 1934.153554885343,
'PredStdErr': 212.4399633739204, 'IsAnomaly': False, 'IsAnomalyExtreme': False,
'AnomalyProbability': 0.4244056403219776, 'DownAnomalyProbability': 0.2877971798390112,
'UpAnomalyProbability': 0.7122028201609888, 'NonStationarityDiffOrder': 2, 'ModelFreshness': 0.1},
<luminaire.model.lad_filtering.LADFilteringModel object at 0x11f3c0860>)
train(data, **kwargs)[source]

This function trains a filtering LAD model for a given time series.

Parameters:

data (pandas.DataFrame) – Input time series data

Returns:

The success flag, model date and a trained lad filtering object

Return type:

tuple[bool, str, LADFilteringModel object]

>>> data
               raw interpolated
2020-01-01  1326.0       1326.0
2020-01-02  1552.0       1552.0
2020-01-03  1432.0       1432.0
2020-01-04  1470.0       1470.0
2020-01-05  1565.0       1565.0
...            ...          ...
2020-06-03  1934.0       1934.0
2020-06-04  1873.0       1873.0
2020-06-05  1674.0       1674.0
2020-06-06  1747.0       1747.0
2020-06-07  1782.0       1782.0
>>> hyper = {"is_log_transformed": 1}
>>> de_obj = DataExploration(freq='D', is_log_transformed=1, fill_rate=0.95)
>>> data, pre_prc = de_obj.profile(data)
>>> pre_prc
{'success': True, 'trend_change_list': ['2020-04-01 00:00:00'], 'change_point_list': ['2020-03-16 00:00:00'],
'is_log_transformed': 1, 'min_ts_mean': None, 'ts_start': '2020-01-01 00:00:00',
'ts_end': '2020-06-07 00:00:00'}
>>> lad_filter_obj = LADFilteringModel(hyper_params=hyper, freq='D')
>>> model = lad_filter_obj.train(data=data, **pre_prc)
>>> model
(True, '2020-06-07 00:00:00', <luminaire.model.lad_filtering.LADFilteringModel object at 0x11b6c4f60>)
exception luminaire.model.lad_filtering.LADFilteringModelError(message)[source]

Exception class for Luminaire filtering anomaly detection model.