Luminaire Streaming Anomaly Detection Models: Window Density Model

class luminaire.model.window_density.WindowDensityHyperParams(freq=None, max_missing_train_prop=0.1, is_log_transformed=False, baseline_type='aggregated', detection_method=None, min_window_length=None, max_window_length=None, window_length=None, detrend_method='modeling')[source]

Hyperparameter class for Luminaire Window density model.

Parameters:
  • freq (str) – The frequency of the time-series. Luminaire supports default configuration for ‘S’, T, ‘15T’, ‘H’, ‘D’. Any other frequency type should be specified as ‘custom’ and configuration should be set manually.

  • max_missing_train_prop (float) – Maximum proportion of missing observation allowed in the training data.

  • is_log_transformed (bool) – A flag to specify whether to take a log transform of the input data. If the data contain negatives, is_log_transformed is ignored even though it is set to True.

  • baseline_type (str) –

    A string flag to specify whether to take set a baseline as the previous sub-window from the training data for scoring or to aggregate the overall window as a baseline. Possible values:

    • ”last_window”

    • ”aggregated”

  • detection_method (str) –

    A string that select between two window testing method. Possible values:

    • ”kldiv” (KL-divergence). This is recommended to be set for high frequency time series such as ‘S’, ‘T’ etc.

    • ”sign_test” (Wilcoxon sign rank test). This is recommended to be set for low frequency time series such as ‘H’, ‘D’ etc.

  • min_window_length (int) –

    Minimum size of the scoring window / a stable training sub-window length.

    Note

    This is not the minimum size of the whole training window which is the combination of stable sub-windows.

  • max_window_length (int) –

    Maximum size of the scoring window / a stable training sub-window length.

    Note

    This is not the maximum size of the whole training window which is the combination of stable sub-windows.

  • window_length (int) –

    Size of the scoring window / a stable training sub-window length.

    Note

    This is not the size of the whole training window which is the combination of stable sub-windows.

  • detrend_method (str) –

    A string that select between two stationarizing method. Possible values:

    • ”ma” (moving average based)

    • ”diff” (differencing based).

class luminaire.model.window_density.WindowDensityModel(hyper_params: {'freq': None, 'max_missing_train_prop': 0.1, 'is_log_transformed': False, 'baseline_type': 'aggregated', 'detection_method': None, 'min_window_length': None, 'max_window_length': None, 'window_length': None, 'detrend_method': 'modeling'}, **kwargs)[source]

This model detects anomalous windows using KL divergence (for high frequency data) and Wilcoxon sign rank test (for low frequency data). This default monitoring frequency is set to pandas time frequency type ‘T’.

Parameters:

hyper_params (dict) – Hyper parameters for Luminaire window density model. See luminaire.model.window_density.WindowDensityHyperParams for detailed information.

Returns:

Anomaly probability for the execution window and other related model outputs

Return type:

list[dict]

score(data, **kwargs)[source]

Function scores input series for anomalies

Parameters:

data (pandas.DataFrame) – Input time series to score

Returns:

Output dictionary with scoring summary.

Return type:

dict

>>> data
                        raw interpolated
index
2018-10-11 00:00:00  204800       204800
2018-10-11 01:00:00  222218       222218
2018-10-11 02:00:00  218903       218903
2018-10-11 03:00:00  190639       190639
2018-10-11 04:00:00  148214       148214
2018-10-11 05:00:00  106358       106358
2018-10-11 06:00:00   70081        70081
2018-10-11 07:00:00   47748        47748
2018-10-11 08:00:00   36837        36837
2018-10-11 09:00:00   33023        33023
2018-10-11 10:00:00   44432        44432
2018-10-11 11:00:00   72773        72773
2018-10-11 12:00:00  115180       115180
2018-10-11 13:00:00  157568       157568
2018-10-11 14:00:00  180174       180174
2018-10-11 15:00:00  190048       190048
2018-10-11 16:00:00  188391       188391
2018-10-11 17:00:00  189233       189233
2018-10-11 18:00:00  191703       191703
2018-10-11 19:00:00  189848       189848
2018-10-11 20:00:00  192685       192685
2018-10-11 21:00:00  196743       196743
2018-10-11 22:00:00  193016       193016
2018-10-11 23:00:00  196441       196441
>>> model
<luminaire.model.window_density.WindowDensityModel object at 0x7fcaab72fdd8>
>>> model.score(data)
{'Success': True, 'ConfLevel': 99.9, 'IsAnomaly': False, 'AnomalyProbability': 0.6963188902776808}
train(data, **kwargs)[source]

Input time series for training.

Parameters:

data (pandas.DataFrame) – Input time series.

Returns:

Trained model with the training timestamp and a success flag

Return type:

tuple(bool, str, python model object)

>>> data
                        raw interpolated
index
2017-10-02 00:00:00  118870       118870
2017-10-02 01:00:00  121914       121914
2017-10-02 02:00:00  116097       116097
2017-10-02 03:00:00   94511        94511
2017-10-02 04:00:00   68330        68330
...                     ...          ...
2018-10-10 19:00:00  219908       219908
2018-10-10 20:00:00  219149       219149
2018-10-10 21:00:00  207232       207232
2018-10-10 22:00:00  198741       198741
2018-10-10 23:00:00  213751       213751
>>> hyper_params = WindowDensityHyperParams(freq='H').params
>>> wdm_obj = WindowDensityModel(hyper_params=hyper_params)
>>> success, model = wdm_obj.train(data)
>>> success, model
(True, "2018-10-10 23:00:00", <luminaire.model.window_density.WindowDensityModel object at 0x7fd7c5a34e80>)