Otsad Manual

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 40

DownloadOtsad-manual
Open PDF In BrowserView PDF
Package ‘otsad’
July 19, 2018
Type Package
Title Online Time Serie Anomaly Detectors
Version 0.1.0
Description Implements a set of online fault detector detectors
for time-series, called: EWMA, TSSD-EWMA, PEWMA and KNN-CAD and
KNN-LDCD. The first three algorithms belong to prediction-based
techniques and the last two belong to window-based techniques.
In addition, the SD-EWMA and PEWMA algorithms are algorithms
designed to work in stationary environments, while the other three
are algorithms designed to work in non-stationary environments.
Depends R (>= 3.4.0)
License MIT + file LICENSE
Encoding UTF-8
LazyData true
RoxygenNote 6.0.1
Suggests testthat,
stream,
knitr,
rmarkdown
Imports stats,
digest
VignetteBuilder knitr

R topics documented:
ContextCrosser.CAD . . . .
ContextualAnomalyDetector
CpKnnCad . . . . . . . . .
CpPewma . . . . . . . . . .
CpSdEwma . . . . . . . . .
CpTsSdEwma . . . . . . . .
GetAnomalyScore.CAD . .
GetContextByFacts.CAD . .
IntToBinarySens . . . . . .
IpKnnCad . . . . . . . . . .
IpPewma . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
1

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

2
3
4
5
7
8
10
10
11
12
15

2

ContextCrosser.CAD
IpSdEwma . . . . . . . . . . . . . .
IpTsSdEwma . . . . . . . . . . . .
OcpPewma . . . . . . . . . . . . .
OcpSdEwma . . . . . . . . . . . .
OcpTsSdEwma . . . . . . . . . . .
OipPewma . . . . . . . . . . . . . .
OipSdEwma . . . . . . . . . . . . .
OipTsSdEwma . . . . . . . . . . .
Step.CAD . . . . . . . . . . . . . .
UpdateContextsAndGetActive.CAD

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

Index

.
.
.
.
.
.
.
.
.
.

18
21
24
25
27
29
32
34
38
39
40

ContextCrosser.CAD

Context Crosser for Contextual Anomaly Detector (ContextualAD)

Description
ContextCrosser.CAD is an auxiliar method that makes the necessary changes in the context.operator
environment, including in the fact.list and the corresponding dictionary of facts the new facts.
Usage
ContextCrosser.CAD(left.or.right, fact.list, new.ctxt.flag = FALSE,
potential.new.contexts = list(), context.operator)
Arguments
left.or.right

Situation from which the method have been called. It says if the new facts have
been seen or if they are completely new.

fact.list

New fact list.

new.ctxt.flag Flag that indicates if a new context is required.
potential.new.contexts
Potential new contexts that might be added.
context.operator
Environment with the current status for the context operator.
Details
left.or.right must be 0 or 1 to indicate left or right respectively.
Value
If left.or.right, number of new contex added tho the context.operator. Otherwise, the output
from UpdateContextsAndGetActive.CAD.
References
Smirnov, M. (2018). CAD: Contextual Anomaly Detector. https://github.com/smirmik/CAD

ContextualAnomalyDetector

3

ContextualAnomalyDetector
Contextual Anomaly Detector - Open Source (CAD)

Description
ContextualAnomalyDetector calculates the anomaly score of a dataset using the notion of contexts conformed by facts and provides probabilistic abnormality scores.

Usage
ContextualAnomalyDetector(data, rest.period = max(min(150, round(nrow(data) *
0.03), 1)), max.left.semicontexts = 7, max.active.neurons = 15,
num.norm.value.bits = 3, base.threshold = 0.75)
Arguments
data

Numerical vector with training and test dataset.

rest.period

Training period after an anomaly.

max.left.semicontexts
Number of semicontexts that should be maintained in memory.
max.active.neurons
Number of neurons of the model.
num.norm.value.bits
Granularity of the transformation into discrete values
base.threshold Threshold to be considered an anomaly.
Details
data must be a numerical vector without NA values. threshold must be a numeric value between 0 and 1. If the anomaly score obtained for an observation is greater than the threshold, the
observation will be considered abnormal.

Value
Anomaly score of data.
References
Smirnov, M. (2018). CAD: Contextual Anomaly Detector. https://github.com/smirmik/CAD

4

CpKnnCad

CpKnnCad

Classic processing KNN based Conformal Anomaly Detector (KNNCAD)

Description
CpKnnCad calculates the anomalies of a dataset using classical processing based on the KNN-CAD
algorithm. KNN-CAD is a model-free anomaly detection method for univariate time-series which
adapts itself to non-stationarity in the data stream and provides probabilistic abnormality scores
based on the conformal prediction paradigm.
Usage
CpKnnCad(data, n.train, threshold = 1, l = 19, k = 27,
ncm.type = "ICAD", reducefp = TRUE)
Arguments
data

Numerical vector with training and test dataset.

n.train

Number of points of the dataset that correspond to the training set.

threshold

Anomaly threshold.

l

Window length.

k

Number of neighbours to take into account.

ncm.type

Non Conformity Measure to use "ICAD" or "LDCD"

reducefp

If TRUE reduces false positives.

Details
data must be a numerical vector without NA values. threshold must be a numeric value between 0
and 1. If the anomaly score obtained for an observation is greater than the threshold, the observation will be considered abnormal. l must be a numerical value between 1 and 1/n; n being the length
of the training data. Take into account that the value of l has a direct impact on the computational
cost, so very high values will make the execution time longer. k parameter must be a numerical
value less than the n.train value. ncm.type determines the non-conformity measurement to be
used. ICAD calculates dissimilarity as the sum of the distances of the nearest k neighbours and
LDCD as the average.
Value
dataset conformed by the following columns:
is.anomaly

1 if the value is anomalous, 0 otherwise.

anomaly.score

Probability of anomaly.

References
V. Ishimtsev, I. Nazarov, A. Bernstein and E. Burnaev. Conformal k-NN Anomaly Detector for
Univariate Data Streams. ArXiv e-prints, jun. 2017.

CpPewma

5

Examples
## Generate data
set.seed(100)
n <- 500
x <- sample(1:100, n, replace = TRUE)
x[70:90] <- sample(110:115, 21, replace = TRUE)
x[25] <- 200
x[320] <- 170
df <- data.frame(timestamp=1:n,value=x)
## Set parameters
params.KNN <- list(threshold = 1, n.train = 50, l = 19, k = 17)
## Calculate anomalies
result <- CpKnnCad(
data = df$value,
n.train = params.KNN$n.train,
threshold = params.KNN$threshold,
l = params.KNN$l,
k = params.KNN$k,
ncm.type = "ICAD",
reducefp = TRUE
)
## Plot results
res <- cbind(df[(params.KNN$n.train + 1):n,],
is.anomaly = result$is.anomaly[(params.KNN$n.train + 1):n])
y.limits <- c(-150,250)
plot(x = res$timestamp, y = res$value, type = "l", ylim = y.limits,
xlab = "timestamp", ylab = "value", main = "KNN-CAD ANOMALY DETECTOR")
points(x = res[res$is.anomaly == TRUE, "timestamp"],
y = res[res$is.anomaly == TRUE, "value"], pch=4, col="red", lwd = 2)

CpPewma

Classic Processing Probabilistic-EWMA (PEWMA).

Description
CpPewma calculates the anomalies of a dataset using classical processing based on the PEWMA
algorithm. This algorithm is a probabilistic method of EWMA which dynamically adjusts the parameterization based on the probability of the given observation. This method produces dynamic,
data-driven anomaly thresholds which are robust to abrupt transient changes, yet quickly adjust to
long-term distributional shifts. See also OcpPewma, the optimized and faster function of the this
function.
Usage
CpPewma(data, n.train = 5, alpha0 = 0.8, beta = 0.3, l = 3)
Arguments
data

Numerical vector with training and test dataset.

6

CpPewma
n.train

Number of points of the dataset that correspond to the training set.

alpha0

Maximal weighting parameter.

beta

Weight placed on the probability of the given observation.

l

Control limit multiplier.

Details
data must be a numerical vector without NA values. alpha0 must be a numeric value where 0
< alpha0 < 1. If a faster adjustment to the initial shift is desirable, simply lowering alpha0 will
suffice. beta is the weight placed on the probability of the given observation. It must be a numeric
value where 0 <= beta <= 1. Note that if beta equals 0, PEWMA converges to a standard EWMA.
Finally l is the parameter that determines the control limits. By default, 3 is used.
Value
dataset conformed by the following columns:
is.anomaly

1 if the value is anomalous 0, otherwise.

ucl

Upper control limit.

lcl

Lower control limit.

References
M. Carter, Kevin y W. Streilein. Probabilistic reasoning for streaming anomaly detection. 2012
IEEE Statistical Signal Processing Workshop (SSP), pp. 377-380, Aug 2012.
Examples
## Generate data
set.seed(100)
n <- 500
x <- sample(1:100, n, replace = TRUE)
x[70:90] <- sample(110:115, 21, replace = TRUE)
x[25] <- 200
x[320] <- 170
df <- data.frame(timestamp=1:n,value=x)
## Calculate anomalies
result <- CpPewma(
data = df$value,
n.train = 5,
alpha0 = 0.8,
beta = 0.1,
l = 3
)
## Plot results
res <- cbind(df, result)
res <- res[1:500,]
y.limits <- c(-150,250)
plot(x = res$timestamp, y = res$value, type = "l", ylim = y.limits,
xlab = "timestamp", ylab = "value", main = "PEWMA ANOMALY DETECTOR")
points(x = res[res$is.anomaly == 1, "timestamp"],
y = res[res$is.anomaly == 1, "value"], pch=4, col="red", lwd = 2)

CpSdEwma

7

par(new=TRUE)
plot(x = res$timestamp, y = res$ucl, type="l", col="green", xaxt="n",
ylim = y.limits, xlab = "", ylab = "")
par(new=TRUE)
plot(x = res$timestamp, y = res$lcl, type="l", col="green", xaxt="n",
ylim = y.limits, xlab = "", ylab = "")

CpSdEwma

Classic Processing Shift-Detection based on EWMA (SD-EWMA).

Description
CpSdEwma calculates the anomalies of a dataset using classical processing based on the SD-EWMA
algorithm. This algorithm is a novel method for covariate shift-detection tests based on a twostage structure for univariate time-series. It works in an online mode and it uses an exponentially
weighted moving average (EWMA) model based control chart to detect the covariate shift-point in
non-stationary time-series. See also OcpSdEwma, the optimized and faster function of this function.
Usage
CpSdEwma(train.data, test.data, threshold = 0.01, l = 3)
Arguments
train.data

Numerical vector with the training set.

test.data

Numerical vector with the test set.

threshold

Error smoothing constant.

l

Control limit multiplier.

Details
train.data and test.data must be numerical vectors without NA values. threshold must be
a numeric value between 0 and 1. It is recommended to use low values such as 0.01 or 0.05. By
default, 0.01 is used. Finally, l is the parameter that determines the control limits. By default, 3 is
used.
Value
dataset conformed by the following columns:
is.anomaly

1 if the value is anomalous 0, otherwise.

ucl

Upper control limit.

lcl

Lower control limit.

References
Raza, H., Prasad, G., & Li, Y. (03 de 2015). EWMA model based shift-detection methods for
detecting covariate shifts in non-stationary environments. Pattern Recognition, 48(3), 659-669.

8

CpTsSdEwma

Examples
## Generate data
set.seed(100)
n <- 500
x <- sample(1:100, n, replace = TRUE)
x[70:90] <- sample(110:115, 21, replace = TRUE)
x[25] <- 200
x[320] <- 170
df <- data.frame(timestamp=1:n,value=x)
## Calculate anomalies
result <- CpSdEwma(
train.data = df[1:5,"value"],
test.data = df[6:n,"value"],
threshold = 0.01,
l = 3
)
res <- cbind(df[6:n,], result)
rownames(res) <- 1:(n-5)
## Plot results
y.limits <- c(-150,250)
plot(x = res$timestamp, y = res$value, type = "l", ylim = y.limits,
xlab = "timestamp", ylab = "value", main = "SD-EWMA ANOMALY DETECTOR")
points(x = res[res$is.anomaly == 1, "timestamp"],
y = res[res$is.anomaly == 1, "value"], pch=4, col="red", lwd = 2)
par(new=TRUE)
plot(x = res$timestamp, y = res$ucl, type="l", col="red", xaxt="n",
ylim = y.limits, xlab = "", ylab = "")
par(new=TRUE)
plot(x = res$timestamp, y = res$lcl, type="l", col="red", xaxt="n",
ylim = y.limits, xlab = "", ylab = "")

CpTsSdEwma

Classic Processing Two-Stage Shift-Detection based on EWMA

Description
CpTsSdEwma calculates the anomalies of a dataset using classical processing based on the SDEWMA algorithm. This algorithm is a novel method for covariate shift-detection tests based on
a two-stage structure for univariate time-series. This algorithm works in two phases. In the first
phase, it detects anomalies using the SD-EWMA CpSdEwma algorithm. In the second phase, it
checks the veracity of the anomalies using the Kolmogorov-Simirnov test to reduce false alarms.
See also OcpTsSdEwma, the optimized and faster function of this function.
Usage
CpTsSdEwma(train.data, test.data, threshold, l = 3, m = 5)
Arguments
train.data

Numerical vector with the training set.

test.data

Numerical vector with the test set.

CpTsSdEwma

9

threshold

Error smoothing constant.

l

Control limit multiplier.

m

Length of the subsequences for applying the Kolmogorov-Smirnov test.

Details
train.data and test.data must be numerical vectors without NA values. threshold must be
a numeric value between 0 and 1. It is recommended to use low values such as 0.01 or 0.05. By
default, 0.01 is used. Finally, l is the parameter that determines the control limits. By default, 3 is
used. m is the length of the subsequences for applying the Kolmogorov-Smirnov test. By default, 5
is used. It should be noted that the last m values will not been verified because another m values are
needed to be able to perform the verification.
Value
dataset conformed by the following columns:
is.anomaly

1 if the value is anomalous, 0 otherwise.

ucl

Upper control limit.

lcl

Lower control limit.

References
Raza, H., Prasad, G., & Li, Y. (03 de 2015). EWMA model based shift-detection methods for
detecting covariate shifts in non-stationary environments. Pattern Recognition, 48(3), 659-669.
Examples
## Generate data
set.seed(100)
n <- 500
x <- sample(1:100, n, replace = TRUE)
x[70:90] <- sample(110:115, 21, replace = TRUE)
x[25] <- 200
x[320] <- 170
df=data.frame(timestamp=1:n,value=x)
## Calculate anomalies
result <- CpTsSdEwma(
train.data = df[1:5,"value"],
test.data = df[6:n,"value"],
threshold = 0.01,
l = 3,
m = 20
)
res <- cbind(df[6:n,], result)
rownames(res) <- 1:(n-5)
## Plot results
res <- res[1:500,]
y.limits <- c(-150,250)
plot(x = res$timestamp, y = res$value, type = "l", ylim = y.limits,
xlab = "timestamp", ylab = "value", main = "TSSD-EWMA ANOMALY DETECTOR")
points(x = res[res$is.anomaly == 1, "timestamp"],
y = res[res$is.anomaly == 1, "value"], pch=4, col="red", lwd = 2)

10

GetContextByFacts.CAD
par(new=TRUE)
plot(x = res$timestamp, y = res$ucl, type="l", col="red", xaxt="n",
ylim = y.limits, xlab = "", ylab = "")
par(new=TRUE)
plot(x = res$timestamp, y = res$lcl, type="l", col="red", xaxt="n",
ylim = y.limits, xlab = "", ylab = "")

GetAnomalyScore.CAD

Get Anomaly Score for Contextual Anomaly Detector

Description
GetAnomalyScore.CAD calculates the anomaly score of a single datum using the notion of contexts
conformed by facts and provides probabilistic abnormality scores.
Usage
GetAnomalyScore.CAD(datum, local.environment)
Arguments
datum
Integer vector of activated bits.
local.environment
Local environment with the variables of the Context Anomaly Detector.
Details
datum must be a numerical vector of the transformation made by IntToBinarySens.
Value
Anomaly score of datum.
References
Smirnov, M. (2018). CAD: Contextual Anomaly Detector. https://github.com/smirmik/CAD

GetContextByFacts.CAD Get Context By Facts for Contextual Anomaly Detector

Description
GetContextByFacts.CAD is an auxiliary method that determine by the complete facts list whether
the context is already saved to the memory. If the context is not found the function immediately
creates such. To optimize speed and volume of the occupied memory the contexts are divided into
semi-contexts as several contexts can contain the same facts set in its left and right parts.
Usage
GetContextByFacts.CAD(context.list, context.operator, zero.level = 0)

IntToBinarySens

11

Arguments
context.list

List of potentially new contexts

context.operator
Environment with the current status for the context operator.
zero.level

Flag indicating the context type in transmitted list

Value
Depending on the type of potentially new context transmitted as an input parameters the function
returns either: a) flag indicating that the transmitted zero-level context is a new/existing one; or: b)
number of the really new contexts that have been saved to the context memory.
References
Smirnov, M. (2018). CAD: Contextual Anomaly Detector. https://github.com/smirmik/CAD

IntToBinarySens

Int To Binary Sens

Description
IntToBinarySens Auxiliar function that transforms integer to binary bins.
Usage
IntToBinarySens(x, num.norm.value.bits)
Arguments
x

input data

num.norm.value.bits
Expected number of binary bins
Value
Vector with activated numeric bits.
Examples
otsad:::IntToBinarySens(2, 3)
otsad:::IntToBinarySens(10, 5)

12

IpKnnCad

IpKnnCad

Incremental processing KNN based Conformal Anomaly Detector
(KNN-CAD).

Description
IpKnnCad allows the calculation of anomalies using SD-EWMA in an incremental processing mode.
KNN-CAD is a model-free anomaly detection method for univariate time-series which adapts itself
to non-stationarity in the data stream and provides probabilistic abnormality scores based on the
conformal prediction paradigm.
Usage
IpKnnCad(data, n.train, threshold = 1, l = 19, k = 27,
ncm.type = "ICAD", reducefp = TRUE, to.next.iteration = NULL)
Arguments
data

Numerical vector with training and test dataset.

n.train

Number of points of the dataset that correspond to the training set.

threshold

Anomaly threshold.

l

Window length.

k

Number of neighbours to take into account.

ncm.type

Non Conformity Measure to use "ICAD" or "LDCD"

reducefp

If TRUE reduces false positives.

to.next.iteration
list with the necessary parameters to execute in the next iteration.
Details
data must be a numerical vector without NA values. threshold must be a numeric value between 0
and 1. If the anomaly score obtained for an observation is greater than the threshold, the observation will be considered abnormal. l must be a numerical value between 1 and 1/n; n being the length
of the training data. Take into account that the value of l has a direct impact on the computational
cost, so very high values will make the execution time longer. k parameter must be a numerical
value less than the n.train value. ncm.type determines the non-conformity measurement to be
used. ICAD calculates dissimilarity as the sum of the distances of the nearest k neighbours and
LDCD as the average. to.next.iteration is the last result returned by some previous execution
of this algorithm. The first time the algorithm is executed its value is NULL. However, to run a new
batch of data without having to include it in the old dataset and restart the process, this parameter
returned by the last run is only needed.
This algorithm can be used for both classical and incremental processing. It should be noted that
in case of having a finite dataset, the CpKnnCad algorithm is faster. Incremental processing can be
used in two ways. 1) Processing all available data and saving calibration.alpha and last.data
for future runs with new data. 2) Using the stream library for when there is much data and it does
not fit into the memory. An example has been made for this use case.

IpKnnCad

13

Value
dataset conformed by the following columns:
is.anomaly

1 if the value is anomalous 0, otherwise.

anomaly.score Probability of anomaly.
to.next.iteration
Last result returned by the algorithm. It is a list containing the following items.
• training.set Last training set values used in the previous iteration and required for the next
run.
• calibration.set Last calibration set values used in the previous iteration and required for
the next run.
• sigma Last covariance matrix calculated in the previous iteration and required for the next
run.
• alphas Last calibration alpha values calculated in the previous iteration and required for the
next run.
• last.data Last values of the dataset converted into multi-dimensional vectors..
• pred Parameter that is used to reduce false positives. Only necessary in case of reducefp is
TRUE.
• record.count Number of observations that have been processed up to the last iteration.
References
V. Ishimtsev, I. Nazarov, A. Bernstein and E. Burnaev. Conformal k-NN Anomaly Detector for
Univariate Data Streams. ArXiv e-prints, jun. 2017.
Examples
## EXAMPLE 1: ---------------------## It can be used in the same way as with CpKnnCad passing the whole dataset as
## an argument.
## Generate data
set.seed(100)
n <- 500
x <- sample(1:100, n, replace = TRUE)
x[70:90] <- sample(110:115, 21, replace = TRUE)
x[25] <- 200
x[320] <- 170
df <- data.frame(timestamp=1:n,value=x)
## Set parameters
params.KNN <- list(threshold = 1, n.train = 50, l = 19, k = 17)
## Calculate anomalies
result <- IpKnnCad(
data = df$value,
n.train = params.KNN$n.train,
threshold = params.KNN$threshold,
l = params.KNN$l,
k = params.KNN$k,
ncm.type = "ICAD",

14

IpKnnCad

)

reducefp = TRUE

## Plot results
res <- cbind(df[(params.KNN$n.train + 1):n,],
is.anomaly = result$is.anomaly[(params.KNN$n.train + 1):n])
y.limits <- c(-150,250)
plot(x = res$timestamp, y = res$value, type = "l", ylim = y.limits,
xlab = "timestamp", ylab = "value", main = "KNN-CAD ANOMALY DETECTOR")
points(x = res[res$is.anomaly == TRUE, "timestamp"],
y = res[res$is.anomaly == TRUE, "value"], pch=4, col="red", lwd = 2)
## EXAMPLE 2: ---------------------## You can use it in an incremental way. This is an example using the stream
## library. This library allows the simulation of streaming operation.
# install.packages("stream")
library("stream")
## Generate data
set.seed(100)
n <- 500
x <- sample(1:100, n, replace = TRUE)
x[70:90] <- sample(110:115, 21, replace = TRUE)
x[25] <- 200
x[320] <- 170
df=data.frame(timestamp=1:n,value=x)
dsd_df <- DSD_Memory(df)
## Initialize parameters for the loop
last.res <- NULL
res <- NULL
nread <- 100
numIter <- n%/%nread
## Set parameters
params.KNN <- list(threshold = 1, n.train = 50, l = 19, k = 17)
## Calculate anomalies
for(i in 1:numIter) {
# read new data
newRow <- get_points(dsd_df, n = nread, outofpoints = "ignore")
# calculate if it's an anomaly
last.res <- IpKnnCad(
data = newRow$value,
n.train = params.KNN$n.train,
threshold = params.KNN$threshold,
l = params.KNN$l,
k = params.KNN$k,
ncm.type = "ICAD",
reducefp = TRUE,
to.next.iteration = last.res$to.next.iteration
)
# prepare the result
if(!is.null(last.res$is.anomaly)){
res <- rbind(res, cbind(newRow, is.anomaly = last.res$is.anomaly))
}

IpPewma

15

}
## Plot results
res <- res[(params.KNN$n.train + 1):n,]
y.limits <- c(-150,250)
plot(x = res$timestamp, y = res$value, type = "l", ylim = y.limits,
xlab = "timestamp", ylab = "value", main = "KNN-CAD ANOMALY DETECTOR")
points(x = res[res$is.anomaly == 1, "timestamp"],
y = res[res$is.anomaly == 1, "value"], pch=4, col="red", lwd = 2)

IpPewma

Incremental Processing Probabilistic-EWMA (PEWMA).

Description
IpPewma allows the calculation of anomalies using PEWMA in an incremental processing mode.
See also OipPewma, the optimized and faster function of this function This algorithm is a probabilistic method of EWMA which dynamically adjusts the parameterization based on the probability of
the given observation. This method produces dynamic, data-driven anomaly thresholds which are
robust to abrupt transient changes, yet quickly adjust to long-term distributional shifts.
Usage
IpPewma(data, n.train = 5, alpha0 = 0.8, beta = 0, l = 3,
last.res = NULL)
Arguments
data

Numerical vector with training and test dataset.

n.train

Number of points of the dataset that correspond to the training set.

alpha0

Maximal weighting parameter.

beta

Weight placed on the probability of the given observation.

l

Control limit multiplier.

last.res

Last result returned by the algorithm.

Details
data must be a numerical vector without NA values. alpha0 must be a numeric value where 0
< alpha0 < 1. If a faster adjustment to the initial shift is desirable, simply lowering alpha0 will
suffice. beta is the weight placed on the probability of the given observation. it must be a numeric
value where 0 <= beta <= 1. Note that beta equals 0, PEWMA converges to a standard EWMA.
Finally l is the parameter that determines the control limits. By default, 3 is used. last.res is the
last result returned by some previous execution of this algorithm. The first time the algorithm is
executed its value is NULL. However, to run a new batch of data without having to include it in the
old dataset and restart the process, the two parameters returned by the last run are only needed.

16

IpPewma
This algorithm can be used for both classical and incremental processing. It should be noted that in
case of having a finite dataset the CpPewma or OcpPewma algorithms are faster. Incremental processing can be used in two ways. 1) Processing all available data and saving last.res for future runs
in which there is new data. 2) Using the stream library for when there is too much data and it does
not fit into the memory. An example has been made for this use case.

Value
A list of the following items.
result

dataset conformed by the following columns.

• is.anomaly 1 if the value is anomalous 0, otherwise.
• ucl Upper control limit.
• lcl Lower control limit.
last.res

Last result returned by the algorithm. Is a dataset containing the parameters
calculated in the last iteration and necessary for the next one.

References
M. Carter, Kevin y W. Streilein. Probabilistic reasoning for streaming anomaly detection. 2012
IEEE Statistical Signal Processing Workshop (SSP), pp. 377-380, Aug 2012.
Examples
## EXAMPLE 1: ---------------------## It can be used in the same way as with CpPewma passing the whole dataset as
## an argument.
## Generate data
set.seed(100)
n <- 500
x <- sample(1:100, n, replace = TRUE)
x[70:90] <- sample(110:115, 21, replace = TRUE)
x[25] <- 200
x[320] <- 170
df=data.frame(timestamp=1:n,value=x)
## Calculate anomalies
result <- IpPewma(
data = df$value,
alpha0 = 0.8,
beta = 0,
n.train = 5,
l = 3,
last.res = NULL
)
res <- cbind(df, result$result)
## Plot results
y.limits <- c(-150,250)
plot(x = res$timestamp, y = res$value, type = "l", ylim = y.limits,
xlab = "timestamp", ylab = "value", main = "PEWMA ANOMALY DETECTOR")
points(x = res[res$is.anomaly == 1, "timestamp"],
y = res[res$is.anomaly == 1, "value"], pch=4, col="red", lwd = 2)

IpPewma
par(new=TRUE)
plot(x = res$timestamp, y = res$ucl, type="l", col="green", xaxt="n",
ylim = y.limits, xlab = "", ylab = "")
par(new=TRUE)
plot(x = res$timestamp, y = res$lcl, type="l", col="green", xaxt="n",
ylim = y.limits, xlab = "", ylab = "")
## EXAMPLE 2: ---------------------## You can use it in an incremental way. This is an example using the stream
## library. This library allows the simulation of streaming operation.
# install.packages("stream")
library("stream")
## Generate data
set.seed(100)
n <- 500
x <- sample(1:100, n, replace = TRUE)
x[70:90] <- sample(110:115, 21, replace = TRUE)
x[25] <- 200
x[320] <- 170
df=data.frame(timestamp=1:n,value=x)
dsd_df <- DSD_Memory(df)
## Initialize parameters for the loop
last.res <- NULL
res <- NULL
nread <- 100
numIter <- n%/%nread
## Calculate anomalies
for(i in 1:numIter) {
# read new data
newRow <- get_points(dsd_df, n = nread, outofpoints = "ignore")
# calculate if it's an anomaly
last.res <- IpPewma(
data = newRow$value,
n.train = 5,
alpha0 = 0.8,
beta = 0,
l = 3,
last.res = last.res$last.res
)
# prepare the result
if(!is.null(last.res$result)){
res <- rbind(res, cbind(newRow, last.res$result))
}
}
## Plot results
res <- res[6:500,]
y.limits <- c(-150,250)
plot(x = res$timestamp, y = res$value, type = "l", ylim = y.limits,
xlab = "timestamp", ylab = "value", main = "PEWMA ANOMALY DETECTOR")
points(x = res[res$is.anomaly == 1, "timestamp"],
y = res[res$is.anomaly == 1, "value"], pch=4, col="red", lwd = 2)
par(new=TRUE)

17

18

IpSdEwma
plot(x = res$timestamp, y = res$ucl, type="l", col="green", xaxt="n",
ylim = y.limits, xlab = "", ylab = "")
par(new=TRUE)
plot(x = res$timestamp, y = res$lcl, type="l", col="green", xaxt="n",
ylim = y.limits, xlab = "", ylab = "")

IpSdEwma

Incremental Processing Shift-Detection based on EWMA (SDEWMA).

Description
IpSdEwma allows the calculation of anomalies using SD-EWMA in an incremental processing mode.
See also OipSdEwma, the optimized and faster function of this function SD-EWMA algorithm is a
novel method for covariate shift-detection tests based on a two-stage structure for univariate timeseries. It works in an online mode and it uses an exponentially weighted moving average (EWMA)
model based control chart to detect the covariate shift-point in non-stationary time-series.
Usage
IpSdEwma(data, n.train, threshold = 0.01, l = 3, last.res = NULL)
Arguments
data

Numerical vector with training and test dataset.

n.train

Number of points of the dataset that correspond to the training set.

threshold

Error smoothing constant.

l

Control limit multiplier.

last.res

Last result returned by the algorithm.

Details
data must be a numerical vector without NA values. threshold must be a numeric value between
0 and 1. It is recommended to use low values such as 0.01 or 0.05. By default, 0.01 is used. l
is the parameter that determines the control limits. By default, 3 is used. Finally last.res is the
last result returned by some previous execution of this algorithm. The first time the algorithm is
executed its value is NULL. However, to run a new batch of data without having to include it in the
old dataset and restart the process, the two parameters returned by the last run are only needed.
This algorithm can be used for both classical and incremental processing. It should be noted that
in case of having a finite dataset the CpSdEwma or OcpSdEwma algorithms are faster. Incremental
processing can be used in two ways. 1) Processing all available data and saving last.res for
future runs in which there is new data. 2) Using the stream library for when there is too much data
and it does not fit into memory. An example has been made for this use case.

IpSdEwma

19

Value
A list of the following items.
result

dataset conformed by the following columns.

• is.anomaly 1 if the value is anomalous 0 otherwise.
• ucl Upper control limit.
• lcl Lower control limit.
last.res

Last result returned by the algorithm. Is a dataset containing the parameters
calculated in the last iteration and necessary for the next one.

References
Raza, H., Prasad, G., & Li, Y. (03 de 2015). EWMA model based shift-detection methods for
detecting covariate shifts in non-stationary environments. Pattern Recognition, 48(3), 659-669.
Examples
## EXAMPLE 1: ---------------------## It can be used in the same way as with CpSdEwma passing the whole dataset as
## an argument.
## Generate data
set.seed(100)
n <- 500
x <- sample(1:100, n, replace = TRUE)
x[70:90] <- sample(110:115, 21, replace = TRUE)
x[25] <- 200
x[320] <- 170
df=data.frame(timestamp=1:n,value=x)
## Calculate anomalies
result <- IpSdEwma(
data = df$value,
n.train = 5,
threshold = 0.01,
l = 3
)
res <- cbind(df[6:n,], result$result)
rownames(res) <- 1:(n-5)
## Plot results
y.limits <- c(-150,250)
plot(x = res$timestamp, y = res$value, type = "l", ylim = y.limits,
xlab = "timestamp", ylab = "value", main = "SD-EWMA ANOMALY DETECTOR")
points(x = res[res$is.anomaly == 1, "timestamp"],
y = res[res$is.anomaly == 1, "value"], pch=4, col="red", lwd = 2)
par(new=TRUE)
plot(x = res$timestamp, y = res$ucl, type="l", col="red", xaxt="n",
ylim = y.limits, xlab = "", ylab = "")
par(new=TRUE)
plot(x = res$timestamp, y = res$lcl, type="l", col="red", xaxt="n",
ylim = y.limits, xlab = "", ylab = "")

20

IpSdEwma
## EXAMPLE 2: ---------------------## You can use it in an incremental way. This is an example using the stream
## library. This library allows the simulation of streaming operation.
# install.packages("stream")
library("stream")
## Generate data
set.seed(100)
n <- 500
x <- sample(1:100, n, replace = TRUE)
x[70:90] <- sample(110:115, 21, replace = TRUE)
x[25] <- 200
x[320] <- 170
df=data.frame(timestamp=1:n,value=x)
dsd_df <- DSD_Memory(df)
## Initialize parameters for the loop
last.res <- NULL
res <- NULL
nread <- 100
numIter <- n%/%nread
## Calculate anomalies
for(i in 1:numIter) {
# read new data
newRow <- get_points(dsd_df, n = nread, outofpoints = "ignore")
# calculate if it's an anomaly
last.res <- IpSdEwma(
data = newRow$value,
n.train = 5,
threshold = 0.01,
l = 3,
last.res = last.res$last.res
)
# prepare the result
if(!is.null(last.res$result)){
res <- rbind(res, cbind(newRow[(nread-nrow(last.res$result)+1):nread,],
last.res$result))
}
}
print(res)
## Plot results
n.train <- 5
rownames(res) <- 1:(n-n.train)
y.limits <- c(-150,250)
plot(x = res$timestamp, y = res$value, type = "l", ylim = y.limits,
xlab = "timestamp", ylab = "value", main = "SD-EWMA ANOMALY DETECTOR")
points(x = res[res$is.anomaly == 1, "timestamp"],
y = res[res$is.anomaly == 1, "value"], pch=4, col="red", lwd = 2)
par(new=TRUE)
plot(x = res$timestamp, y = res$ucl, type="l", col="red", xaxt="n",
ylim = y.limits, xlab = "", ylab = "")
par(new=TRUE)
plot(x = res$timestamp, y = res$lcl, type="l", col="red", xaxt="n",
ylim = y.limits, xlab = "", ylab = "")

IpTsSdEwma

IpTsSdEwma

21

Incremental Processing Two-Stage Shift-Detection based on EWMA

Description
IpTsSdEwma allows the calculation of anomalies using TSSD-EWMA in an incremental processing
mode. See also OipTsSdEwma, the optimized and faster function of this function. This algorithm
is a novel method for covariate shift-detection tests based on a two-stage structure for univariate
time-series. TSSD-EWMA works in two phases. In the first phase, it detects anomalies using the
SD-EWMA CpSdEwma algorithm. In the second phase, it checks the veracity of the anomalies using
the Kolmogorov-Simirnov test to reduce false alarms.
Usage
IpTsSdEwma(data, n.train, threshold, l = 3, m = 5,
to.next.iteration = list(last.res = NULL, to.check = NULL, last.m = NULL))
Arguments
data

Numerical vector with training and test dataset.

n.train

Number of points of the dataset that correspond to the training set.

threshold

Error smoothing constant.

l

Control limit multiplier.

m
Length of the subsequences for applying the Kolmogorov-Smirnov test.
to.next.iteration
list with the necessary parameters to execute in the next iteration
Details
data must be a numerical vector without NA values. threshold must be a numeric value between
0 and 1. It is recommended to use low values such as 0.01 or 0.05. By default, 0.01 is used. Finally,
l is the parameter that determines the control limits. By default, 3 is used. m is the length of the
subsequences for applying the Kolmogorov-Smirnov test. By default, 5 is used. It should be noted
that the last m values have not been verified because you need other m values to be able to perform
the verification. Finally to.next.iteration is the last result returned by some previous execution
of this algorithm. The first time the algorithm is executed its value is NULL. However, to run a
new batch of data without having to include it in the old dataset and restart the process, the two
parameters returned by the last run are only needed.
Value
A list of the following items.
last.data.checked
Anomaly results of the last m results of the previous iteration. dataset conformed
by the following columns.
• is.anomaly 1 if the value is anomalous 0 otherwise.

22

IpTsSdEwma
• ucl Upper control limit.
• lcl Lower control limit.
checked.results

Anomaly results of the dataset excluding the last m values because they could
not be verified. dataset conformed by the following columns: is.anomaly, ucl,
lcl.
to.next.iteration
Last result returned by the algorithm. It is a list containing the following items.
• last.res Last result returned by the aplicaction of SD-EWMA function with the calculations
of the parameters of the last run . These are necessary for the next run.
• to.check Subsequence of the last remaining unchecked values to be checked in the next
iteration. dataset conformed by the following columns: is.anomaly, ucl, lcl, value.
• last.m Subsequence of the m values prior to the to.check subsecuence necessary to verify the
values in to.check.

References
Raza, H., Prasad, G., & Li, Y. (03 de 2015). EWMA model based shift-detection methods for
detecting covariate shifts in non-stationary environments. Pattern Recognition, 48(3), 659-669.
Examples
## EXAMPLE 1: ---------------------## It can be used in the same way as with CpTsSdEwma passing the whole dataset
## as an argument.
## Generate data
set.seed(100)
n <- 500
x <- sample(1:100, n, replace = TRUE)
x[70:90] <- sample(110:115, 21, replace = TRUE)
x[25] <- 200
x[320] <- 170
df=data.frame(timestamp=1:n,value=x)
## Calculate anomalies
result <- IpTsSdEwma(
data = df$value,
n.train = 5,
threshold = 0.01,
l = 3,
m = 20
)
res <- cbind(df[6:n,], rbind(result$last.data.checked, result$checked.results,
result$to.next.iteration$to.check[, -4]))
rownames(res) <- 1:(n-5)
## Plot results
res <- res[1:500,]
y.limits <- c(-150,250)
plot(x = res$timestamp, y = res$value, type = "l", ylim = y.limits,
xlab = "timestamp", ylab = "value", main = "TSSD-EWMA ANOMALY DETECTOR")
points(x = res[res$is.anomaly == 1, "timestamp"],

IpTsSdEwma
y = res[res$is.anomaly == 1, "value"], pch=4, col="red", lwd = 2)
par(new=TRUE)
plot(x = res$timestamp, y = res$ucl, type="l", col="red", xaxt="n",
ylim = y.limits, xlab = "", ylab = "")
par(new=TRUE)
plot(x = res$timestamp, y = res$lcl, type="l", col="red", xaxt="n",
ylim = y.limits, xlab = "", ylab = "")
## EXAMPLE 2: ---------------------## You can use it in an incremental way. This is an example using the stream
## library. This library allows the simulation of streaming operation.
# install.packages("stream")
library("stream")
## Generate data
set.seed(100)
n <- 500
x <- sample(1:100, n, replace = TRUE)
x[70:90] <- sample(110:115, 21, replace = TRUE)
x[25] <- 200
x[320] <- 170
df=data.frame(timestamp=1:n,value=x)
dsd_df <- DSD_Memory(df)
## Initialize parameters for the loop
last.res <- NULL
res <- NULL
nread <- 100
numIter <- n%/%nread
m <- 20
## Calculate anomalies
for(i in 1:numIter) {
# read new data
newRow <- get_points(dsd_df, n = nread, outofpoints = "ignore")
# calculate if it's an anomaly
last.res <- IpTsSdEwma(
data = newRow$value,
n.train = 5,
threshold = 0.01,
l = 3,
m = m,
to.next.iteration = last.res$to.next.iteration
)
# prepare result
if(!is.null(last.res$last.data.checked)){
res <- rbind(res, cbind(last.timestamp, last.res$last.data.checked))
}
if(!is.null(last.res$checked.results)){
init <- nread - (nrow(last.res$checked.results) +
nrow(last.res$to.next.iteration$to.check)) + 1
end <- init + nrow(last.res$checked.results) - 1
res <- rbind(res, cbind(newRow[init:end,], last.res$checked.results))
}
if(i == numIter){
res <- rbind(res,

23

24

OcpPewma
cbind(timestamp = newRow[
(nread - nrow(last.res$to.next.iteration$to.check)
+ 1):nread, "timestamp"],
last.res$to.next.iteration$to.check))

}

}
last.timestamp <- newRow[(nread-m+1):nread,]

## Plot results
rownames(res) <- 1:(n-5)
y.limits <- c(-150,250)
plot(x = res$timestamp, y = res$value, type = "l", ylim = y.limits,
xlab = "timestamp", ylab = "value", main = "TSSD-EWMA ANOMALY DETECTOR")
points(x = res[res$is.anomaly == 1, "timestamp"],
y = res[res$is.anomaly == 1, "value"], pch=4, col="red", lwd = 2)
par(new=TRUE)
plot(x = res$timestamp, y = res$ucl, type="l", col="red", xaxt="n",
ylim = y.limits, xlab = "", ylab = "")
par(new=TRUE)
plot(x = res$timestamp, y = res$lcl, type="l", col="red", xaxt="n",
ylim = y.limits, xlab = "", ylab = "")

OcpPewma

Optimized Classic Processing Probabilistic-EWMA (PEWMA).

Description
OcpPewma calculates the anomalies of a dataset using an optimized version of classical processing
Probabilistic-EWMA algorithm. It Is an optimized implementation of the CpPewma algorithm using
environmental variables. It has been shown that in long datasets it can reduce runtime by up to 50%.
TThis algorithm is a probabilistic method of EWMA which dynamically adjusts the parameterization based on the probability of the given observation. This method produces dynamic, data-driven
anomaly thresholds which are robust to abrupt transient changes, yet quickly adjust to long-term
distributional shifts.
Usage
OcpPewma(data, alpha0 = 0.2, beta = 0, n.train = 5, l = 3)
Arguments
data
alpha0
beta
n.train
l

Numerical vector with training and test datasets.
Maximal weighting parameter.
Weight placed on the probability of the given observation.
Number of points of the dataset that correspond to the training set.
Control limit multiplier.

Details
data must be a numerical vector without NA values. alpha0 must be a numeric value where 0
< alpha0 < 1. If a faster adjustment to the initial shift is desirable, simply lowering alpha0 will
suffice. beta is the weight placed on the probability of the given observation. It must be a numeric
value where 0 <= beta <= 1. Note that if beta equals 0, PEWMA converges to a standard EWMA.
Finally l is the parameter that determines the control limits. By default, 3 is used.

OcpSdEwma

25

Value
dataset conformed by the following columns:
is.anomaly

1 if the value is anomalous 0, otherwise.

ucl

Upper control limit.

lcl

Lower control limit.

References
M. Carter, Kevin y W. Streilein. Probabilistic reasoning for streaming anomaly detection. 2012
IEEE Statistical Signal Processing Workshop (SSP), pp. 377-380, Aug 2012.
Examples
## Generate data
set.seed(100)
n <- 500
x <- sample(1:100, n, replace = TRUE)
x[70:90] <- sample(110:115, 21, replace = TRUE)
x[25] <- 200
x[320] <- 170
df <- data.frame(timestamp=1:n,value=x)
## Calculate anomalies
result <- OcpPewma(
data = df$value,
n.train = 5,
alpha0 = 0.8,
beta = 0.1,
l = 3
)
## Plot results
res <- cbind(df, result)
res <- res[1:500,]
y.limits <- c(-150,250)
plot(x = res$timestamp, y = res$value, type = "l", ylim = y.limits,
xlab = "timestamp", ylab = "value", main = "PEWMA ANOMALY DETECTOR")
points(x = res[res$is.anomaly == 1, "timestamp"],
y = res[res$is.anomaly == 1, "value"], pch=4, col="red", lwd = 2)
par(new=TRUE)
plot(x = res$timestamp, y = res$ucl, type="l", col="green", xaxt="n",
ylim = y.limits, xlab = "", ylab = "")
par(new=TRUE)
plot(x = res$timestamp, y = res$lcl, type="l", col="green", xaxt="n",
ylim = y.limits, xlab = "", ylab = "")

OcpSdEwma

Optimized Classic Processing Shift-Detection based on EWMA (SDEWMA).

26

OcpSdEwma

Description
OcpSdEwma calculates the anomalies of a dataset using an optimized version of classical processing
based on the SD-EWMA algorithm. It is an optimized implementation of the CpSdEwma algorithm
using environment variables. It has been shown that in long datasets it can reduce runtime by up
to 50%. SD-EWMA algorithm is a novel method for covariate shift-detection tests based on a twostage structure for univariate time-series. It works in an online mode and it uses an exponentially
weighted moving average (EWMA) model based control chart to detect the covariate shift-point in
non-stationary time-series.
Usage
OcpSdEwma(train.data, test.data, threshold, l = 3)
Arguments
train.data

Numerical vector with the training set.

test.data

Numerical vector with the test set.

threshold

Error smoothing constant.

l

Control limit multiplier.

Details
train.data and test.data must be numerical vectors without NA values. threshold must be
a numeric value between 0 and 1. It is recommended to use low values such as 0.01 or 0.05. By
default, 0.01 is used. Finally, l is the parameter that determines the control limits. By default, 3 is
used.
Value
dataset conformed by the following columns:
is.anomaly

1 if the value is anomalous 0, otherwise.

ucl

Upper control limit.

lcl

Lower control limit.

References
Raza, H., Prasad, G., & Li, Y. (03 de 2015). EWMA model based shift-detection methods for
detecting covariate shifts in non-stationary environments. Pattern Recognition, 48(3), 659-669.
Examples
## Generate data
set.seed(100)
n <- 500
x <- sample(1:100, n, replace = TRUE)
x[70:90] <- sample(110:115, 21, replace = TRUE)
x[25] <- 200
x[320] <- 170
df <- data.frame(timestamp = 1:n, value = x)
## Calculate anomalies
result <- OcpSdEwma(

OcpTsSdEwma

27

train.data = df[1:5,"value"],
test.data = df[6:n,"value"],
threshold = 0.01,
l = 3

)
res <- cbind(df[6:n,], result)
rownames(res) <- 1:(n-5)

## Plot results
y.limits <- c(-150,250)
plot(x = res$timestamp, y = res$value, type = "l", ylim = y.limits,
xlab = "timestamp", ylab = "value", main = "SD-EWMA ANOMALY DETECTOR")
points(x = res[res$is.anomaly == 1, "timestamp"],
y = res[res$is.anomaly == 1, "value"], pch=4, col="red", lwd = 2)
par(new=TRUE)
plot(x = res$timestamp, y = res$ucl, type="l", col="red", xaxt="n",
ylim = y.limits, xlab = "", ylab = "")
par(new=TRUE)
plot(x = res$timestamp, y = res$lcl, type="l", col="red", xaxt="n",
ylim = y.limits, xlab = "", ylab = "")

OcpTsSdEwma

Optimized Classic Processing Two-Stage Shift-Detection based on
EWMA

Description
OcpTsSdEwma calculates the anomalies of a dataset using an optimized verision of classical processing based on the SD-EWMA algorithm. It is an optimized implementation of the CpTsSdEwma
algorithm using environment variables. It has been shown that in long datasets it can reduce runtime
by up to 50%. This algorithm is a novel method for covariate shift-detection tests based on a twostage structure for univariate time-series. This algorithm works in two phases. In the first phase,
it detects anomalies using the SD-EWMA CpSdEwma algorithm. In the second phase, it checks the
veracity of the anomalies using the Kolmogorov-Simirnov test to reduce false alarms.

Usage
OcpTsSdEwma(train.data, test.data, threshold, l = 3, m = 5)
Arguments
train.data

Numerical vector with the training set.

test.data

Numerical vector with the test set.

threshold

Error smoothing constant.

l

Control limit multiplier.

m

Length of the subsequences for applying the Kolmogorov-Smirnov test.

28

OcpTsSdEwma

Details
train.data and test.data must be numerical vectors without NA values. threshold must be
a numeric value between 0 and 1. It is recommended to use low values such as 0.01 or 0.05. By
default, 0.01 is used. Finally, l is the parameter that determines the control limits. By default, 3 is
used. m is the length of the subsequences for applying the Kolmogorov-Smirnov test. By default, 5
is used. It should be noted that the last m values will not been verified because another m values are
needed to be able to perform the verification.
Value
dataset conformed by the following columns:
is.anomaly

1 if the value is anomalous 0, otherwise.

ucl

Upper control limit.

lcl

Lower control limit.

References
Raza, H., Prasad, G., & Li, Y. (03 de 2015). EWMA model based shift-detection methods for
detecting covariate shifts in non-stationary environments. Pattern Recognition, 48(3), 659-669.
Examples
## Generate data
set.seed(100)
n <- 500
x <- sample(1:100, n, replace = TRUE)
x[70:90] <- sample(110:115, 21, replace = TRUE)
x[25] <- 200
x[320] <- 170
df=data.frame(timestamp=1:n,value=x)
## Calculate anomalies
result <- OcpTsSdEwma(
train.data = df[1:5,"value"],
test.data = df[6:n,"value"],
threshold = 0.01,
l = 3,
m = 20
)
res <- cbind(df[6:n,], result)
rownames(res) <- 1:(n-5)
## Plot results
res <- res[1:500,]
y.limits <- c(-150,250)
plot(x = res$timestamp, y = res$value, type = "l", ylim = y.limits,
xlab = "timestamp", ylab = "value", main = "TSSD-EWMA ANOMALY DETECTOR")
points(x = res[res$is.anomaly == 1, "timestamp"],
y = res[res$is.anomaly == 1, "value"], pch=4, col="red", lwd = 2)
par(new=TRUE)
plot(x = res$timestamp, y = res$ucl, type="l", col="red", xaxt="n",
ylim = y.limits, xlab = "", ylab = "")
par(new=TRUE)
plot(x = res$timestamp, y = res$lcl, type="l", col="red", xaxt="n",

OipPewma

29

ylim = y.limits, xlab = "", ylab = "")

OipPewma

Optimized Incremental Processing Probabilistic-EWMA (PEWMA).

Description
OipPewma is the optimized implementation of the IpPewma function using environmental variables.
It has been shown that in long datasets it can reduce runtime by up to 50%. This function allows
the calculation of anomalies using PEWMA in an incremental processing mode. This algorithm is a
probabilistic method of EWMA which dynamically adjusts the parameterization based on the probability of the given observation. This method produces dynamic, data-driven anomaly thresholds
which are robust to abrupt transient changes, yet quickly adjust to long-term distributional shifts.
Usage
OipPewma(data, alpha0 = 0.2, beta = 0, n.train = 5, l = 3,
last.res = NULL)
Arguments
data

Numerical vector with training and test dataset.

alpha0

Maximal weighting parameter.

beta

Weight placed on the probability of the given observation.

n.train

Number of points of the dataset that correspond to the training set.

l

Control limit multiplier.

last.res

Last result returned by the algorithm.

Details
data must be a numerical vector without NA values. alpha0 must be a numeric value where 0
< alpha0 < 1. If a faster adjustment to the initial shift is desirable, simply lowering alpha0 will
suffice. beta is the weight placed on the probability of the given observation. it must be a numeric
value where 0 <= beta <= 1. Note that beta equals 0, PEWMA converges to a standard EWMA.
Finally l is the parameter that determines the control limits. By default, 3 is used. last.res is the
last result returned by some previous execution of this algorithm. The first time the algorithm is
executed its value is NULL. However, to run a new batch of data without having to include it in the
old dataset and restart the process, the two parameters returned by the last run are only needed.
This algorithm can be used for both classical and incremental processing. It should be noted that in
case of having a finite dataset the CpPewma or OcpPewma algorithms are faster. Incremental processing can be used in two ways. 1) Processing all available data and saving last.res for future runs
in which there is new data. 2) Using the stream library for when there is too much data and it does
not fit into the memory. An example has been made for this use case.

30

OipPewma

Value
A list of the following items.
result

dataset conformed by the following columns.

• is.anomaly 1 if the value is anomalous 0, otherwise.
• ucl Upper control limit.
• lcl Lower control limit.
last.res

Last result returned by the algorithm. Is a dataset containing the parameters
calculated in the last iteration and necessary for the next one.

References
M. Carter, Kevin y W. Streilein. Probabilistic reasoning for streaming anomaly detection. 2012
IEEE Statistical Signal Processing Workshop (SSP), pp. 377-380, Aug 2012.
Examples
## EXAMPLE 1: ---------------------## It can be used in the same way as with OcpPewma passing the whole dataset as
## an argument.
## Generate data
set.seed(100)
n <- 500
x <- sample(1:100, n, replace = TRUE)
x[70:90] <- sample(110:115, 21, replace = TRUE)
x[25] <- 200
x[320] <- 170
df <- data.frame(timestamp=1:n,value=x)
## Calculate anomalies
result <- OipPewma(
data = df$value,
alpha0 = 0.8,
beta = 0,
n.train = 5,
l = 3,
last.res = NULL
)
res <- cbind(df, result$result)
## Plot results
res <- res[6:500,]
y.limits <- c(-150,250)
plot(x = res$timestamp, y = res$value, type = "l", ylim = y.limits,
xlab = "timestamp", ylab = "value", main = "PEWMA ANOMALY DETECTOR")
points(x = res[res$is.anomaly == 1, "timestamp"],
y = res[res$is.anomaly == 1, "value"], pch=4, col="red", lwd = 2)
par(new=TRUE)
plot(x = res$timestamp, y = res$ucl, type="l", col="green", xaxt="n",
ylim = y.limits, xlab = "", ylab = "")
par(new=TRUE)
plot(x = res$timestamp, y = res$lcl, type="l", col="green", xaxt="n",

OipPewma
ylim = y.limits, xlab = "", ylab = "")
## EXAMPLE 2: ---------------------## You can use it in an incremental way. This is an example using the stream
## library. This library allows the simulation of streaming operation.
# install.packages("stream")
library("stream")
## Generate data
set.seed(100)
n <- 500
x <- sample(1:100, n, replace = TRUE)
x[70:90] <- sample(110:115, 21, replace = TRUE)
x[25] <- 200
x[320] <- 170
df=data.frame(timestamp=1:n,value=x)
dsd_df <- DSD_Memory(df)
## Initialize parameters for the loop
last.res <- NULL
res <- NULL
nread <- 100
numIter <- n%/%nread
## Calculate anomalies
for(i in 1:numIter) {
# read new data
newRow <- get_points(dsd_df, n = nread, outofpoints = "ignore")
# calculate if it's an anomaly
last.res <- OipPewma(
data = newRow$value,
n.train = 5,
alpha0 = 0.8,
beta = 0,
l = 3,
last.res = last.res$last.res
)
# prepare the result
if(!is.null(last.res$result)){
res <- rbind(res, cbind(newRow, last.res$result))
}
}
## Plot results
res <- res[6:500,]
y.limits <- c(-150,250)
plot(x = res$timestamp, y = res$value, type = "l", ylim = y.limits,
xlab = "timestamp", ylab = "value", main = "PEWMA ANOMALY DETECTOR")
points(x = res[res$is.anomaly == 1, "timestamp"],
y = res[res$is.anomaly == 1, "value"], pch=4, col="red", lwd = 2)
par(new=TRUE)
plot(x = res$timestamp, y = res$ucl, type="l", col="green", xaxt="n",
ylim = y.limits, xlab = "", ylab = "")
par(new=TRUE)
plot(x = res$timestamp, y = res$lcl, type="l", col="green", xaxt="n",
ylim = y.limits, xlab = "", ylab = "")

31

32

OipSdEwma

OipSdEwma

Optimized Incremental Processing Shift-Detection based on EWMA
(SD-EWMA).

Description
OipSdEwma is the optimized implementation of the IpSdEwma function using environmental variables. This function allows the calculation of anomalies using SD-EWMA alogrithm in an incremental processing mode. It has been shown that in long datasets it can reduce runtime by up to
50%. SD-EWMA algorithm is a novel method for covariate shift-detection tests based on a twostage structure for univariate time-series. It works in an online mode and it uses an exponentially
weighted moving average (EWMA) model based control chart to detect the covariate shift-point in
non-stationary time-series.
Usage
OipSdEwma(data, n.train, threshold, l = 3, last.res = NULL)
Arguments
data

Numerical vector with training and test datasets.

n.train

Number of points of the dataset that correspond to the training set.

threshold

Error smoothing constant.

l

Control limit multiplier.

last.res

Last result returned by the algorithm.

Details
data must be a numerical vector without NA values. threshold must be a numeric value between
0 and 1. It is recommended to use low values such as 0.01 or 0.05. By default, 0.01 is used. l
is the parameter that determines the control limits. By default, 3 is used. Finally last.res is the
last result returned by some previous execution of this algorithm. The first time the algorithm is
executed its value is NULL. However, to run a new batch of data without having to include it in the
old dataset and restart the process, the two parameters returned by the last run are only needed.
This algorithm can be used for both classical and incremental processing. It should be noted that
in case of having a finite dataset the CpSdEwma or OcpSdEwma algorithms are faster. Incremental
processing can be used in two ways. 1) Processing all available data and saving last.res for
future runs in which there is new data. 2) Using the stream library for when there is too much data
and it does not fit into memory. An example has been made for this use case.
Value
A list of the following items.
result

dataset conformed by the following columns.

• is.anomaly 1 if the value is anomalous 0, otherwise.

OipSdEwma

33

• ucl Upper control limit.
• lcl Lower control limit.
last.res

Last result returned by the algorithm. Is a dataset containing the parameters
calculated in the last iteration and necessary for the next one.

References
Raza, H., Prasad, G., & Li, Y. (03 de 2015). EWMA model based shift-detection methods for
detecting covariate shifts in non-stationary environments. Pattern Recognition, 48(3), 659-669.
Examples
## EXAMPLE 1: ---------------------## It can be used in the same way as with OcpSdEwma passing the whole dataset as
## an argument.
## Generate data
set.seed(100)
n <- 500
x <- sample(1:100, n, replace = TRUE)
x[70:90] <- sample(110:115, 21, replace = TRUE)
x[25] <- 200
x[320] <- 170
df=data.frame(timestamp=1:n,value=x)
## Calculate anomalies
result <- OipSdEwma(
data = df$value,
n.train = 5,
threshold = 0.01,
l = 3
)
res <- cbind(df[6:n,], result$result)
rownames(res) <- 1:(n-5)
## Plot results
res <- res[1:500,]
y.limits <- c(-150,250)
plot(x = res$timestamp, y = res$value, type = "l", ylim = y.limits,
xlab = "timestamp", ylab = "value", main = "SD-EWMA ANOMALY DETECTOR")
points(x = res[res$is.anomaly == 1, "timestamp"],
y = res[res$is.anomaly == 1, "value"], pch=4, col="red", lwd = 2)
par(new=TRUE)
plot(x = res$timestamp, y = res$ucl, type="l", col="red", xaxt="n",
ylim = y.limits, xlab = "", ylab = "")
par(new=TRUE)
plot(x = res$timestamp, y = res$lcl, type="l", col="red", xaxt="n",
ylim = y.limits, xlab = "", ylab = "")
## EXAMPLE 2: ---------------------## You can use it in an incremental way. This is an example using the stream
## library. This library allows the simulation of streaming operation.
# install.packages("stream")
library("stream")

34

OipTsSdEwma
## Generate data
set.seed(100)
n <- 500
x <- sample(1:100, n, replace = TRUE)
x[70:90] <- sample(110:115, 21, replace = TRUE)
x[25] <- 200
x[320] <- 170
df=data.frame(timestamp=1:n,value=x)
dsd_df <- DSD_Memory(df)
## Initialize parameters for the loop
last.res <- NULL
res <- NULL
nread <- 100
numIter <- n%/%nread
## Calculate anomalies
for(i in 1:numIter) {
# read new data
newRow <- get_points(dsd_df, n = nread, outofpoints = "ignore")
# calculate if it's an anomaly
last.res <- OipSdEwma(
data = newRow$value,
n.train = 5,
threshold = 0.01,
l = 3,
last.res = last.res$last.res
)
# prepare the result
if(!is.null(last.res$result)){
res <- rbind(res, cbind(newRow[(nread-nrow(last.res$result)+1):nread,],
last.res$result))
}
}
print(res)
# plot
n.train <- 5
rownames(res) <- 1:(n-n.train)
res <- res[1:500,]
y.limits <- c(-150,250)
plot(x = res$timestamp, y = res$value, type = "l",
ylim = y.limits, xlab = "timestamp", ylab = "value",
main = "SD-EWMA ANOMALY DETECTOR")
points(x = res[res$is.anomaly == 1, "timestamp"],
y = res[res$is.anomaly == 1, "value"], pch=4, col="red", lwd = 2)
par(new=TRUE)
plot(x = res$timestamp, y = res$ucl, type="l", col="red", xaxt="n",
ylim = y.limits, xlab = "", ylab = "")
par(new=TRUE)
plot(x = res$timestamp, y = res$lcl, type="l", col="red", xaxt="n",
ylim = y.limits, xlab = "", ylab = "")

OipTsSdEwma
OipTsSdEwma

35

Optimized Incremental Processing Two-Stage Shift-Detection based
on EWMA

Description
OipTsSdEwma is the optimized implementation of the IpTsSdEwma function using environmental
variables. This function allows the calculation of anomalies using TSSD-EWMA in an incremental
processing mode. It has been shown that in long datasets it can reduce runtime by up to 50%. This
algorithm is a novel method for covariate shift-detection tests based on a two-stage structure for
univariate time-series. TSSD-EWMA works in two phases. In the first phase, it detects anomalies using the SD-EWMA CpSdEwma algorithm. In the second phase, it checks the veracity of the
anomalies using the Kolmogorov-Simirnov test to reduce false alarms.
Usage
OipTsSdEwma(data, n.train, threshold, l = 3, m = 5,
to.next.iteration = list(last.res = NULL, to.check = NULL, last.m = NULL))
Arguments
data

Numerical vector with training and test dataset.

n.train

Number of points of the dataset that correspond to the training set.

threshold

Error smoothing constant.

l

Control limit multiplier.

m
Length of the subsequences for applying the Kolmogorov-Smirnov test.
to.next.iteration
list with the necessary parameters to execute in the next iteration
Details
data must be a numerical vector without NA values. threshold must be a numeric value between
0 and 1. It is recommended to use low values such as 0.01 or 0.05. By default, 0.01 is used. Finally,
l is the parameter that determines the control limits. By default, 3 is used. m is the length of the
subsequences for applying the Kolmogorov-Smirnov test. By default, 5 is used. It should be noted
that the last m values have not been verified because you need other m values to be able to perform
the verification. Finally to.next.iteration is the last result returned by some previous execution
of this algorithm. The first time the algorithm is executed its value is NULL. However, to run a
new batch of data without having to include it in the old dataset and restart the process, the two
parameters returned by the last run are only needed.
Value
A list of the following items.
last.data.checked
Anomaly results of the last m results of the previous iteration. dataset conformed
by the following columns.
• is.anomaly 1 if the value is anomalous 0 otherwise.
• ucl Upper control limit.

36

OipTsSdEwma
• lcl Lower control limit.
checked.results

Anomaly results of the dataset excluding the last m values because they could
not be verified. dataset conformed by the following columns: is.anomaly, ucl,
lcl.
to.next.iteration
Last result returned by the algorithm. It is a list containing the following items.
• last.res Last result returned by the aplicaction of SD-EWMA function with the calculations
of the parameters of the last run . These are necessary for the next run.
• to.check Subsequence of the last remaining unchecked values to be checked in the next
iteration. dataset conformed by the following columns: is.anomaly, ucl, lcl, value.
• last.m Subsequence of the m values prior to the to.check subsecuence necessary to verify the
values in to.check.
References
Raza, H., Prasad, G., & Li, Y. (03 de 2015). EWMA model based shift-detection methods for
detecting covariate shifts in non-stationary environments. Pattern Recognition, 48(3), 659-669.
Examples
## EXAMPLE 1: ---------------------## It can be used in the same way as with OcpTsSdEwma passing the whole dataset
## as an argument.
## Generate data
set.seed(100)
n <- 500
x <- sample(1:100, n, replace = TRUE)
x[70:90] <- sample(110:115, 21, replace = TRUE)
x[25] <- 200
x[320] <- 170
df=data.frame(timestamp=1:n,value=x)
## Calculate anomalies
result <- OipTsSdEwma(
data = df$value,
n.train = 5,
threshold = 0.01,
l = 3,
m = 20
)
res <- cbind(df[6:n,], rbind(result$last.data.checked, result$checked.results,
result$to.next.iteration$to.check[, -4]))
rownames(res) <- 1:(n-5)
## Plot results
res <- res[1:500,]
y.limits <- c(-150,250)
plot(x = res$timestamp, y = res$value, type = "l", ylim = y.limits,
xlab = "timestamp", ylab = "value", main = "TSSD-EWMA ANOMALY DETECTOR")
points(x = res[res$is.anomaly == 1, "timestamp"],
y = res[res$is.anomaly == 1, "value"], pch=4, col="red", lwd = 2)
par(new=TRUE)

OipTsSdEwma
plot(x = res$timestamp,
ylim = y.limits, xlab
par(new=TRUE)
plot(x = res$timestamp,
ylim = y.limits, xlab

37
y = res$ucl, type="l", col="red", xaxt="n",
= "", ylab = "")
y = res$lcl, type="l", col="red", xaxt="n",
= "", ylab = "")

## EXAMPLE 2: ---------------------## You can use it in an incremental way. This is an example using the stream
## library. This library allows the simulation of streaming operation.
# install.packages("stream")
library("stream")
## Generate data
set.seed(100)
n <- 500
x <- sample(1:100, n, replace = TRUE)
x[70:90] <- sample(110:115, 21, replace = TRUE)
x[25] <- 200
x[320] <- 170
df=data.frame(timestamp=1:n,value=x)
dsd_df <- DSD_Memory(df)
## Initialize parameters for the loop
last.res <- NULL
res <- NULL
nread <- 100
numIter <- n%/%nread
m <- 20
## Calculate anomalies
for(i in 1:numIter) {
# read new data
newRow <- get_points(dsd_df, n = nread, outofpoints = "ignore")
# calculate if it's an anomaly
last.res <- OipTsSdEwma(
data = newRow$value,
n.train = 5,
threshold = 0.01,
l = 3,
m = m,
to.next.iteration = last.res$to.next.iteration
)
# prepare result
if(!is.null(last.res$last.data.checked)){
res <- rbind(res, cbind(last.timestamp, last.res$last.data.checked))
}
if(!is.null(last.res$checked.results)){
init <- nread - (nrow(last.res$checked.results) +
nrow(last.res$to.next.iteration$to.check)) + 1
end <- init + nrow(last.res$checked.results) - 1
res <- rbind(res, cbind(newRow[init:end,], last.res$checked.results))
}
if(i == numIter){
res <- rbind(res,
cbind(timestamp = newRow[
(nread - nrow(last.res$to.next.iteration$to.check)

38

Step.CAD
+ 1):nread, "timestamp"],
last.res$to.next.iteration$to.check))

}

}
last.timestamp <- newRow[(nread-m+1):nread,]

## Plot results
rownames(res) <- 1:(n-5)
y.limits <- c(-150,250)
plot(x = res$timestamp, y = res$value, type = "l", ylim = y.limits,
xlab = "timestamp", ylab = "value", main = "TSSD-EWMA ANOMALY DETECTOR")
points(x = res[res$is.anomaly == 1, "timestamp"],
y = res[res$is.anomaly == 1, "value"], pch=4, col="red", lwd = 2)
par(new=TRUE)
plot(x = res$timestamp, y = res$ucl, type="l", col="red", xaxt="n",
ylim = y.limits, xlab = "", ylab = "")
par(new=TRUE)
plot(x = res$timestamp, y = res$lcl, type="l", col="red", xaxt="n",
ylim = y.limits, xlab = "", ylab = "")

Step.CAD

Step operator for Contextual Anomaly Detector (ContextualAD)

Description
Step.CAD is an auxiliar method that makes the necessary changes in the local environment of the
anomaly detector
Usage
Step.CAD(datum, local.environment)
Arguments
datum

New datum in form of numeric vector corresponding to the bits that have been
activated.
local.environment
Local environment with the variables of the Contextual Anomaly Detector.
Details
datum must be a numeric vector.
Value
If left.or.right, number of new contex added tho the context.operator. Otherwise, the output
from UpdateContextsAndGetActive.CAD.
References
Smirnov, M. (2018). CAD: Contextual Anomaly Detector. https://github.com/smirmik/CAD

UpdateContextsAndGetActive.CAD

39

UpdateContextsAndGetActive.CAD
Update Contexts and Get Active - Contextual Anomaly Detector

Description
UpdateContextsAndGetActive.CAD is an auxiliar method for the Contextual Anomaly Detector
that reviews the list of previously selected left semi-contexts, creates the list of potentially new contexts resulted from intersection between zero-level contexts, determines the contexts that coincide
with the input data and require activation.
Usage
UpdateContextsAndGetActive.CAD(new.ctxt.flag, context.operator)
Arguments
new.ctxt.flag

flag indicating that a new zero-level context is not recorded at the current step,
which means that all contexts already exist and there is no need to create new
ones.
context.operator
Environment with the current status for the context operator.
Value
List of:
activeContexts List of identifiers of the contexts which completely coincide with the input
stream, should be considered active and be recorded to the input stream of “neurons”
potentialNewContextList
List of contexts based on intersection between the left and the right zero-level
semi-contexts, which are potentially new contexts requiring saving to the context
memory
References
Smirnov, M. (2018). CAD: Contextual Anomaly Detector. https://github.com/smirmik/CAD

Index
ContextCrosser.CAD, 2
ContextualAnomalyDetector, 3
CpKnnCad, 4, 12
CpPewma, 5, 16, 24, 29
CpSdEwma, 7, 8, 18, 21, 26, 27, 32, 35
CpTsSdEwma, 8, 27
GetAnomalyScore.CAD, 10
GetContextByFacts.CAD, 10
IntToBinarySens, 11
IpKnnCad, 12
IpPewma, 15
IpSdEwma, 18
IpTsSdEwma, 21
OcpPewma, 5, 16, 24, 29
OcpSdEwma, 7, 18, 25, 32
OcpTsSdEwma, 8, 27
OipPewma, 15, 29
OipSdEwma, 18, 32
OipTsSdEwma, 21, 34
Step.CAD, 38
UpdateContextsAndGetActive.CAD, 39

40



Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.5
Linearized                      : No
Page Count                      : 40
Page Mode                       : UseOutlines
Author                          : 
Title                           : 
Subject                         : 
Creator                         : LaTeX with hyperref package
Producer                        : pdfTeX-1.40.19
Create Date                     : 2018:07:19 17:17:14+02:00
Modify Date                     : 2018:07:19 17:17:14+02:00
Trapped                         : False
PTEX Fullbanner                 : This is pdfTeX, Version 3.14159265-2.6-1.40.19 (TeX Live 2018/Arch Linux) kpathsea version 6.3.0
EXIF Metadata provided by EXIF.tools

Navigation menu