Skip to contents

Applies DBSCAN clustering to anomalous points based on time and observed value.

Usage

run_dbscan(
  anom.df,
  time_col = "GPS",
  val_col = "observed",
  eps = 0.01,
  minPts = 1,
  cluster.col = "cluster",
  ...
)

Arguments

anom.df

A data.frame or tibble containing anomaly detection results, including at least columns for time, observed value, and a binary anomaly flag.

time_col

A character string. Name of the time column (default: "GPS").

val_col

A character string. Name of the observed value column (default: "observed").

eps

A numeric. Maximum neighborhood radius for DBSCAN (default: 0.01).

minPts

An integer. Minimum number of points to form a cluster (default: 1).

cluster.col

A character string. Column name for cluster ID assignment (default: "cluster").

...

Additional arguments passed to dbscan::dbscan().

Value

A modified version of anom.df with a new cluster.col column indicating cluster membership. Anomalies not assigned to any cluster will receive a value of 0. Non-anomalous points will be assigned NA.

Details

Only rows where anomaly == 1L are included in the clustering. Clustering is performed on a 2D space defined by the specified time and value columns.

Examples

if (FALSE) { # \dontrun{
df <- tibble::tibble(
    GPS = seq(0, 1, length.out = 100),
    observed = sin(2 * pi * GPS * 5),
    anomaly = sample(0:1, 100, replace = TRUE)
)
run_dbscan(df, eps = 0.02)
} # }