Skip to contents

Applies outlier detection to a specified column of a data frame using either the IQR-based or robust GESD-based method. Returns the original data frame augmented with anomaly indicators and detection thresholds.

Usage

anomalize2(
  data,
  target,
  method = c("iqr", "gesd"),
  alpha = 0.05,
  max_anoms = 0.2,
  verbose = FALSE
)

Arguments

data

A data.frame or tibble. Input dataset containing a numeric column to be analyzed.

target

Unquoted column name. Target column to apply anomaly detection to.

method

A character. Outlier detection method. One of "iqr" or "gesd".

alpha

A numeric (default: 0.05). Significance level used in the detection threshold.

max_anoms

A numeric between 0 and 1 (default: 0.2). Maximum fraction of data points allowed to be anomalous.

verbose

Logical (default: FALSE). If TRUE, returns a list including detailed results.

Value

If verbose = FALSE, returns the input data frame with:

<target>_l1

Lower bound of anomaly detection range.

<target>_l2

Upper bound of anomaly detection range.

anomaly

Binary indicator: 1 for anomaly, 0 otherwise.

If verbose = TRUE, returns a list with:

anomalized_tbl

Augmented data frame as described above.

anomaly_details

Full output from iqr2() or gesd2().

Details

This function is adapted from anomalize’s anomalize() and internally uses custom implementations iqr2() and gesd2() for robust outlier detection.

Unlike the original version, this implementation:

  • Returns binary flags (0 or 1) for anomalies instead of "Yes"/"No" strings.

  • Is implemented without pipe operators for simplicity and compatibility.

The two available methods are:

  • IQR: Anomaly detection based on interquartile range thresholds.

  • GESD: Generalized Extreme Studentized Deviate test with robust statistics (median, MAD).

References

Adapted from anomalize::anomalize(): https://business-science.github.io/anomalize/reference/anomalize.html

See also

anomalize, iqr, gesd