Skip to contents

Applies the Generalized Extreme Studentized Deviate (GESD) test for detecting one or more outliers in a univariate numeric vector, using a robust formulation with median and MAD.

Usage

gesd2(x, alpha = 0.05, max_anoms = 0.2, verbose = FALSE)

Arguments

x

A numeric vector.

alpha

A numeric (default: 0.05). Significance level for detecting outliers.

max_anoms

A numeric between 0 and 1 (default: 0.2). Maximum fraction of values that may be flagged as outliers.

verbose

Logical (default: FALSE). If TRUE, returns a detailed outlier report.

Value

If verbose = FALSE, returns an integer vector (same length as x) with 0 for normal values and 1 for outliers.

If verbose = TRUE, returns a list with:

outlier

Binary vector of 0/1 flags.

outlier_idx

Indices of detected outliers.

outlier_vals

Values of detected outliers.

outlier_direction

Direction of anomaly ("Up" or "Down").

critical_limits

Named vector with lower and upper bounds.

outlier_report

A tibble summarizing the detection statistics.

Details

This function implements a robust version of the GESD procedure, replacing mean and standard deviation with median and MAD (median absolute deviation), as commonly used in anomaly detection for heavy-tailed or skewed data.

At each iteration, it removes the most extreme observation (with highest robust z-score), recalculates the test statistic, and compares it to a dynamically computed critical value. Observations are reported as outliers only if their z-statistics exceed the threshold.

This implementation is adapted from anomalize's gesd() method. The return type has been simplified to use binary flags (1 = outlier, 0 = normal), and the function is implemented without using pipe operators for clarity and compatibility.

References

Adapted from anomalize::gesd(): https://business-science.github.io/anomalize/reference/gesd.html

Original method: Rosner, B. (1983). “Percentage points for a generalized ESD many-outlier procedure.” Technometrics.