Identify outliers using robust IQR method

Detects anomalous observations in a numeric vector using an IQR-based rule with enhanced robustness and optional verbose reporting. The cutoff threshold is scaled by a tunable factor $(0.15/\alpha)$.

Usage

iqr2(x, alpha = 0.05, max_anoms = 0.2, verbose = FALSE)

Arguments

x: A numeric vector.
alpha: A numeric (default: 0.05). Controls the sensitivity of outlier detection. Smaller values yield broader thresholds.
max_anoms: A numeric between 0 and 1 (default: 0.2). Maximum fraction of data points to be flagged as outliers.
verbose: Logical (default: FALSE). If TRUE, returns a detailed outlier report; otherwise returns a binary vector.

Value

If verbose = FALSE, returns an integer vector of 0s and 1s (same length as x), where 1 indicates an outlier.

If verbose = TRUE, returns a list with the following elements:

outlier: Binary vector of outlier flags (1 = outlier).
outlier_idx: Indices of detected outliers.
outlier_vals: Values of detected outliers.
outlier_direction: Direction of anomaly ("Up" or "Down").
critical_limits: Named vector with lower and upper bounds used for outlier detection.
outlier_report: Tibble containing values, limits, and direction annotations.

Details

This function is based on the IQR-based approach used in anomalize, but modifies the output to return binary flags (1 = outlier, 0 = normal) instead of string labels ("Yes", "No"). It also removes the dependency on pipe operators and expresses the logic using explicit data manipulation functions for clarity and standalone usage.

The detection threshold is defined as: $$[Q1 - (0.15 / \alpha) \cdot IQR, \; Q3 + (0.15 / \alpha) \cdot IQR]$$ where IQR = Q3 - Q1. Among points beyond this range, only the top max_anoms × length(x) are retained based on their magnitude of deviation.

References

This implementation is adapted from the anomalize::iqr() function: https://business-science.github.io/anomalize/index.html