Detect anomalies in a data frame column using robust statistical methods
anomalize2.RdApplies outlier detection to a specified column of a data frame using either the IQR-based or robust GESD-based method. Returns the original data frame augmented with anomaly indicators and detection thresholds.
Usage
anomalize2(
data,
target,
method = c("iqr", "gesd"),
alpha = 0.05,
max_anoms = 0.2,
verbose = FALSE
)Arguments
- data
A
data.frameortibble. Input dataset containing a numeric column to be analyzed.- target
Unquoted column name. Target column to apply anomaly detection to.
- method
A character. Outlier detection method. One of
"iqr"or"gesd".- alpha
A numeric (default: 0.05). Significance level used in the detection threshold.
- max_anoms
A numeric between 0 and 1 (default: 0.2). Maximum fraction of data points allowed to be anomalous.
- verbose
Logical (default:
FALSE). IfTRUE, returns a list including detailed results.
Value
If verbose = FALSE, returns the input data frame with:
<target>_l1Lower bound of anomaly detection range.
<target>_l2Upper bound of anomaly detection range.
anomalyBinary indicator: 1 for anomaly, 0 otherwise.
If verbose = TRUE, returns a list with:
Details
This function is adapted from anomalize’s anomalize() and internally uses
custom implementations iqr2() and gesd2() for robust outlier detection.
Unlike the original version, this implementation:
Returns binary flags (
0or1) for anomalies instead of"Yes"/"No"strings.Is implemented without pipe operators for simplicity and compatibility.
The two available methods are:
IQR: Anomaly detection based on interquartile range thresholds.
GESD: Generalized Extreme Studentized Deviate test with robust statistics (median, MAD).
References
Adapted from anomalize::anomalize():
https://business-science.github.io/anomalize/reference/anomalize.html