Anomaly-based detection

How it works

An anomaly-detection engine observes a stream of Telemetry Collection and transmission of security-relevant data from remote sources for monitoring and analysis. over time and constructs a baseline of typical behavior for the entities it sees. The baseline can be statistical (a distribution of login times, packet sizes, or process counts), machine-learned (clustering, Isolation Forests A machine learning algorithm used for anomaly detection by isolating anomalies in data. , autoencoders), or a hybrid of the two. Once the baseline is stable, the engine flags any observation that falls far enough outside the baseline to be considered an outlier.

The defining property of the family is that it does not need a prior example of the Threat An actor (or capability) with intent and means to cause harm. A vulnerability is what they exploit; risk is the product of threat, vulnerability, and impact. . Signature detection needs to have seen the Malware Software whose author intends harm: ransomware, trojans, worms, viruses, spyware, wipers, rootkits, RATs. The B.A.D. glossary catalogs the families in detail. before. Anomaly detection only needs to know what your environment usually does. That makes anomaly the strongest family against truly novel attacks, and the noisiest family during periods of organizational change.

What anomaly detection catches

๐Ÿ•’ Unusual authentication times

An administrative login at 03:00 from an account that historically logs in between 09:00 and 17:00. The deviation is from this userโ€™s historical pattern, not a global rule.

๐Ÿ“ค Atypical data transfer volumes

A file server moves 2 GB outbound when its baseline is single-digit megabytes per day. The traffic itself is over an allowed channel; the volume is the deviation.

โš™๏ธ Abnormal process behavior

powershell.exe launching python.exe on an endpoint where that combination has never been observed. Both binaries are legitimate; the relationship is what is new.

๐Ÿšช Authentication pattern shifts

Rapid lateral authentication attempts across multiple endpoints within seconds. Each attempt is legitimate-looking; the rate and breadth are anomalous.

Representative platforms: Microsoft Defender Antivirus, endpoint protection, EDR (Defender for Endpoint), and XDR ties to Sentinel and Entra ID. XDR (UEBA), Darktrace Self-learning AI; builds behavioral baselines, detects anomalies, autonomously responds via Antigena. , Exabeam Next-gen SIEM with built-in UEBA; user-behavior timelines and automated investigation workflows. Advanced Analytics, Elastic Security Unified security on the Elastic Stack; endpoint protection, prebuilt detections, and customizable monitoring. ML, Securonix.


Example: a 2 AM file transfer

Walk through a typical anomaly alert

A UEBA platform produces the following alert:

Alert: Data transfer volume deviation
Severity: Medium
Confidence: 64%
Entity: srv-fileshare-01
Observation: 2.1 GB outbound to ext-archive[.]example over HTTPS
Baseline: median 6 MB/day, 99th percentile 80 MB/day
Window: 02:14 - 02:31 UTC

The analystโ€™s first question is which family fired. The โ€œConfidence: 64%โ€ and the baseline statistics make it obvious this is an anomaly engine. The engine is reporting a real deviation, not a verdict on whether the deviation is malicious.

What the analyst checks next:

  • Is the destination legitimate? Look up ext-archive[.]example in DNS history, threat intel, and asset inventory. A long-standing partner is one story; a recently registered domain is another.
  • Is there a corresponding business reason? Check change management, scheduled backup jobs, and ITSM tickets for the 02:14 window. A documented backup window neutralizes the alert.
  • Is the process source consistent? The transfer must have been initiated by a process on the file Server A computer or device that provides services, resources, or data to other devices or systems on a network. . Correlating process telemetry from that window will identify whether a known backup agent or an unexpected process started the transfer.
  • Are there sibling anomalies? Did the same server also show unusual authentication or process activity in the same window? Multiple anomalies converging on one Entity A person, system, or organization that interacts with or affects a security incident. is a stronger signal than any single one.

Anomaly detection is rarely decisive on its own. The job is to gather the corroborating context that either explains the deviation away or escalates it.


Strengths and limitations

Strengths

  • Catches zero-day attacks and novel techniques the signature world has no name for yet.
  • Adapts to environment-specific patterns. Every organization's 'normal' is different, and anomaly engines learn that on the ground.
  • Identifies subtle or gradual changes that single-event detection would miss.
  • Resilient to evasion tactics that target static signatures or static rules.

Limitations

  • High false-positive rate during baseline-learning periods, environment shifts, mergers, migrations, or new tool rollouts.
  • Computationally expensive. Sustained ML inference on streaming telemetry has real cost.
  • Requires ongoing tuning. A baseline that has not been refreshed in six months starts to lie about what normal is.
  • Decision logic can be opaque. ML-driven alerts can be hard to explain to stakeholders or auditors.

Operational considerations

  • Allow a meaningful baseline period before trusting alerts. Several weeks of stable telemetry is a reasonable starting point. Less than that and the baseline is noise.
  • Monitor for baseline drift during operational change. A merger, a new SaaS rollout, or a major upgrade will all invalidate prior baselines temporarily.
  • Tune sensitivity continuously. Anomaly detection is never โ€œset and forget.โ€ Treat the engine like a living asset that needs review.
  • Pair with corroborating data sources. Anomaly alerts are most useful when they can be joined to identity, process, and network telemetry inside a single investigation surface.

Next up

Rule-based detection

The family that encodes institutional knowledge: if-then logic combining multiple indicators into scenarios worth flagging.

Read rule-based