Schema normalization

Why alert fatigue is a schema problem

Analysts face a flood of alerts from multiple systems, each using distinct field names, timestamp conventions, and nested structures. Without Normalization The process of transforming data into a standard format to improve analysis and comparison. :

๐Ÿ”Š

Signal buried in noise

Critical indicators get lost in irrelevant fields and inconsistent naming.

โฑ๏ธ

Investigations slow down

Analysts spend cycles reconciling field formats instead of investigating.

๐Ÿ“ˆ

MTTR inflates

Mean time to response grows linearly with alert volume.

A malicious PowerShell A command-line shell and scripting language built on the .NET framework, commonly used for system administration and potentially for malicious purposes. Execution The attacker successfully runs malicious code on a system, typically using interpreters, scripts, payloads, or legitimate tools. detected by an EDR generates an alert with proprietary fields. The related network connection logged by a Firewall A network security system that controls and monitors incoming and outgoing traffic based on predetermined security rules. or cloud proxy appears in a completely different format. Linking them manually requires effort that scales with volume. Volume keeps going up.


The schemas that matter

Three open standards are worth knowing. Implementations differ; the goal is the same.

Rising standard

OCSF

Open Cybersecurity Schema Framework. Vendor-neutral, cloud-friendly, increasingly the default for cross-tool integration. If your platform supports it, prioritize it.

Mature

STIX 2.1

Structured Threat Information Expression. Designed for sharing threat intelligence. Strong for indicators and TTPs. Less ergonomic for raw telemetry.

Action-oriented

OpenC2

Open Command and Control. Designed for expressing response actions in a portable way. Pairs well with STIX for indicators and OCSF for events.


Core normalization principles

Whatever schema you adopt, normalization maps diverse vendor-specific fields into four buckets:

01

Detection metadata

Alert name, detection logic, severity, confidence. Timestamps standardized to UTC in ISO 8601.

02

Affected entities

Hostnames, IPs, user accounts, cloud assets, containers. Anything the alert references.

03

Primary observables

File hashes, domains, URLs, registry keys, IoC Indicator of Compromise. An artifact that suggests intrusion: a file hash, domain, IP, registry key, or behavioral pattern. IoCs feed signature detection and post-event correlation. . The artifacts to correlate against.

04

Contextual data

MITRE ATT&CK mappings, enrichment tags, kill-chain stage, threat-intelligence annotations.

Normalization also requires flattening nested structures where useful and harmonizing identifiers so automated cross-tool correlation works.


Operational benefits

โœ“

Holistic visibility. Events from endpoint, network, and cloud correlate into a unified timeline.

โœ“

Faster response. Standardized structures let SOAR triage, enrich, and escalate at speed.

โœ“

Advanced analytics. Consistent fields make ML, anomaly detection, and threat hunting possible.

โœ“

Improved scoring. Standardized observables get reliable threat-intelligence enrichment.

โœ“

Simpler reporting. Predictable fields streamline dashboards and compliance reports.


Governance and sustainability

Normalization is not a one-time project. As new data sources arrive, as vendors update their formats, as new schemas emerge, the normalization layer needs maintenance.

Tools that help: Logstash Server-side pipeline that ingests, transforms, and forwards data; central to the Elastic Stack. , Fluentd Open-source data collector that unifies log collection and processing across distributed systems via plugins. , Cribl Data routing and processing for observability pipelines; filter, enrich, and reduce log data before forwarding. , and the native normalization capabilities in modern SIEMs.

Within ASSURED, prioritize normalization of detection Metadata Data about data: file timestamps, owner, size, hash; an email's headers; a process's parent, command line, and signing certificate. In triage, metadata is often more diagnostic than the content itself. , affected entities, and primary observables. Those are the fields downstream phases depend on.

Next up

Alert working example

Two cases worked side by side through the Alert phase: a real phishing intrusion and a developer-workstation false-positive. Both threads continue on every later chapter's example.

See the worked examples