Data sources for evidence gathering
Four principles before any source
Before picking a source, four properties shape whether the evidence gathered will be usable later.
Comprehensive coverage
Visibility across endpoints, networks, identities, and cloud is what captures the full attack lifecycle. Gaps in any layer become blind spots that adversaries exploit.
Contextual analysis
Sources matter most when combined. Endpoint telemetry plus identity logs plus network flows produces a story none of them tell alone.
Historical depth
Sophisticated attacks unfold over weeks or months. Sources that retain only days of data cannot answer βdid this start before the alert?β Retention shapes what is even possible.
Data integrity
Evidence that may end up in legal or regulatory proceedings has to be tamper-evident and chain-of-custody preserved. Integrity is what makes findings durable.
The sources at a glance
π Security event logs
SIEM-collected events from across the environment.
π» EDR logs
Process creation, command lines, file changes at the endpoint level.
π Network traffic
NetFlow, DNS, proxy, firewall. What the wire saw.
π Identity and access
Authentication, SSO, role assumption, privilege grants.
βοΈ Cloud activity
CloudTrail, Azure Activity, GCP audit. API and configuration.
π§ Threat intelligence
IOCs and TTPs from external sources. Hypothesis fuel.
π‘οΈ Vulnerability & patch
What is exposed and what is patched.
π File integrity (FIM)
Tampering signals for critical files and configs.
π§ Email security
Inbound/outbound mail metadata, attachment scanning, URL rewriting.
ποΈ Application & database
Row-level access, query history, business-logic evidence.
π Asset & configuration
What exists, who owns it, what it does.
π« Data loss prevention
What sensitive data moved where.
π― Deception
Honeypot interactions: high-confidence adversary signal.
πΎ Backup & snapshot
When systems were last clean. Sabotage detection too.
π’ Physical security
Badge swipes, camera events. Insider correlation.
π Security event and incident logs
Logs from firewalls, IDS/IPS, web proxies, and SIEM platforms. They give the high-level view of network-wide security activity and are often the first indicator of anomalous behavior. They capture Port A communication endpoint used by protocols to send and receive data. scans, policy violations, Malware Software whose author intends harm: ransomware, trojans, worms, viruses, spyware, wipers, rootkits, RATs. The B.A.D. glossary catalogs the families in detail. alerts, Unauthorized Access Access to a system, network, or resource without proper authorization or permission. attempts, and correlation outputs.
β When to use
- Establishing the initial alert trail and identifying upstream/downstream events.
- Mapping relationships across systems based on alert correlation.
- Reviewing alert fidelity and triage metadata.
- Creating comprehensive timelines of security events across the environment.
- Identifying patterns that suggest coordinated attack campaigns.
β Limitations
- Alert volume includes significant noise and false-positives.
- Limited context without enrichment from other sources.
- Ingestion lag and misconfigured sources create timing gaps.
- Missing events when logging levels are not properly configured.
- Normalization challenges across heterogeneous security products.
π» Endpoint detection and response logs
EDR platforms monitor Endpoint A device that initiates network connections and runs user-facing software: laptop, desktop, server, phone, tablet. Endpoints are where most adversary tradecraft eventually shows up, which is why EDR exists. activity in real time. Process creation, parent-child relationships, registry changes, file system activity, network connections, memory Execution The attacker successfully runs malicious code on a system, typically using interpreters, scripts, payloads, or legitimate tools. . This data lets analysts reconstruct attacker behavior at a granular level, from initial execution to Lateral Movement Adversary traversal from the initial-access host to other hosts inside the environment. Each hop expands the blast radius and adds new entities for Subject analysis. Often piggybacks on legitimate authentication, which is what makes it hard to detect. and Persistence Mechanisms an adversary installs so their access survives reboots, password resets, and partial cleanups: Run keys, scheduled tasks, services, WMI subscriptions, browser extensions. Mature operators plant several anchors so removing one is not enough. .
Key EDR data points
- Process execution with full command-line arguments
- Parent-child Process Relationships The connections and interactions between different processes running on a system, including parent-child relationships and inter-process communication. and process-tree visualization
- File creation, modification, and deletion events
- Registry modifications and persistence-mechanism creation
- Network connection attempts with destination information
- Memory manipulation and code-injection attempts
- User Account A unique identity or profile used to authenticate and authorize access to a system or resource. activity and privilege-elevation events
β When to use
- Identifying execution chains and suspicious child processes.
- Detecting unauthorized file modifications or malicious payloads.
- Investigating persistence mechanisms or privilege misuse.
- Reconstructing attack timelines at the endpoint level.
- Validating suspicious network connections seen in other logs.
β Limitations
- Requires agent deployment and tuning for full visibility.
- May not record benign LOLBins used maliciously (wmic, rundll32).
- Limited or no coverage for unmanaged or BYOD endpoints.
- Performance impact on endpoints if not optimized.
- Storage and retention constraints for high-volume telemetry.
π Network traffic and communication logs
Network Telemetry Collection and transmission of security-relevant data from remote sources for monitoring and analysis. covers how data moves across internal and external environments: NetFlow Cisco-developed protocol for flow-based monitoring and anomaly detection; metadata about traffic sessions. , DNS, full packet capture (PCAP), proxy and web gateway logs. Analysts use it to trace lateral movement, identify C2 channels, detect Beaconing Periodic network communication from an infected host to a C2 server. , and examine Data Exfiltration The unauthorized transfer of data from a computer or network to an external location or system. . Network logs are essential for anomalies that endpoints cannot see, such as DGA Traffic The flow of data between devices, systems, or servers on a network. or external callbacks from stealthy malware.
β When to use
- Tracing the source or destination of suspicious traffic.
- Detecting exfiltration techniques (DNS tunneling, HTTPS uploads).
- Correlating DNS or proxy logs to identify attacker infrastructure.
- Mapping lateral movement patterns across the network.
- Identifying beaconing or periodic communication indicative of C2.
β Limitations
- Full PCAP is storage-intensive; retention is often short.
- TLS encryption obscures payload content without decryption.
- Requires baselining to distinguish normal from abnormal.
- Cloud and remote-work environments create visibility gaps.
- High-volume environments may require sampling.
π User identity and access management logs
Logs from identity providers (Active Directory, Okta Identity platform with SSO, MFA, lifecycle management, and federation across thousands of apps. , Azure AD), SSO platforms, MFA systems, and directory services. Login attempts, access denials, role escalations, Password A secret word or phrase used to authenticate a user or system. resets, Token A small piece of data or code that is used to authenticate or authorize access to a system or resource. activity. Critical for account compromise, Password Spraying Trying a small number of common passwords against many accounts to avoid lockouts. , session hijacking, and unauthorized access.
User authentication
Login attempts, successes, failures, lockouts across identity systems. Reveals brute force or credential stuffing.
Privilege management
Role assignments, permission changes, elevation requests. The path of any privilege-escalation attempt.
MFA activities
Enrollment, verification, bypass attempts. The trail of sophisticated account-takeover tactics.
Session management
Token issuance, validation, expiration. Helps detect token theft and session manipulation.
β When to use
- Validating whether access to sensitive systems was legitimate.
- Detecting anomalies (logins at odd hours or unfamiliar locations).
- Investigating failed authentication, lockouts, or MFA bypass.
- Tracking privilege escalation across user accounts.
- Correlating identity events with endpoint and network activity.
β Limitations
- Often no detail on activity inside applications post-authentication.
- Federated/hybrid environments fragment session tracking.
- Inconsistent logging formats across identity platforms.
- Limited historical context for behavior baselines.
- Hard to distinguish legitimate from malicious in some scenarios.
βοΈ Cloud activity and configuration logs
Cloud-native logs: AWS CloudTrail Records and logs API activity within an AWS account, the audit trail for security and forensic work. , Azure Activity Logs, GCP Audit Logs. API calls, configuration changes, login activity, resource provisioning. Foundational for detecting cloud misconfigurations, account abuse, Privilege Escalation Gaining access at a higher trust level than the actor originally held, by any means: exploiting a bug, abusing a misconfiguration, stealing credentials, impersonating a token, or socially engineering an elevation. , and exposure of cloud storage or compute.
β When to use
- Tracking IAM changes (user/role creation, key usage, permission escalation).
- Auditing bucket and compute resource policies.
- Investigating provisioning of new cloud resources.
- Detecting exfiltration through cloud storage or automation.
- Reconstructing actions across multiple cloud accounts.
β Limitations
- Logging may not be enabled by default; needs explicit configuration.
- Interpretation requires familiarity with each providerβs services.
- Rapid provisioning/de-provisioning complicates timeline reconstruction.
- Logging granularity varies across services.
- Cost considerations limit retention and collection scope.
π§ Threat intelligence and IOC feeds
Structured data on known malicious indicators: domains, IPs, Hashes Cryptographic functions that generate fixed-size values representing digital data, used for file integrity verification and malware identification. , URLs. Commercial, community, or internal. Matching internal activity against these indicators lets analysts assess credibility quickly and link activity to known malware families or Threat An actor (or capability) with intent and means to cause harm. A vulnerability is what they exploit; risk is the product of threat, vulnerability, and impact. actors.
β When to use
- Enriching indicators (hash, domain) with threat-actor attribution.
- Prioritizing activity involving known malicious infrastructure.
- Correlating internal observations with campaign data or bulletins.
- Expanding searches via related indicators in the same campaign.
- Supporting executive communication on threat context and impact.
β Limitations
- Includes outdated or overused indicators (sandbox IPs, expired domains).
- False-positives from benign reuse of known infrastructure.
- Presence of an IOC does not confirm compromise; requires validation.
- Coverage gaps for emerging threats or sophisticated actors.
- Hard to operationalize at scale without automation.
π‘οΈ Vulnerability and patch management logs
Logs from scanners ( Qualys Cloud-based vulnerability management, policy compliance, and web app scanning across enterprise estates. , Tenable Vulnerability management platform; Nessus-based scanning with on-prem, cloud, and hybrid coverage. , Nexpose) and Patch Management The process of applying updates and fixes to software and systems to repair vulnerabilities and improve security. (Microsoft Configuration Manager (MECM, formerly SCCM), WSUS). Exposed vulnerabilities, patch application status, security baselines. Paired with Threat Intelligence Evidence-based knowledge about existing or emerging threats, including context, mechanisms, indicators, implications, and actionable advice. and Exploit A technique or piece of code that turns a vulnerability into actual capability: remote code execution, privilege escalation, authentication bypass. A vulnerability without an exploit is theoretical; a vulnerability with one is operational. telemetry, they help identify at-risk systems and likely exploit vectors.
Key vulnerability data points
- CVE identifiers and Vulnerability A defect in a system that can be turned into adversary capability if paired with an exploit and exposure: an unpatched CVE, a misconfiguration, a default credential, a logic flaw. Vulnerability without exposure or exploitability is latent; with both, it's a finding. descriptions
- CVSS scores and severity ratings
- Exploit availability and complexity assessment
- Affected software versions and components
- Remediation status and patch application history
- Asset exposure duration (time since vulnerability discovery)
- Compensating controls and Mitigation Action that reduces a risk without eliminating it: a compensating control, a workaround, a temporary block. Distinct from remediation, which fixes the underlying issue. status
β When to use
- Determining if exploited CVEs are present in the environment.
- Verifying which systems are unpatched or missing mitigations.
- Supporting risk-based prioritization of remediation.
- Identifying likely attack vectors based on known vulnerabilities.
- Assessing exploitability of systems involved in incidents.
β Limitations
- Scan coverage may be incomplete or outdated.
- Vulnerability databases produce false-positives.
- Patch status does not always equal exploit prevention.
- Limited visibility into zero-day or freshly disclosed vulns.
- Hard to correlate vulnerability data with actual exploitation evidence.
π File integrity monitoring (FIM)
FIM tools record unauthorized or unexpected changes to files, directories, registries, and configurations. They are used to detect tampering with critical system files, injection of malicious binaries, or modification of application code. They work by comparing files against known baselines and alerting on deviations.
β When to use
- Detecting unauthorized changes to system binaries or config files.
- Identifying tampering attempts post-intrusion.
- Confirming presence of malware implants or backdoors.
- Tracking modifications to sensitive application code.
- Validating whether critical security controls were disabled.
β Limitations
- Noisy if baselines are not properly tuned.
- Does not always distinguish legitimate from malicious changes.
- Real-time monitoring can be resource-intensive.
- Blind to memory-only or fileless attacks.
- Hard to maintain in dynamic environments with frequent change.
π§ Email security logs
Logs from Secure Email Gateways (SEGs), spam filters, and Phishing Deceptive messages (usually email; sometimes SMS, voice, or chat) that impersonate a trusted sender to lure the recipient into clicking, opening, or entering credentials. The bait is the email; the line is the impersonation; the catch is initial access. -detection platforms. Sender domains, attachment hashes, headers, click-through behavior on malicious links. Email is still one of the top initial-access vectors; these logs are invaluable for tracing the origin of many incidents.
Key email security data points
- Sender information (email, Domain A unique name or identifier for a system, network, or organization on the internet. , IP, Geolocation The process of determining the physical location of a device or IP address. )
- Authentication results (SPF, DKIM, DMARC)
- Subject lines and content snippets
- Attachment details (filename, type, size, hash)
- Embedded URL information and reputation scores
- User An individual who interacts with a system, network, or application. interaction data (opens, clicks, forwards, replies)
- Detection verdicts and quarantine actions
β When to use
- Tracing suspected phishing campaigns or credential harvesting.
- Reviewing user interaction with malicious links or attachments.
- Identifying targeted users or high-risk email behaviors.
- Correlating email-based attacks with other security events.
- Assessing scope and impact of email-based threats.
β Limitations
- Limited visibility into content without sandboxing or deep inspection.
- Missed threats in encrypted attachments or spoofed headers.
- Logs may not track behavior outside the gateway (forwarding).
- Sophisticated social engineering can lack obvious indicators.
- Personal email on corporate devices is often invisible.
ποΈ Application and database logs
Application and database logs offer visibility into use and abuse of business-critical systems. Transaction anomalies, privilege escalations, SQL Injection Code injection where malicious SQL is inserted into input fields to manipulate backend databases. attempts, unauthorized queries. Particularly useful for investigating data breaches or access to sensitive records.
Authentication events
Login attempts, session management, role and permission changes, account lockouts, password resets.
Data access activities
Query execution, record CRUD operations, bulk exports, access to sensitive fields.
System operations
Configuration changes, backups and restores, schema modifications, integration API calls.
β When to use
- Investigating activity within custom or sensitive applications.
- Detecting unauthorized queries or privilege escalation.
- Confirming access to regulated or sensitive data sets.
- Tracing the business impact of technical compromises.
- Establishing evidence for breach notification or compliance.
β Limitations
- Logging is inconsistent and application-specific.
- Volume and format may need specialized parsing.
- Often not integrated into centralized SIEM workflows.
- Variable logging levels omit critical security events.
- Limited historical retention for performance reasons.
π Configuration management and asset inventory
ServiceNow CMDB, Lansweeper IT asset discovery and inventory; CMDB foundations and continuous visibility into devices, software, and users. , AWS Config. Current and historical asset state: ownership, software versions, hardware specs, business functions. Analysts use these to assess Asset Criticality The level of importance or sensitivity of an asset, which determines the level of protection it requires. , validate baselines, and map IPs/hostnames to physical or virtual systems.
Key asset data elements
- System identifiers (hostname, IP, MAC, asset tag)
- Hardware specifications and physical location
- Installed software inventory and versions
- Operating system details and patch levels
- Ownership and responsible teams
- Business criticality and data classification
- Network segment and security zone information
- Relationships and dependencies with other systems
β When to use
- Identifying asset owners for follow-up or scope assessment.
- Verifying whether a system is production, test, or abandoned.
- Understanding intended vs. actual configuration state.
- Determining system criticality and potential business impact.
- Mapping network addresses to physical or virtual assets.
β Limitations
- CMDB entries become stale without regular maintenance.
- Gaps in asset tagging significantly reduce effectiveness.
- Often disconnected from real-time security event data.
- Limited visibility into cloud or shadow IT resources.
- Difficult to maintain in highly dynamic environments.
π« Data loss prevention (DLP)
DLP platforms monitor and control Sensitive Data Information that is confidential, proprietary, or regulated, such as personal data, financial information, or intellectual property. movement across endpoints, email, cloud, and network. They Log A record of events, transactions, or activities in a system or network. when protected information (PII, IP, source code, financial data) is accessed, transferred, or potentially exfiltrated.
π§ Email DLP
Outbound email and attachment monitoring. Detects sensitive content leaving via corporate or personal mail channels.
π» Endpoint DLP
File operations, clipboard use, device transfers on workstations. Identifies unauthorized data handling at the user level.
π Network DLP
Inspects traffic for sensitive content. Catches unauthorized transfers over web upload, FTP, or other protocols.
βοΈ Cloud DLP
Monitors data movement to and between cloud services. Detects inappropriate sharing or storage of sensitive information.
β When to use
- Confirming whether sensitive data was accessed or exfiltrated.
- Reviewing policy violations related to data movement.
- Investigating unusual outbound transfers or clipboard use.
- Assessing the scope and impact of data breaches.
- Identifying unauthorized cloud-service usage.
β Limitations
- High false-positive rate from generic rule matching.
- Can be bypassed by encrypted or obfuscated channels.
- Requires well-tuned policies to avoid alert fatigue.
- Limited visibility into unmanaged devices or networks.
- Performance impact when scanning large volumes.
π― Deception and honeypot logs
Deception platforms (Canary, Proofpoint Identity Threat Defense (formerly Illusive), Commvault ThreatWise (formerly TrapX)) deploy decoys throughout the environment: fake credentials, deceptive files, memory lures, simulated services. Because these decoys have no business use, interaction with them strongly indicates malicious activity. High-fidelity, low False-Positive (definition missing) alerts.
π Credential honeypots
Fake user accounts, tokens, or access keys. Triggered alerts on authentication use reveal credential harvesting or brute force.
π₯οΈ System decoys
Simulated servers and workstations that appear legitimate. Detect lateral movement and reconnaissance with high confidence.
π File lures
Deceptive documents with enticing names placed strategically. Identify data theft attempts and reveal attacker interests.
βοΈ Application honeypots
Fake services that emulate production. Capture attacker techniques without risk to real assets.
β When to use
- Detecting lateral movement and credential harvesting.
- Triggering high-fidelity alerts on credential reuse or scanning.
- Assessing attacker intent through decoy interactions.
- Gathering intelligence on attacker TTPs.
- Creating detection for stealthy threats.
β Limitations
- Requires realistic integration to avoid detection by attackers.
- Rarely deployed comprehensively due to cost or complexity.
- Attackers may avoid honeypots if they suspect their presence.
- Maintenance overhead to keep decoys believable.
- Limited effect on fully automated or scripted attacks.
πΎ Backup and snapshot metadata
Veeam Backup and recovery across virtual, physical, and cloud workloads with cloud-mobility options. , Druva Cloud-native, agentless data protection for endpoints, data center, and cloud workloads. , AWS Backup, EBS snapshots, Azure disk snapshots. Backup Metadata Data about data: file timestamps, owner, size, hash; an email's headers; a process's parent, command line, and signing certificate. In triage, metadata is often more diagnostic than the content itself. is both a forensic resource and a recovery lifeline. Investigators use it for historical context, detect sabotage attempts (deleted or encrypted backups, a common Ransomware Malicious software that encrypts a victim's files and demands payment (usually cryptocurrency) for the decryption key. Modern ransomware operations typically pair encryption with data theft, threatening public release if the ransom is not paid (double extortion). tactic), and check whether backup repositories were accessed.
Key backup metadata elements
- Backup job execution times and status
- Systems and data included in each backup
- Backup storage locations and retention periods
- Verification and validation results
- Access logs for backup management interfaces
- Restoration testing activity
- Modifications to backup policies or schedules
β When to use
- Confirming whether clean backups exist pre-compromise.
- Detecting attempts to delete or encrypt backups.
- Assessing recovery timelines.
- Identifying forensic artifacts in historical snapshots.
- Determining the earliest known clean state for affected systems.
β Limitations
- Visibility limited to protected systems and storage.
- May not include granular file-level logs.
- Often managed by separate IT/Ops teams.
- Coverage gaps for certain systems.
- Incomplete metadata for cloud-based or offsite backups.
π’ Physical security and access control logs
Badge swipes, biometric authentication, visitor records, video surveillance, secure-zone access. These logs add a dimension digital telemetry cannot. By establishing the physical location of individuals at specific times, they help validate or refute hypotheses about who performed certain actions.
β When to use
- Correlating login times with building entry logs.
- Investigating physical theft or insider access.
- Validating whether activity was local vs. remote.
- Confirming presence or absence during security events.
- Identifying unauthorized physical access to secure areas.
β Limitations
- Limited integration with SOC tools.
- May not be real-time or centralized.
- Access policies vary widely across facilities.
- Badge sharing and tailgating compromise accuracy.
- Disparate systems across geographies.
Next up
Threat intelligence
Indicator matching, attribution, TTP-driven hunting, validation. How to use intel as a hypothesis source, not a verdict.
Read threat intelligence