Data sources for evidence gathering

Four principles before any source

Before picking a source, four properties shape whether the evidence gathered will be usable later.

Comprehensive coverage

Visibility across endpoints, networks, identities, and cloud is what captures the full attack lifecycle. Gaps in any layer become blind spots that adversaries exploit.

Contextual analysis

Sources matter most when combined. Endpoint telemetry plus identity logs plus network flows produces a story none of them tell alone.

Historical depth

Sophisticated attacks unfold over weeks or months. Sources that retain only days of data cannot answer “did this start before the alert?” Retention shapes what is even possible.

Data integrity

Evidence that may end up in legal or regulatory proceedings has to be tamper-evident and chain-of-custody preserved. Integrity is what makes findings durable.

The sources at a glance

📋 Security event logs

SIEM-collected events from across the environment.

💻 EDR logs

Process creation, command lines, file changes at the endpoint level.

🌐 Network traffic

NetFlow, DNS, proxy, firewall. What the wire saw.

🔑 Identity and access

Authentication, SSO, role assumption, privilege grants.

☁️ Cloud activity

CloudTrail, Azure Activity, GCP audit. API and configuration.

🧠 Threat intelligence

IOCs and TTPs from external sources. Hypothesis fuel.

📁 File integrity (FIM)

Tampering signals for critical files and configs.

📧 Email security

Inbound/outbound mail metadata, attachment scanning, URL rewriting.

🗄️ Application & database

Row-level access, query history, business-logic evidence.

🍯 Deception

Honeypot interactions: high-confidence adversary signal.

💾 Backup & snapshot

When systems were last clean. Sabotage detection too.

🏢 Physical security

Badge swipes, camera events. Insider correlation.

📋 Security event and incident logs

Logs from firewalls, IDS/IPS, web proxies, and SIEM platforms. They give the high-level view of network-wide security activity and are often the first indicator of anomalous behavior. They capture scans, policy violations, alerts, attempts, and correlation outputs.

✓ When to use

Establishing the initial alert trail and identifying upstream/downstream events.
Mapping relationships across systems based on alert correlation.
Reviewing alert fidelity and triage metadata.
Creating comprehensive timelines of security events across the environment.
Identifying patterns that suggest coordinated attack campaigns.

✗ Limitations

Alert volume includes significant noise and false-positives.
Limited context without enrichment from other sources.
Ingestion lag and misconfigured sources create timing gaps.
Missing events when logging levels are not properly configured.
Normalization challenges across heterogeneous security products.

💻 Endpoint detection and response logs

EDR platforms monitor activity in real time. Process creation, parent-child relationships, registry changes, file system activity, network connections, memory . This data lets analysts reconstruct attacker behavior at a granular level, from initial execution to and .

Key EDR data points

Process execution with full command-line arguments
Parent-child and process-tree visualization
File creation, modification, and deletion events
Registry modifications and persistence-mechanism creation
Network connection attempts with destination information
Memory manipulation and code-injection attempts
activity and privilege-elevation events

✓ When to use

Identifying execution chains and suspicious child processes.
Detecting unauthorized file modifications or malicious payloads.
Investigating persistence mechanisms or privilege misuse.
Reconstructing attack timelines at the endpoint level.
Validating suspicious network connections seen in other logs.

✗ Limitations

Requires agent deployment and tuning for full visibility.
May not record benign LOLBins used maliciously (wmic, rundll32).
Limited or no coverage for unmanaged or BYOD endpoints.
Performance impact on endpoints if not optimized.
Storage and retention constraints for high-volume telemetry.

🌐 Network traffic and communication logs

Network covers how data moves across internal and external environments: , DNS, full packet capture (PCAP), proxy and web gateway logs. Analysts use it to trace lateral movement, identify C2 channels, detect , and examine . Network logs are essential for anomalies that endpoints cannot see, such as DGA or external callbacks from stealthy malware.

✓ When to use

Tracing the source or destination of suspicious traffic.
Detecting exfiltration techniques (DNS tunneling, HTTPS uploads).
Correlating DNS or proxy logs to identify attacker infrastructure.
Mapping lateral movement patterns across the network.
Identifying beaconing or periodic communication indicative of C2.

✗ Limitations

Full PCAP is storage-intensive; retention is often short.
TLS encryption obscures payload content without decryption.
Requires baselining to distinguish normal from abnormal.
Cloud and remote-work environments create visibility gaps.
High-volume environments may require sampling.

🔑 User identity and access management logs

Logs from identity providers (Active Directory, , Azure AD), SSO platforms, MFA systems, and directory services. Login attempts, access denials, role escalations, resets, activity. Critical for account compromise, , session hijacking, and unauthorized access.

User authentication

Privilege management

Role assignments, permission changes, elevation requests. The path of any privilege-escalation attempt.

MFA activities

Enrollment, verification, bypass attempts. The trail of sophisticated account-takeover tactics.

Session management

Token issuance, validation, expiration. Helps detect token theft and session manipulation.

✓ When to use

Validating whether access to sensitive systems was legitimate.
Detecting anomalies (logins at odd hours or unfamiliar locations).
Investigating failed authentication, lockouts, or MFA bypass.
Tracking privilege escalation across user accounts.
Correlating identity events with endpoint and network activity.

✗ Limitations

Often no detail on activity inside applications post-authentication.
Federated/hybrid environments fragment session tracking.
Inconsistent logging formats across identity platforms.
Limited historical context for behavior baselines.
Hard to distinguish legitimate from malicious in some scenarios.

☁️ Cloud activity and configuration logs

Cloud-native logs: , Azure Activity Logs, GCP Audit Logs. API calls, configuration changes, login activity, resource provisioning. Foundational for detecting cloud misconfigurations, account abuse, , and exposure of cloud storage or compute.

✓ When to use

Tracking IAM changes (user/role creation, key usage, permission escalation).
Auditing bucket and compute resource policies.
Investigating provisioning of new cloud resources.
Detecting exfiltration through cloud storage or automation.
Reconstructing actions across multiple cloud accounts.

✗ Limitations

Logging may not be enabled by default; needs explicit configuration.
Interpretation requires familiarity with each provider’s services.
Rapid provisioning/de-provisioning complicates timeline reconstruction.
Logging granularity varies across services.
Cost considerations limit retention and collection scope.

🧠 Threat intelligence and IOC feeds

Structured data on known malicious indicators: domains, IPs, , URLs. Commercial, community, or internal. Matching internal activity against these indicators lets analysts assess credibility quickly and link activity to known malware families or actors.

✓ When to use

Enriching indicators (hash, domain) with threat-actor attribution.
Prioritizing activity involving known malicious infrastructure.
Correlating internal observations with campaign data or bulletins.
Expanding searches via related indicators in the same campaign.
Supporting executive communication on threat context and impact.

✗ Limitations

Includes outdated or overused indicators (sandbox IPs, expired domains).
False-positives from benign reuse of known infrastructure.
Presence of an IOC does not confirm compromise; requires validation.
Coverage gaps for emerging threats or sophisticated actors.
Hard to operationalize at scale without automation.

🛡️ Vulnerability and patch management logs

Logs from scanners ( , , Nexpose) and (Microsoft Configuration Manager (MECM, formerly SCCM), WSUS). Exposed vulnerabilities, patch application status, security baselines. Paired with and telemetry, they help identify at-risk systems and likely exploit vectors.

Key vulnerability data points

CVE identifiers and descriptions
CVSS scores and severity ratings
Exploit availability and complexity assessment
Affected software versions and components
Remediation status and patch application history
Asset exposure duration (time since vulnerability discovery)
Compensating controls and status

✓ When to use

Determining if exploited CVEs are present in the environment.
Verifying which systems are unpatched or missing mitigations.
Supporting risk-based prioritization of remediation.
Identifying likely attack vectors based on known vulnerabilities.
Assessing exploitability of systems involved in incidents.

✗ Limitations

Scan coverage may be incomplete or outdated.
Vulnerability databases produce false-positives.
Patch status does not always equal exploit prevention.
Limited visibility into zero-day or freshly disclosed vulns.
Hard to correlate vulnerability data with actual exploitation evidence.

📁 File integrity monitoring (FIM)

FIM tools record unauthorized or unexpected changes to files, directories, registries, and configurations. They are used to detect tampering with critical system files, injection of malicious binaries, or modification of application code. They work by comparing files against known baselines and alerting on deviations.

✓ When to use

Detecting unauthorized changes to system binaries or config files.
Identifying tampering attempts post-intrusion.
Confirming presence of malware implants or backdoors.
Tracking modifications to sensitive application code.
Validating whether critical security controls were disabled.

✗ Limitations

Noisy if baselines are not properly tuned.
Does not always distinguish legitimate from malicious changes.
Real-time monitoring can be resource-intensive.
Blind to memory-only or fileless attacks.
Hard to maintain in dynamic environments with frequent change.

📧 Email security logs

Logs from Secure Email Gateways (SEGs), spam filters, and -detection platforms. Sender domains, attachment hashes, headers, click-through behavior on malicious links. Email is still one of the top initial-access vectors; these logs are invaluable for tracing the origin of many incidents.

Key email security data points

Sender information (email, , IP, )
Authentication results (SPF, DKIM, DMARC)
Subject lines and content snippets
Attachment details (filename, type, size, hash)
Embedded URL information and reputation scores
interaction data (opens, clicks, forwards, replies)
Detection verdicts and quarantine actions

✓ When to use

Tracing suspected phishing campaigns or credential harvesting.
Reviewing user interaction with malicious links or attachments.
Identifying targeted users or high-risk email behaviors.
Correlating email-based attacks with other security events.
Assessing scope and impact of email-based threats.

✗ Limitations

Limited visibility into content without sandboxing or deep inspection.
Missed threats in encrypted attachments or spoofed headers.
Logs may not track behavior outside the gateway (forwarding).
Sophisticated social engineering can lack obvious indicators.
Personal email on corporate devices is often invisible.

🗄️ Application and database logs

Application and database logs offer visibility into use and abuse of business-critical systems. Transaction anomalies, privilege escalations, attempts, unauthorized queries. Particularly useful for investigating data breaches or access to sensitive records.

Authentication events

Data access activities

Query execution, record CRUD operations, bulk exports, access to sensitive fields.

System operations

Configuration changes, backups and restores, schema modifications, integration API calls.

✓ When to use

Investigating activity within custom or sensitive applications.
Detecting unauthorized queries or privilege escalation.
Confirming access to regulated or sensitive data sets.
Tracing the business impact of technical compromises.
Establishing evidence for breach notification or compliance.

✗ Limitations

Logging is inconsistent and application-specific.
Volume and format may need specialized parsing.
Often not integrated into centralized SIEM workflows.
Variable logging levels omit critical security events.
Limited historical retention for performance reasons.

📚 Configuration management and asset inventory

ServiceNow CMDB, , AWS Config. Current and historical asset state: ownership, software versions, hardware specs, business functions. Analysts use these to assess , validate baselines, and map IPs/hostnames to physical or virtual systems.

Key asset data elements

System identifiers (hostname, IP, MAC, asset tag)
Hardware specifications and physical location
Installed software inventory and versions
Operating system details and patch levels
Ownership and responsible teams
Business criticality and data classification
Network segment and security zone information
Relationships and dependencies with other systems

✓ When to use

Identifying asset owners for follow-up or scope assessment.
Verifying whether a system is production, test, or abandoned.
Understanding intended vs. actual configuration state.
Determining system criticality and potential business impact.
Mapping network addresses to physical or virtual assets.

✗ Limitations

CMDB entries become stale without regular maintenance.
Gaps in asset tagging significantly reduce effectiveness.
Often disconnected from real-time security event data.
Limited visibility into cloud or shadow IT resources.
Difficult to maintain in highly dynamic environments.

🚫 Data loss prevention (DLP)

DLP platforms monitor and control movement across endpoints, email, cloud, and network. They when protected information (PII, IP, source code, financial data) is accessed, transferred, or potentially exfiltrated.

📧 Email DLP

Outbound email and attachment monitoring. Detects sensitive content leaving via corporate or personal mail channels.

💻 Endpoint DLP

File operations, clipboard use, device transfers on workstations. Identifies unauthorized data handling at the user level.

🌐 Network DLP

Inspects traffic for sensitive content. Catches unauthorized transfers over web upload, FTP, or other protocols.

☁️ Cloud DLP

Monitors data movement to and between cloud services. Detects inappropriate sharing or storage of sensitive information.

✓ When to use

Confirming whether sensitive data was accessed or exfiltrated.
Reviewing policy violations related to data movement.
Investigating unusual outbound transfers or clipboard use.
Assessing the scope and impact of data breaches.
Identifying unauthorized cloud-service usage.

✗ Limitations

High false-positive rate from generic rule matching.
Can be bypassed by encrypted or obfuscated channels.
Requires well-tuned policies to avoid alert fatigue.
Limited visibility into unmanaged devices or networks.
Performance impact when scanning large volumes.

🍯 Deception and honeypot logs

Deception platforms (Canary, Proofpoint Identity Threat Defense (formerly Illusive), Commvault ThreatWise (formerly TrapX)) deploy decoys throughout the environment: fake credentials, deceptive files, memory lures, simulated services. Because these decoys have no business use, interaction with them strongly indicates malicious activity. High-fidelity, low alerts.

🔑 Credential honeypots

Fake user accounts, tokens, or access keys. Triggered alerts on authentication use reveal credential harvesting or brute force.

🖥️ System decoys

Simulated servers and workstations that appear legitimate. Detect lateral movement and reconnaissance with high confidence.

📄 File lures

Deceptive documents with enticing names placed strategically. Identify data theft attempts and reveal attacker interests.

⚙️ Application honeypots

Fake services that emulate production. Capture attacker techniques without risk to real assets.

✓ When to use

Detecting lateral movement and credential harvesting.
Triggering high-fidelity alerts on credential reuse or scanning.
Assessing attacker intent through decoy interactions.
Gathering intelligence on attacker TTPs.
Creating detection for stealthy threats.

✗ Limitations

Requires realistic integration to avoid detection by attackers.
Rarely deployed comprehensively due to cost or complexity.
Attackers may avoid honeypots if they suspect their presence.
Maintenance overhead to keep decoys believable.
Limited effect on fully automated or scripted attacks.

💾 Backup and snapshot metadata

, , AWS Backup, EBS snapshots, Azure disk snapshots. Backup is both a forensic resource and a recovery lifeline. Investigators use it for historical context, detect sabotage attempts (deleted or encrypted backups, a common tactic), and check whether backup repositories were accessed.

Key backup metadata elements

Backup job execution times and status
Systems and data included in each backup
Backup storage locations and retention periods
Verification and validation results
Access logs for backup management interfaces
Restoration testing activity
Modifications to backup policies or schedules

✓ When to use

Confirming whether clean backups exist pre-compromise.
Detecting attempts to delete or encrypt backups.
Assessing recovery timelines.
Identifying forensic artifacts in historical snapshots.
Determining the earliest known clean state for affected systems.

✗ Limitations

Visibility limited to protected systems and storage.
May not include granular file-level logs.
Often managed by separate IT/Ops teams.
Coverage gaps for certain systems.
Incomplete metadata for cloud-based or offsite backups.

🏢 Physical security and access control logs

Badge swipes, biometric authentication, visitor records, video surveillance, secure-zone access. These logs add a dimension digital telemetry cannot. By establishing the physical location of individuals at specific times, they help validate or refute hypotheses about who performed certain actions.

✓ When to use

Correlating login times with building entry logs.
Investigating physical theft or insider access.
Validating whether activity was local vs. remote.
Confirming presence or absence during security events.
Identifying unauthorized physical access to secure areas.

✗ Limitations

Limited integration with SOC tools.
May not be real-time or centralized.
Access policies vary widely across facilities.
Badge sharing and tailgating compromise accuracy.
Disparate systems across geographies.

A Alert S Subject S Scope U Uncover R Risk E Escalation D Documentation

Next up

Threat intelligence

Indicator matching, attribution, TTP-driven hunting, validation. How to use intel as a hypothesis source, not a verdict.

Read threat intelligence

Data sources for evidence gathering

Four principles before any source

Comprehensive coverage

Contextual analysis

Historical depth

Data integrity

The sources at a glance

📋 Security event logs

💻 EDR logs

🌐 Network traffic

🔑 Identity and access

☁️ Cloud activity

🧠 Threat intelligence

🛡️ Vulnerability & patch

📁 File integrity (FIM)

📧 Email security

🗄️ Application & database

📚 Asset & configuration

🚫 Data loss prevention

🍯 Deception

💾 Backup & snapshot

🏢 Physical security

📋 Security event and incident logs

✓ When to use

✗ Limitations

💻 Endpoint detection and response logs

✓ When to use

✗ Limitations

🌐 Network traffic and communication logs

✓ When to use

✗ Limitations

🔑 User identity and access management logs

User authentication

Privilege management

MFA activities

Session management

✓ When to use

✗ Limitations

☁️ Cloud activity and configuration logs

✓ When to use

✗ Limitations

🧠 Threat intelligence and IOC feeds

✓ When to use

✗ Limitations

🛡️ Vulnerability and patch management logs

✓ When to use

✗ Limitations

📁 File integrity monitoring (FIM)

✓ When to use

✗ Limitations

📧 Email security logs

✓ When to use

✗ Limitations

🗄️ Application and database logs

Authentication events

Data access activities

System operations

✓ When to use

✗ Limitations

📚 Configuration management and asset inventory

✓ When to use

✗ Limitations

🚫 Data loss prevention (DLP)

📧 Email DLP

💻 Endpoint DLP

🌐 Network DLP

☁️ Cloud DLP

✓ When to use

✗ Limitations

🍯 Deception and honeypot logs

🔑 Credential honeypots

🖥️ System decoys

📄 File lures

⚙️ Application honeypots

✓ When to use

✗ Limitations

💾 Backup and snapshot metadata

✓ When to use

✗ Limitations

🏢 Physical security and access control logs