Domain 7 · Lesson 3 of 6

Security Monitoring, SIEM & Threat Hunting

Giám sát Bảo mật, SIEM & Săn lùng Mối đe dọa

SIEM — Security Information and Event Management

Log Sources

→ Collect

Normalize

→

Correlate Events

→

Alert on Patterns

→

SOC Response

SIEM Use Cases

Brute force: many failed logins from one IP in 5 minutes
Privilege escalation: admin role added at 3am by service account
Lateral movement: service accessing unusual internal resources
Data exfiltration: large outbound data transfer to unknown IP

Log Sources to Ingest

Authentication events (login, MFA)
Authorization decisions (access denied)
Network flows and DNS queries
Application errors and WAF blocks
IDS/IPS alerts

Alert Fatigue

Too many false positive alerts → analysts become desensitized → real threats are ignored. Tune SIEM detection rules carefully. Target false positive rate below 20%. Alert fatigue is a security risk, not just an operational nuisance.

SOC (Security Operations Center) Tiers

Tier	Role	Activity
Tier 1	Alert Analyst	Triage and initial classification of alerts — real vs false positive, priority assignment
Tier 2	Incident Responder	Deep investigation, log correlation, containment actions, escalate to T3 if needed
Tier 3	Threat Hunter / Expert	Proactive threat hunting, advanced malware analysis, custom detection rule development, APT investigation

Key Operational Metrics

Metric	Full Name	Definition	Target
MTTD	Mean Time to Detect	From when the incident STARTED to when it was first detected	As low as possible
MTTR	Mean Time to Respond/Recover	From DETECTION to full resolution/recovery	As low as possible
FP Rate	False Positive Rate	Percentage of alerts that are not real threats	<20% ideal
Alert Vol	Alert Volume	Number of alerts per day requiring analyst attention	Manageable by team size

MTTD vs MTTR — Critical Distinction

MTTD clock starts when the incident begins (not when the alert fires). An attacker who compromised a system 3 days ago but was only detected today = MTTD of 3 days. MTTR clock starts at detection and ends at full recovery.

Threat Hunting & Threat Intelligence

Threat Hunting — PROACTIVE

Search for threats already in the environment without waiting for alerts. Assumes breach has already occurred.

Hypothesis-driven: "Are there indicators of an APT in our Platform C network right now?" → search the data to find out.

Uses: MITRE ATT&CK framework — matrix of known adversary tactics and techniques

Incident Response — REACTIVE

Responds to an alert or report that something bad has happened. Triggered by detection.

Key difference: IR starts with a known alert; threat hunting starts with a hypothesis and searches for evidence.

IR = reactive to what happened. Threat hunting = proactive search for what might already be happening.

Threat Intelligence Types

Type	Audience	Example
Strategic	Executives / CISO	"Nation-state actors targeting fintech companies in Southeast Asia"
Operational	SOC managers	"APT group X is running a campaign targeting Philippine banks this quarter"
Tactical	SOC analysts	"This malware family uses PowerShell with base64 encoding for C2 communication"
Technical	Systems / Tools (SIEM rules)	IOC: IP 1.2.3.4, hash abc123, malicious domain evil.com

IOC — Indicators of Compromise

Evidence that a compromise has already occurred: known bad IP, malware hash, suspicious domain. Retrospective — looks at past activity.

IOA — Indicators of Attack

Behavioral patterns suggesting an attack is in progress: unusual privilege use, abnormal process spawning. Proactive — detects active attacks before IOCs exist.

Key Terms

SIEM SOC Alert Fatigue MTTD MTTR Threat Hunting MITRE ATT&CK IOC IOA Strategic/Tactical Intelligence

Exam Tips

SIEM correlates events from multiple sources — correlation detects PATTERNS, not individual events
MTTD: measured from when the incident STARTED (not when alert fired). An undetected breach of 3 days = MTTD of 3 days.
MTTR: measured from DETECTION to RECOVERY (not from incident start)
Threat hunting is PROACTIVE (assume breach, go searching); IR is REACTIVE (respond to alert)
IOC = past evidence of compromise; IOA = current behavior suggesting active attack — IOA is more useful for catching attacks in progress
Alert fatigue: too many false positive alerts → real alerts are ignored → security risk. Tune detection rules aggressively.

Work Application — FinTech Company X Datadog Monitoring Gaps Audit

Gap audit — missing Datadog alerts:

Brute force: Is there an alert for auth_failure_rate > 50/min? Target MTTD <2 minutes from attack start.
Kafka lag: Is there an alert for consumer lag > 10,000 messages? Indicates processing backlog affecting loan decisions — potential SLA breach.
Vault secret access: Is there an alert for Vault secret access by an unknown service or outside business hours? Indicates possible credential theft.
Anomalous outbound transfer: Is there an alert for large data transfer from any Platform C pod to an external IP? Indicates possible exfiltration.
Off-hours deployment: ArgoCD alert if deployment happens outside approved window (no deployments 10pm–6am VN time without P1 justification).

MTTD target for Platform C P1: <15 minutes from incident start to Datadog alert firing. Current baseline: unknown — measure and track.

OpenTelemetry: Distributed traces across Platform C microservices create an audit trail for security analysis — unusual call patterns (e.g., a loan service calling the PII service 1000x in a minute) are visible in traces before showing up in SIEM.

Practice Quiz

Q1. What does a SIEM do with logs from multiple sources?

▼ Reveal Answer

A SIEM collects logs from multiple sources, normalizes them into a common format, correlates events across sources to identify patterns, and generates alerts when those patterns match known attack behaviors. It detects patterns that no single log source could reveal on its own.

The power of SIEM is correlation — a failed login might be noise, but 500 failed logins from the same IP against 20 different accounts in 5 minutes is a brute force attack pattern that SIEM rules can detect. Individual log sources (auth server, WAF, network) each see a piece; the SIEM sees the whole picture. This is why Datadog at FinTech Company X is valuable even if it's not a traditional SIEM — the principle of centralized log collection and correlation is the same.

Q2. An attacker compromised Platform C on Monday, but the SOC only detected it Friday. What is the MTTD?

▼ Reveal Answer

MTTD = 4 days. The clock starts from when the incident began (Monday), not when the alert fired (Friday). A 4-day MTTD means the attacker had undetected access for 4 days — significant time to exfiltrate data or establish persistence.

MTTD measures the gap between when an incident started and when it was detected. This is one of the most important security metrics because it directly measures how long an attacker is "in the dark" — operating undetected. Industry average MTTD for breaches is still measured in weeks to months. A MTTD of 4 days is actually better than average but still means 4 days of potential data exposure. Reducing MTTD requires better detection rules, more comprehensive log coverage, and proactive threat hunting.

Q3. Threat hunting vs incident response — which is proactive and which is reactive?

▼ Reveal Answer

Threat hunting is PROACTIVE — analysts hypothesize that a threat exists and actively search for evidence without waiting for an alert. Incident response is REACTIVE — it begins when an alert or report triggers the response process.

The key distinction is the trigger. IR trigger = an alert has fired. Threat hunting trigger = an analyst decides to look. Threat hunting operates on the "assume breach" model: even if no alerts have fired, an attacker might already be present. Threat hunters use MITRE ATT&CK to generate hypotheses about what attack techniques adversaries would use and then search logs/telemetry for evidence of those techniques. This is how you find the sophisticated attackers who know how to evade SIEM rules.

Q4. Alert fatigue — what is the security risk it creates?

▼ Reveal Answer

Alert fatigue occurs when analysts receive so many false positive alerts that they start ignoring all alerts — including real threats. The security risk is that a genuine attack alert gets dismissed as another false positive, and the breach goes undetected and unresponded to.

Alert fatigue is one of the biggest operational security failures in practice. If a SOC generates 10,000 alerts per day and 9,800 are false positives, analysts will develop a bias to dismiss everything. When the real attack alert appears, it looks like just another false positive. The solution is rigorous tuning: detect meaningful patterns, suppress noisy rules, and maintain a manageable alert volume. A well-tuned SIEM with 100 high-quality alerts per day is safer than a poorly tuned one with 10,000 noisy alerts.

Q5. IOC vs IOA — which is more useful for detecting an active, ongoing attack?

▼ Reveal Answer

IOA (Indicators of Attack) — because they detect behavioral patterns that indicate an attack is currently in progress, even before any IOCs (like known bad IPs or hashes) are available. IOAs are forward-looking; IOCs are backward-looking.

IOCs are signatures of past compromise: "this IP is in our threat feed," "this hash is known malware." IOCs are valuable for detecting known threats. IOAs are behavioral patterns: "this service is making an unusual number of authentication attempts," "a process is spawning child processes in an unusual sequence." IOAs catch attacks that use novel techniques or zero-day exploits where no IOCs exist yet. For sophisticated attackers targeting Platform C, IOAs (behavioral anomaly detection in Datadog) will detect them before traditional signature-based IOC matching would.

← Lesson 2: Digital Forensics Lesson 4: Change Management →