Domain 7 Β· Lesson 3 of 6
Security Monitoring, SIEM & Threat Hunting
GiΓ‘m sΓ‘t BαΊ£o mαΊt, SIEM & SΔn lΓΉng Mα»i Δe dα»a
SIEM β Security Information and Event Management
SIEM Use Cases
- Brute force: many failed logins from one IP in 5 minutes
- Privilege escalation: admin role added at 3am by service account
- Lateral movement: service accessing unusual internal resources
- Data exfiltration: large outbound data transfer to unknown IP
Log Sources to Ingest
- Authentication events (login, MFA)
- Authorization decisions (access denied)
- Network flows and DNS queries
- Application errors and WAF blocks
- IDS/IPS alerts
Alert Fatigue
Too many false positive alerts β analysts become desensitized β real threats are ignored. Tune SIEM detection rules carefully. Target false positive rate below 20%. Alert fatigue is a security risk, not just an operational nuisance.
SOC (Security Operations Center) Tiers
| Tier | Role | Activity |
|---|---|---|
| Tier 1 | Alert Analyst | Triage and initial classification of alerts β real vs false positive, priority assignment |
| Tier 2 | Incident Responder | Deep investigation, log correlation, containment actions, escalate to T3 if needed |
| Tier 3 | Threat Hunter / Expert | Proactive threat hunting, advanced malware analysis, custom detection rule development, APT investigation |
Key Operational Metrics
| Metric | Full Name | Definition | Target |
|---|---|---|---|
| MTTD | Mean Time to Detect | From when the incident STARTED to when it was first detected | As low as possible |
| MTTR | Mean Time to Respond/Recover | From DETECTION to full resolution/recovery | As low as possible |
| FP Rate | False Positive Rate | Percentage of alerts that are not real threats | <20% ideal |
| Alert Vol | Alert Volume | Number of alerts per day requiring analyst attention | Manageable by team size |
MTTD vs MTTR β Critical Distinction
MTTD clock starts when the incident begins (not when the alert fires). An attacker who compromised a system 3 days ago but was only detected today = MTTD of 3 days. MTTR clock starts at detection and ends at full recovery.
Threat Hunting & Threat Intelligence
Threat Hunting β PROACTIVE
Search for threats already in the environment without waiting for alerts. Assumes breach has already occurred.
Hypothesis-driven: "Are there indicators of an APT in our Platform C network right now?" β search the data to find out.
Uses: MITRE ATT&CK framework β matrix of known adversary tactics and techniques
Incident Response β REACTIVE
Responds to an alert or report that something bad has happened. Triggered by detection.
Key difference: IR starts with a known alert; threat hunting starts with a hypothesis and searches for evidence.
IR = reactive to what happened. Threat hunting = proactive search for what might already be happening.
Threat Intelligence Types
| Type | Audience | Example |
|---|---|---|
| Strategic | Executives / CISO | "Nation-state actors targeting fintech companies in Southeast Asia" |
| Operational | SOC managers | "APT group X is running a campaign targeting Philippine banks this quarter" |
| Tactical | SOC analysts | "This malware family uses PowerShell with base64 encoding for C2 communication" |
| Technical | Systems / Tools (SIEM rules) | IOC: IP 1.2.3.4, hash abc123, malicious domain evil.com |
IOC β Indicators of Compromise
Evidence that a compromise has already occurred: known bad IP, malware hash, suspicious domain. Retrospective β looks at past activity.
IOA β Indicators of Attack
Behavioral patterns suggesting an attack is in progress: unusual privilege use, abnormal process spawning. Proactive β detects active attacks before IOCs exist.
Key Terms
- SIEM correlates events from multiple sources β correlation detects PATTERNS, not individual events
- MTTD: measured from when the incident STARTED (not when alert fired). An undetected breach of 3 days = MTTD of 3 days.
- MTTR: measured from DETECTION to RECOVERY (not from incident start)
- Threat hunting is PROACTIVE (assume breach, go searching); IR is REACTIVE (respond to alert)
- IOC = past evidence of compromise; IOA = current behavior suggesting active attack β IOA is more useful for catching attacks in progress
- Alert fatigue: too many false positive alerts β real alerts are ignored β security risk. Tune detection rules aggressively.
Gap audit β missing Datadog alerts:
- Brute force: Is there an alert for auth_failure_rate > 50/min? Target MTTD <2 minutes from attack start.
- Kafka lag: Is there an alert for consumer lag > 10,000 messages? Indicates processing backlog affecting loan decisions β potential SLA breach.
- Vault secret access: Is there an alert for Vault secret access by an unknown service or outside business hours? Indicates possible credential theft.
- Anomalous outbound transfer: Is there an alert for large data transfer from any Platform C pod to an external IP? Indicates possible exfiltration.
- Off-hours deployment: ArgoCD alert if deployment happens outside approved window (no deployments 10pmβ6am VN time without P1 justification).
MTTD target for Platform C P1: <15 minutes from incident start to Datadog alert firing. Current baseline: unknown β measure and track.
OpenTelemetry: Distributed traces across Platform C microservices create an audit trail for security analysis β unusual call patterns (e.g., a loan service calling the PII service 1000x in a minute) are visible in traces before showing up in SIEM.
Practice Quiz
Q1. What does a SIEM do with logs from multiple sources?
βΌ Reveal Answer
Q2. An attacker compromised Platform C on Monday, but the SOC only detected it Friday. What is the MTTD?
βΌ Reveal Answer
Q3. Threat hunting vs incident response β which is proactive and which is reactive?
βΌ Reveal Answer
Q4. Alert fatigue β what is the security risk it creates?
βΌ Reveal Answer
Q5. IOC vs IOA β which is more useful for detecting an active, ongoing attack?