Domain 7 · Lesson 1 of 6

Incident Response Lifecycle (NIST 800-61)

Quy trình Ứng phó Sự cố

NIST SP 800-61 Rev 2 — 6 Phases

Critical Order: Containment BEFORE Eradication

You must stop the damage (contain) before you can clean it up (eradicate). Acting out of order risks further spread.

Phase Tiếng Việt Key Activities
1Preparation Chuẩn bị IR plan, team (CIRT/CSIRT), runbooks, communication plan, tools ready, tabletop exercises
2Detection & Analysis Phát hiện & Phân tích Alert triage, log correlation, severity classification (P1–P4), scope determination
3Containment Ngăn chặn Short-term: isolate/block — stop the bleeding. Long-term: patch/harden for extended operation while investigation continues
4Eradication Loại bỏ Remove malware, close vulnerabilities, reset compromised credentials, check for persistence mechanisms
5Recovery Phục hồi Restore from clean backup, validate integrity, monitor closely, gradual return to production
6Post-Incident Activity Bài học kinh nghiệm Root cause analysis, lessons learned report, update IR plan, implement improvements

Incident Severity Levels

Level Description Response Time
P1 Critical Service down, data breach in progress Immediate (24/7)
P2 High Degraded service, security control bypassed Within 1 hour
P3 Medium Minor impact, potential threat Within 4 hours
P4 Low Informational, no immediate impact Next business day

P1 vs P2 Distinction

P1 = service is DOWN or breach is ACTIVE — existential risk. P2 = service is degraded or a security control has been bypassed but the situation is still containable without 24/7 mobilization.

Breach Notification Requirements

Timer starts from AWARENESS — not discovery

When your organization knew or should have known about the breach. "We found out last week but didn't report" is not a valid defense.

Regulation Notification Window Notify Who
GDPR (EU) 72 hours Supervisory authority (DPA); individuals if high risk
Philippines DPA 72 hours National Privacy Commission (NPC)
PCI-DSS Immediately Notify card brands (Visa, Mastercard) and acquiring bank
Vietnam As soon as possible Relevant government authority (Ministry of Public Security / VNISA)

IR Team Composition & Key Terms

CIRT / CSIRT Roles

  • Security: technical investigation lead
  • Legal: regulatory obligations, evidence handling
  • HR: insider threat, employee actions
  • PR/Communications: external messaging
  • Executive sponsor: business decisions (shutdown?)

Runbook vs Playbook

  • Runbook: detailed step-by-step technical procedure for a specific scenario (e.g., "PII breach in Platform C — exact commands to run")
  • Playbook: higher-level response flow and decision trees — the strategic guide
  • Runbooks live inside playbooks as technical annexes

Key Terms

CIRT CSIRT Incident Response Plan Runbook Playbook Containment Eradication Recovery Post-Incident P1/P2/P3/P4 Breach Notification
Exam Tips
  1. Phase order: Preparation → Detection → Containment → Eradication → Recovery → Lessons Learned. CONTAINMENT before ERADICATION — always.
  2. "First step after discovering a breach" → CONTAIN (not eradicate, not notify, not fully investigate)
  3. Breach notification timer starts from AWARENESS (when organization knew or should have known)
  4. Runbooks = detailed technical steps for a specific scenario; Playbooks = higher-level response flows
  5. IR team must include: security, legal, HR, PR/communications, executive sponsor — not just the security team
  6. Lessons Learned phase is REQUIRED for a complete IR process — it drives continuous improvement
Work Application — FinTech Company X PII Incident Mapped to IR Phases

1. Preparation: Did an IR plan exist before the incident? Was there a runbook for PII exposure? Gap: if not, create one now for top 5 Platform C incident types.

2. Detection: How was unencrypted PII discovered — a Datadog monitoring alert, an InfoSec audit, or a customer report? The detection method determines MTTD.

3. Containment: Production shutdown — this is correct. Short-term containment: stop the bleeding by halting the service. Platform C processes no new PII during investigation.

4. Eradication: Implement AES-256-CTR encryption for all PII fields in Platform C. Audit all Platform A tables for any other unencrypted PII. Patch the root cause (missing encryption-at-rest enforcement).

5. Recovery: InfoSec sign-off before redeployment — this is a formal validation gate. Correct practice. Gradual rollout with close monitoring in Datadog.

6. Lessons Learned: Was a post-incident report written? Was "encryption by default" added to Platform C architecture standards? Were Vault policies updated to prevent unencrypted storage?

Action: Create runbooks for top 5 Platform C incident types: PII breach, Kafka outage, eKYC service down, bank H2H failure, DDoS on public API.

Practice Quiz

Q1. Which IR phase comes immediately after detecting a breach — Containment or Eradication?

Reveal Answer
Containment comes first. You must stop the spread of the incident before you can clean it up (Eradication). Trying to eradicate while the threat is still active is like mopping the floor with the tap still running.
NIST 800-61 order: Detection → Containment → Eradication → Recovery. Containment isolates the affected systems so the damage doesn't grow. Only once the perimeter is secured can you safely remove the root cause (eradication). This is one of the most common trick questions on the CISSP exam — they will ask what comes "first" or "immediately after" detection.

Q2. The production shutdown in the FinTech Company X PII incident — which IR phase does this represent?

Reveal Answer
Containment — specifically Short-term Containment. The production shutdown stops Platform C from processing any further unencrypted PII while the investigation and eradication proceed.
Short-term containment = immediate action to stop the bleeding (isolate, shut down, block). Long-term containment = keeping systems stable while the full fix is developed (may mean running a reduced-functionality version). Production shutdown is a classic short-term containment response to a PII breach — it's aggressive but correct when data is actively at risk.

Q3. What is the GDPR breach notification timeline, and when does the clock start?

Reveal Answer
72 hours to notify the supervisory authority (Data Protection Authority). The clock starts from AWARENESS — when the organization became aware (or should have become aware) of the breach, not from initial system discovery.
This is a critical distinction: "discovery" means a tool flagged something; "awareness" means a responsible person in the organization understood a breach had occurred. An automated alert at 11pm that was reviewed and confirmed at 8am the next day means the 72-hour clock started at 8am (when humans became aware), not 11pm. Philippines DPA also uses 72 hours to NPC — same window, different authority.

Q4. What distinguishes a P1 Critical from a P2 High incident?

Reveal Answer
P1 = service is completely down OR a breach is actively in progress — requiring immediate 24/7 response. P2 = service is degraded or a security control has been bypassed, but the situation is contained enough for a 1-hour response window.
The key differentiator is severity of business impact and time sensitivity. P1 warrants waking up on-call engineers at 3am; P2 means paging the on-call but with a 1-hour SLA. For Bank A/Partner A lending SLAs, Platform C being completely unavailable is always P1 — every minute of downtime has a measurable revenue and SLA breach cost.

Q5. Is the Lessons Learned / Post-Incident Activity phase required, or optional?

Reveal Answer
Required for a complete IR process. The Lessons Learned phase is what prevents the same incident from recurring. Without it, you fix the immediate problem but never address the systemic weakness. Most compliance frameworks (ISO 27001, SOC 2) require documented post-incident reviews.
The CISSP exam treats Lessons Learned as a mandatory phase. In practice, it often gets skipped because teams are exhausted after containment/recovery — but this is an anti-pattern. For the FinTech Company X PII incident, a Lessons Learned report should document: what the root cause was, why encryption wasn't enforced by default, what controls failed, and what architecture standards were updated as a result.