Domain 6 · Lesson 5 of 5

Security Audits, SOC 2 & Compliance Testing

Kiểm toán Bảo mật & SOC 2

Audit Types

Internal (First-Party)

Self-assessment. Lowest trust level. Useful for preparation and gap identification but cannot satisfy external compliance requirements.

External (Third-Party)

Independent auditor. Higher trust. Required for SOC 2, PCI-DSS, ISO 27001 certification. Auditor has no financial relationship with the organization.

Regulatory (Government Mandate)

Government-required. Not optional. Examples: BSP VAPT (Philippines), SBV inspection (Vietnam), PCI-DSS QSA for card payments. Required before go-live.

SOC Reports (AICPA)

Report	Focus	Audience	Coverage
SOC 1	Financial reporting controls (ICFR)	Financial auditors	Internal Controls over Financial Reporting
SOC 2 Type 1	Security/Availability/Integrity/Confidentiality/Privacy controls — design	Customers, prospects	Controls are designed appropriately at a point in time (snapshot)
SOC 2 Type 2 ⭐	Same Trust Services Criteria — but OPERATING over time	Customers, prospects	Controls operated effectively over 6–12 months — gold standard
SOC 3	Public summary of SOC 2 findings	General public	Trust seal only — no detailed testing results; used for marketing

SOC 2 Type 2 = Gold Standard

Type 1 proves controls were designed correctly at one point in time. Type 2 proves they actually worked in practice over 6–12 months. Enterprise customers and regulators require Type 2. Always ask vendors for Type 2.

BCP/DR Test Types — Least to Most Disruptive

Checklist Review

Review the DR plan on paper. No systems involved. Confirms plan exists and is complete. Zero disruption.

No disruption

Tabletop Exercise

Discussion-based scenario walkthrough with managers and key staff. No systems activated. Teams discuss what they would do step by step. Cheapest and least disruptive real test.

None

Simulation

Formal practice walkthrough — more structured than tabletop but still no actual failover. Teams simulate activating the plan without touching production systems.

Low

Parallel Test

Recovery systems are activated while production continues running simultaneously. Both operate in parallel. Validates recovery capability without risking production. No user-visible disruption.

Medium

Full Interruption Test

Production is ACTUALLY failed over to recovery systems. Most realistic proof of RTO/RPO. Production goes down for the duration. Requires executive approval and change management. Use rarely and with careful planning.

High

Security Metrics — MTTD & MTTR

MTTD — Mean Time to Detect

Time from incident start to when the security team first detects it. Measures quality of monitoring and alerting. Lower = better visibility.

If attack starts at 14:00 and alert fires at 14:23 → MTTD = 23 minutes

MTTR — Mean Time to Recover

Time from detection to full system recovery/service restoration. Measures response process and tooling effectiveness. Lower = better response.

If detected at 14:23 and recovered at 15:45 → MTTR = 82 minutes

Continuous Monitoring (NIST SP 800-137)

Ongoing automated assessment of security controls — not just periodic audits. Automated scanning, log analysis, alert thresholds, drift detection. Reduces MTTD by catching issues before an attacker can cause significant damage.

Key Terms

SOC 1 SOC 2 Type 1 SOC 2 Type 2 SOC 3 Trust Services Criteria Tabletop Exercise Parallel Test Full Interruption Test MTTD MTTR Continuous Monitoring BSP VAPT

Exam Tips

SOC 2 Type 2 > Type 1 — Type 2 proves controls operated effectively over 6–12 months. Type 1 is a point-in-time design review only.
Tabletop exercise = cheapest and least disruptive DR test — discussion only, no systems activated.
Full interruption test = most realistic DR test but causes actual production downtime — use with caution and executive approval.
MTTD = time from incident START to detection (lower = better monitoring). MTTR = time from detection to recovery (lower = better response).
BSP VAPT = regulatory audit — not optional, required before go-live for new financial products in the Philippines.

Work Application — FinTech Company X Vendor SOC 2 & Monitoring

Vendor SOC 2 requirement: Require SOC 2 Type 2 from all Tier 1 processors annually — eKYC Vendor, AML Vendor, Card Processor, eSign Vendor. SOC 2 Type 1 from a vendor is acceptable for initial onboarding but require Type 2 within 12 months. Vendor without any SOC 2 = higher due diligence required (detailed questionnaire + right-to-audit clause in contract).

Platform C SOC 2 target: SOC 2 Type 2 certification by 2027 — requires 6+ months of evidence gathering from when controls are documented and operating. Start now: document all controls (Vault rotation, ArgoCD SoD, access recertification) so the observation period begins.

MTTD targets: P1 incidents: alert within 15 minutes from event. Configure Datadog: "If auth_failure_rate > 50/min for any endpoint → PagerDuty P1 alert within 60 seconds." Kafka consumer lag > 10,000 messages → P2 alert. Error rate > 5% on any Platform C service → P1 alert.

Practice Quiz

Q1. A SaaS vendor presents a SOC 2 Type 1 report from 3 months ago. Your CISO asks why you're requesting a SOC 2 Type 2. What is your answer?

▼ Reveal Answer

SOC 2 Type 2 proves that controls operated effectively over 6–12 months. Type 1 only proves controls were designed appropriately at a single point in time — it doesn't tell us whether they worked in practice. A vendor can design perfect controls and then fail to enforce them. Type 2 is the evidence that controls actually ran.

Type 1 = "Our controls look good on paper as of [date]." Type 2 = "Our controls actually worked for the past year." From a vendor risk perspective, Type 2 is significantly more valuable. It proves consistent operation — not just good design. For Tier 1 vendors (eKYC Vendor, Card Processor) who process our customer data or financial transactions, Type 2 is the minimum acceptable standard. Type 1 is acceptable for low-risk or new vendor relationships while Type 2 is being prepared.

Q2. Which DR test type requires actually failing over production systems, and what risk does this carry?

▼ Reveal Answer

Full interruption test — production is actually failed over to recovery systems. The risk: if the recovery systems fail or the failover process has issues, real users experience downtime. This is the most realistic DR test but carries the highest disruption risk. It requires executive approval, change management, and is usually done during a maintenance window.

The full interruption test is the only way to truly validate RTO (Recovery Time Objective) and RPO (Recovery Point Objective) numbers — because you actually measure how long failover takes and how much data was lost. Parallel test runs recovery alongside production (safer — production keeps running). Full interruption cuts over production — no safety net. Use it only when you have high confidence in your DR plan (after successful parallel tests) or when the business truly needs to prove RTO/RPO compliance to a regulator.

Q3. An incident begins at 10:00 AM. The Datadog alert fires at 10:08. The team restores service at 10:52. What are the MTTD and MTTR?

▼ Reveal Answer

MTTD = 8 minutes (10:00 incident start to 10:08 detection). MTTR = 44 minutes (10:08 detection to 10:52 service restored).

MTTD = from when the incident actually started (not when we noticed) to when we detected it. 8 minutes is quite good for MTTD — it means Datadog alerting is catching issues quickly. MTTR = from detection (when we knew about it) to full recovery. 44 minutes MTTR for a P1 is acceptable but could be improved. Common MTTR targets: P1 = <30 minutes to service restoration, <2 hours to root cause documented. Common exam trap: MTTR is sometimes called "Mean Time to Repair" or "Mean Time to Resolution" — same metric, different names.

Q4. The Philippines BSP requires VAPT before launching a new lending product via Partner E. Is this a first-party, third-party, or regulatory audit?

▼ Reveal Answer

Regulatory audit — it is mandated by the Bangko Sentral ng Pilipinas (BSP), the Philippine financial regulator. It is not optional. BSP requires external (third-party) pen testing — an internal VA/pen test does not satisfy the BSP requirement. Failure to complete it before go-live is a regulatory violation.

BSP Circular 982 (and subsequent guidance) requires regulated financial institutions and their technology partners to conduct regular VAPT, with external assessment before major new product launches. This is a compliance obligation, not a best practice choice. The penalty for non-compliance includes delays in product approval, fines, and potential regulatory sanctions. BSP also requires closure evidence — the pen test report plus confirmation that findings were remediated.

Q5. During a tabletop exercise for Platform C production failure, the CTO asks: "What are we NOT doing during a tabletop that we'd do in a real incident?" What is the correct answer?

▼ Reveal Answer

In a tabletop, NO systems are activated — there is no actual failover, no real traffic switching, no database backup restore, no actual incident communication sent to customers. The team only discusses what they would do. System interactions, actual tool execution, and real-world delays are not tested.

Tabletop = verbal walk-through of the plan. It reveals gaps in the plan itself (who is responsible for step X? what happens if person Y is unavailable?) but doesn't test execution capability. A team can perfectly describe the failover process in a tabletop but discover in a parallel test that the actual failover script hasn't been updated in 18 months and fails. The tabletop is the starting point — it validates planning. Parallel and full interruption tests validate execution.

← Lesson 4: SAST, DAST & SCA Domain 7: Security Operations →