HomeD1: Security & Risk Mgmt › Business Continuity & BIA
Domain 1 · Lesson 7 of 10

Business Continuity & Business Impact Analysis

Liên tục Kinh doanh & Phân tích Tác động

BCP vs DRP

BCP — Business Continuity Plan
Kế hoạch Liên tục Kinh doanh · PARENT PLAN

Keep business operations running during a disruption. Business-focused — may include manual workarounds, staff redeployment, alternate sites.

Scope: Entire business; includes IT and non-IT functions
DRP — Disaster Recovery Plan
Kế hoạch Phục hồi Thảm họa · SUBSET OF BCP

Recover IT systems and data after a disaster. Technically focused — covers backup restoration, failover, system recovery procedures.

Scope: IT systems and data only; supports BCP execution
Relationship: BCP is the parent plan. DRP is a subset of BCP focused on IT recovery. An organization cannot execute a BCP without the DRP component for IT-dependent business functions.

Business Impact Analysis (BIA) — Key Metrics

The BIA is the foundation of BCP. It identifies critical business functions and defines the recovery parameters for each. These metrics are set by the business — not IT.

MetricTiếng ViệtDefinitionWho Sets It
MTD
Maximum Tolerable Downtime
Thời gian ngừng hoạt động tối đa Maximum time the business can survive without the function. If exceeded, permanent damage occurs. Business (executive)
RTO
Recovery Time Objective
Mục tiêu thời gian phục hồi Target time to restore the system/function after an incident. Must be ≤ MTD. Business + IT jointly
RPO
Recovery Point Objective
Mục tiêu điểm phục hồi Maximum acceptable data loss expressed in time. "How old can the restored data be?" Business (executive)
WRT
Work Recovery Time
Thời gian phục hồi công việc Time to validate and restore data integrity after systems are back online. IT (technical assessment)
Critical Relationship — Must Memorize
RTO + WRT ≤ MTD
If RTO + WRT exceeds MTD, the business will fail before IT can restore operations.
RPO Example: If RPO = 1 hour, backups must be taken at least every 1 hour. If a failure occurs, the most data that can be lost is 1 hour of transactions. For a fintech lending platform, an RPO of 15 minutes means backups every 15 minutes or continuous replication.

Recovery Site Types

Site TypeTiếng ViệtReadinessCostBest For
Hot Site Trung tâm dự phòng nóng Immediate failover — fully operational, data mirrored in real-time Highest RTO < 1 hour; mission-critical systems
Warm Site Trung tâm dự phòng ấm Hours — hardware pre-provisioned, data restored from recent backup Medium RTO 1–12 hours; important business systems
Cold Site Trung tâm dự phòng lạnh Days — empty facility with power/connectivity; everything must be installed Lowest RTO > 24 hours; low-priority systems
Cloud/Elastic Site Dự phòng đám mây Minutes to hours — infrastructure provisioned on demand via IaC Pay-per-use Variable RTO; modern cloud-native systems

BCP Testing Types (Least → Most Disruptive)

1
Checklist / Document Review
Review the plan documents to verify completeness and currency. No systems affected.
2
Structured Walkthrough / Tabletop Exercise
Team discusses scenarios and responses verbally. No systems affected. Cheapest active test. Most commonly used for initial DR validation.
3
Simulation Test
Simulate a disaster scenario; teams respond as if real, but production systems not actually interrupted.
4
Parallel Test
Recovery systems activated and tested simultaneously while production continues running normally. Validates failover without interrupting production.
5
Full Interruption Test
Actual production systems shut down; full failover to recovery site. Most realistic — proves the plan actually works. High risk if plan has gaps.

Key Terms

TermTiếng ViệtDefinition
BCPKế hoạch liên tục kinh doanhParent plan for maintaining business operations during disruption
DRPKế hoạch phục hồi thảm họaSubset of BCP focused on IT system recovery
BIAPhân tích tác động kinh doanhFoundation of BCP; identifies critical functions and recovery parameters
MTDThời gian ngừng tối đaMaximum time before permanent business damage; set by business
RTOMục tiêu thời gian phục hồiTarget restoration time; must be ≤ MTD
RPOMục tiêu điểm phục hồiMaximum acceptable data loss in time (drives backup frequency)
WRTThời gian phục hồi công việcTime to verify data integrity after systems restored
Hot SiteDự phòng nóngImmediate failover; most expensive; fully mirrored
Tabletop ExerciseBài tập bànDiscussion-based DR test; cheapest; no systems affected
Exam Tips
  1. 1. RTO must be ≤ MTD: If RTO exceeds MTD, the business will fail before IT can restore operations. This relationship is critical and frequently tested.
  2. 2. BCP = business (keeps running); DRP = IT recovery. BCP is the parent plan. "Which plan governs overall business operations during a disaster?" → BCP, not DRP.
  3. 3. Tabletop = cheapest and least disruptive DR test. Full interruption = most realistic and most disruptive (highest risk if gaps exist).
  4. 4. Hot site = immediate failover (most expensive); Cold site = empty building (slowest to activate, cheapest).
  5. 5. RPO measures data loss in time, not in records. RPO = 1 hour means the system can tolerate losing at most 1 hour of transactions. This drives your backup frequency (backups must be taken at least as often as the RPO interval).
Work Application — FinTech Company X Platform C BCP

Platform C BCP parameters by product:

Product/PartnerMTDRTORPOSite Type
Partner A VN (live loans)8 hrs4 hrs15 minWarm (GCP multi-region)
Bank A VN (Platform B cards)4 hrs2 hrs5 minWarm + active-active
Partner E PH (card, planned)2 hrs1 hr5 minHot standby required
Partner D PH (live loans)12 hrs6 hrs30 minWarm (GCP)

Recommended action: Conduct a tabletop exercise with the Platform A/Platform C engineering team to walk through three scenarios: (1) Kafka broker failure — verify consumer lag alerts fire within 5 min; (2) PostgreSQL primary failover — verify replica promotion RTO; (3) eKYC Vendor eKYC outage — verify manual fallback KYC procedure is documented and team knows it.

Practice Questions

Q1. A bank's loan disbursement system has an MTD of 6 hours. The IT team's RTO is 4 hours and the WRT is 3 hours. What is the problem with this plan?

A) RTO + WRT (4 + 3 = 7 hours) exceeds MTD (6 hours) — the business will fail before recovery is complete
Rationale: RTO + WRT must be ≤ MTD. In this case, 4 + 3 = 7 hours, which exceeds the MTD of 6 hours. The IT team must either reduce RTO (faster restoration), reduce WRT (faster data validation), or negotiate with the business to extend the MTD. This is the most important BCP metric relationship on the exam.

Q2. The Platform C team meets to discuss how they would respond if the primary GCP region went down, talking through their runbook without actually activating any systems. What type of DR test is this?

A) Tabletop exercise (structured walkthrough) — discussion-based test with no systems affected
Rationale: A tabletop exercise involves key stakeholders talking through their response to a disaster scenario without actually activating systems. It is the cheapest and least disruptive form of DR testing, making it ideal for initial validation and training. The next step would be a simulation test (practice without production impact) or parallel test.

Q3. Partner E's planned card processing requires a 1-hour RTO. Which recovery site type is most appropriate?

A) Hot site — immediate failover with data already mirrored; only hot sites reliably achieve sub-1-hour RTO
Rationale: Hot sites have systems fully operational and data mirrored in real-time, enabling immediate failover measured in minutes, not hours. Warm sites require 1–12 hours for data restoration. Cold sites require days. For a 1-hour RTO on a card processing system, only a hot site (or cloud active-active) reliably meets this target.

Q4. A disaster destroys Partner A's loan servicing systems. The CISO activates both the BCP and DRP. Which plan covers resuming loan officer manual activities while IT is restored?

A) BCP — covers all business operations including non-IT processes like manual workarounds
Rationale: The BCP is the parent plan covering all business operations during disruption — including non-IT activities like manual processing, staff redeployment, and customer communication. The DRP is a subset focused specifically on IT system recovery. Manual loan officer activities are a business continuity measure, not an IT disaster recovery measure.

Q5. Platform C has an RPO of 30 minutes for its PostgreSQL lending database. What does this mean for backup frequency?

A) Backups must be taken at least every 30 minutes — the maximum acceptable data loss is 30 minutes of transactions
Rationale: RPO defines the maximum acceptable data loss in time. An RPO of 30 minutes means the organization can tolerate losing at most 30 minutes of data. To guarantee this, backups must occur at least as frequently as the RPO interval. For Platform C's lending database, this likely means continuous WAL archiving (PostgreSQL streaming replication) rather than periodic snapshots.