Business Continuity & Business Impact Analysis
Liên tục Kinh doanh & Phân tích Tác động
BCP vs DRP
Keep business operations running during a disruption. Business-focused — may include manual workarounds, staff redeployment, alternate sites.
Recover IT systems and data after a disaster. Technically focused — covers backup restoration, failover, system recovery procedures.
Business Impact Analysis (BIA) — Key Metrics
The BIA is the foundation of BCP. It identifies critical business functions and defines the recovery parameters for each. These metrics are set by the business — not IT.
| Metric | Tiếng Việt | Definition | Who Sets It |
|---|---|---|---|
| MTD Maximum Tolerable Downtime |
Thời gian ngừng hoạt động tối đa | Maximum time the business can survive without the function. If exceeded, permanent damage occurs. | Business (executive) |
| RTO Recovery Time Objective |
Mục tiêu thời gian phục hồi | Target time to restore the system/function after an incident. Must be ≤ MTD. | Business + IT jointly |
| RPO Recovery Point Objective |
Mục tiêu điểm phục hồi | Maximum acceptable data loss expressed in time. "How old can the restored data be?" | Business (executive) |
| WRT Work Recovery Time |
Thời gian phục hồi công việc | Time to validate and restore data integrity after systems are back online. | IT (technical assessment) |
Recovery Site Types
| Site Type | Tiếng Việt | Readiness | Cost | Best For |
|---|---|---|---|---|
| Hot Site | Trung tâm dự phòng nóng | Immediate failover — fully operational, data mirrored in real-time | Highest | RTO < 1 hour; mission-critical systems |
| Warm Site | Trung tâm dự phòng ấm | Hours — hardware pre-provisioned, data restored from recent backup | Medium | RTO 1–12 hours; important business systems |
| Cold Site | Trung tâm dự phòng lạnh | Days — empty facility with power/connectivity; everything must be installed | Lowest | RTO > 24 hours; low-priority systems |
| Cloud/Elastic Site | Dự phòng đám mây | Minutes to hours — infrastructure provisioned on demand via IaC | Pay-per-use | Variable RTO; modern cloud-native systems |
BCP Testing Types (Least → Most Disruptive)
Key Terms
| Term | Tiếng Việt | Definition |
|---|---|---|
| BCP | Kế hoạch liên tục kinh doanh | Parent plan for maintaining business operations during disruption |
| DRP | Kế hoạch phục hồi thảm họa | Subset of BCP focused on IT system recovery |
| BIA | Phân tích tác động kinh doanh | Foundation of BCP; identifies critical functions and recovery parameters |
| MTD | Thời gian ngừng tối đa | Maximum time before permanent business damage; set by business |
| RTO | Mục tiêu thời gian phục hồi | Target restoration time; must be ≤ MTD |
| RPO | Mục tiêu điểm phục hồi | Maximum acceptable data loss in time (drives backup frequency) |
| WRT | Thời gian phục hồi công việc | Time to verify data integrity after systems restored |
| Hot Site | Dự phòng nóng | Immediate failover; most expensive; fully mirrored |
| Tabletop Exercise | Bài tập bàn | Discussion-based DR test; cheapest; no systems affected |
- 1. RTO must be ≤ MTD: If RTO exceeds MTD, the business will fail before IT can restore operations. This relationship is critical and frequently tested.
- 2. BCP = business (keeps running); DRP = IT recovery. BCP is the parent plan. "Which plan governs overall business operations during a disaster?" → BCP, not DRP.
- 3. Tabletop = cheapest and least disruptive DR test. Full interruption = most realistic and most disruptive (highest risk if gaps exist).
- 4. Hot site = immediate failover (most expensive); Cold site = empty building (slowest to activate, cheapest).
- 5. RPO measures data loss in time, not in records. RPO = 1 hour means the system can tolerate losing at most 1 hour of transactions. This drives your backup frequency (backups must be taken at least as often as the RPO interval).
Platform C BCP parameters by product:
| Product/Partner | MTD | RTO | RPO | Site Type |
|---|---|---|---|---|
| Partner A VN (live loans) | 8 hrs | 4 hrs | 15 min | Warm (GCP multi-region) |
| Bank A VN (Platform B cards) | 4 hrs | 2 hrs | 5 min | Warm + active-active |
| Partner E PH (card, planned) | 2 hrs | 1 hr | 5 min | Hot standby required |
| Partner D PH (live loans) | 12 hrs | 6 hrs | 30 min | Warm (GCP) |
Recommended action: Conduct a tabletop exercise with the Platform A/Platform C engineering team to walk through three scenarios: (1) Kafka broker failure — verify consumer lag alerts fire within 5 min; (2) PostgreSQL primary failover — verify replica promotion RTO; (3) eKYC Vendor eKYC outage — verify manual fallback KYC procedure is documented and team knows it.
Practice Questions
Q1. A bank's loan disbursement system has an MTD of 6 hours. The IT team's RTO is 4 hours and the WRT is 3 hours. What is the problem with this plan?
A) RTO + WRT (4 + 3 = 7 hours) exceeds MTD (6 hours) — the business will fail before recovery is completeQ2. The Platform C team meets to discuss how they would respond if the primary GCP region went down, talking through their runbook without actually activating any systems. What type of DR test is this?
A) Tabletop exercise (structured walkthrough) — discussion-based test with no systems affectedQ3. Partner E's planned card processing requires a 1-hour RTO. Which recovery site type is most appropriate?
A) Hot site — immediate failover with data already mirrored; only hot sites reliably achieve sub-1-hour RTOQ4. A disaster destroys Partner A's loan servicing systems. The CISO activates both the BCP and DRP. Which plan covers resuming loan officer manual activities while IT is restored?
A) BCP — covers all business operations including non-IT processes like manual workaroundsQ5. Platform C has an RPO of 30 minutes for its PostgreSQL lending database. What does this mean for backup frequency?
A) Backups must be taken at least every 30 minutes — the maximum acceptable data loss is 30 minutes of transactions