Domain 7 · Lesson 4 of 6

Change, Patch & Configuration Management

Quản lý Thay đổi, Vá lỗi & Cấu hình

Change Management & Change Advisory Board (CAB)

Change Advisory Board (CAB)

Reviews and approves significant changes to production systems. Includes representatives from IT, security, and business stakeholders. Ensures changes are risk-assessed before deployment.

Change Types

Type	Description	Approval Required
Standard	Pre-approved, low-risk, routine change with a well-known procedure	No CAB — pre-authorized. Execute per documented procedure.
Normal	Planned change, assessed for risk and impact	Full CAB review required before implementation
Emergency	Urgent — cannot wait for regular CAB cycle. Unplanned, time-critical.	Emergency CAB or expedited approval — NOT "no approval"

1. Request

→

2. Assess Impact/Risk

→

3. Approve (CAB)

→

4. Implement

→

5. Test

→

6. Close

GitOps as Modern Change Management

Every deployment is a Git commit — auditable, reversible, peer-reviewed via Pull Request. ArgoCD enforces: only deploy from approved Git state. Git history IS the change log. This satisfies CAB requirements for standard changes (pre-approved automation pipeline).

Configuration Management

Key Concepts

Configuration baseline: documented known-good state of a system — the approved reference
Configuration item (CI): any component managed by CM (server, config file, Kubernetes manifest)
CMDB: Configuration Management Database — authoritative inventory of all CIs and their relationships

Configuration Drift

Unauthorized changes from the established baseline — manual changes made directly to production, bypassing the CM process. Creates security risk because system state is unknown and unvalidated.

Detection: ArgoCD detects drift between Git state and live cluster. Chef InSpec, Terraform plan also used.

Configuration Drift = Security Risk

If a system has drifted from its approved baseline, you don't know its actual security posture. The drift could have introduced a vulnerability, opened a port, or weakened a security control — and no one knows. This is why drift detection and automatic correction (ArgoCD self-heal) is critical.

Patch Management Process & SLAs

1. Inventory (SBOM, CMDB)

→

2. Scan (Nessus, govulncheck)

→

3. Assess (CVSS + context)

→

4. Test (non-prod)

→

5. Deploy (with CM approval)

→

6. Verify

Severity	CVSS Score	Patch SLA	Action if Can't Patch
Critical	9.0 – 10.0	24–48 hours	Implement compensating control (WAF rule) + management sign-off
High	7.0 – 8.9	7 days	Risk acceptance with CTO/CISO sign-off
Medium	4.0 – 6.9	30 days	Track in backlog, resolve in current quarter
Low	0.1 – 3.9	90 days	Fix in next sprint cycle

Virtual Patching

A WAF rule that blocks exploitation of a known vulnerability before the real patch is deployed. Temporary compensating control only — does NOT replace real patching. Used to bridge the gap between vulnerability disclosure and patch availability.

SBOM — Software Bill of Materials

Inventory of all software components and dependencies in a system. Required for patch management — you can't patch what you don't know exists. govulncheck and Dependabot use SBOM data to identify vulnerable dependencies.

Key Terms

CAB Standard Change Normal Change Emergency Change Configuration Baseline CMDB Configuration Drift GitOps Virtual Patching CVSS SLA

Exam Tips

Emergency change: bypass the regular CAB cycle BUT still requires expedited approval from designated emergency approvers — "no approval" is never correct.
Virtual patching is a temporary compensating control (WAF rule) — it does NOT replace real patching. Never accept virtual patching as a permanent solution.
GitOps (ArgoCD): every change is traceable via Git commit history = audit trail. This satisfies change management auditability requirements.
Configuration drift: unauthorized changes from baseline create unknown security posture — detect and remediate automatically where possible.
Patch SLA for Critical (CVSS 9+) = 24–48 hours. If you cannot patch in time, implement a compensating control AND get management sign-off documenting the accepted risk.
CMDB is the authoritative source of what systems exist — without it, patch management is impossible (can't patch what you don't know about).

Work Application — Platform C Change & Patch Management with ArgoCD

Standard changes via GitOps: All production changes via Pull Request → code review approval (minimum 1 engineer) → ArgoCD auto-sync. This satisfies CAB requirements for standard changes — the process is documented, repeatable, and auditable via Git history.

Emergency change process: PR with 'EMERGENCY' label → on-call engineer approval (no waiting for next CAB meeting) → immediate ArgoCD sync → post-hoc review within 24hrs. Approval is still required — just expedited.

Patch management in CI: govulncheck runs on every CI build for Go modules. Critical CVE in any Platform C dependency → auto-create GitHub Issue → assigned engineer must resolve within 48hrs or add a WAF compensating rule with CTO sign-off. The 48-hour SLA is non-negotiable.

Configuration drift detection: ArgoCD continuously compares Kubernetes manifests in Git vs live cluster state. Any drift (e.g., someone did kubectl edit in production) triggers an alert → auto-revert to Git state (or manual review if auto-revert is disabled).

CMDB for Platform C: Terraform state files + Kubernetes manifests + Helm chart values = the "living CMDB." Version-controlled, auditable, and the single source of truth for all infrastructure configuration items.

Practice Quiz

Q1. A critical security patch needs immediate deployment. Can the team skip all approvals since it's an emergency?

▼ Reveal Answer

No. Emergency changes bypass the regular CAB meeting cycle but still require expedited approval from a designated emergency approver (e.g., on-call manager, CISO, or pre-designated emergency CAB member). "Emergency" never means "no approval."

The purpose of change management is to prevent unauthorized changes and ensure changes are risk-assessed. Emergency changes compress the timeline but don't eliminate accountability. Most frameworks (ITIL, ISO 20000) define an Emergency CAB — a subset of CAB members available 24/7 for urgent approvals. The change is still documented, approved, and reviewed post-deployment. Skipping all approvals would mean a single engineer could deploy anything to production without oversight — a serious internal control failure.

Q2. How does GitOps (ArgoCD) satisfy change management requirements?

▼ Reveal Answer

GitOps satisfies change management by making every change traceable to a Git commit (who changed what, when, why via PR description), requiring peer review before merge (equivalent to CAB approval), and making changes reversible (git revert). The Git history is the audit trail.

Traditional CAB approval = a meeting where humans review and approve changes. GitOps = a Pull Request where engineers review and approve via code review. Both achieve the same outcome: changes are reviewed by a second party before deployment. ArgoCD enforces that only code in the approved Git state gets deployed — no ad-hoc manual deployments. For auditors (SOC 2, ISO 27001), the GitHub PR history with approvals and the ArgoCD deployment log is sufficient change management evidence. This maps GitOps to ITIL standard change process.

Q3. An engineer directly edits a Kubernetes deployment in production ("kubectl edit") bypassing GitOps. What security problem does this create?

▼ Reveal Answer

This is configuration drift — the live system has deviated from the approved baseline (Git state). The change is unreviewed, unaudit-trailed, and potentially unauthorized. It creates an unknown security posture: did the change introduce a vulnerability? Weaken a security control? Open an unintended port?

Configuration drift is dangerous because it creates an "unknown unknowns" problem — you don't know what's different, you don't know if it's a security risk, and you can't confidently rely on your security controls anymore. The baseline assumption (approved Git state = production state) is now broken. This is why ArgoCD's drift detection and self-heal capability is a security control, not just an operational convenience. The manual kubectl edit is equivalent to a violation of change management policy — it should be blocked by RBAC and detected via ArgoCD.

Q4. A WAF rule is deployed to block exploitation of Log4Shell (CVE-2021-44228) while the team prepares the real patch. Is virtual patching sufficient long-term?

▼ Reveal Answer

No. Virtual patching (WAF rule) is a temporary compensating control to buy time until the real patch is deployed. Attackers can sometimes bypass WAF rules, and WAF rules require constant updates. The real patch (upgrading Log4j to 2.17.1+) must still be deployed within the patch SLA.

Virtual patching has limitations: WAF bypass techniques exist, WAF rules may not cover all attack vectors, and the underlying vulnerability remains. For Log4Shell (CVSS 10.0 Critical), the virtual patching is appropriate for the first hours while the real fix is prepared, but the patch SLA of 24–48 hours still applies. Virtual patching as a permanent solution means you're accepting indefinite risk with a compensating control that can fail — this requires documented management sign-off and periodic risk review. Never treat virtual patching as equivalent to real patching.

Q5. govulncheck finds a CVSS 9.5 vulnerability in an Platform C Go dependency. What is the patch SLA and what happens if the team can't meet it?

▼ Reveal Answer

24–48 hours for a Critical (CVSS 9.0+) vulnerability. If the real patch cannot be deployed in time: (1) implement a compensating control (WAF rule blocking the vulnerable code path), AND (2) get management sign-off documenting the accepted risk and the compensating control in place. Continue working on the real patch.

The 24–48 hour SLA for Critical vulnerabilities is industry standard (PCI-DSS, SOC 2, ISO 27001 all reference similar timelines). At CVSS 9.5, this vulnerability has very high likelihood of exploitation with significant impact — every hour without a patch is exposure. If a real patch requires testing and can't be done in 48 hours, the compensating control + management sign-off is the correct process — not "we'll get to it next sprint." The risk is documented, mitigated with a compensating control, and owned by management, not left unaddressed.

← Lesson 3: Monitoring & SIEM Lesson 5: BCP & DRP →