DDoS Readiness
A CISO brief for proving availability under attack
Organizations rarely get credit for DDoS protection until something breaks. "Readiness" means you can withstand, degrade gracefully, and recover quickly—and you can prove it with evidence.
What "DDoS resilience" means
DDoS resilience is not just "blocking traffic." It is your ability to keep critical services usable under hostile conditions.
Resilience has three parts:
Good DDoS resilience is measurable. You define what "acceptable impact" looks like and validate you can stay within it.
Common failure modes CISOs should expect
Most DDoS-related incidents are not "we were hit with a big number." They're architectural and operational.
Volumetric saturation (bandwidth / packets)
- Links saturate, upstream devices drop, or the provider scrubs too late.
- Protection exists, but escalation paths aren't prepared.
Protocol & state exhaustion (L3/L4)
- SYN floods, TCP state table saturation, or SSL/TLS renegotiation abuse overwhelm stateful devices (firewalls, load balancers) before bandwidth is exhausted.
- Stateful infrastructure (firewalls, NAT) often fails before the application layer is even reached.
Application-layer bottlenecks (L7)
- Small request rates cause big backend load (login, search, checkout, APIs).
- Rate limits are too loose (ineffective) or too strict (self-inflicted outage).
False positives (you block real users)
- Controls trigger during real spikes (product launches, news coverage, emergencies).
- Legitimate traffic looks "bot-like" without proper baselines.
Upstream constraints (your providers become the bottleneck)
- CDN/WAF protects the edge but origin, DNS, or third-party integrations fail.
- Capacity and limits are unknown until peak stress.
DNS / API / dependency fragility
- Attacks target what's easiest to break: DNS, auth, payment gateways, identity providers, or critical APIs.
- Your "main site" survives, but the user journey fails.
A practical readiness checklist
This checklist is designed to be actionable and auditable.
A. Define service objectives (SLOs)
- What must stay available? (portals, APIs, login, payments, DNS)
- What impact is acceptable? (latency, error rate, partial feature loss)
- What is the recovery objective? (RTO/RPO, failover time)
B. Map ownership and escalation
- Named owners for: edge, DNS, WAF/CDN, app services, incident commander.
- Documented escalation paths to providers (including emergency contacts).
- Clear "who decides" on blocking, geo rules, and emergency mitigations.
- Communication plan: who notifies customers, regulators, and media—and when.
C. Validate controls and capacity
- Edge controls: WAF/CDN rules, rate limits, bot controls.
- Network controls: ACLs, upstream filtering, protection activation steps.
- Origin resilience: caching strategy, queueing/backpressure, autoscaling thresholds.
- Cost controls: autoscaling budgets, spend alerts, and circuit-breaker thresholds to prevent cloud bill runaway during sustained attacks.
D. Runbooks and drills
- Incident runbook with: detection → triage → mitigation → recovery → comms.
- Tabletop exercise for your top scenarios (e.g., simultaneous volumetric + L7 + dependency failure).
- A "safe-mode" plan (reduced features, protected endpoints, prioritized flows).
E. Testing cadence (don't rely on hope)
- Regular scenario-based tests (at least quarterly for high-criticality services).
- Retest after major changes (new CDN/WAF rules, new APIs, new regions).
- Post-incident re-validation (prove fixes worked).
What "evidence" looks like
When DDoS readiness is real, you can show evidence that's useful to executives, auditors, and incident reviews.
If you can't show evidence, you have opinions—not readiness.