Kubernetes DDoS Testing: Ingress, Autoscaling, and What to Validate

A Kubernetes DDoS test answers a question the cloud console will not: when attack-shaped traffic hits the cluster, does its elasticity protect you, fail open, or quietly hand you a five-figure bill.

Kubernetes is sold on elasticity. Replicas scale out, nodes join the pool, the system absorbs load. That story is true right up to the point where an adversary turns it against you.

Under a real flood, a cluster's autoscaling does one of three things. It keeps up. It tips over, because pods cannot schedule against a ceiling nobody set on purpose. Or it keeps up so well that it converts an availability attack into a financial one.

Which outcome you get is not luck. It is a configuration property, and a structured DDoS test is how you find out which one your cluster has before someone else does.

This is the Kubernetes-specific instance of the same discipline, with siblings for the managed clouds the control plane usually runs on: AWS (EKS), Azure (AKS), and GCP (GKE). Kubernetes adds a layer those posts do not: an internal scheduler whose reaction time and resource ceilings are themselves part of the attack surface.

The thing Kubernetes does not have

Start with what is not in the cluster: a DDoS edge.

Kubernetes has no equivalent of a scrubbing center or an always-on volumetric filter. Layer 3/4 floods, the SYN floods and UDP reflection that saturate a pipe, are absorbed (or not) by whatever sits in front of the cluster: a cloud load balancer, a CDN, an anycast edge. Those are the systems the sibling posts above test. The cluster's own defenses begin at Layer 7, and they begin inside a pod.

So a Kubernetes DDoS test is really a test of the internal response: what happens after a flood has cleared the edge (or walked around it) and is now landing on your ingress controller and your application pods. That is a different question from "does the cloud's managed protection work," and it is the question this post is about.

At a glance: what a Kubernetes DDoS test validates

Layer under test	What it is supposed to do	What the test actually verifies
Upstream edge (cloud LB / CDN)	Absorb L3/L4 volumetric floods before they reach the cluster	Whether one exists at all, and whether any pod, NodePort, or external IP is reachable around it
Ingress controller (nginx / Envoy)	Terminate TLS and route L7 traffic to services	The worker-connection, file-descriptor, and CPU ceiling of a single pod; whether request limits and rate limits are even configured
Horizontal Pod Autoscaler	Add replicas when a metric crosses target	Reaction latency end to end, and whether the target metric reflects the attack at all
Cluster Autoscaler / Karpenter	Add nodes when pods cannot schedule	Node-provision latency, the quota ceiling where it stops, and the cost trajectory while it runs
Pod requests/limits + probes	Bound consumption; restart unhealthy pods	Whether load-induced probe timeouts evict healthy pods and start a cascade
Node kernel (conntrack, accept queue)	Track connections and accept sockets	The connection-table ceiling that fails before CPU or bandwidth does
NetworkPolicy / service exposure	Restrict who can reach which pods	Whether a NodePort or LoadBalancer service exposes pods directly, past the ingress

The pattern repeats at every layer: the platform supplies an elastic mechanism, and the resilience lives in the limits, targets, and timing wired around it.

Kubernetes DDoS request path: traffic crosses an external cloud load balancer or CDN edge into the cluster, hits the ingress controller pod, then the service and application pods, with the Horizontal Pod Autoscaler and Cluster Autoscaler reacting on separate timescales. The cluster itself has no L3/L4 DDoS edge.

The protection surface you are testing

Each layer has a specific failure mode. Be precise about them before generating any load.

The ingress controller is a single L7 chokepoint

Almost all north-south traffic enters through one ingress controller, an nginx or Envoy reverse proxy running as a handful of pods. It terminates TLS, matches host and path rules, and proxies to backend services.

That makes it the first thing a Layer 7 flood meets inside the cluster, and it is a pod with finite resources like any other. nginx has a worker_connections ceiling and a file-descriptor limit. Envoy has connection and concurrency budgets. TLS termination is CPU-bound, and a flood of new handshakes (or an HTTP/2 rapid reset pattern that opens and cancels streams) burns that CPU fast.

If ingress rate limiting and connection limits are unset, which is a common default, the controller has no way to shed load other than falling over. The test drives HTTP floods and slow-connection attacks at it specifically and watches where it breaks: connection table, descriptors, CPU, or upstream timeouts.

The Horizontal Pod Autoscaler reacts on a delay

The HPA is the inner elasticity loop: when a metric (CPU, or a custom request-rate metric) crosses its target, it adds replicas.

The word that matters is reacts. The loop is not instant. It runs on a chain of delays: the metrics pipeline scrape interval, the HPA sync period, a stabilization window that deliberately damps flapping, then pod scheduling, image pull, container start, and the readiness probe passing before the new replica takes traffic. Each step is seconds; together they are a window during which the existing pods absorb the entire attack alone.

This is the cluster's version of a detection-to-mitigation gap, except the thing arriving late is not a mitigation rule, it is capacity. The autoscaler exposure window is where short, sharp pulse-wave attacks do their damage: each pulse ends before the cluster finishes scaling up for it.

The Cluster Autoscaler is a slower, harder ceiling

When new pods cannot fit on existing nodes, they sit Pending until the Cluster Autoscaler (or Karpenter) provisions a node. That outer loop is slower still: cloud instance request, boot, kubelet registration, then the pod schedule and readiness chain on top.

It also has a hard stop. Node groups have maximum sizes. Cloud accounts have instance quotas. The CNI has a finite pool of pod IPs per node and per cluster. When the autoscaler hits any of those, scaling does not slow down, it stops, and the surplus pods stay Pending indefinitely. That ceiling is almost never set deliberately with an attack in mind, which is exactly why a test should find it.

Pods, probes, and the self-inflicted cascade

Resource requests and limits bound what each pod consumes. Liveness and readiness probes restart or remove pods that stop responding. Under normal conditions these keep the cluster healthy. Under load they can do the opposite.

When pods are saturated, their probe endpoints get slow. A readiness probe that times out marks a healthy-but-busy pod as not-ready, pulling it out of the service endpoints and dumping its share of traffic onto the remaining pods, which are now closer to the edge themselves. A liveness probe timeout goes further and restarts the pod, briefly removing capacity during an attack and risking CrashLoopBackoff. The defense mechanism becomes the amplifier. A test has to confirm that probe timeouts and thresholds are tuned so load alone does not trigger this spiral.

The node kernel fails before the bandwidth does

Below the application, every node is a Linux box with kernel limits. The conntrack table tracks every connection through the node; a connection flood fills it, and once it is full new connections are dropped regardless of how much CPU or bandwidth is idle. The accept queue and ephemeral port range have their own ceilings, as does kube-proxy's iptables or IPVS path.

These are the layer of first failure on many clusters: a connection-oriented flood exhausts a kernel table at a traffic rate that looks like nothing on a bandwidth graph. A datasheet never tells you which table goes first. A test does.

Service exposure: the path around the ingress

The ingress controller only protects traffic that goes through it. A NodePort service opens a port on every node. A LoadBalancer service can provision a public endpoint straight to pods. A pod with a public IP, a debug service left exposed, or a permissive NetworkPolicy can all create a path that skips the ingress, the WAF, and the rate limits entirely.

This is the cluster-internal cousin of origin IP exposure: a carefully defended front door with a side door propped open behind it. The test enumerates every externally reachable service and confirms that nothing answers except the intended ingress path.

What Kubernetes DDoS testing actually surfaces

A useful test is organized around the failures that recur on real clusters. On Kubernetes the first one is not a tuning detail. It is the elasticity itself, and it leads.

Autoscaler tip-over versus cost detonation

The headline finding on most clusters is that autoscaling has two failure modes, and teams have usually planned for neither.

The first is tip-over. Under sustained load the HPA requests more replicas, the Cluster Autoscaler requests more nodes, and then one of them hits a ceiling: node-group max, instance quota, or CNI IP exhaustion. Pods pile up Pending. The cluster stops absorbing and starts failing, often abruptly, because everything worked until the moment it didn't.

The second is cost detonation, sometimes called economic denial of sustainability. Here the autoscaler succeeds. It scales nodes to meet the attack, the site stays up, and the attacker has converted a traffic flood into a billing event. An attack that costs the adversary almost nothing can scale your infrastructure spend for hours, and a cluster configured for unbounded elasticity has no natural backstop against it.

These are two ends of one dial. A low ceiling tips over; a high ceiling detonates the bill; the right setting is a deliberate trade-off you can only make if you know where both edges are. Finding them is a core test objective.

Under a sustained Layer 7 flood the autoscaler forks two ways: a low ceiling leaves pods stuck Pending and the cluster tips into failure, while an unbounded ceiling keeps scaling nodes and converts the traffic attack into a runaway cost event. The test locates both edges of the dial.

The scale-out window: capacity arrives late

Even when the ceilings are set sensibly, the time to reach them matters. Between attack onset and enough capacity being Ready, the existing pods carry everything.

The chart below shows the shape: successful requests dip as the running pods saturate, bottom out, then climb back as the HPA (and, if needed, the Cluster Autoscaler) brings replicas online across the scale-out window. The depth and width of that dip are the real, measured exposure. A short pulse-wave attack that fits inside the window never gives the cluster a chance to catch up.

Successful-request rate under a simulated Layer 7 flood: it dips as running pods saturate, troughs, then recovers as the autoscaler brings new replicas Ready across the scale-out window. The dip is the measured exposure, not a number from a datasheet.

The number worth capturing is end-to-end: from the request rate crossing the HPA target to the moment added capacity is actually serving. Measured, not quoted from the sync-period setting, because the readiness chain and image-pull time dominate it in practice.

The HPA scaling on the wrong signal

An HPA keyed on CPU assumes the attack burns CPU. Many Layer 7 attacks do not. A slowloris or slow-read flood holds connections open while sending almost nothing; CPU stays low, the metric never crosses target, and the HPA never scales while the connection table fills. The pods are dying and the autoscaler sees a quiet cluster.

The test verifies that the scaling signal actually correlates with the attack classes you care about. If it does not, the elasticity is connected to the wrong nerve.

Probe-induced cascades under load

The readiness-probe spiral described above is worth provoking deliberately in a controlled test, because it is invisible until it happens. Drive the pods to the saturation point and watch whether probe timeouts start pulling healthy pods out of rotation. If they do, the failure accelerates on its own, and the fix (more generous probe timeouts, separate probe endpoints, decoupling health from load) is cheap once you know it is needed.

Rate limiting that lives in the wrong place, or nowhere

Many clusters have no rate limiting at the ingress at all, on the assumption that the upstream WAF or the application handles it. The test checks where the limit actually is. A limit only at the application means every abusive request still costs a full proxy hop and a backend connection. A limit only at a per-IP key misses a distributed flood and punishes shared-egress users. As with any edge, the key and the threshold are hypotheses about traffic, and the test is how you learn whether they hold.

Authorization: testing through someone else's infrastructure

A Kubernetes test almost never touches only your code. The traffic crosses the cloud provider's load balancer and network before it reaches the cluster, and that changes the authorization picture.

High-volume simulated traffic toward an EKS, AKS, or GKE-fronted endpoint crosses the provider's infrastructure and falls under its simulated-DDoS and acceptable-use terms, the same gate covered in the AWS, Azure, and GCP posts. Thresholds and process change over time, so the durable instruction is procedural: read the current published policy for your provider and coordinate before generating load.

There is also a blast-radius dimension specific to clusters. Most clusters are multi-tenant: many applications share nodes. A flood aimed at one namespace can evict pods, exhaust a node's conntrack table, or trigger node scaling that affects every workload on those nodes. The noisy-neighbor effect is itself a finding, and it is also a reason to scope carefully. Keeping the test contained matters more here than on a single-tenant origin, because the collateral surface is larger.

Underneath all of it sits the non-negotiable: written authorization from the cluster's owner. Provider terms are a gate on top of that consent, never a substitute for it.

Designing the test: environment, scope, and measurement

The discipline that keeps a cluster test informative without taking down production is the same that governs any production-adjacent test, with a Kubernetes contour.

Environment selection

The cleanest target is a staging cluster built from the same source of truth as production: the same Helm charts or manifests, the same HPA targets, the same ingress configuration, the same node-group and autoscaler settings, applied through the same GitOps pipeline. Kubernetes makes this unusually reproducible, which is one of the few places the platform helps the tester.

The trap is that a staging cluster sized differently from production has different ceilings. If staging has a node-group max of three and production has thirty, the tip-over point you measure is fiction. Match the limits that govern the failure modes, or measure against production directly under tight scope.

Scope as a bounding document

The scope names the cluster and namespaces in play, the ingress hosts and services under test, the maximum request rates and source counts, the autoscaler ceilings as they stand at test time (so a tip-over is interpreted correctly), the test windows aligned to change control, and the kill switch: the ability to stop generation instantly and to cap or freeze autoscaling so a cost-detonation test cannot run away.

Measurement per layer

Each layer gets its own measured outcome. Ingress: the rate and connection count at which the controller degrades, and how. HPA: the end-to-end scale-out time and whether the signal tracked the attack. Cluster Autoscaler: node-provision latency and the ceiling reached. Pods: whether probes induced a cascade. Node: which kernel table saturated first. Cost: the spend delta across the test window.

The deliverable is not "the cluster stayed up." It is a per-layer characterization: which mechanism engaged, how fast, against which ceiling, and what a real user experienced while it did.

Mapping attack classes to Kubernetes controls

A thorough test exercises each class against the layer meant to handle it. A cluster hardened against one class can fall to another at a fraction of the volume.

Mapping DDoS attack classes to the Kubernetes layer that should handle each: L3/L4 volumetric floods to the upstream cloud edge rather than the cluster, L7 floods to the ingress controller's rate and connection limits, connection-table exhaustion to the node kernel, application-logic abuse to the app and bot management, and the direct-to-pod path to NetworkPolicy and service exposure.

L3/L4 volumetric and protocol floods are not a cluster control at all. They are the job of the upstream cloud load balancer, CDN, or scrubbing layer. The cluster test confirms that layer exists and that nothing is reachable around it.
L7 HTTP floods (HTTP flood, HTTP/2 rapid reset, slow-read) are the ingress controller's domain: connection limits, request rate limits, timeouts, and the worker/CPU budget under sustained volume.
Connection and state exhaustion target the node kernel and the ingress connection table, the layer of first failure on many clusters, and pass straight through any control that only inspects request volume.
Application-logic abuse (application-layer abuse: credential stuffing under the limit, expensive search, checkout floods) is the hardest class because the requests are valid. Only application-aware rules and a tuned bot management score catch it, and both need validation against legitimate and adversarial traffic side by side.
Direct-to-pod is defended, or not, by NetworkPolicy and disciplined service exposure. It is the path that skips the ingress, and a test must look for it from outside the cluster.

Procurement note: subscription versus project engagement

One consideration sits at the procurement layer rather than the technical one. A cluster's protection is assembled from platform primitives, and the validation of that assembly is a separate discipline procured on its own terms.

DDoS testing services are sold both as an ongoing subscription and as a discrete project engagement, and the two suit different cadences. Change-triggered testing (after an ingress migration, an autoscaler retune, a CNI change, a node-group resize, or ahead of a launch) maps naturally onto project engagements: the test happens when the change happens. A cluster under constant churn may prefer a standing capability. Neither is universally correct, and the annual-subscription commitments some vendors require can price out a team whose real need is a handful of well-scoped tests a year. Match the engagement model to the cadence the cluster actually warrants.

FAQ

What is Kubernetes DDoS testing?

Controlled, authorized generation of attack-shaped traffic against a Kubernetes-hosted application, to verify how the cluster behaves under pressure. It validates the ingress controller's capacity, whether the Horizontal Pod Autoscaler and Cluster Autoscaler react in time and against the right signal, whether autoscaling tips over or runs up cost, whether probes induce a cascade, which node kernel limit fails first, and whether any service is reachable around the ingress.

Does Kubernetes protect against DDoS by default?

No. Kubernetes has no L3/L4 DDoS edge of its own; that protection comes from the cloud load balancer or CDN in front of the cluster. Inside the cluster, autoscaling is reactive and bounded, ingress rate limiting is often unset, and node kernel tables have finite ceilings. The elasticity helps, but it has a latency and a limit, both of which an attack will find.

Can autoscaling stop a DDoS attack?

It can absorb some attacks and it makes others worse. Reactive scaling has a window during which existing pods carry the whole load, it scales on a metric that some L7 attacks never move, and successful scaling can turn an availability attack into a cost attack. Autoscaling is a capacity mechanism, not a security control, and a test shows which it is for your cluster.

What should a Kubernetes DDoS test validate?

At minimum: the ingress controller's degradation point and behavior; the end-to-end HPA scale-out time and whether its signal tracks the attack; the Cluster Autoscaler's node-provision latency and ceiling; whether probe timeouts evict healthy pods under load; which node kernel table (conntrack, accept queue, ports) saturates first; the cost trajectory under sustained load; and that no NodePort, LoadBalancer service, or pod IP is reachable past the ingress.

Is testing a staging cluster good enough?

For most layers, yes, provided the staging cluster shares the limits that govern failure: HPA targets, node-group maximums, quotas, and CNI capacity. A staging cluster with smaller ceilings reports a tip-over point that does not exist in production. Service-exposure checks against the real public endpoints have to run against production scope.

What a Kubernetes test is really auditing

Five years from now, Kubernetes will still schedule pods onto nodes, the HPA will still react to metrics on a delay, ingress will still be a proxy with finite connections, and the cluster will still have no DDoS edge of its own. Those properties are stable.

What drifts, and so has to be re-verified, is everything configured around them:

the autoscaler ceilings, set for a steady-state bill and never revisited for an attack
the HPA target metric, which may not move when the dangerous attacks arrive
ingress rate and connection limits, often still at their unset defaults
probe timeouts tuned for a healthy cluster, not a saturated one
the list of services quietly exposed past the ingress, which grows every sprint

Kubernetes does not ship a cluster that is open to attack. It ships a set of elastic mechanisms whose protection is entirely a function of the limits and timings wired around them. Elasticity does not remove the failure. It defers it, and moves it to whichever ceiling you forgot you had. The test is how you find that ceiling while it is still cheap to move.