DNS Infrastructure Under DDoS: Testing Authoritative and Recursive Resolution

Every other layer you test is a destination. DNS is the thing that tells the internet where the destination is.

That inversion is the whole reason DNS deserves its own test. Your origin can be perfectly healthy, your WAF tuned, your autoscaler warm, and none of it matters if a name cannot be resolved into an address. When resolution fails, you are not slow. You are gone. The servers are running and answering their own health checks, and the outside world simply cannot find them.

Worse, the failure is quiet. Every dashboard you own looks at traffic that already arrived, which means it looks at the users who successfully resolved you. The ones who got a SERVFAIL never show up in your logs at all. You can lose a third of your users to a DNS problem and watch your request graphs stay flat.

So a DNS DDoS test is not a capacity test of your servers. It is a test of a distributed resolution path, most of which you do not operate, whose most important property is that it can fail totally and invisibly.

What DNS DDoS testing actually validates

DNS DDoS testing is the practice of verifying that name resolution for your zone continues to work under attack, across both the authoritative servers you are responsible for and the recursive resolvers your users rely on. It measures whether clients can still turn your domain into a routable address while an adversary floods, or weaponizes, the resolution path, and how long any disruption takes to appear and to clear.

It is a spoke of a complete DDoS testing methodology, and it sits upstream of everything else in that methodology. The reflection and amplification mechanics it touches are catalogued in the DDoS attack vectors deep dive. This post is about the other half: not how DNS is abused as a weapon, but how you validate that your own resolution survives being on the receiving end.

The single most useful thing to hold in your head is that "DNS" names two different systems with two different owners and two different threat models.

	Authoritative resolution	Recursive resolution
What it is	The nameservers that hold your zone and answer for it	The resolvers that query on a user's behalf and cache the answer
Who runs it	You, or your managed DNS provider	ISPs, enterprises, public resolvers (you do not own these)
The attack	Flood the authoritative servers so they cannot answer	Launder a flood through resolvers so it lands on your authoritative tier
What the test validates	Query-rate headroom, anycast distribution, provider diversity	Resolution success and latency from the client's side of the path

Get the two straight and the rest of the discipline follows. Confuse them and you will test the half you own and declare victory over the half that actually breaks.

Why DNS is a different kind of test

Failure is total and binary

Most DDoS outcomes are graded. Latency climbs, a fraction of requests fail, goodput sags, and the service degrades along a curve you can measure. DNS is closer to a switch. Either the resolver returns an address for your name or it does not, and if it does not, there is no partial service to measure. The user's browser never opens a connection to anything.

This is why "our origin stayed up" is not an answer to "did we survive." Origin health and name resolution are independent failure domains. The whole point of the test is the domain your application monitoring cannot see.

The cache is a buffer that hides the onset

DNS answers carry a TTL, and recursive resolvers cache them. While a record is cached, the resolver answers from memory and never touches your authoritative servers. That is a performance win most of the time and a measurement trap under attack.

When an attack begins saturating your authoritative tier, users whose resolvers hold a warm cache entry keep resolving you perfectly. Nothing looks wrong. Then TTLs expire, one resolver population at a time, and each cache miss now has to reach an authoritative server that cannot answer. The outage does not arrive at attack onset. It seeps in as caches drain, which can be minutes or hours later depending on your TTLs.

The pillar guide notes this from the mitigation side: long TTLs pin clients to a pre-change address for the full TTL after you update a record. The same physics that delays your fix delays the appearance of the damage. A DNS test that runs for five minutes against a zone with hour-long TTLs has measured almost nothing.

Two resolution paths, two threat models

Diagram of the DNS resolution path from client through a recursive resolver the operator does not own to the authoritative nameservers the operator is responsible for, with the ownership boundary marked and two attack entry points: a direct flood on the authoritative tier and a pseudo-random subdomain flood laundered through the recursive resolver

The resolution path has a client at one end, your authoritative nameservers at the other, and a recursive resolver in the middle that belongs to someone else. An attack can enter at either end, and the two entry points are tested differently.

Authoritative: the servers that answer for your zone

Your authoritative nameservers are the ones you can actually harden, so start there. The direct threat is a query flood aimed straight at them: enough well-formed queries per second that legitimate lookups are crowded out.

The defenses are structural. Anycast advertises the same nameserver IPs from many locations so the flood is split across points of presence instead of concentrating on one. Raw query-handling capacity at each of those locations sets the ceiling. Multiple nameservers, ideally across more than one provider, remove the single chokepoint.

What the test validates is whether those structural properties are real or nominal. How many queries per second can the authoritative set actually answer before the success rate bends? Is the anycast advertisement genuinely pulling queries to more than one location, or is one point of presence quietly taking most of the load? Do all the listed NS records point at infrastructure that fails independently, or do two of your four nameservers sit in the same failure domain?

Recursive: the resolvers you depend on but do not own

The recursive layer is the uncomfortable one, because you cannot patch it. Your users resolve through their ISP's resolver, their employer's, or a public one. You do not control its capacity, its caching policy, or its behavior under load.

Two things can go wrong here, and both are yours to test even though the infrastructure is not yours to fix. The first is that the recursive layer becomes the delivery mechanism for an attack on your authoritative tier, which is the pseudo-random subdomain case below. The second is a resolution failure that lives entirely on the resolver side: if a resolver cannot reach your authoritative servers, its users cannot resolve you, even though the same servers are answering another resolver's queries fine. Resolution is only as good as the specific path between a given resolver and your zone, and that path is different for every resolver population.

This is the DNS analog of the origin-exposure problem in the CDN-bypass discussion: the control you rely on is only meaningful if the path to it holds, and the path runs through infrastructure outside your administrative boundary.

The attack that defines DNS testing: pseudo-random subdomain flooding

If you test one thing against your authoritative tier, test this one, because it defeats the caching that protects you against every other query flood.

A DNS water torture attack, also called a pseudo-random subdomain or PRSD flood, generates queries for names that do not exist and never will: a8f3k2.example.com, zq71x9.example.com, one unique random label after another under your real domain.

The elegance, from the attacker's side, is that these queries are uncacheable by construction. A resolver has no cached answer for a name it has never seen, so it cannot short-circuit the lookup. Every single query is a cache miss, which means every single query is forwarded to your authoritative servers. The recursive resolvers of the entire internet become an involuntary, distributed amplifier pointed directly at your zone, and they do it while following the protocol exactly.

Diagram contrasting a cacheable query for a real hostname that a recursive resolver answers from cache without touching the authoritative tier, against a pseudo-random subdomain query for a name that has never existed, which misses cache by construction, is forwarded to the authoritative nameservers, and forces NXDOMAIN work under load

The damage is worse than a plain query flood for two reasons. Each query forces your authoritative server to do real work: a lookup that ends in NXDOMAIN, and if any part of your DNS is backed by a database or a wildcard record, potentially much more than a static-zone lookup would cost. And because the answers are all negative and unique, nothing downstream caches anything, so there is no relief valve anywhere in the path.

Testing it means generating the pattern deliberately: a stream of unique, non-existent labels under your zone at a controlled rate, measured for what it does to the success rate of legitimate lookups happening concurrently. A stack that shrugs off a million repeated queries for www.example.com, because the resolver answered them all from cache, can fall over at a small fraction of that rate when every query is a unique cache miss. If you have not specifically tested the uncacheable case, you have tested the easy one.

DNS is also a weapon: reflection and amplification

DNS testing has a second axis that no other layer has: you are not only a potential victim, you are a potential weapon. The same infrastructure that answers your users can, if misconfigured, be conscripted to attack someone else, and that is both a liability and a signal about your own exposure.

Diagram showing DNS in two roles: on one side the authoritative and resolver infrastructure as the victim of a query flood, on the other side the same infrastructure abused as a reflector, turning a small spoofed request into a large response aimed at a third-party victim, with response-rate limiting and source-address validation marked as the controls that break the reflector role

A DNS amplification attack sends a small query with a spoofed source address to a resolver or authoritative server, which sends a much larger answer to the spoofed victim. A short request for an ANY record, or a DNSSEC-signed zone with large response records, can yield an amplification factor in the tens. The attacker spends a little bandwidth and the reflector spends a lot, aimed wherever the attacker chose.

Part of a thorough DNS assessment is confirming you are not that reflector. Are your resolvers open to the internet when they should be restricted to your own users? Does your authoritative server hand out large responses to spoofed queries without any response-rate limiting? Source-address validation at the network edge, the BCP 38 discipline that refuses to forward packets with forged source addresses, is the structural defense against the whole reflection class, and its uneven deployment is why the class persists.

The dual role also shapes how you read your own results. An authoritative server that emits large answers cheaply is both a reflection risk to others and, under a query flood, a server doing expensive work per packet against itself. Trimming response sizes and rate-limiting identical responses helps you in both directions at once.

Designing the test

Measure from the resolver's side, not the server's

The instinct is to measure at your authoritative servers: queries received, queries answered, CPU. Those tell you the server's story. They do not tell you the user's story, which is the only one that matters.

Resolution success is a property of the whole path, so measure it from where the user sits. Query your domain through multiple independent recursive resolvers, from multiple regions and networks, and record whether each returns a correct answer and how long it took. A single vantage point resolving fine while another times out is exactly the partial, path-specific failure that server-side metrics hide.

The TTL trap: test the cold cache

Because caching masks the onset, a short test against warm caches measures nothing. Design around the buffer explicitly. Know your record TTLs, and either run long enough for caches to drain naturally or measure the cold-cache path directly by querying resolvers that hold no cached entry for the names under test.

The number worth capturing is not just whether resolution eventually failed, but the delay between attack onset and visible impact. That delay is your TTL buffer, and it is a genuine, if double-edged, defensive asset: it buys time before an attack bites, and it delays the effect of any fix you push during one.

Anycast: confirm it actually distributes

Anycast is the load-spreader that makes authoritative infrastructure survivable, and it is easy to assume rather than verify. Confirm that queries from different regions actually land on different points of presence, and that losing one location degrades rather than fails resolution. A misconfigured advertisement, or a route that pulls a disproportionate share of the world into one PoP, turns a distributed system back into a single target without anyone noticing until the flood arrives.

DNSSEC: bigger responses, and a self-inflicted outage to test for

DNSSEC is worth deploying, and it changes two things a DNS test has to account for. Signed responses are larger, which raises both the amplification factor you present to the world and the per-packet cost you pay under a flood. And validation introduces a failure mode that has nothing to do with volume: an expired signature, a botched key rollover, or a broken chain of trust makes validating resolvers return failure for a perfectly reachable zone.

That last one is the DNS version of the self-inflicted outage that shows up at every layer: the defense, misconfigured, becomes the outage. A signing mistake takes you down for every user behind a validating resolver, no attacker required. Testing DNSSEC means testing the operational discipline around it, key rollover and signature freshness, as much as the cryptography.

Provider diversity: single DNS is a single point of failure

The cleanest structural finding in DNS is often the simplest to state: if every one of your nameservers is operated by one provider, that provider is a single point of failure for your entire online presence. When it has a bad day, so do you, regardless of how healthy your own infrastructure is.

Secondary DNS on a second, independent provider removes that dependency, and validating the failover is part of the test. Does resolution actually survive one provider going dark, or is the secondary a record that has never been exercised and quietly drifted out of sync with the primary zone?

Authorization and scope

DNS tests reach across boundaries you do not own, which raises the coordination burden rather than lowering it. If your authoritative DNS is managed, generating flood traffic against it involves your provider's acceptable-use terms and their own protective mitigations, which can act on your test traffic. The recursive resolvers in a realistic test are third-party infrastructure you must not target directly at all; you model their behavior, you do not attack them.

Scope the blast radius before running anything, and confirm the current published policy of whichever provider holds your zone. The specifics change; the discipline of coordinating first does not. Owner authorization is mandatory regardless of who operates the servers.

What to measure

Resolution success rate, not server uptime

The headline metric is the fraction of legitimate resolution attempts that succeed, measured from the client side of the path, while the attack runs. It captures both failure modes DNS has: the authoritative tier too saturated to answer, and the resolver-to-authoritative path failing for a given population. Server uptime captures neither.

The cache-masked onset

Simulated chart of successful resolutions over time under a DNS flood: after onset the success rate does not drop instantly but declines as record TTLs expire across resolver populations and cache misses reach a saturated authoritative tier, reaching a trough before response-rate limiting and anycast absorption restore service

The chart is illustrative and the numbers are invented; the shape is the point. After the flood begins, resolution does not fail all at once. It declines as caches expire one resolver population at a time, each miss reaching a tier that cannot answer, so the slope of the fall is set by your TTLs as much as by the attack's own ramp. Recovery lags for the same reason: caches have to re-warm after mitigation engages.

That cache-shaped curve is unique to DNS, and it is the reason a DNS test has to be read on a clock. The instantaneous success rate at any single moment tells you less than the trajectory: how quickly impact spread after onset, how deep the trough went, and how the recovery tracked TTL expiry rather than the attack's own timeline.

Time to unreachable

The most decision-useful single number is the interval between attack onset and the point where resolution success crosses whatever threshold you consider an outage. It is your real warning window, and it is set by your TTLs as much as by your capacity. A zone with generous TTLs and thin authoritative capacity can look invincible for twenty minutes and then go dark; a zone with short TTLs feels the same attack almost immediately but recovers just as quickly when you fix it. Neither is simply better. The trade-off is the thing to understand, and you only understand it by watching the clock during a controlled test.

Frequently asked questions

How is DNS DDoS testing different from testing my web servers?

Web-server testing measures capacity and behavior once traffic arrives. DNS testing measures whether traffic can find you at all. They are independent failure domains: your servers can be completely healthy while resolution fails, and no application-level metric will show it, because the affected users never reach the application. DNS sits upstream of every other test in a full methodology, which is why it is worth isolating.

What is a DNS water torture attack and why is it hard to defend?

A water torture, or pseudo-random subdomain, flood queries for endless unique, non-existent names under your domain. Because each name is new, no resolver has it cached, so every query is forced through to your authoritative servers instead of being absorbed by the caching that stops ordinary query floods. The recursive resolvers of the internet become an unwitting distributed amplifier aimed at your zone, all while following the DNS protocol correctly.

Do I need to test recursive resolvers if I do not operate any?

You test resolution through them, not the resolvers themselves. You cannot harden third-party resolvers, but they are the path your users take, so success and latency have to be measured from their side. The recursive layer is also how an authoritative-tier attack is delivered, so understanding its behavior is part of characterizing your own exposure even though the infrastructure is not yours.

Does anycast solve DNS DDoS on its own?

Anycast is the most important structural defense for authoritative DNS because it splits a flood across many locations, but it is not automatic. Its value depends on the advertisement genuinely distributing queries and on each location having real capacity. A test confirms the distribution is working rather than assuming it, and it does nothing at all against a resolver-side path failure or a DNSSEC misconfiguration.

Can DNSSEC make a DDoS worse?

It changes the math in two ways. Signed responses are larger, which raises the amplification factor you expose and the per-packet cost you pay under a flood. And validation adds a non-volumetric failure mode: an expired signature or a broken key rollover makes validating resolvers reject a reachable zone, which is a self-inflicted outage no attacker had to cause. DNSSEC is still worth running; it just adds operational discipline that belongs in the test.

Up and unreachable

Every layer of DDoS defense below this one is about surviving traffic that has already found you. DNS is about whether it can find you in the first place, and that makes it the one layer where the failure mode is not degradation but disappearance.

The property that makes DNS worth its own test is that you can be entirely up and entirely unreachable at the same time, and none of your instruments will tell you, because they only see the users who resolved. The attack is often laundered through resolvers you do not own, its impact is buffered and delayed by caches you do not control, and its recovery lags for the same reason. It is the layer where the least of the machinery is yours and the most of the consequence is.

So the durable output of a DNS test is not a query-per-second ceiling, which drifts with every capacity change. It is a map of your resolution path: who answers for you and how independently, which of your names are cacheable and which an attacker can force uncacheable, where your anycast actually lands, how long your TTLs will hide an attack before it bites, and whether your zone survives losing a provider. That map is what you still know a year from now. The breaking rate is what you re-measure.

The traffic that takes you offline at this layer will not overwhelm your servers. It will make sure nobody ever asks them anything.