Critical OT Operator Validates IT/OT Boundary Resilience Without Impact to Real-Time Control

A critical-infrastructure operator running power-generation and transmission OT systems engaged BlackNeuron for a DDoS resilience validation across the IT/OT boundary. The operator's threat model includes capable adversaries — state-aligned actors, supply-chain compromise vectors, and opportunistic ransomware groups — but the operational constraints differ fundamentally from IT-only environments. Real-time control signals between SCADA systems and field devices cannot tolerate latency beyond defined thresholds without affecting grid stability. Any defensive control introduced into the OT communication path must demonstrably not impact those thresholds. The validation needed to confirm both that the IT/OT boundary held under adversarial pressure and that the OT side's operational continuity was preserved.

The operator's pre-engagement architecture followed the Purdue Model with layered segmentation: enterprise IT (Level 4), DMZ for IT/OT data exchange (Level 3.5), supervisory OT (Level 3), control systems (Level 2), basic control (Level 1), and physical processes (Level 0). DDoS pressure was historically considered an IT problem. The engagement scope expanded the threat model: an IT-side DDoS that exhausts the IT/OT DMZ's east-west capacity could disrupt operational data flows even when the control systems themselves remained reachable from the OT side.

The threat profile

OT-relevant DDoS attack patterns differ from typical IT patterns. Volumetric attacks against IT-side egress capacity matter primarily as availability issues for engineering workstations and remote-access infrastructure. More consequentially, attacks against the IT/OT DMZ's data-historian endpoints and operational-dashboard infrastructure can degrade the operations team's situational awareness without directly affecting the control systems — a class of attack that creates operational risk by reducing the operators' ability to observe rather than by directly disrupting control. Targeted slowloris-class attacks against the data-historian's HTTP interface, for example, can saturate connection pools at low traffic volume while the underlying control plane remains nominally functional.

A second class of threat: attacks against the remote-engineering-access infrastructure (typically VPN gateways or jump hosts in the Level 3.5 DMZ). Degrading access disrupts the engineering team's ability to respond to operational events, including events triggered by the attack itself. The attack pattern enables operational impact while the control systems remain technically reachable.

The engagement excluded testing of any control system from the OT side. Any OT-side test would have required outage-window scheduling with grid operators — a process the operator was not prepared to authorize for a validation engagement. The validation focused on IT-side resilience and IT/OT boundary integrity.

Engagement structure

The validation was structured over twelve weeks, with three testing windows scheduled around operator change-control windows. All testing was performed against the operator's IT-side infrastructure and the IT/OT DMZ. The OT-side infrastructure was monitored for operational metrics during testing windows but was not directly targeted. Pre-engagement, the operator's defensive runbook for DDoS scenarios was reviewed and aligned with operational continuity requirements; the validation included exercise of the runbook itself, not just the technical infrastructure.

Particular attention was paid to telemetry: confirming that operational data flows from OT to IT-side historians remained within latency thresholds during attack windows, that the SOC's IT-side detection capabilities engaged without triggering operations-side false alarms, and that the engineering team's remote-access path remained operational under sustained pressure.

Attack vectors exercised

L3/L4 volumetric against the operator's IT-side public-facing infrastructure (corporate-VPN endpoints, remote-engineering-access gateways, operational-dashboard endpoints). Multi-vector traffic at peak 40 Gbps multi-source was directed against the public infrastructure. Edge absorption was provided by an upstream scrubbing service. Observable impact on IT-side infrastructure was within tolerance; OT-side operational data flows showed no measurable degradation, confirming segmentation effectiveness at the network layer.

L7 attacks against IT/OT DMZ endpoints. HTTP traffic at sustained 2,500 RPS was directed at the data-historian API endpoint. The historian's API was rate-limited at 1,000 requests per minute per IP — adequate against per-IP threats but not against distributed attacks. Connection-pool exhaustion on the historian was observed within four minutes. The historian's degradation did not affect the OT-side data acquisition (which continues even when the IT-side historian is unreachable) but did interrupt the operations team's IT-side dashboards. The finding: operational situational awareness degraded under attack even though control systems remained operational.

Slow-attack family against remote-engineering-access infrastructure. 3,000 Slowloris connections against the engineering jump-host's HTTPS interface saturated connection slots within five minutes. The jump host's idle-timeout configuration had been set during initial deployment six years prior and never re-evaluated. Engineering team's ability to remotely respond to operational events during attack windows was demonstrably degraded.

Authentication endpoint pressure. Credential-stuffing patterns at sustained 600 requests per minute against the engineering-access SSO endpoint. The SSO infrastructure absorbed the volume without rate-limit engagement (the configured rate limit was higher than the test load). The finding: SSO logs surfaced thousands of failed authentication attempts daily as baseline noise; the attack-condition pattern was indistinguishable from normal background noise in the operator's existing log analytics. The detection gap was not technical but observational.

Findings

Five findings, ordered by operational risk:

Remote-access infrastructure resilience. The engineering jump host's idle-timeout and connection-pool configuration was inadequate against slow-attack family vectors. During the attack window, engineering response capability was degraded — a higher-order operational risk than direct control-system disruption.
IT-side situational awareness degradation. Operational dashboards' resilience to L7 DDoS was insufficient. Even though OT-side data acquisition continued, IT-side visualization degraded, reducing the operations team's ability to observe operational state during a coincident operational event.
SSO log noise masking attack patterns. Credential-stuffing attempts were operationally invisible against background-noise volume of failed authentications. Threshold-based detection had not been calibrated against attack-condition baseline shift.
OT-side latency margins preserved. Positive finding: control-plane communication latency between Level 3 supervisory systems and Level 2 controllers remained within operational thresholds throughout all testing windows. The IT/OT segmentation effectively isolated OT-side performance from IT-side pressure.
Runbook exercise findings. The operator's DDoS response runbook had not been exercised in eighteen months. Several procedures referenced personnel and contact information that had changed. Vendor escalation procedures referenced support relationships that had been re-negotiated. The runbook was technically present but operationally stale.

Remediation

The engineering jump host's connection-pool capacity was increased and idle timeouts tuned to defeat slow-attack family vectors. Operational dashboards were moved to a higher-availability tier with independent rate-limit configuration. The SSO infrastructure's log analytics were tuned to detect rate-shift patterns in failed authentication volume rather than relying on absolute thresholds. The DDoS response runbook was updated, exercised in tabletop format with the affected teams, and added to the operator's quarterly tabletop schedule.

The validation engagement was documented and submitted as part of the operator's NERC CIP compliance evidence package, supporting the operator's claim of having tested defensive controls against the documented threat model.

Outcome

The operator now possesses validated, auditable evidence that IT-side DDoS pressure of meaningful magnitude (50 Gbps multi-vector ceiling exercised) does not measurably affect OT-side operational continuity or control-system reachability. More importantly for the operator's risk posture, the IT-side findings — particularly the engineering-access resilience and dashboard-degradation findings — surfaced operational risks that had not previously been articulated in the operator's risk register. These are now tracked and have remediation owners.

The instructive part

OT-environment DDoS resilience surfaces a class of findings that pure-IT environments rarely face: the operational risk is not just service unavailability but degradation of the operations team's ability to observe and respond to coincident events. A DDoS attack that does not directly disrupt control systems but degrades the operators' visibility into those systems creates higher-order risk than a direct control-system attack. The relevant defensive measure is not just whether the control systems remain reachable, but whether the operators retain situational awareness and response capability throughout the attack window. This is a property of the entire IT-side infrastructure, not just the immediate attack target — and it cannot be verified except by exercising the conditions under which it would be tested in production.

All Case Studies