Security

Why AI and deterministic offensive security tools should be used together

"Practitioners do not experience their work as a choice between paradigms. They experience it as a sequence of problems."

The current conversation about AI in offensive security has collapsed into a false binary. Either AI is about to take over the entire workflow, or it is an overhyped distraction from the tools that actually work. Both framings are wrong, and both cause real damage to the practitioners trying to make sensible decisions amid them.

The more useful question is where each approach earns its place. AI improves precision at scale and expands coverage in ways that deterministic tools cannot. Deterministic tools produce the reproducible, auditable proof that offensive security has always demanded.

In a well-designed workflow, these are complementary: AI cleans the signal and broadens the surface area; deterministic tools validate exposure with evidence that can withstand scrutiny.

Getting the combination wrong produces either noise or missed findings, and as AI-accelerated discovery raises the volume of findings teams must process, getting it right becomes more consequential.

The binary framing serves vendors, not practitioners

The "AI versus traditional tools" conversation is largely a vendor conversation. Practitioners do not experience their work as a choice between paradigms. They experience it as a sequence of problems: too many assets, too many potential findings, not enough time to validate everything, and a standard of proof that does not lower just because the queue is growing.

Deterministic tools exist precisely because offensive security operates under an evidentiary standard. A finding is reproducible under specific conditions, and an attacker can demonstrably exploit it in a specific way.

That standard matters because findings do not stay inside the security team. They reach developers, clients, compliance functions, and executive leadership, and each of those audiences will ask the same question: how do you know? The answer cannot be a confidence score.

AI-based approaches handle pattern recognition at scale, tolerate ambiguity, and solve problems that defeat rule-based logic. These are real capabilities.

The mistake is deploying them where the evidence bar is highest and then inheriting the consequences when findings cannot be defended.

Where deterministic tools remain irreplaceable

Exploit validation is the clearest case. The output of a well-executed validation is proof: a specific flaw exists, under specific conditions, and can be exploited in a specific way.

That proof is what moves remediation decisions forward: in developer backlogs, in board-level risk conversations, and in regulatory contexts where "the scanner flagged it" is not a sufficient answer.

This matters more as discovery volume grows. AI-assisted discovery is already compressing the front end of the vulnerability lifecycle faster than the back end can absorb. Remediation capacity is not scaling at the same rate.

When the queue grows faster than it drains, the quality of evidence attached to each finding determines whether it gets acted on or buried. Teams best positioned for this environment can attach defensible proof to the findings that matter most, and that requires deterministic tools doing what they do reliably.

The same applies to scan orchestration. When a practitioner defines the logic and the tool executes it faithfully and transparently, the output is auditable. The practitioner knows exactly what ran, in what order, under what conditions, and that auditability is the foundation of a defensible engagement.

Where AI genuinely improves the workflow

AI earns its place where deterministic approaches hit structural limits, where signal-to-noise becomes the dominant problem rather than the evidence bar being the highest.

False positive management is the most concrete example. In Dynamic Application Security Testing, for instance, HTTP responses at scale produce a classification problem that exhausts rule-based logic. A Machine Learning classifier handling this semantically, distinguishing genuine findings from soft-404 pages, instrumentation noise, and ambiguous responses, can meaningfully reduce the false positive rate while maintaining high detection precision.

The AI here is filtering input, so the deterministic scanner's outputs are more reliable. In a high-volume discovery environment, that filtering function becomes load bearing: the difference between a scanner whose results practitioners trust and one whose results they have learned to second-guess.

Coverage expansion is a second area. AI can map logical flows within web applications, surfacing hidden endpoints and expanding what a deterministic scanner can reach, reducing the gap between what an attacker would find and what the assessment actually covers.

Triage overhead is a third. When the queue grows faster than teams can drain it, reducing the cognitive load attached to each finding matters. AI that enriches vulnerability descriptions, contextualising severity and suggesting remediation paths, addresses a capacity problem that will only intensify as discovery accelerates.

In each case, AI improves the conditions under which deterministic tools operate, without displacing what those tools produce.

The capacity problem reframes the entire question

There is a broader shift underway that makes this argument more urgent. Discovery is becoming a commodity. The scarce resource is no longer the ability to find vulnerabilities; it is the ability to validate, prioritise, and remediate them faster than they accumulate.

Systems like Mythos, the AI bug-hunting platform that recently demonstrated the ability to identify vulnerabilities at a scale and pace no human team could match, illustrate the direction of travel.

The front end of the vulnerability lifecycle is accelerating. CVE and NVD infrastructure was designed for human-paced discovery, and AI-assisted discovery compresses the timeline without compressing the remediation process. The gap between the two is widening.

This is pushing security teams toward something that resembles chronic disease management more than a traditional patch cycle. Not all vulnerabilities can be fixed in the timeframe they are found. Some will be knowingly managed rather than remediated, assessed and accepted by design rather than ignored by default. The governance challenge is maintaining that distinction between calculated risk acceptance and capacity failure dressed up as a decision.

Tooling that blurs the line between an AI-flagged hypothesis and a deterministically validated finding makes that governance challenge harder, producing dashboards that cannot distinguish between "we assessed this and accepted it" and "we have not gotten to it yet." In a high-volume environment, that ambiguity has real consequences.

Human oversight as a design choice, not a disclaimer

Products that describe human oversight as a feature while designing it out of actual execution are making a consequential mistake. Genuine human-in-the-loop means the practitioner is a required gate between AI intent and tool action, not a downstream recipient of AI decisions who is nominally invited to review outputs after the fact.

Every tool call requiring explicit human approval, with outputs that are predictable and inspectable, is what real oversight looks like in practice.

Offensive security works the way it does because the practitioner owns the finding, understands its context, and stands behind it. Good AI and good automation shorten the time to reach that moment. They do not eliminate it.

Complementary by design

Precision about what each approach provides, and where each belongs, is more useful than picking a side in a debate that was never really about the technology.

AI improves the early and mid-stages: filtering signal, expanding coverage, reducing triage overhead, and handling the volume problems that rule-based approaches cannot absorb at scale. Deterministic tools provide what the back end demands: reproducible, auditable proof that survives the scrutiny of every audience who ultimately acts on a finding.

Together, with clear boundaries between them, that combination is what a mature offensive security practice looks like in an environment where discovery is outpacing remediation, and where every finding needs to earn its place in the queue.

Adrian Furtuna is Founder and CEO of Pentest-Tools.com