Markus Vervier
Markus Vervier is CEO of Persistent Security and Director at X41 D-Sec GmbH, a specialized application security, penetration testing, and red/purple-teaming provider. Over the past 18 years he has worked as a security researcher, code auditor, and penetration tester. His work includes security analysis and reverse engineering of embedded firmware for mobile devices, discovering vulnerabilities in Signal Private Messenger (with JP Aumasson), and finding a remote vulnerability in libOTR. He is currently
active in the development of offensive security tooling and platforms that break AI security defenses.
Session
AI-powered pentesting is the latest hype. Slap an LLM agent on top of well-known offensive
tools built by humans in their free time, run it in YOLO mode, and call it autonomous security
testing. Valuations are going through the roof!
Here is the thing though: these agents consume untrusted input from the very targets they are
testing by design.
Current discourse around AI agent security focuses on prompt injection through direct
interaction. But what about the agent's environment itself? What happens when the attack
surface the agent is exploring has been prepared by an adversary? What if the authentication
service referenced in that one GitHub issue is actually a honeypot?
In this presentation we will demonstrate a complete attack framework against AI pentesting
agents and release it as open source. We show how to inject tracking payloads at scale into any
platform with user-generated content, operate fake services that capture credentials from AI
agents, and turn every future AI pentest engagement against a sprayed target into a passive
credential harvesting fest. No ongoing effort required, no exploits needed. The AI leaks to us,
fully automated!
The attacker does not need to talk to the agent. They just leave breadcrumbs where the agent
will find them during reconnaissance. A hint about a backup authentication endpoint in a GitHub
issue. A debug configuration in a support ticket. SSO metadata in a user profile bio. The agent
discovers these reasons they are worth investigating, and acts on them with whatever
credentials and access it was given.
SSO authentication is a particularly brutal example because determining if they are in-scope is
difficult: When logging in, anyone must follow OAuth/OIDC redirects to external domains to test
authenticated applications, and they need to be told how do distinguish a legitimate Identity
Provider from a fake one we planted in user content.
But SSO is just one instance of the fundamental problem: the AI makes decisions based on
content it should not trust, and no amount of prompt engineering changes unless you know in
advance what the target will look like. We want to shed some light on the complications that
arise when putting AI literally to the test!