Security & Reverse Engineering(3)
The security conversation is rapidly shifting from LLM prompt injection to the more complex attack surface of agent orchestration and tool interaction, with both Anthropic and Google DeepMind publishing formal frameworks.
Major labs are disclosing agent jailbreaks and releasing red-teaming frameworks, while practitioners are testing agent-based pentesting in the wild.
Anthropic@AnthropicAIrising Responsible disclosure on a Claude jailbreak chain we patched last week. Full write-up including our red team timeline.
Google DeepMind@GoogleDeepMindrising New red team framework for prompt injection in autonomous agents. Covers cross-tool leakage, scanner evasion, and sandbox escape patterns.
MalwareTech@MalwareTechBlogrepeated Autonomous agent running pentest flows against a real SaaS. First real-world run: fewer false positives than I expected on the vulnerability surface.