Security & Reverse Engineering(3)
The red-teaming focus is shifting from simple prompt injection to complex, stateful attacks on agent frameworks, as demonstrated by Anthropic's disclosure and DeepMind's new framework.
Major labs are publicly disclosing and building frameworks for agent security, focusing on vulnerabilities in orchestration and multi-tool systems.
Anthropic@AnthropicAIrising Responsible disclosure on a Claude jailbreak chain we patched last week. Full write-up including our red team timeline.
Google DeepMind@GoogleDeepMindrising New red team framework for prompt injection in autonomous agents. Covers cross-tool leakage, scanner evasion, and sandbox escape patterns.
MalwareTech@MalwareTechBlogrepeated Autonomous agent running pentest flows against a real SaaS. First real-world run: fewer false positives than I expected on the vulnerability surface.