Security & Reverse Engineering(3)
The security discourse is shifting from model-level exploits to vulnerabilities in the agent orchestration and tool-use layers, as demonstrated by Anthropic's jailbreak analysis and @MalwareTechBlog's pentesting agent.
Disclosures from Anthropic and Google DeepMind show a focus on red-teaming autonomous agents, moving beyond simple prompt injection to complex, multi-step vulnerabilities.
Anthropic@AnthropicAIrising Responsible disclosure on a Claude jailbreak chain we patched last week. Full write-up including our red team timeline.
Google DeepMind@GoogleDeepMindrising New red team framework for prompt injection in autonomous agents. Covers cross-tool leakage, scanner evasion, and sandbox escape patterns.
MalwareTech@MalwareTechBlogrepeated Autonomous agent running pentest flows against a real SaaS. First real-world run: fewer false positives than I expected on the vulnerability surface.