2026-05-28

Denoise · Twitter

Autonomous agents move from research to production, with a new infrastructure stack and attack surface taking shape.

Pay attention to the convergence on terminal-native coding agents and the new orchestration protocols from OpenAI and Anthropic defining the agent infrastructure layer.

2026-05-282026-05-28T12:35:15Zrules twitter-v1Healthytweets 25signals 25

Top 3 changes

AnthropicAI / AI Coding Agents: The release of Claude Code 1.5, a terminal-native agent, marks a structural shift in developer tooling away from IDE-centric assistants.
OpenAI / AI Infrastructure: A new agent SDK with protocol-level primitives signals a push to standardize the agent orchestration layer.
AnthropicAI / Security: The responsible disclosure of a complex agent jailbreak highlights the new, critical attack surface emerging around agent orchestration.

Strategic insights

#01A new developer tool paradigm is emerging around terminal-native agents, with Anthropic's Claude Code 1.5 directly challenging the incumbent IDE/Copilot model, a shift validated by influential voices like @karpathy.

#02The battle to define the agent infrastructure layer is heating up. OpenAI's new agent SDK and Anthropic's Model Context Protocol (MCP) represent competing standards for orchestration, while Vercel and Replit are building the serverless runtimes to host them.

#03As autonomous agents become production-ready, security is shifting from theoretical prompt injection to practical exploits of the orchestration layer. Disclosures from Anthropic and frameworks from Google DeepMind show this is now a primary concern.

#04The discussion on LLM memory is maturing beyond simple RAG. Practitioners like @GregKamradt and dedicated tools like @mem0ai are promoting 'context engineering'—a more sophisticated approach to memory management involving caching, retrieval strategies, and tiered memory stores.

Categories

Security & Reverse Engineering(3)

The security conversation is rapidly shifting from LLM prompt injection to the more complex attack surface of agent orchestration and tool interaction, with both Anthropic and Google DeepMind publishing formal frameworks.

Major labs are disclosing agent jailbreaks and releasing red-teaming frameworks, while practitioners are testing agent-based pentesting in the wild.

Anthropic@AnthropicAIrising
Responsible disclosure on a Claude jailbreak chain we patched last week. Full write-up including our red team timeline.
♥ 5.2k↻ 910" 160⟲ 220· score 7.5k· +1 related
Google DeepMind@GoogleDeepMindrising
New red team framework for prompt injection in autonomous agents. Covers cross-tool leakage, scanner evasion, and sandbox escape patterns.
♥ 880↻ 140" 18⟲ 38· score 1.2k
MalwareTech@MalwareTechBlogrepeated
Autonomous agent running pentest flows against a real SaaS. First real-world run: fewer false positives than I expected on the vulnerability surface.
♥ 180↻ 28" 3⟲ 15· score 245

AI Coding Tools & Agents(5)

The developer tool landscape is re-centering around terminal agents, with Anthropic's Claude Code 1.5 directly competing with the Copilot/IDE paradigm, a shift endorsed by influential figures like @karpathy.

Anthropic released Claude Code 1.5, a terminal-native agent, prompting discussion on its performance against existing tools and the broader shift away from IDE-centric workflows.

Anthropic@AnthropicAIrising
Claude Code 1.5 is live. Terminal-native coding agent with full Claude Opus reasoning, file-ops sandbox, and session replay.
♥ 4.8k↻ 820" 140⟲ 190· score 6.9k· +1 related
Andrej Karpathy@karpathyrising
The developer-experience shift from IDE to terminal agent is underrated. Coding workflows are about to look nothing like 2024.
♥ 3.4k↻ 510" 30⟲ 140· score 4.5k
swyx@swyxrising
Codex vs Claude Code terminal agent benchmarks. Pass@1 diverges more than I expected on the long-context editor tasks.
♥ 1.1k↻ 180" 22⟲ 60· score 1.6k
DSPy@dspy_airising
DSPy 3.0: prompt optimization via compile-time search over system prompt variations. Benchmarks inside.
♥ 960↻ 150" 12⟲ 42· score 1.3k
@levelsio@levelsiorising
Switched my whole editor setup to Claude Code this week. Shipping faster than when I used Cursor + Copilot.
♥ 580↻ 40" 6⟲ 80· score 678

AI Infra & Protocols(5)

A battle for the standard agent protocol is beginning, with OpenAI's new SDK and Anthropic's Model Context Protocol (via LangChain) representing competing visions for multi-agent orchestration.

OpenAI, Vercel, and Replit released new primitives for agent orchestration and deployment, focusing on SDKs, protocols, and serverless runtimes.

OpenAI@OpenAIrising
New agent SDK: protocol-level tool calling, deployment harness, and multi-worker orchestration primitives. Docs live.
♥ 4.2k↻ 680" 75⟲ 180· score 5.8k
LangChain@LangChainAIrising
MCP protocol integration thread. How to wire existing LangGraph agents into the Anthropic Model Context Protocol server spec.
♥ 920↻ 145" 14⟲ 48· score 1.3k
Vercel@vercelrising
Edge runtime for agent workers is live. Spawn durable background agents from any serverless deployment.
♥ 540↻ 80" 6⟲ 22· score 718
Alex Albert@AlexAlbert__rising
When your security scanner finds nothing scary on an agent deploy, check the orchestration layer again. That's usually where the jailbreak sneaks through.
♥ 420↻ 60" 8⟲ 35· score 564
Replit@replitrising
New agent deployment harness. One command to go from local orchestration to hosted agent worker.
♥ 380↻ 55" 5⟲ 18· score 505

On-device & Multimodal AI(1)

Foundational model labs continue to release high-quality, open datasets like Mistral AI's OCR data, enabling the broader community to train specialized models and commoditizing a key part of the data pipeline.

Mistral AI released a large-scale, cleaned OCR dataset for training multimodal models.

Mistral AI@MistralAIrising
Open dataset release: 100M-row web OCR dataset. Cleaned, licensed, ready to train.
♥ 2.6k↻ 390" 30⟲ 88· score 3.5k

Memory, RAG & Context(4)

The naive approach of 'just use RAG' is being replaced by more complex memory architectures, with @GregKamradt, @mem0ai, and LlamaIndex all proposing different layers of memory management beyond simple vector retrieval.

The focus shifts from simply increasing context window size to sophisticated 'context engineering,' including caching strategies and specialized memory layers for agents.

Vaibhav Srivastav@reach_vbrising
Tested the new 10M context memory window end to end. Surprising failure modes around rag retrieval cache invalidation, thread below.
♥ 1.9k↻ 260" 22⟲ 75· score 2.5k
Greg Kamradt@GregKamradtrising
RAG is dead, long live context engineering. My framework for when to cache, when to retrieve, and when to just dump memory into the prompt.
♥ 820↻ 130" 16⟲ 54· score 1.1k
mem0@mem0airising
Memory layer for agents: differentiating working memory from the subconscious store. Vector index isn't enough anymore.
♥ 480↻ 72" 5⟲ 25· score 639
LlamaIndex@llamaindexrepeated
Knowledge graph retrieval walkthrough: when semantic vector search misses, graph hop beats it every time.
♥ 290↻ 40" 2⟲ 11· score 376

Other(4)

The concept of orchestration is becoming mainstream, with workflow engines like Temporal positioning themselves for AI agents, while productivity tools like Notion and Linear implement similar, albeit simpler, automation patterns.

Workspace automation features are launching in Notion and Linear, while Temporal discusses durable workflows for orchestrating agents.

Notion@NotionHQrising
Notion workspace automation is out of beta. Auto-fill tables, chained updates across databases, and a new audit log surface.
♥ 820↻ 125" 12⟲ 38· score 1.1k
Linear@linearrising
Linear now auto-triages incoming issues. Quiet launch, but already our favorite workspace feature of the year.
♥ 460↻ 70" 6⟲ 24· score 618
Temporal@temporaliorepeated
Orchestrating agents with durable workflows: replayable, resumable, and multi-worker by default. Walkthrough from our infra team.
♥ 310↻ 48" 4⟲ 14· score 418
James Clear@jamesclearrepeated
The best habit tracker is the one you actually open. Three open-source alternatives worth trying.
♥ 280↻ 42" 3⟲ 18· score 373

Prompt & Skill Libraries(2)

Prompt engineering is evolving into a systematic, data-driven discipline, with efforts from Weights & Biases and individual engineers demonstrating a move from anecdotal tricks to scalable, benchmark-driven optimization.

Practitioners are sharing empirical results from large-scale system prompt benchmarking and reviews of production prompts.

dotey@doteyrising
Five prompt tricks learned this week from reviewing 200 production prompts. Short thread.
♥ 510↻ 88" 8⟲ 30· score 710
Weights & Biases@weights_biasesrising
System prompt benchmarking at scale: we ran 40k variants across 6 frontier models. The efficient frontier is not where you think.
♥ 420↻ 55" 6⟲ 20· score 548

ML & GPU Infrastructure(1)

As agent training relies more on synthetic data, curation and filtering techniques, as highlighted by @jerryjliu0, are becoming a critical and non-obvious part of the MLOps stack to ensure model robustness.

A practitioner shared insights on the critical process of curating and filtering synthetic data to avoid poisoning agent generalization during training.

Jerry Liu@jerryjliu0repeated
Dataset curation for agent training: how we filter synthetic data that looks good but poisons generalization.
♥ 260↻ 36" 2⟲ 11· score 338