2026-05-21

Denoise · Twitter

AI agents are moving from chat to the terminal, with new SDKs and coding tools reshaping developer workflows.

Pay attention to the convergence on terminal-native AI agents and the underlying orchestration protocols, as major players like Anthropic and OpenAI release new tools.

2026-05-212026-05-21T12:20:50Zrules twitter-v1Healthytweets 25signals 25

Top 3 changes

AnthropicAI / Coding Agents: Released Claude Code 1.5, a terminal-native coding agent, signaling a major push away from IDE-centric AI assistants.
OpenAI / Agent Infrastructure: Launched a new agent SDK with protocol-level primitives, aiming to standardize how autonomous agents are built and orchestrated.
karpathy / Developer Experience: Articulated the underrated shift from IDEs to terminal agents, framing the new wave of coding tools as a fundamental workflow change.

Strategic insights

#01The primary interface for AI coding assistants is shifting from the IDE to the terminal. Anthropic's Claude Code 1.5 release and commentary from @karpathy and @levelsio signal a convergence on this new developer workflow.

#02Agent orchestration is becoming the key infrastructure battleground. OpenAI's SDK, Vercel's edge workers, and Replit's deployment harness all provide primitives for managing multi-worker, durable agents, moving beyond single-shot execution.

#03As agent capabilities expand, the security focus is shifting to the orchestration layer. Disclosures from @AnthropicAI and frameworks from @GoogleDeepMind highlight that vulnerabilities often emerge from how agents interact, not just from prompt injection.

#04The concept of RAG is being replaced by more sophisticated 'context engineering.' Discussions by @GregKamradt, @reach_vb, and @mem0ai show a move towards multi-layered memory systems and complex caching strategies beyond simple vector retrieval.

Categories

Security & Reverse Engineering(3)

The security conversation is moving up the stack from simple prompt injection to complex exploits in the agent orchestration layer, as noted by @MalwareTechBlog and @AlexAlbert__.

Red teaming efforts are now focused on autonomous agents, with major labs like Anthropic and DeepMind disclosing vulnerabilities and frameworks.

Anthropic@AnthropicAIrising
Responsible disclosure on a Claude jailbreak chain we patched last week. Full write-up including our red team timeline.
♥ 5.2k↻ 910" 160⟲ 220· score 7.5k· +1 related
Google DeepMind@GoogleDeepMindrising
New red team framework for prompt injection in autonomous agents. Covers cross-tool leakage, scanner evasion, and sandbox escape patterns.
♥ 880↻ 140" 18⟲ 38· score 1.2k
MalwareTech@MalwareTechBlogrepeated
Autonomous agent running pentest flows against a real SaaS. First real-world run: fewer false positives than I expected on the vulnerability surface.
♥ 180↻ 28" 3⟲ 15· score 245

AI Coding Tools & Agents(5)

The competition in coding agents, highlighted by @swyx's benchmarks, is now centered on terminal-based workflows and long-context reasoning, with developers like @levelsio already migrating.

Anthropic's launch of Claude Code 1.5, a terminal-native agent, validates the trend of moving AI coding tools out of the IDE.

Anthropic@AnthropicAIrising
Claude Code 1.5 is live. Terminal-native coding agent with full Claude Opus reasoning, file-ops sandbox, and session replay.
♥ 4.8k↻ 820" 140⟲ 190· score 6.9k· +1 related
Andrej Karpathy@karpathyrising
The developer-experience shift from IDE to terminal agent is underrated. Coding workflows are about to look nothing like 2024.
♥ 3.4k↻ 510" 30⟲ 140· score 4.5k
swyx@swyxrising
Codex vs Claude Code terminal agent benchmarks. Pass@1 diverges more than I expected on the long-context editor tasks.
♥ 1.1k↻ 180" 22⟲ 60· score 1.6k
DSPy@dspy_airising
DSPy 3.0: prompt optimization via compile-time search over system prompt variations. Benchmarks inside.
♥ 960↻ 150" 12⟲ 42· score 1.3k
@levelsio@levelsiorising
Switched my whole editor setup to Claude Code this week. Shipping faster than when I used Cursor + Copilot.
♥ 580↻ 40" 6⟲ 80· score 678

AI Infra & Protocols(5)

There's a clear convergence on standardizing agent interaction, with OpenAI providing a protocol and LangChain showing how to adapt to it, indicating a move towards an interoperable agent ecosystem.

Major infrastructure providers like OpenAI, Vercel, and Replit are releasing SDKs and runtimes for orchestrating durable, multi-worker agents.

OpenAI@OpenAIrising
New agent SDK: protocol-level tool calling, deployment harness, and multi-worker orchestration primitives. Docs live.
♥ 4.2k↻ 680" 75⟲ 180· score 5.8k
LangChain@LangChainAIrising
MCP protocol integration thread. How to wire existing LangGraph agents into the Anthropic Model Context Protocol server spec.
♥ 920↻ 145" 14⟲ 48· score 1.3k
Vercel@vercelrising
Edge runtime for agent workers is live. Spawn durable background agents from any serverless deployment.
♥ 540↻ 80" 6⟲ 22· score 718
Alex Albert@AlexAlbert__rising
When your security scanner finds nothing scary on an agent deploy, check the orchestration layer again. That's usually where the jailbreak sneaks through.
♥ 420↻ 60" 8⟲ 35· score 564
Replit@replitrising
New agent deployment harness. One command to go from local orchestration to hosted agent worker.
♥ 380↻ 55" 5⟲ 18· score 505

On-device & Multimodal AI(1)

While the agent ecosystem is the focus, MistralAI's release shows that foundational dataset work continues to be a critical, albeit less visible, driver of progress for models that need to parse the visual web.

MistralAI released a large-scale, cleaned, and licensed web OCR dataset to support multimodal model training.

Mistral AI@MistralAIrising
Open dataset release: 100M-row web OCR dataset. Cleaned, licensed, ready to train.
♥ 2.6k↻ 390" 30⟲ 88· score 3.5k

Memory, RAG & Context(4)

The simple RAG pattern is proving insufficient; actors like @mem0ai and @llamaindex are pushing towards multi-layered memory and knowledge graphs to handle complex reasoning.

Developers are exploring the failure modes of massive context windows and evolving RAG into more complex context engineering frameworks.

Vaibhav Srivastav@reach_vbrising
Tested the new 10M context memory window end to end. Surprising failure modes around rag retrieval cache invalidation, thread below.
♥ 1.9k↻ 260" 22⟲ 75· score 2.5k
Greg Kamradt@GregKamradtrising
RAG is dead, long live context engineering. My framework for when to cache, when to retrieve, and when to just dump memory into the prompt.
♥ 820↻ 130" 16⟲ 54· score 1.1k
mem0@mem0airising
Memory layer for agents: differentiating working memory from the subconscious store. Vector index isn't enough anymore.
♥ 480↻ 72" 5⟲ 25· score 639
LlamaIndex@llamaindexrepeated
Knowledge graph retrieval walkthrough: when semantic vector search misses, graph hop beats it every time.
♥ 290↻ 40" 2⟲ 11· score 376

Other(4)

The principles of autonomous orchestration from the AI agent world, like the durable workflows discussed by @temporalio, are being mirrored in SaaS productivity tools, suggesting a broader trend.

Workspace automation tools like Notion and Linear are releasing features for auto-triaging and chained updates, mirroring patterns seen in AI agents.

Notion@NotionHQrising
Notion workspace automation is out of beta. Auto-fill tables, chained updates across databases, and a new audit log surface.
♥ 820↻ 125" 12⟲ 38· score 1.1k
Linear@linearrising
Linear now auto-triages incoming issues. Quiet launch, but already our favorite workspace feature of the year.
♥ 460↻ 70" 6⟲ 24· score 618
Temporal@temporaliorepeated
Orchestrating agents with durable workflows: replayable, resumable, and multi-worker by default. Walkthrough from our infra team.
♥ 310↻ 48" 4⟲ 14· score 418
James Clear@jamesclearrepeated
The best habit tracker is the one you actually open. Three open-source alternatives worth trying.
♥ 280↻ 42" 3⟲ 18· score 373

Prompt & Skill Libraries(2)

The focus is shifting from anecdotal 'prompt tricks' (@dotey) to data-driven optimization, with platforms like Weights & Biases enabling systematic search for optimal system prompts.

Production-level prompt engineering is being systematized through large-scale benchmarking and the extraction of reusable patterns.

dotey@doteyrising
Five prompt tricks learned this week from reviewing 200 production prompts. Short thread.
♥ 510↻ 88" 8⟲ 30· score 710
Weights & Biases@weights_biasesrising
System prompt benchmarking at scale: we ran 40k variants across 6 frontier models. The efficient frontier is not where you think.
♥ 420↻ 55" 6⟲ 20· score 548

ML & GPU Infrastructure(1)

@jerryjliu0 highlights a key challenge in the agent development lifecycle: filtering out synthetic data that appears useful but harms a model's ability to generalize.

The conversation centers on the crucial but difficult task of curating high-quality synthetic training data for agents.

Jerry Liu@jerryjliu0repeated
Dataset curation for agent training: how we filter synthetic data that looks good but poisons generalization.
♥ 260↻ 36" 2⟲ 11· score 338