2026-05-12

Denoise · Twitter

AI agents are graduating from chat to terminal-native coding tools, with Anthropic's Claude Code leading a new devEx paradigm.

Pay attention to the shift from web UIs to integrated terminal agents, a move that redefines developer workflows and creates new battlegrounds in orchestration and security.

2026-05-122026-05-12T11:24:28Zrules twitter-v1Healthytweets 25signals 25

Top 3 changes

AnthropicAI / AI Coding: The launch of Claude Code 1.5 marks a major push for terminal-native AI agents with filesystem access.
@karpathy / Developer Experience: Articulates the structural shift underway, framing the move from IDEs to terminal agents as a fundamental change in coding workflows.
@OpenAI / AI Infrastructure: Releases a new agent SDK, signaling a competitive focus on providing the orchestration primitives and protocols necessary for deploying agents.

Strategic insights

#01The agent infrastructure stack is standardizing. Actors like OpenAI, Vercel, Replit, and LangChainAI are all building primitives for agent orchestration, tool-calling protocols, and deployment, moving the problem from model capabilities to reliable execution.

#02The developer's terminal is the new frontier for AI. Anthropic's Claude Code, lauded by figures like @karpathy and @levelsio, signals a major shift away from IDE extensions and web playgrounds toward deeply integrated, stateful command-line agents.

#03Agent security is now a primary concern. With agents gaining filesystem and tool access, red-teaming frameworks from Google DeepMind, disclosures from Anthropic, and real-world tests by @MalwareTechBlog show the focus shifting to securing the orchestration layer.

#04The RAG vs. large context debate is maturing into 'context engineering'. Thinkers like @GregKamradt and tools like @mem0ai are moving beyond simple retrieval, focusing on sophisticated memory architectures for agents with massive context windows.

Categories

Security & Reverse Engineering(3)

The security discourse is moving up the stack from the model to the agent's orchestration layer, where tool-chaining and sandbox escapes present new attack surfaces.

Major AI labs and security researchers are focusing on red-teaming autonomous agents, publishing frameworks and disclosures for vulnerabilities beyond simple prompt injection.

Anthropic@AnthropicAIrising
Responsible disclosure on a Claude jailbreak chain we patched last week. Full write-up including our red team timeline.
♥ 5.2k↻ 910" 160⟲ 220· score 7.5k· +1 related
Google DeepMind@GoogleDeepMindrising
New red team framework for prompt injection in autonomous agents. Covers cross-tool leakage, scanner evasion, and sandbox escape patterns.
♥ 880↻ 140" 18⟲ 38· score 1.2k
MalwareTech@MalwareTechBlogrepeated
Autonomous agent running pentest flows against a real SaaS. First real-world run: fewer false positives than I expected on the vulnerability surface.
♥ 180↻ 28" 3⟲ 15· score 245

AI Coding Tools & Agents(5)

The competition in AI coding assistants is shifting from IDE integrations like Copilot to stateful, terminal-based agents like Claude Code, which fundamentally alters the developer workflow.

Anthropic's Claude Code 1.5 launch dominates, presenting a terminal-native agent that is already seeing developer adoption and performance comparisons against established tools.

Anthropic@AnthropicAIrising
Claude Code 1.5 is live. Terminal-native coding agent with full Claude Opus reasoning, file-ops sandbox, and session replay.
♥ 4.8k↻ 820" 140⟲ 190· score 6.9k· +1 related
Andrej Karpathy@karpathyrising
The developer-experience shift from IDE to terminal agent is underrated. Coding workflows are about to look nothing like 2024.
♥ 3.4k↻ 510" 30⟲ 140· score 4.5k
swyx@swyxrising
Codex vs Claude Code terminal agent benchmarks. Pass@1 diverges more than I expected on the long-context editor tasks.
♥ 1.1k↻ 180" 22⟲ 60· score 1.6k
DSPy@dspy_airising
DSPy 3.0: prompt optimization via compile-time search over system prompt variations. Benchmarks inside.
♥ 960↻ 150" 12⟲ 42· score 1.3k
@levelsio@levelsiorising
Switched my whole editor setup to Claude Code this week. Shipping faster than when I used Cursor + Copilot.
♥ 580↻ 40" 6⟲ 80· score 678

AI Infra & Protocols(5)

There is a clear convergence on building a standardized infrastructure layer for agents, focusing on primitives for tool-calling, orchestration, and durable execution across different platforms.

Major infrastructure providers including OpenAI, Vercel, and Replit are releasing SDKs and runtimes specifically for deploying and orchestrating AI agents.

OpenAI@OpenAIrising
New agent SDK: protocol-level tool calling, deployment harness, and multi-worker orchestration primitives. Docs live.
♥ 4.2k↻ 680" 75⟲ 180· score 5.8k
LangChain@LangChainAIrising
MCP protocol integration thread. How to wire existing LangGraph agents into the Anthropic Model Context Protocol server spec.
♥ 920↻ 145" 14⟲ 48· score 1.3k
Vercel@vercelrising
Edge runtime for agent workers is live. Spawn durable background agents from any serverless deployment.
♥ 540↻ 80" 6⟲ 22· score 718
Alex Albert@AlexAlbert__rising
When your security scanner finds nothing scary on an agent deploy, check the orchestration layer again. That's usually where the jailbreak sneaks through.
♥ 420↻ 60" 8⟲ 35· score 564
Replit@replitrising
New agent deployment harness. One command to go from local orchestration to hosted agent worker.
♥ 380↻ 55" 5⟲ 18· score 505

On-device & Multimodal AI(1)

While a valuable data contribution, the primary focus of the day remains squarely on agent-based systems, leaving multimodal developments in the background.

MistralAI released a 100M-row web OCR dataset, providing a significant new resource for training multimodal models.

Mistral AI@MistralAIrising
Open dataset release: 100M-row web OCR dataset. Cleaned, licensed, ready to train.
♥ 2.6k↻ 390" 30⟲ 88· score 3.5k

Memory, RAG & Context(4)

With 10M context windows becoming available, the challenge shifts from data retrieval to context management, where actors like @GregKamradt and @mem0ai are proposing more sophisticated memory hierarchies.

Discussion moves beyond simple RAG, focusing on 'context engineering' to manage massive context windows and architecting agent memory systems.

Vaibhav Srivastav@reach_vbrising
Tested the new 10M context memory window end to end. Surprising failure modes around rag retrieval cache invalidation, thread below.
♥ 1.9k↻ 260" 22⟲ 75· score 2.5k
Greg Kamradt@GregKamradtrising
RAG is dead, long live context engineering. My framework for when to cache, when to retrieve, and when to just dump memory into the prompt.
♥ 820↻ 130" 16⟲ 54· score 1.1k
mem0@mem0airising
Memory layer for agents: differentiating working memory from the subconscious store. Vector index isn't enough anymore.
♥ 480↻ 72" 5⟲ 25· score 639
LlamaIndex@llamaindexrepeated
Knowledge graph retrieval walkthrough: when semantic vector search misses, graph hop beats it every time.
♥ 290↻ 40" 2⟲ 11· score 376

Other(4)

The concept of 'orchestration' is a common thread, linking mainstream SaaS automation (Notion, Linear) with the more technical problem of running multi-step AI agents (Temporal).

Workspace automation sees new releases from Notion and Linear, while Temporal highlights its durable workflow engine as a fit for orchestrating agents.

Notion@NotionHQrising
Notion workspace automation is out of beta. Auto-fill tables, chained updates across databases, and a new audit log surface.
♥ 820↻ 125" 12⟲ 38· score 1.1k
Linear@linearrising
Linear now auto-triages incoming issues. Quiet launch, but already our favorite workspace feature of the year.
♥ 460↻ 70" 6⟲ 24· score 618
Temporal@temporaliorepeated
Orchestrating agents with durable workflows: replayable, resumable, and multi-worker by default. Walkthrough from our infra team.
♥ 310↻ 48" 4⟲ 14· score 418
James Clear@jamesclearrepeated
The best habit tracker is the one you actually open. Three open-source alternatives worth trying.
♥ 280↻ 42" 3⟲ 18· score 373

Prompt & Skill Libraries(2)

Prompt engineering is maturing into a data-driven optimization discipline, with platforms like Weights & Biases providing the infrastructure for scaled experimentation.

The focus is on moving from anecdotal prompt tricks to systematic, large-scale benchmarking of system prompts to find optimal configurations.

dotey@doteyrising
Five prompt tricks learned this week from reviewing 200 production prompts. Short thread.
♥ 510↻ 88" 8⟲ 30· score 710
Weights & Biases@weights_biasesrising
System prompt benchmarking at scale: we ran 40k variants across 6 frontier models. The efficient frontier is not where you think.
♥ 420↻ 55" 6⟲ 20· score 548

ML & GPU Infrastructure(1)

As agent training relies more on synthetic data, the subtle but critical challenge of data quality and filtering becomes a key bottleneck for model performance.

The primary concern highlighted is the difficulty of curating synthetic data for training agents without introducing biases that harm generalization.

Jerry Liu@jerryjliu0repeated
Dataset curation for agent training: how we filter synthetic data that looks good but poisons generalization.
♥ 260↻ 36" 2⟲ 11· score 338