2026-05-10

Denoise · Twitter

Autonomous agents are the new primitive, with terminal-based coding and dedicated infrastructure emerging as the stack.

Today's releases from Anthropic and OpenAI show the agent stack is solidifying, moving coding into the terminal and forcing a new look at orchestration security.

2026-05-102026-05-10T10:20:08Zrules twitter-v1Healthytweets 25signals 25

Top 3 changes

AnthropicAI / AI Coding: Claude Code 1.5 released as a terminal-native agent, proposing a new developer workflow outside the traditional IDE.
OpenAI / AI Infra: A new agent SDK provides protocol-level primitives for tool calling and orchestration, pushing towards standardization.
karpathy / Developer Experience: Frames the shift from IDE to terminal agent as a fundamental, underrated change in how developers will work.

Strategic insights

#01The agent infrastructure layer is rapidly maturing, with OpenAI, Vercel, and Replit all shipping distinct orchestration and deployment tools, signaling a race to build the standard platform.

#02The terminal is re-emerging as the primary AI-native developer interface. Anthropic's Claude Code, validated by @karpathy and early adopters like @levelsio, challenges the dominance of IDE-based tools like Copilot.

#03Agent security is now a distinct discipline. Disclosures from Anthropic, red team frameworks from Google DeepMind, and real-world tests by @MalwareTechBlog show the attack surface has moved to the orchestration layer.

#04"Context Engineering" is replacing simple RAG. Practitioners like @GregKamradt and @reach_vb are moving beyond basic retrieval to build sophisticated memory hierarchies, treating context management as a core engineering problem.

#05Workflow automation is a convergent trend across the stack. SaaS tools like Notion and Linear are building business-logic automation, while infrastructure like Temporal provides primitives for orchestrating technical workflows, including AI agents.

Categories

Security & Reverse Engineering(3)

The conversation is shifting from theoretical prompt injection to securing complex, multi-tool autonomous agent systems, with red teaming becoming a standard practice.

Major labs Anthropic and Google DeepMind are releasing formal frameworks for agent security, while practitioners test autonomous agents for penetration testing.

Anthropic@AnthropicAIrising
Responsible disclosure on a Claude jailbreak chain we patched last week. Full write-up including our red team timeline.
♥ 5.2k↻ 910" 160⟲ 220· score 7.5k· +1 related
Google DeepMind@GoogleDeepMindrising
New red team framework for prompt injection in autonomous agents. Covers cross-tool leakage, scanner evasion, and sandbox escape patterns.
♥ 880↻ 140" 18⟲ 38· score 1.2k
MalwareTech@MalwareTechBlogrepeated
Autonomous agent running pentest flows against a real SaaS. First real-world run: fewer false positives than I expected on the vulnerability surface.
♥ 180↻ 28" 3⟲ 15· score 245

AI Coding Tools & Agents(5)

A split is emerging in developer tools between IDE plugins like Copilot and terminal-native agents like Claude Code, with the latter claiming higher developer velocity.

Anthropic's release of Claude Code 1.5, a terminal-native coding agent, is driving discussion, with early benchmarks and user testimonials appearing.

Anthropic@AnthropicAIrising
Claude Code 1.5 is live. Terminal-native coding agent with full Claude Opus reasoning, file-ops sandbox, and session replay.
♥ 4.8k↻ 820" 140⟲ 190· score 6.9k· +1 related
Andrej Karpathy@karpathyrising
The developer-experience shift from IDE to terminal agent is underrated. Coding workflows are about to look nothing like 2024.
♥ 3.4k↻ 510" 30⟲ 140· score 4.5k
swyx@swyxrising
Codex vs Claude Code terminal agent benchmarks. Pass@1 diverges more than I expected on the long-context editor tasks.
♥ 1.1k↻ 180" 22⟲ 60· score 1.6k
DSPy@dspy_airising
DSPy 3.0: prompt optimization via compile-time search over system prompt variations. Benchmarks inside.
♥ 960↻ 150" 12⟲ 42· score 1.3k
@levelsio@levelsiorising
Switched my whole editor setup to Claude Code this week. Shipping faster than when I used Cursor + Copilot.
♥ 580↻ 40" 6⟲ 80· score 678

AI Infra & Protocols(5)

A convergence towards protocol-level standards is visible, with OpenAI's SDK and LangChain's integration work showing interoperability is a key concern.

OpenAI, Vercel, and Replit all shipped tools for agent orchestration and deployment, indicating a race to define the standard infrastructure layer.

OpenAI@OpenAIrising
New agent SDK: protocol-level tool calling, deployment harness, and multi-worker orchestration primitives. Docs live.
♥ 4.2k↻ 680" 75⟲ 180· score 5.8k
LangChain@LangChainAIrising
MCP protocol integration thread. How to wire existing LangGraph agents into the Anthropic Model Context Protocol server spec.
♥ 920↻ 145" 14⟲ 48· score 1.3k
Vercel@vercelrising
Edge runtime for agent workers is live. Spawn durable background agents from any serverless deployment.
♥ 540↻ 80" 6⟲ 22· score 718
Alex Albert@AlexAlbert__rising
When your security scanner finds nothing scary on an agent deploy, check the orchestration layer again. That's usually where the jailbreak sneaks through.
♥ 420↻ 60" 8⟲ 35· score 564
Replit@replitrising
New agent deployment harness. One command to go from local orchestration to hosted agent worker.
♥ 380↻ 55" 5⟲ 18· score 505

On-device & Multimodal AI(1)

While agent orchestration is the focus, major labs like Mistral are still addressing foundational data bottlenecks for core multimodal skills like OCR.

Mistral AI released a large-scale, 100M-row web OCR dataset, providing a foundational resource for training multimodal models.

Mistral AI@MistralAIrising
Open dataset release: 100M-row web OCR dataset. Cleaned, licensed, ready to train.
♥ 2.6k↻ 390" 30⟲ 88· score 3.5k

Memory, RAG & Context(4)

The field is advancing from basic retrieval to complex memory management, with tools like mem0.ai proposing new architectures that separate working and long-term memory.

The discussion highlights the failure modes of simple RAG in large context windows and argues for more sophisticated "context engineering" and memory architectures.

Vaibhav Srivastav@reach_vbrising
Tested the new 10M context memory window end to end. Surprising failure modes around rag retrieval cache invalidation, thread below.
♥ 1.9k↻ 260" 22⟲ 75· score 2.5k
Greg Kamradt@GregKamradtrising
RAG is dead, long live context engineering. My framework for when to cache, when to retrieve, and when to just dump memory into the prompt.
♥ 820↻ 130" 16⟲ 54· score 1.1k
mem0@mem0airising
Memory layer for agents: differentiating working memory from the subconscious store. Vector index isn't enough anymore.
♥ 480↻ 72" 5⟲ 25· score 639
LlamaIndex@llamaindexrepeated
Knowledge graph retrieval walkthrough: when semantic vector search misses, graph hop beats it every time.
♥ 290↻ 40" 2⟲ 11· score 376

Other(4)

SaaS tools like Notion and infrastructure like Temporal are converging on workflow orchestration, providing automation primitives at different layers of the stack.

Workspace automation is a prominent theme, with Notion and Linear releasing features to automate internal workflows, mirroring the broader agent trend.

Notion@NotionHQrising
Notion workspace automation is out of beta. Auto-fill tables, chained updates across databases, and a new audit log surface.
♥ 820↻ 125" 12⟲ 38· score 1.1k
Linear@linearrising
Linear now auto-triages incoming issues. Quiet launch, but already our favorite workspace feature of the year.
♥ 460↻ 70" 6⟲ 24· score 618
Temporal@temporaliorepeated
Orchestrating agents with durable workflows: replayable, resumable, and multi-worker by default. Walkthrough from our infra team.
♥ 310↻ 48" 4⟲ 14· score 418
James Clear@jamesclearrepeated
The best habit tracker is the one you actually open. Three open-source alternatives worth trying.
♥ 280↻ 42" 3⟲ 18· score 373

Prompt & Skill Libraries(2)

Prompt engineering is maturing from anecdotal tricks to an industrial process, with platforms treating system prompt selection as a formal hyperparameter tuning problem.

The conversation covers both tactical prompt tricks from production reviews and strategic, large-scale system prompt benchmarking by platforms like Weights & Biases.

dotey@doteyrising
Five prompt tricks learned this week from reviewing 200 production prompts. Short thread.
♥ 510↻ 88" 8⟲ 30· score 710
Weights & Biases@weights_biasesrising
System prompt benchmarking at scale: we ran 40k variants across 6 frontier models. The efficient frontier is not where you think.
♥ 420↻ 55" 6⟲ 20· score 548

ML & GPU Infrastructure(1)

As agent capabilities scale, the bottleneck is shifting from model architecture to high-quality, poison-resistant training data, a classic ML pattern now repeating itself.

A key challenge highlighted is the curation of datasets for agent training, specifically how to filter synthetic data that harms model generalization.

Jerry Liu@jerryjliu0repeated
Dataset curation for agent training: how we filter synthetic data that looks good but poisons generalization.
♥ 260↻ 36" 2⟲ 11· score 338

Recent reports