2026-05-31

Denoise · Twitter

The terminal agent is the new IDE, and the race to build its supporting infrastructure for orchestration, deployment, and security is on.

Pay attention to the developer workflow shifting from IDEs to terminal-native agents, exemplified by Anthropic's Claude Code and OpenAI's Agent SDK.

2026-05-312026-05-31T11:03:25Zrules twitter-v1Healthytweets 25signals 25

Top 3 changes

AnthropicAI / Claude Code 1.5: A major release of a terminal-native coding agent, shifting the developer experience away from traditional IDEs.
OpenAI / Agent SDK: The release of a foundational SDK for agent orchestration and deployment, signaling a new layer of infrastructure.
karpathy / DevEx Shift: Articulation of the structural shift from IDE plugins to terminal agents as the primary coding interface.

Strategic insights

#01A consensus layer for agent infrastructure is emerging. OpenAI, Vercel, and Replit are all shipping orchestration and deployment primitives, indicating a new battleground for developer platforms.

#02The primary developer interface is shifting from the IDE to the terminal. Anthropic's Claude Code is the key artifact, with figures like @karpathy providing the conceptual framing and users like @levelsio offering early validation.

#03Agent security is becoming a parallel field of practice. As agent capabilities expand, disclosures from @AnthropicAI and red-teaming frameworks from @GoogleDeepMind show that securing orchestration and tool-use is a non-trivial, distinct discipline.

#04The conversation on context is evolving from 'retrieval' to 'engineering.' With 10M context windows tested by practitioners like @reach_vb, the focus is shifting to sophisticated memory management and caching strategies, as articulated by @GregKamradt.

Categories

Security & Reverse Engineering(3)

Agent security is moving beyond simple prompt injection to complex vulnerabilities in orchestration and tool-use, a focus shared by Anthropic's disclosure and Google DeepMind's framework.

Major model providers are publicly disclosing agent jailbreaks and releasing red-teaming frameworks, while independent researchers test autonomous pentesting.

Anthropic@AnthropicAIrising
Responsible disclosure on a Claude jailbreak chain we patched last week. Full write-up including our red team timeline.
♥ 5.2k↻ 910" 160⟲ 220· score 7.5k· +1 related
Google DeepMind@GoogleDeepMindrising
New red team framework for prompt injection in autonomous agents. Covers cross-tool leakage, scanner evasion, and sandbox escape patterns.
♥ 880↻ 140" 18⟲ 38· score 1.2k
MalwareTech@MalwareTechBlogrepeated
Autonomous agent running pentest flows against a real SaaS. First real-world run: fewer false positives than I expected on the vulnerability surface.
♥ 180↻ 28" 3⟲ 15· score 245

AI Coding Tools & Agents(5)

The frontier of AI coding assistants is shifting from IDE extensions like Copilot to standalone terminal agents like Claude Code, a paradigm change articulated by @karpathy.

Anthropic's release of Claude Code 1.5, a terminal-native agent, dominates the conversation, spurring immediate benchmarks and declarations of workflow shifts.

Anthropic@AnthropicAIrising
Claude Code 1.5 is live. Terminal-native coding agent with full Claude Opus reasoning, file-ops sandbox, and session replay.
♥ 4.8k↻ 820" 140⟲ 190· score 6.9k· +1 related
Andrej Karpathy@karpathyrising
The developer-experience shift from IDE to terminal agent is underrated. Coding workflows are about to look nothing like 2024.
♥ 3.4k↻ 510" 30⟲ 140· score 4.5k
swyx@swyxrising
Codex vs Claude Code terminal agent benchmarks. Pass@1 diverges more than I expected on the long-context editor tasks.
♥ 1.1k↻ 180" 22⟲ 60· score 1.6k
DSPy@dspy_airising
DSPy 3.0: prompt optimization via compile-time search over system prompt variations. Benchmarks inside.
♥ 960↻ 150" 12⟲ 42· score 1.3k
@levelsio@levelsiorising
Switched my whole editor setup to Claude Code this week. Shipping faster than when I used Cursor + Copilot.
♥ 580↻ 40" 6⟲ 80· score 678

AI Infra & Protocols(5)

A new infrastructure stack for agents is standardizing, with OpenAI's SDK and LangChain's protocol integrations suggesting a move towards interoperable orchestration.

OpenAI, Vercel, and Replit all released primitives for agent orchestration and deployment, indicating a rapid build-out of this new infrastructure layer.

OpenAI@OpenAIrising
New agent SDK: protocol-level tool calling, deployment harness, and multi-worker orchestration primitives. Docs live.
♥ 4.2k↻ 680" 75⟲ 180· score 5.8k
LangChain@LangChainAIrising
MCP protocol integration thread. How to wire existing LangGraph agents into the Anthropic Model Context Protocol server spec.
♥ 920↻ 145" 14⟲ 48· score 1.3k
Vercel@vercelrising
Edge runtime for agent workers is live. Spawn durable background agents from any serverless deployment.
♥ 540↻ 80" 6⟲ 22· score 718
Alex Albert@AlexAlbert__rising
When your security scanner finds nothing scary on an agent deploy, check the orchestration layer again. That's usually where the jailbreak sneaks through.
♥ 420↻ 60" 8⟲ 35· score 564
Replit@replitrising
New agent deployment harness. One command to go from local orchestration to hosted agent worker.
♥ 380↻ 55" 5⟲ 18· score 505

On-device & Multimodal AI(1)

A quiet day, with Mistral's dataset release being the only signal, reinforcing that progress in multimodality remains fundamentally gated by large, clean datasets.

Mistral AI released a large-scale, 100M-row web OCR dataset, providing a key resource for training future multimodal models.

Mistral AI@MistralAIrising
Open dataset release: 100M-row web OCR dataset. Cleaned, licensed, ready to train.
♥ 2.6k↻ 390" 30⟲ 88· score 3.5k

Memory, RAG & Context(4)

As context windows expand to 10M tokens, the problem shifts from retrieval to management, with practitioners like @GregKamradt and tools like LlamaIndex exploring more complex memory and graph-based systems.

Discussion is shifting from simple RAG towards 'context engineering'—managing large memory stores and complex retrieval strategies for agents.

Vaibhav Srivastav@reach_vbrising
Tested the new 10M context memory window end to end. Surprising failure modes around rag retrieval cache invalidation, thread below.
♥ 1.9k↻ 260" 22⟲ 75· score 2.5k
Greg Kamradt@GregKamradtrising
RAG is dead, long live context engineering. My framework for when to cache, when to retrieve, and when to just dump memory into the prompt.
♥ 820↻ 130" 16⟲ 54· score 1.1k
mem0@mem0airising
Memory layer for agents: differentiating working memory from the subconscious store. Vector index isn't enough anymore.
♥ 480↻ 72" 5⟲ 25· score 639
LlamaIndex@llamaindexrepeated
Knowledge graph retrieval walkthrough: when semantic vector search misses, graph hop beats it every time.
♥ 290↻ 40" 2⟲ 11· score 376

Other(4)

The agentic automation pattern seen in developer tools is being mirrored in business applications by Notion and Linear, automating triage and database updates.

Productivity tools like Notion and Linear are shipping agent-like automation features, suggesting these patterns are moving into mainstream SaaS.

Notion@NotionHQrising
Notion workspace automation is out of beta. Auto-fill tables, chained updates across databases, and a new audit log surface.
♥ 820↻ 125" 12⟲ 38· score 1.1k
Linear@linearrising
Linear now auto-triages incoming issues. Quiet launch, but already our favorite workspace feature of the year.
♥ 460↻ 70" 6⟲ 24· score 618
Temporal@temporaliorepeated
Orchestrating agents with durable workflows: replayable, resumable, and multi-worker by default. Walkthrough from our infra team.
♥ 310↻ 48" 4⟲ 14· score 418
James Clear@jamesclearrepeated
The best habit tracker is the one you actually open. Three open-source alternatives worth trying.
♥ 280↻ 42" 3⟲ 18· score 373

Prompt & Skill Libraries(2)

Prompting is shifting from a craft of individual tricks (@dotey) to a science of large-scale benchmarking, exemplified by Weights & Biases' 40k variant study.

The focus is on systematizing prompt optimization by running large-scale benchmarks to find the most effective system prompt variations.

dotey@doteyrising
Five prompt tricks learned this week from reviewing 200 production prompts. Short thread.
♥ 510↻ 88" 8⟲ 30· score 710
Weights & Biases@weights_biasesrising
System prompt benchmarking at scale: we ran 40k variants across 6 frontier models. The efficient frontier is not where you think.
♥ 420↻ 55" 6⟲ 20· score 548

ML & GPU Infrastructure(1)

For advanced agent training, data quality is now a more significant bottleneck than data quantity, requiring sophisticated filtering of synthetic data as noted by @jerryjliu0.

The key challenge discussed is the curation of high-quality synthetic data for agent training, focusing on avoiding data that harms generalization.

Jerry Liu@jerryjliu0repeated
Dataset curation for agent training: how we filter synthetic data that looks good but poisons generalization.
♥ 260↻ 36" 2⟲ 11· score 338