2026-05-04

Denoise · Twitter

The AI agent stack is standardizing around terminal-native workflows, orchestration protocols, and new security models as key players release foundational tools.

Attention is shifting from LLM chat to autonomous agents, with new tools from Anthropic and OpenAI defining terminal-based workflows and orchestration.

2026-05-042026-05-04T11:09:09Zrules twitter-v1Healthytweets 25signals 25

Top 3 changes

AnthropicAI / Claude Code 1.5: A major release of a terminal-native coding agent, signaling a UX shift from IDEs.
OpenAI / Agent SDK: The release of a protocol and orchestration primitives for agents, paralleling Anthropic's move up the stack.
karpathy / Developer Experience: Articulated the core pattern of the day: the migration of coding workflows from IDEs to terminal agents.

Strategic insights

#01A de facto agent stack is emerging. OpenAI's Agent SDK and Anthropic's protocol (via LangChain's integration) are creating standards that platforms like Vercel and Replit are building runtimes for.

#02The primary developer interface is shifting from the IDE to the terminal. Karpathy's observation is validated by Anthropic's Claude Code release and user reports from developers like levelsio abandoning IDE-centric tools.

#03Agent security is now a first-class concern. Major labs like Anthropic and Google DeepMind are proactively publishing red-teaming frameworks and disclosures, treating agent security as a parallel track to capability development.

#04The concept of 'memory' is evolving beyond RAG. The conversation, led by figures like GregKamradt and startups like mem0ai, is now about 'context engineering' and tiered memory systems, indicating RAG alone is insufficient for complex agents.

Categories

Security & Reverse Engineering(3)

The security discourse has matured from simple prompt injection to complex agent orchestration vulnerabilities, with Anthropic and Google DeepMind leading the public research.

Major labs are publicly disclosing red-teaming frameworks and patched vulnerabilities for autonomous agents, focusing on multi-step attack surfaces.

Anthropic@AnthropicAIrising
Responsible disclosure on a Claude jailbreak chain we patched last week. Full write-up including our red team timeline.
♥ 5.2k↻ 910" 160⟲ 220· score 7.5k· +1 related
Google DeepMind@GoogleDeepMindrising
New red team framework for prompt injection in autonomous agents. Covers cross-tool leakage, scanner evasion, and sandbox escape patterns.
♥ 880↻ 140" 18⟲ 38· score 1.2k
MalwareTech@MalwareTechBlogrepeated
Autonomous agent running pentest flows against a real SaaS. First real-world run: fewer false positives than I expected on the vulnerability surface.
♥ 180↻ 28" 3⟲ 15· score 245

AI Coding Tools & Agents(5)

A consensus is forming around the terminal as the new UI for coding agents, with Anthropic's launch directly competing with the established Copilot/IDE paradigm.

Anthropic's release of Claude Code 1.5, a terminal-native agent, is driving a broader conversation about moving development workflows out of the IDE.

Anthropic@AnthropicAIrising
Claude Code 1.5 is live. Terminal-native coding agent with full Claude Opus reasoning, file-ops sandbox, and session replay.
♥ 4.8k↻ 820" 140⟲ 190· score 6.9k· +1 related
Andrej Karpathy@karpathyrising
The developer-experience shift from IDE to terminal agent is underrated. Coding workflows are about to look nothing like 2024.
♥ 3.4k↻ 510" 30⟲ 140· score 4.5k
swyx@swyxrising
Codex vs Claude Code terminal agent benchmarks. Pass@1 diverges more than I expected on the long-context editor tasks.
♥ 1.1k↻ 180" 22⟲ 60· score 1.6k
DSPy@dspy_airising
DSPy 3.0: prompt optimization via compile-time search over system prompt variations. Benchmarks inside.
♥ 960↻ 150" 12⟲ 42· score 1.3k
@levelsio@levelsiorising
Switched my whole editor setup to Claude Code this week. Shipping faster than when I used Cursor + Copilot.
♥ 580↻ 40" 6⟲ 80· score 678

AI Infra & Protocols(5)

A new infrastructure layer for agents is solidifying. OpenAI's SDK and Anthropic's MCP (via LangChain) are setting protocol standards that compute platforms like Vercel are racing to support.

OpenAI, Vercel, and Replit released new infrastructure for deploying and orchestrating AI agents, focusing on SDKs, protocols, and edge runtimes.

OpenAI@OpenAIrising
New agent SDK: protocol-level tool calling, deployment harness, and multi-worker orchestration primitives. Docs live.
♥ 4.2k↻ 680" 75⟲ 180· score 5.8k
LangChain@LangChainAIrising
MCP protocol integration thread. How to wire existing LangGraph agents into the Anthropic Model Context Protocol server spec.
♥ 920↻ 145" 14⟲ 48· score 1.3k
Vercel@vercelrising
Edge runtime for agent workers is live. Spawn durable background agents from any serverless deployment.
♥ 540↻ 80" 6⟲ 22· score 718
Alex Albert@AlexAlbert__rising
When your security scanner finds nothing scary on an agent deploy, check the orchestration layer again. That's usually where the jailbreak sneaks through.
♥ 420↻ 60" 8⟲ 35· score 564
Replit@replitrising
New agent deployment harness. One command to go from local orchestration to hosted agent worker.
♥ 380↻ 55" 5⟲ 18· score 505

On-device & Multimodal AI(1)

While agents dominate the conversation, MistralAI's dataset release underscores the continued importance of foundational data collection for advancing open-source model capabilities.

MistralAI released a massive, cleaned, and licensed web OCR dataset, providing a significant new resource for training open-source multimodal models.

Mistral AI@MistralAIrising
Open dataset release: 100M-row web OCR dataset. Cleaned, licensed, ready to train.
♥ 2.6k↻ 390" 30⟲ 88· score 3.5k

Memory, RAG & Context(4)

The limits of vector search are pushing thought leaders like GregKamradt and tools like mem0ai to propose new primitives for agent memory, differentiating working memory from long-term storage.

Discussion is moving beyond simple RAG towards 'context engineering' and more complex memory architectures for agents with large context windows.

Vaibhav Srivastav@reach_vbrising
Tested the new 10M context memory window end to end. Surprising failure modes around rag retrieval cache invalidation, thread below.
♥ 1.9k↻ 260" 22⟲ 75· score 2.5k
Greg Kamradt@GregKamradtrising
RAG is dead, long live context engineering. My framework for when to cache, when to retrieve, and when to just dump memory into the prompt.
♥ 820↻ 130" 16⟲ 54· score 1.1k
mem0@mem0airising
Memory layer for agents: differentiating working memory from the subconscious store. Vector index isn't enough anymore.
♥ 480↻ 72" 5⟲ 25· score 639
LlamaIndex@llamaindexrepeated
Knowledge graph retrieval walkthrough: when semantic vector search misses, graph hop beats it every time.
♥ 290↻ 40" 2⟲ 11· score 376

Other(4)

Autonomous workflow automation is becoming a standard feature in modern collaboration software, with Notion and Linear embedding capabilities previously requiring third-party tools.

SaaS tools like Notion and Linear are shipping agent-like automation features, such as auto-triaging issues and chaining database updates, directly into their platforms.

Notion@NotionHQrising
Notion workspace automation is out of beta. Auto-fill tables, chained updates across databases, and a new audit log surface.
♥ 820↻ 125" 12⟲ 38· score 1.1k
Linear@linearrising
Linear now auto-triages incoming issues. Quiet launch, but already our favorite workspace feature of the year.
♥ 460↻ 70" 6⟲ 24· score 618
Temporal@temporaliorepeated
Orchestrating agents with durable workflows: replayable, resumable, and multi-worker by default. Walkthrough from our infra team.
♥ 310↻ 48" 4⟲ 14· score 418
James Clear@jamesclearrepeated
The best habit tracker is the one you actually open. Three open-source alternatives worth trying.
♥ 280↻ 42" 3⟲ 18· score 373

Prompt & Skill Libraries(2)

The practice of prompt engineering is maturing from an art to a science, with organizations like Weights & Biases investing in large-scale studies to find empirically optimal system prompts.

Engineers are sharing increasingly systematic approaches to prompt optimization, from practical heuristics to large-scale benchmark-driven analysis.

dotey@doteyrising
Five prompt tricks learned this week from reviewing 200 production prompts. Short thread.
♥ 510↻ 88" 8⟲ 30· score 710
Weights & Biases@weights_biasesrising
System prompt benchmarking at scale: we ran 40k variants across 6 frontier models. The efficient frontier is not where you think.
♥ 420↻ 55" 6⟲ 20· score 548

ML & GPU Infrastructure(1)

As agent training becomes more widespread, data quality, not just quantity, is becoming the bottleneck, with a focus on avoiding subtle 'poisoning' from low-quality synthetic data.

A key challenge highlighted is the curation of synthetic data for agent training, specifically filtering out data that harms model generalization.

Jerry Liu@jerryjliu0repeated
Dataset curation for agent training: how we filter synthetic data that looks good but poisons generalization.
♥ 260↻ 36" 2⟲ 11· score 338

Recent reports