2026-05-12

Denoise · Twitter

AI agents are graduating from chat to terminal-native coding tools, with Anthropic's Claude Code leading a new devEx paradigm.

Pay attention to the shift from web UIs to integrated terminal agents, a move that redefines developer workflows and creates new battlegrounds in orchestration and security.

2026-05-122026-05-12T11:24:28Zrules twitter-v1Healthytweets 25signals 25

Top 3 changes

  • AnthropicAI / AI Coding: The launch of Claude Code 1.5 marks a major push for terminal-native AI agents with filesystem access.
  • @karpathy / Developer Experience: Articulates the structural shift underway, framing the move from IDEs to terminal agents as a fundamental change in coding workflows.
  • @OpenAI / AI Infrastructure: Releases a new agent SDK, signaling a competitive focus on providing the orchestration primitives and protocols necessary for deploying agents.

Strategic insights

#01The agent infrastructure stack is standardizing. Actors like OpenAI, Vercel, Replit, and LangChainAI are all building primitives for agent orchestration, tool-calling protocols, and deployment, moving the problem from model capabilities to reliable execution.
#02The developer's terminal is the new frontier for AI. Anthropic's Claude Code, lauded by figures like @karpathy and @levelsio, signals a major shift away from IDE extensions and web playgrounds toward deeply integrated, stateful command-line agents.
#03Agent security is now a primary concern. With agents gaining filesystem and tool access, red-teaming frameworks from Google DeepMind, disclosures from Anthropic, and real-world tests by @MalwareTechBlog show the focus shifting to securing the orchestration layer.
#04The RAG vs. large context debate is maturing into 'context engineering'. Thinkers like @GregKamradt and tools like @mem0ai are moving beyond simple retrieval, focusing on sophisticated memory architectures for agents with massive context windows.

Categories

Security & Reverse Engineering(3)

The security discourse is moving up the stack from the model to the agent's orchestration layer, where tool-chaining and sandbox escapes present new attack surfaces.

Major AI labs and security researchers are focusing on red-teaming autonomous agents, publishing frameworks and disclosures for vulnerabilities beyond simple prompt injection.

  • Anthropic@AnthropicAIrising

    Responsible disclosure on a Claude jailbreak chain we patched last week. Full write-up including our red team timeline.

    5.2k910" 160220· score 7.5k· +1 related
  • Google DeepMind@GoogleDeepMindrising

    New red team framework for prompt injection in autonomous agents. Covers cross-tool leakage, scanner evasion, and sandbox escape patterns.

    880140" 1838· score 1.2k
  • MalwareTech@MalwareTechBlogrepeated

    Autonomous agent running pentest flows against a real SaaS. First real-world run: fewer false positives than I expected on the vulnerability surface.

    18028" 315· score 245

AI Coding Tools & Agents(5)

The competition in AI coding assistants is shifting from IDE integrations like Copilot to stateful, terminal-based agents like Claude Code, which fundamentally alters the developer workflow.

Anthropic's Claude Code 1.5 launch dominates, presenting a terminal-native agent that is already seeing developer adoption and performance comparisons against established tools.

  • Anthropic@AnthropicAIrising

    Claude Code 1.5 is live. Terminal-native coding agent with full Claude Opus reasoning, file-ops sandbox, and session replay.

    4.8k820" 140190· score 6.9k· +1 related
  • Andrej Karpathy@karpathyrising

    The developer-experience shift from IDE to terminal agent is underrated. Coding workflows are about to look nothing like 2024.

    3.4k510" 30140· score 4.5k
  • swyx@swyxrising

    Codex vs Claude Code terminal agent benchmarks. Pass@1 diverges more than I expected on the long-context editor tasks.

    1.1k180" 2260· score 1.6k
  • DSPy@dspy_airising

    DSPy 3.0: prompt optimization via compile-time search over system prompt variations. Benchmarks inside.

    960150" 1242· score 1.3k
  • @levelsio@levelsiorising

    Switched my whole editor setup to Claude Code this week. Shipping faster than when I used Cursor + Copilot.

    58040" 680· score 678

AI Infra & Protocols(5)

There is a clear convergence on building a standardized infrastructure layer for agents, focusing on primitives for tool-calling, orchestration, and durable execution across different platforms.

Major infrastructure providers including OpenAI, Vercel, and Replit are releasing SDKs and runtimes specifically for deploying and orchestrating AI agents.

  • OpenAI@OpenAIrising

    New agent SDK: protocol-level tool calling, deployment harness, and multi-worker orchestration primitives. Docs live.

    4.2k680" 75180· score 5.8k
  • LangChain@LangChainAIrising

    MCP protocol integration thread. How to wire existing LangGraph agents into the Anthropic Model Context Protocol server spec.

    920145" 1448· score 1.3k
  • Vercel@vercelrising

    Edge runtime for agent workers is live. Spawn durable background agents from any serverless deployment.

    54080" 622· score 718
  • Alex Albert@AlexAlbert__rising

    When your security scanner finds nothing scary on an agent deploy, check the orchestration layer again. That's usually where the jailbreak sneaks through.

    42060" 835· score 564
  • Replit@replitrising

    New agent deployment harness. One command to go from local orchestration to hosted agent worker.

    38055" 518· score 505

On-device & Multimodal AI(1)

While a valuable data contribution, the primary focus of the day remains squarely on agent-based systems, leaving multimodal developments in the background.

MistralAI released a 100M-row web OCR dataset, providing a significant new resource for training multimodal models.

  • Mistral AI@MistralAIrising

    Open dataset release: 100M-row web OCR dataset. Cleaned, licensed, ready to train.

    2.6k390" 3088· score 3.5k

Memory, RAG & Context(4)

With 10M context windows becoming available, the challenge shifts from data retrieval to context management, where actors like @GregKamradt and @mem0ai are proposing more sophisticated memory hierarchies.

Discussion moves beyond simple RAG, focusing on 'context engineering' to manage massive context windows and architecting agent memory systems.

  • Vaibhav Srivastav@reach_vbrising

    Tested the new 10M context memory window end to end. Surprising failure modes around rag retrieval cache invalidation, thread below.

    1.9k260" 2275· score 2.5k
  • Greg Kamradt@GregKamradtrising

    RAG is dead, long live context engineering. My framework for when to cache, when to retrieve, and when to just dump memory into the prompt.

    820130" 1654· score 1.1k
  • mem0@mem0airising

    Memory layer for agents: differentiating working memory from the subconscious store. Vector index isn't enough anymore.

    48072" 525· score 639
  • LlamaIndex@llamaindexrepeated

    Knowledge graph retrieval walkthrough: when semantic vector search misses, graph hop beats it every time.

    29040" 211· score 376

Other(4)

The concept of 'orchestration' is a common thread, linking mainstream SaaS automation (Notion, Linear) with the more technical problem of running multi-step AI agents (Temporal).

Workspace automation sees new releases from Notion and Linear, while Temporal highlights its durable workflow engine as a fit for orchestrating agents.

  • Notion@NotionHQrising

    Notion workspace automation is out of beta. Auto-fill tables, chained updates across databases, and a new audit log surface.

    820125" 1238· score 1.1k
  • Linear@linearrising

    Linear now auto-triages incoming issues. Quiet launch, but already our favorite workspace feature of the year.

    46070" 624· score 618
  • Temporal@temporaliorepeated

    Orchestrating agents with durable workflows: replayable, resumable, and multi-worker by default. Walkthrough from our infra team.

    31048" 414· score 418
  • James Clear@jamesclearrepeated

    The best habit tracker is the one you actually open. Three open-source alternatives worth trying.

    28042" 318· score 373

Prompt & Skill Libraries(2)

Prompt engineering is maturing into a data-driven optimization discipline, with platforms like Weights & Biases providing the infrastructure for scaled experimentation.

The focus is on moving from anecdotal prompt tricks to systematic, large-scale benchmarking of system prompts to find optimal configurations.

  • dotey@doteyrising

    Five prompt tricks learned this week from reviewing 200 production prompts. Short thread.

    51088" 830· score 710
  • Weights & Biases@weights_biasesrising

    System prompt benchmarking at scale: we ran 40k variants across 6 frontier models. The efficient frontier is not where you think.

    42055" 620· score 548

ML & GPU Infrastructure(1)

As agent training relies more on synthetic data, the subtle but critical challenge of data quality and filtering becomes a key bottleneck for model performance.

The primary concern highlighted is the difficulty of curating synthetic data for training agents without introducing biases that harm generalization.

  • Jerry Liu@jerryjliu0repeated

    Dataset curation for agent training: how we filter synthetic data that looks good but poisons generalization.

    26036" 211· score 338

Recent reports