2026-05-10

Denoise · Twitter

Autonomous agents are the new primitive, with terminal-based coding and dedicated infrastructure emerging as the stack.

Today's releases from Anthropic and OpenAI show the agent stack is solidifying, moving coding into the terminal and forcing a new look at orchestration security.

2026-05-102026-05-10T10:20:08Zrules twitter-v1Healthytweets 25signals 25

Top 3 changes

  • AnthropicAI / AI Coding: Claude Code 1.5 released as a terminal-native agent, proposing a new developer workflow outside the traditional IDE.
  • OpenAI / AI Infra: A new agent SDK provides protocol-level primitives for tool calling and orchestration, pushing towards standardization.
  • karpathy / Developer Experience: Frames the shift from IDE to terminal agent as a fundamental, underrated change in how developers will work.

Strategic insights

#01The agent infrastructure layer is rapidly maturing, with OpenAI, Vercel, and Replit all shipping distinct orchestration and deployment tools, signaling a race to build the standard platform.
#02The terminal is re-emerging as the primary AI-native developer interface. Anthropic's Claude Code, validated by @karpathy and early adopters like @levelsio, challenges the dominance of IDE-based tools like Copilot.
#03Agent security is now a distinct discipline. Disclosures from Anthropic, red team frameworks from Google DeepMind, and real-world tests by @MalwareTechBlog show the attack surface has moved to the orchestration layer.
#04"Context Engineering" is replacing simple RAG. Practitioners like @GregKamradt and @reach_vb are moving beyond basic retrieval to build sophisticated memory hierarchies, treating context management as a core engineering problem.
#05Workflow automation is a convergent trend across the stack. SaaS tools like Notion and Linear are building business-logic automation, while infrastructure like Temporal provides primitives for orchestrating technical workflows, including AI agents.

Categories

Security & Reverse Engineering(3)

The conversation is shifting from theoretical prompt injection to securing complex, multi-tool autonomous agent systems, with red teaming becoming a standard practice.

Major labs Anthropic and Google DeepMind are releasing formal frameworks for agent security, while practitioners test autonomous agents for penetration testing.

  • Anthropic@AnthropicAIrising

    Responsible disclosure on a Claude jailbreak chain we patched last week. Full write-up including our red team timeline.

    5.2k910" 160220· score 7.5k· +1 related
  • Google DeepMind@GoogleDeepMindrising

    New red team framework for prompt injection in autonomous agents. Covers cross-tool leakage, scanner evasion, and sandbox escape patterns.

    880140" 1838· score 1.2k
  • MalwareTech@MalwareTechBlogrepeated

    Autonomous agent running pentest flows against a real SaaS. First real-world run: fewer false positives than I expected on the vulnerability surface.

    18028" 315· score 245

AI Coding Tools & Agents(5)

A split is emerging in developer tools between IDE plugins like Copilot and terminal-native agents like Claude Code, with the latter claiming higher developer velocity.

Anthropic's release of Claude Code 1.5, a terminal-native coding agent, is driving discussion, with early benchmarks and user testimonials appearing.

  • Anthropic@AnthropicAIrising

    Claude Code 1.5 is live. Terminal-native coding agent with full Claude Opus reasoning, file-ops sandbox, and session replay.

    4.8k820" 140190· score 6.9k· +1 related
  • Andrej Karpathy@karpathyrising

    The developer-experience shift from IDE to terminal agent is underrated. Coding workflows are about to look nothing like 2024.

    3.4k510" 30140· score 4.5k
  • swyx@swyxrising

    Codex vs Claude Code terminal agent benchmarks. Pass@1 diverges more than I expected on the long-context editor tasks.

    1.1k180" 2260· score 1.6k
  • DSPy@dspy_airising

    DSPy 3.0: prompt optimization via compile-time search over system prompt variations. Benchmarks inside.

    960150" 1242· score 1.3k
  • @levelsio@levelsiorising

    Switched my whole editor setup to Claude Code this week. Shipping faster than when I used Cursor + Copilot.

    58040" 680· score 678

AI Infra & Protocols(5)

A convergence towards protocol-level standards is visible, with OpenAI's SDK and LangChain's integration work showing interoperability is a key concern.

OpenAI, Vercel, and Replit all shipped tools for agent orchestration and deployment, indicating a race to define the standard infrastructure layer.

  • OpenAI@OpenAIrising

    New agent SDK: protocol-level tool calling, deployment harness, and multi-worker orchestration primitives. Docs live.

    4.2k680" 75180· score 5.8k
  • LangChain@LangChainAIrising

    MCP protocol integration thread. How to wire existing LangGraph agents into the Anthropic Model Context Protocol server spec.

    920145" 1448· score 1.3k
  • Vercel@vercelrising

    Edge runtime for agent workers is live. Spawn durable background agents from any serverless deployment.

    54080" 622· score 718
  • Alex Albert@AlexAlbert__rising

    When your security scanner finds nothing scary on an agent deploy, check the orchestration layer again. That's usually where the jailbreak sneaks through.

    42060" 835· score 564
  • Replit@replitrising

    New agent deployment harness. One command to go from local orchestration to hosted agent worker.

    38055" 518· score 505

On-device & Multimodal AI(1)

While agent orchestration is the focus, major labs like Mistral are still addressing foundational data bottlenecks for core multimodal skills like OCR.

Mistral AI released a large-scale, 100M-row web OCR dataset, providing a foundational resource for training multimodal models.

  • Mistral AI@MistralAIrising

    Open dataset release: 100M-row web OCR dataset. Cleaned, licensed, ready to train.

    2.6k390" 3088· score 3.5k

Memory, RAG & Context(4)

The field is advancing from basic retrieval to complex memory management, with tools like mem0.ai proposing new architectures that separate working and long-term memory.

The discussion highlights the failure modes of simple RAG in large context windows and argues for more sophisticated "context engineering" and memory architectures.

  • Vaibhav Srivastav@reach_vbrising

    Tested the new 10M context memory window end to end. Surprising failure modes around rag retrieval cache invalidation, thread below.

    1.9k260" 2275· score 2.5k
  • Greg Kamradt@GregKamradtrising

    RAG is dead, long live context engineering. My framework for when to cache, when to retrieve, and when to just dump memory into the prompt.

    820130" 1654· score 1.1k
  • mem0@mem0airising

    Memory layer for agents: differentiating working memory from the subconscious store. Vector index isn't enough anymore.

    48072" 525· score 639
  • LlamaIndex@llamaindexrepeated

    Knowledge graph retrieval walkthrough: when semantic vector search misses, graph hop beats it every time.

    29040" 211· score 376

Other(4)

SaaS tools like Notion and infrastructure like Temporal are converging on workflow orchestration, providing automation primitives at different layers of the stack.

Workspace automation is a prominent theme, with Notion and Linear releasing features to automate internal workflows, mirroring the broader agent trend.

  • Notion@NotionHQrising

    Notion workspace automation is out of beta. Auto-fill tables, chained updates across databases, and a new audit log surface.

    820125" 1238· score 1.1k
  • Linear@linearrising

    Linear now auto-triages incoming issues. Quiet launch, but already our favorite workspace feature of the year.

    46070" 624· score 618
  • Temporal@temporaliorepeated

    Orchestrating agents with durable workflows: replayable, resumable, and multi-worker by default. Walkthrough from our infra team.

    31048" 414· score 418
  • James Clear@jamesclearrepeated

    The best habit tracker is the one you actually open. Three open-source alternatives worth trying.

    28042" 318· score 373

Prompt & Skill Libraries(2)

Prompt engineering is maturing from anecdotal tricks to an industrial process, with platforms treating system prompt selection as a formal hyperparameter tuning problem.

The conversation covers both tactical prompt tricks from production reviews and strategic, large-scale system prompt benchmarking by platforms like Weights & Biases.

  • dotey@doteyrising

    Five prompt tricks learned this week from reviewing 200 production prompts. Short thread.

    51088" 830· score 710
  • Weights & Biases@weights_biasesrising

    System prompt benchmarking at scale: we ran 40k variants across 6 frontier models. The efficient frontier is not where you think.

    42055" 620· score 548

ML & GPU Infrastructure(1)

As agent capabilities scale, the bottleneck is shifting from model architecture to high-quality, poison-resistant training data, a classic ML pattern now repeating itself.

A key challenge highlighted is the curation of datasets for agent training, specifically how to filter synthetic data that harms model generalization.

  • Jerry Liu@jerryjliu0repeated

    Dataset curation for agent training: how we filter synthetic data that looks good but poisons generalization.

    26036" 211· score 338

Recent reports