2026-05-28

Denoise · Twitter

Autonomous agents move from research to production, with a new infrastructure stack and attack surface taking shape.

Pay attention to the convergence on terminal-native coding agents and the new orchestration protocols from OpenAI and Anthropic defining the agent infrastructure layer.

2026-05-282026-05-28T12:35:15Zrules twitter-v1Healthytweets 25signals 25

Top 3 changes

  • AnthropicAI / AI Coding Agents: The release of Claude Code 1.5, a terminal-native agent, marks a structural shift in developer tooling away from IDE-centric assistants.
  • OpenAI / AI Infrastructure: A new agent SDK with protocol-level primitives signals a push to standardize the agent orchestration layer.
  • AnthropicAI / Security: The responsible disclosure of a complex agent jailbreak highlights the new, critical attack surface emerging around agent orchestration.

Strategic insights

#01A new developer tool paradigm is emerging around terminal-native agents, with Anthropic's Claude Code 1.5 directly challenging the incumbent IDE/Copilot model, a shift validated by influential voices like @karpathy.
#02The battle to define the agent infrastructure layer is heating up. OpenAI's new agent SDK and Anthropic's Model Context Protocol (MCP) represent competing standards for orchestration, while Vercel and Replit are building the serverless runtimes to host them.
#03As autonomous agents become production-ready, security is shifting from theoretical prompt injection to practical exploits of the orchestration layer. Disclosures from Anthropic and frameworks from Google DeepMind show this is now a primary concern.
#04The discussion on LLM memory is maturing beyond simple RAG. Practitioners like @GregKamradt and dedicated tools like @mem0ai are promoting 'context engineering'—a more sophisticated approach to memory management involving caching, retrieval strategies, and tiered memory stores.

Categories

Security & Reverse Engineering(3)

The security conversation is rapidly shifting from LLM prompt injection to the more complex attack surface of agent orchestration and tool interaction, with both Anthropic and Google DeepMind publishing formal frameworks.

Major labs are disclosing agent jailbreaks and releasing red-teaming frameworks, while practitioners are testing agent-based pentesting in the wild.

  • Anthropic@AnthropicAIrising

    Responsible disclosure on a Claude jailbreak chain we patched last week. Full write-up including our red team timeline.

    5.2k910" 160220· score 7.5k· +1 related
  • Google DeepMind@GoogleDeepMindrising

    New red team framework for prompt injection in autonomous agents. Covers cross-tool leakage, scanner evasion, and sandbox escape patterns.

    880140" 1838· score 1.2k
  • MalwareTech@MalwareTechBlogrepeated

    Autonomous agent running pentest flows against a real SaaS. First real-world run: fewer false positives than I expected on the vulnerability surface.

    18028" 315· score 245

AI Coding Tools & Agents(5)

The developer tool landscape is re-centering around terminal agents, with Anthropic's Claude Code 1.5 directly competing with the Copilot/IDE paradigm, a shift endorsed by influential figures like @karpathy.

Anthropic released Claude Code 1.5, a terminal-native agent, prompting discussion on its performance against existing tools and the broader shift away from IDE-centric workflows.

  • Anthropic@AnthropicAIrising

    Claude Code 1.5 is live. Terminal-native coding agent with full Claude Opus reasoning, file-ops sandbox, and session replay.

    4.8k820" 140190· score 6.9k· +1 related
  • Andrej Karpathy@karpathyrising

    The developer-experience shift from IDE to terminal agent is underrated. Coding workflows are about to look nothing like 2024.

    3.4k510" 30140· score 4.5k
  • swyx@swyxrising

    Codex vs Claude Code terminal agent benchmarks. Pass@1 diverges more than I expected on the long-context editor tasks.

    1.1k180" 2260· score 1.6k
  • DSPy@dspy_airising

    DSPy 3.0: prompt optimization via compile-time search over system prompt variations. Benchmarks inside.

    960150" 1242· score 1.3k
  • @levelsio@levelsiorising

    Switched my whole editor setup to Claude Code this week. Shipping faster than when I used Cursor + Copilot.

    58040" 680· score 678

AI Infra & Protocols(5)

A battle for the standard agent protocol is beginning, with OpenAI's new SDK and Anthropic's Model Context Protocol (via LangChain) representing competing visions for multi-agent orchestration.

OpenAI, Vercel, and Replit released new primitives for agent orchestration and deployment, focusing on SDKs, protocols, and serverless runtimes.

  • OpenAI@OpenAIrising

    New agent SDK: protocol-level tool calling, deployment harness, and multi-worker orchestration primitives. Docs live.

    4.2k680" 75180· score 5.8k
  • LangChain@LangChainAIrising

    MCP protocol integration thread. How to wire existing LangGraph agents into the Anthropic Model Context Protocol server spec.

    920145" 1448· score 1.3k
  • Vercel@vercelrising

    Edge runtime for agent workers is live. Spawn durable background agents from any serverless deployment.

    54080" 622· score 718
  • Alex Albert@AlexAlbert__rising

    When your security scanner finds nothing scary on an agent deploy, check the orchestration layer again. That's usually where the jailbreak sneaks through.

    42060" 835· score 564
  • Replit@replitrising

    New agent deployment harness. One command to go from local orchestration to hosted agent worker.

    38055" 518· score 505

On-device & Multimodal AI(1)

Foundational model labs continue to release high-quality, open datasets like Mistral AI's OCR data, enabling the broader community to train specialized models and commoditizing a key part of the data pipeline.

Mistral AI released a large-scale, cleaned OCR dataset for training multimodal models.

  • Mistral AI@MistralAIrising

    Open dataset release: 100M-row web OCR dataset. Cleaned, licensed, ready to train.

    2.6k390" 3088· score 3.5k

Memory, RAG & Context(4)

The naive approach of 'just use RAG' is being replaced by more complex memory architectures, with @GregKamradt, @mem0ai, and LlamaIndex all proposing different layers of memory management beyond simple vector retrieval.

The focus shifts from simply increasing context window size to sophisticated 'context engineering,' including caching strategies and specialized memory layers for agents.

  • Vaibhav Srivastav@reach_vbrising

    Tested the new 10M context memory window end to end. Surprising failure modes around rag retrieval cache invalidation, thread below.

    1.9k260" 2275· score 2.5k
  • Greg Kamradt@GregKamradtrising

    RAG is dead, long live context engineering. My framework for when to cache, when to retrieve, and when to just dump memory into the prompt.

    820130" 1654· score 1.1k
  • mem0@mem0airising

    Memory layer for agents: differentiating working memory from the subconscious store. Vector index isn't enough anymore.

    48072" 525· score 639
  • LlamaIndex@llamaindexrepeated

    Knowledge graph retrieval walkthrough: when semantic vector search misses, graph hop beats it every time.

    29040" 211· score 376

Other(4)

The concept of orchestration is becoming mainstream, with workflow engines like Temporal positioning themselves for AI agents, while productivity tools like Notion and Linear implement similar, albeit simpler, automation patterns.

Workspace automation features are launching in Notion and Linear, while Temporal discusses durable workflows for orchestrating agents.

  • Notion@NotionHQrising

    Notion workspace automation is out of beta. Auto-fill tables, chained updates across databases, and a new audit log surface.

    820125" 1238· score 1.1k
  • Linear@linearrising

    Linear now auto-triages incoming issues. Quiet launch, but already our favorite workspace feature of the year.

    46070" 624· score 618
  • Temporal@temporaliorepeated

    Orchestrating agents with durable workflows: replayable, resumable, and multi-worker by default. Walkthrough from our infra team.

    31048" 414· score 418
  • James Clear@jamesclearrepeated

    The best habit tracker is the one you actually open. Three open-source alternatives worth trying.

    28042" 318· score 373

Prompt & Skill Libraries(2)

Prompt engineering is evolving into a systematic, data-driven discipline, with efforts from Weights & Biases and individual engineers demonstrating a move from anecdotal tricks to scalable, benchmark-driven optimization.

Practitioners are sharing empirical results from large-scale system prompt benchmarking and reviews of production prompts.

  • dotey@doteyrising

    Five prompt tricks learned this week from reviewing 200 production prompts. Short thread.

    51088" 830· score 710
  • Weights & Biases@weights_biasesrising

    System prompt benchmarking at scale: we ran 40k variants across 6 frontier models. The efficient frontier is not where you think.

    42055" 620· score 548

ML & GPU Infrastructure(1)

As agent training relies more on synthetic data, curation and filtering techniques, as highlighted by @jerryjliu0, are becoming a critical and non-obvious part of the MLOps stack to ensure model robustness.

A practitioner shared insights on the critical process of curating and filtering synthetic data to avoid poisoning agent generalization during training.

  • Jerry Liu@jerryjliu0repeated

    Dataset curation for agent training: how we filter synthetic data that looks good but poisons generalization.

    26036" 211· score 338

Recent reports