2026-05-13

Denoise · Twitter

AI agents are moving from demos to deployable infrastructure, with new SDKs, protocols, and security frameworks from major labs solidifying the stack.

Pay attention to the emergence of a standardized agent infrastructure layer, as OpenAI, Anthropic, and Vercel release competing tools for orchestration and deployment.

2026-05-132026-05-13T11:33:31Zrules twitter-v1Healthytweets 25signals 25

Top 3 changes

@AnthropicAI / Claude Code 1.5: A terminal-native coding agent release signals a major shift in developer workflows, moving complex reasoning into the command line.
@OpenAI / Agent SDK: The release of a new SDK with protocol-level primitives points toward standardization in how developers build and orchestrate agents.
@karpathy / Developer Experience: His observation that agent-native workflows will make current IDEs unrecognizable captures the fundamental user experience shift underway.

Strategic insights

#01A new infrastructure layer for AI agents is rapidly solidifying. OpenAI's Agent SDK, Anthropic's Claude Code, Vercel's edge runtimes, and Replit's deployment harness all point to a race to define the standards for agent orchestration and deployment.

#02The primary developer interface is shifting from the IDE to the terminal agent. @karpathy's framing is validated by Anthropic's terminal-native release and immediate developer adoption reports from users like @levelsio, suggesting a fundamental change in coding workflows.

#03As agents become production-ready, security has become the next critical frontier. Disclosures from @AnthropicAI on jailbreaks and new red-teaming frameworks from @GoogleDeepMind show the focus moving from model safety to vulnerabilities in the agent's orchestration and tool-use layers.

#04The context management paradigm is evolving from simple RAG to 'context engineering'. Practitioners like @GregKamradt and startups like @mem0ai are moving beyond basic vector retrieval to more sophisticated systems involving memory layers, caching, and graph traversal.

Categories

Security & Reverse Engineering(3)

The security focus is shifting from model-level safety to vulnerabilities in the agent orchestration and tool-use layers, a concern echoed by Anthropic, Google DeepMind, and practitioners.

Major labs are publicly disclosing agent jailbreaks and releasing red-teaming frameworks as agents begin to see real-world pentesting and deployment.

Anthropic@AnthropicAIrising
Responsible disclosure on a Claude jailbreak chain we patched last week. Full write-up including our red team timeline.
♥ 5.2k↻ 910" 160⟲ 220· score 7.5k· +1 related
Google DeepMind@GoogleDeepMindrising
New red team framework for prompt injection in autonomous agents. Covers cross-tool leakage, scanner evasion, and sandbox escape patterns.
♥ 880↻ 140" 18⟲ 38· score 1.2k
MalwareTech@MalwareTechBlogrepeated
Autonomous agent running pentest flows against a real SaaS. First real-world run: fewer false positives than I expected on the vulnerability surface.
♥ 180↻ 28" 3⟲ 15· score 245

AI Coding Tools & Agents(5)

A new product category is maturing as Claude Code is directly benchmarked against OpenAI's Codex and GitHub's Copilot, signaling intense competition in agentic coding assistants.

Anthropic's release of Claude Code 1.5, a terminal-native agent, is driving a conversation about a fundamental shift in developer workflows away from traditional IDEs.

Anthropic@AnthropicAIrising
Claude Code 1.5 is live. Terminal-native coding agent with full Claude Opus reasoning, file-ops sandbox, and session replay.
♥ 4.8k↻ 820" 140⟲ 190· score 6.9k· +1 related
Andrej Karpathy@karpathyrising
The developer-experience shift from IDE to terminal agent is underrated. Coding workflows are about to look nothing like 2024.
♥ 3.4k↻ 510" 30⟲ 140· score 4.5k
swyx@swyxrising
Codex vs Claude Code terminal agent benchmarks. Pass@1 diverges more than I expected on the long-context editor tasks.
♥ 1.1k↻ 180" 22⟲ 60· score 1.6k
DSPy@dspy_airising
DSPy 3.0: prompt optimization via compile-time search over system prompt variations. Benchmarks inside.
♥ 960↻ 150" 12⟲ 42· score 1.3k
@levelsio@levelsiorising
Switched my whole editor setup to Claude Code this week. Shipping faster than when I used Cursor + Copilot.
♥ 580↻ 40" 6⟲ 80· score 678

AI Infra & Protocols(5)

A convergence on agent protocols is underway, with OpenAI and Anthropic releasing specs while frameworks like LangChain adapt to them and platforms like Vercel provide the compute fabric.

Major platforms including OpenAI, Vercel, and Replit are shipping new SDKs and runtimes for deploying and orchestrating agents, solidifying a new infrastructure layer.

OpenAI@OpenAIrising
New agent SDK: protocol-level tool calling, deployment harness, and multi-worker orchestration primitives. Docs live.
♥ 4.2k↻ 680" 75⟲ 180· score 5.8k
LangChain@LangChainAIrising
MCP protocol integration thread. How to wire existing LangGraph agents into the Anthropic Model Context Protocol server spec.
♥ 920↻ 145" 14⟲ 48· score 1.3k
Vercel@vercelrising
Edge runtime for agent workers is live. Spawn durable background agents from any serverless deployment.
♥ 540↻ 80" 6⟲ 22· score 718
Alex Albert@AlexAlbert__rising
When your security scanner finds nothing scary on an agent deploy, check the orchestration layer again. That's usually where the jailbreak sneaks through.
♥ 420↻ 60" 8⟲ 35· score 564
Replit@replitrising
New agent deployment harness. One command to go from local orchestration to hosted agent worker.
♥ 380↻ 55" 5⟲ 18· score 505

On-device & Multimodal AI(1)

While the agent conversation dominates, foundational dataset releases like Mistral's remain critical for enabling new model capabilities, particularly in document and image understanding.

Mistral AI released a large-scale, cleaned 100M-row web OCR dataset for public use in model training.

Mistral AI@MistralAIrising
Open dataset release: 100M-row web OCR dataset. Cleaned, licensed, ready to train.
♥ 2.6k↻ 390" 30⟲ 88· score 3.5k

Memory, RAG & Context(4)

The dialogue is shifting from 'RAG vs. long context' to 'context engineering', a more nuanced approach combining caching, retrieval strategies, and memory architectures from actors like @GregKamradt and @mem0ai.

Developers are exploring new frameworks beyond simple RAG to manage state and memory in agents with massive context windows.

Vaibhav Srivastav@reach_vbrising
Tested the new 10M context memory window end to end. Surprising failure modes around rag retrieval cache invalidation, thread below.
♥ 1.9k↻ 260" 22⟲ 75· score 2.5k
Greg Kamradt@GregKamradtrising
RAG is dead, long live context engineering. My framework for when to cache, when to retrieve, and when to just dump memory into the prompt.
♥ 820↻ 130" 16⟲ 54· score 1.1k
mem0@mem0airising
Memory layer for agents: differentiating working memory from the subconscious store. Vector index isn't enough anymore.
♥ 480↻ 72" 5⟲ 25· score 639
LlamaIndex@llamaindexrepeated
Knowledge graph retrieval walkthrough: when semantic vector search misses, graph hop beats it every time.
♥ 290↻ 40" 2⟲ 11· score 376

Other(4)

Agentic automation principles are being integrated directly into SaaS products, as seen with Notion and Linear, to automate user workflows within their existing platforms.

Workspace automation features are becoming standard, with Notion and Linear releasing tools for auto-triaging and chained database updates.

Notion@NotionHQrising
Notion workspace automation is out of beta. Auto-fill tables, chained updates across databases, and a new audit log surface.
♥ 820↻ 125" 12⟲ 38· score 1.1k
Linear@linearrising
Linear now auto-triages incoming issues. Quiet launch, but already our favorite workspace feature of the year.
♥ 460↻ 70" 6⟲ 24· score 618
Temporal@temporaliorepeated
Orchestrating agents with durable workflows: replayable, resumable, and multi-worker by default. Walkthrough from our infra team.
♥ 310↻ 48" 4⟲ 14· score 418
James Clear@jamesclearrepeated
The best habit tracker is the one you actually open. Three open-source alternatives worth trying.
♥ 280↻ 42" 3⟲ 18· score 373

Prompt & Skill Libraries(2)

The art of prompt engineering is becoming a data-driven science, with platforms like Weights & Biases enabling optimization across thousands of variants, replacing anecdotal tricks with empirical evidence.

Practitioners and labs are moving towards systematic, large-scale benchmarking of system prompts to identify what works reliably across different models.

dotey@doteyrising
Five prompt tricks learned this week from reviewing 200 production prompts. Short thread.
♥ 510↻ 88" 8⟲ 30· score 710
Weights & Biases@weights_biasesrising
System prompt benchmarking at scale: we ran 40k variants across 6 frontier models. The efficient frontier is not where you think.
♥ 420↻ 55" 6⟲ 20· score 548

ML & GPU Infrastructure(1)

As agent training scales, the key infrastructure challenge becomes less about raw data volume and more about building sophisticated pipelines for data curation, a point raised by @jerryjliu0.

The focus in agent training is shifting toward data quality, particularly techniques for curating and filtering synthetic data to avoid harming model generalization.

Jerry Liu@jerryjliu0repeated
Dataset curation for agent training: how we filter synthetic data that looks good but poisons generalization.
♥ 260↻ 36" 2⟲ 11· score 338