Episode 54: Claude Code 2.1.144, Cursor Composer 2.5

AgentStack Daily EP054 — Claude Code 2.1.144, Cursor Composer 2.5, Stainless, Notion, Vercel AI SDK, and Cloudflare Mesh

Release Coverage Check

OpenClaw — Latest stable/verified release 2026.5.18 from https://api.github.com/repos/openclaw/openclaw/releases?per_page=10 with prereleases skipped. Newer tags in the feed are prerelease betas and are skipped. The latest stable release is present in the recent version scan, so no new OpenClaw host release qualifies for this episode.
Hermes Agent — Latest stable/verified release 2026.5.16 / Hermes Agent 0.14.0 from https://api.github.com/repos/NousResearch/hermes-agent/releases?per_page=10 with prereleases skipped. The latest stable release is present in the recent version scan, so no new Hermes Agent release qualifies for this episode.
OpenAI Codex app/CLI — Latest stable/verified release rust-v0.131.0 from https://api.github.com/repos/openai/codex/releases?per_page=10 with prereleases skipped; newer feed entries are alpha builds. The latest stable release is present in the recent version scan, so no new Codex release qualifies for this episode.
Claude Code CLI — Latest verified npm package version 2.1.144 from npm view @anthropic-ai/claude-code version; concrete changes verified in Anthropic's Claude Code changelog at https://code.claude.com/docs/en/changelog. Recent episode version tags detected: 2.1.141, 2.1.142, 2.1.143. Selected release: 2.1.144.
Candidate verification — OpenClaw, Hermes Agent, and Codex each stop because their latest verified stable release is present in the recent version scan. Claude Code CLI has a latest-contiguous uncovered block starting at its latest verified version. The selected agent-stack release for this episode is therefore Claude Code CLI 2.1.144, which leads the episode.

Episode Title

Claude Code 2.1.144, Cursor Composer 2.5, Stainless, Notion, Vercel AI SDK, and Cloudflare Mesh

Tagline

Harden background coding sessions, run a cheaper long-horizon coding model, watch SDK code generation move in-house, turn a workspace into an agent runtime, bridge LangGraph to the AI SDK, and put a zero-trust network under the agent lifecycle.

Feed Description

AgentStack Daily EP054 opens on concrete release work: Claude Code CLI 2.1.144 stabilizes background and detached agent sessions, fixes a long startup hang when the API endpoint is unreachable, repairs MCP pagination and unsupported-image handling, adds background-session resume and a session-scoped model picker, and tightens read-before-edit and search-error behavior. Then five more builder-relevant moves: Cursor Composer 2.5, a Kimi K2.5-based coding model at roughly a tenth of frontier per-token cost; Anthropic acquiring Stainless and pulling SDK code generation in-house; Notion's Developer Platform turning the workspace into a hosted agent runtime with Workers and an External Agent API; the Vercel AI SDK rewriting its LangChain and LangGraph adapter; and Cloudflare Mesh putting zero-trust networking and identity under the agent lifecycle.

Story Slate

1. Agent-stack release readout: Claude Code CLI 2.1.144 hardens background sessions, startup, and MCP behavior

Claude Code CLI 2.1.144 is the agent-stack release this episode and it concentrates on the surfaces long-running agents actually break on: background and detached sessions, startup resilience, MCP transport, and tool-call hygiene. It fixes a startup hang of up to 75 seconds when api.anthropic.com is unreachable behind a captive portal, firewall, or VPN by timing out side-channel calls after 15 seconds; fixes MCP servers with paginated tools/list responses only returning the first page; saves MCP images with unsupported MIME types to disk instead of breaking the conversation; adds /resume for background sessions and elapsed-duration completion notices; makes /model session-scoped with a separate default; and stops head/tail views and empty grep/git diff results from registering as tool failures. It also repairs background-session crashes under macOS Full Disk Access folders, scrolling and Ctrl+C handling in attached background sessions on Windows, resumed sessions inheriting the wrong model, and false startup-crash marking on wake. Technical depth angle: explain the background-session lifecycle (spawn, detach, retire, wake, respawn, resume), worktree isolation guards, MCP tools/list pagination and image MIME handling, the read-before-edit invariant that search tools must satisfy, side-channel API timeout behavior on degraded networks, and the failure modes each fix removes from unattended agent runs. Primary link: https://code.claude.com/docs/en/changelog

2. Cursor Composer 2.5 ships a cheaper long-horizon coding-agent model on a Kimi K2.5 base

Cursor released Composer 2.5 on May 18, a coding-agent model built on a Kimi K2.5 base with heavier post-training, aimed at longer autonomous coding sessions. Cursor reports SWE-Bench Multilingual rising from 73.7 to 79.8 percent and Terminal-Bench from 61.7 to 69.3 percent, tying Opus 4.7 on Terminal-Bench 2.0 while trailing GPT-5.5, at standard pricing of 50 cents per million input tokens and 2.50 dollars per million output tokens — roughly a tenth of Opus 4.7 per token. The reported training shifts are the interesting part: textual-feedback reinforcement learning that returns localized hints at failed tool calls instead of only end-of-run rewards, 25x more synthetic tasks including feature-deletion rebuild puzzles, and MoE-scale training infrastructure using sharded Muon optimizers and dual-mesh HSDP, with RL run inside real Cursor sessions using the deployed harness. Technical depth angle: explain reward shaping for tool-using coding agents, why localized textual feedback changes long-horizon credit assignment, synthetic-task generation and the rebuild-puzzle objective, MoE optimizer sharding and HSDP meshes, harness-faithful RL, and how fully-loaded cost per completed task plus tiered model routing changes model selection for agent builders. Primary links: https://the-decoder.com/cursors-composer-2-5-matches-opus-4-7-and-gpt-5-5-benchmarks-at-a-fraction-of-the-cost/ and https://winbuzzer.com/2026/05/18/cursor-releases-composer-25-saying-its-better-at-s-xcxwbn/

3. Anthropic acquires Stainless and pulls SDK code generation in-house

Anthropic announced on May 18 that it acquired Stainless, the developer-tools company whose service converts API specifications into production-ready, auto-maintained SDKs across Python, TypeScript, Go, Kotlin, and Java, and which was used by OpenAI, Google, Cloudflare, and others. Anthropic plans to wind down the hosted Stainless products including the SDK generator; existing customers keep already-generated SDKs but lose future access to the hosted service. This is more than an acquisition headline for agent builders, because the SDK is the typed surface an agent uses to call an external API, and a code-generation pipeline that drifts from the spec produces tool clients that silently mismatch the live API. Technical depth angle: explain spec-to-SDK code generation, why typed clients are the real tool-call boundary for agents, SDK/spec drift as a silent agent failure mode, the maintenance cost of hand-written multi-language clients, what teams that relied on the hosted generator should evaluate now (OpenAPI generators, vendor SDKs, MCP-wrapped clients), spec-pinning and drift monitoring, and the supply-concentration risk when one lab owns a shared toolchain dependency. Primary link: https://techcrunch.com/2026/05/18/anthropic-has-acquired-the-dev-tools-startup-used-by-openai-google-and-cloudflare/

4. Notion's Developer Platform turns the workspace into a hosted agent runtime

Notion launched its Developer Platform on May 13, turning the workspace into a place agents run, not just a place they read. The platform adds Workers, a hosted code sandbox with no servers to provision; an External Agent API that lets third-party agents such as Claude Code, Cursor, and Codex operate as first-class workspace participants; database sync that keeps external systems of record fresh in Notion without infrastructure; bidirectional webhooks where a Worker receives an event, runs logic, and acts back in Notion or calls other APIs; and a CLI for authenticating, deploying Workers, and automating from the terminal. Custom agent tools can be implemented as deterministic Workers that agents invoke for predictable, token-efficient execution instead of LLM-mediated tool calls. Technical depth angle: explain hosted sandbox runtimes for agent code, deterministic Worker tools versus LLM-mediated tool calls, the External Agent API as a multi-vendor agent integration surface, webhook-driven agent triggering, sync-as-Workers data freshness, and the governance and trust-boundary implications of running third-party agents inside a workspace. Primary links: https://www.notion.com/blog/introducing-developer-platform and https://techcrunch.com/2026/05/13/notion-just-turned-its-workspace-into-a-hub-for-ai-agents/

5. Vercel AI SDK rewrites its LangChain and LangGraph adapter

The @ai-sdk/langchain adapter has been rewritten to bridge LangChain and LangGraph cleanly into the AI SDK, which matters because most teams do not run a single framework end to end. New APIs include toBaseMessages and convertModelMessages for converting AI SDK message objects into LangChain BaseMessage format, toUIMessageStream for transforming LangChain model streams, LangGraph output, and streamEvents() results into the AI SDK UIMessageStream, and LangSmithDeploymentTransport, a ChatTransport that connects a browser client directly to a LangSmith or LangGraph deployment without a custom backend route. Technical depth angle: explain message-format interop across agent frameworks, streaming event normalization including granular streamEvents() observability, typed custom data parts, the transport abstraction that removes backend glue between a UI and a deployed graph, and why framework-bridging adapters are load-bearing infrastructure for mixed-stack agent builders. Primary link: https://ai-sdk.dev/providers/adapters/langchain

6. Cloudflare Mesh puts zero-trust networking and identity under the agent lifecycle

Cloudflare's agent-cloud push includes Mesh, which applies zero-trust private networking and identity to how agents reach services and each other, alongside dated developer-tooling changes such as the May 18 removal of the legacy wrangler dev --remote flag for KV-backed Durable Objects. The point for builders is that as agents move from one process on a laptop to many sandboxed workers calling internal and external services, the network between them stops being an implementation detail and becomes an attack surface and a policy boundary. Technical depth angle: explain zero-trust networking for agent-to-service and agent-to-agent calls, per-agent identity and scoped credentials versus shared keys, why the agent lifecycle (spawn, act, retire) needs network policy attached to identity, Durable Object state and local-versus-remote dev parity, and the operational risk of agents acting with broad ambient network access. Primary links: https://www.cloudflare.com/press/press-releases/2026/cloudflare-launches-mesh-to-secure-the-ai-agent-lifecycle/ and https://blog.cloudflare.com/agents-week-in-review/

Extra Research Candidates

Databricks ships Unity AI Gateway as a governance layer for agentic AI — Primary source: https://www.databricks.com/blog/ai-gateway-governance-layer-agentic-ai. Technical depth angle: explain on-behalf-of-user execution for MCP calls, full request/response audit to Delta tables, LLM-judge guardrails for PII, prompt injection and exfiltration, per-endpoint rate limits, and dollar-cost accounting to Unity Catalog system tables as a single governed model-and-tool access plane.
Subquadratic ships SubQ with subquadratic sparse attention and a 12M-token context — Primary source: https://llm-stats.com/llm-updates. Technical depth angle: explain subquadratic sparse attention versus dense attention at long context, the compute and KV-cache scaling implications, and why the attention architecture, not just the context number, is the prerequisite for durable long-horizon agents.
Cloudflare Agents Week ships Dynamic Workers and Sandboxes GA for agent execution — Primary source: https://blog.cloudflare.com/agents-week-in-review/. Technical depth angle: explain isolate-based dynamic sandboxes versus container cold starts, persistent Linux sandboxes for stateful agent execution, and the latency and isolation tradeoffs of running untrusted agent-generated code at the edge.

Show Notes

[00:00] Open on the Claude Code CLI changes
Claude Code CLI 2.1.144 is the release to inspect first because it targets the exact surfaces unattended agents fail on: background and detached sessions, startup behavior on degraded networks, MCP transport, and tool-call hygiene. The headline fixes are concrete. A startup hang of up to seventy-five seconds when the API endpoint is unreachable behind a captive portal, firewall, or VPN is gone, because side-channel calls now time out after fifteen seconds. MCP servers that paginate their tools list no longer return only the first page. MCP images with unsupported MIME types are saved to disk instead of breaking the conversation. Background sessions get resume support and elapsed-duration completion notices, and the model picker is now session-scoped with a separate default.

This is a maintenance release, and that is the point. The work is in the failure modes: a detached agent that should keep running, a tool list that was silently truncated, an image that broke a conversation, a resumed session that picked up the wrong model. After the release readout, the episode covers five more builder-relevant moves: Cursor Composer 2.5 as a cheaper long-horizon coding model, Anthropic acquiring Stainless and pulling SDK code generation in-house, Notion turning its workspace into a hosted agent runtime, the Vercel AI SDK rewriting its LangChain and LangGraph adapter, and Cloudflare Mesh putting zero-trust networking under the agent lifecycle.

[02:30] Agent-stack release readout — Claude Code CLI 2.1.144
Start with the startup hang, because it is the clearest example of a fix that matters more for agents than for interactive users. When the API endpoint was unreachable, the CLI could block for up to seventy-five seconds before doing anything useful. A human notices and waits. An unattended agent run, a scheduled job, or a background session on a flaky network turns that into a stall, a timeout, or a missed window. The fix caps side-channel calls at fifteen seconds. The lesson for builders is that startup resilience on degraded networks is an agent reliability property, not a cosmetic one.

The MCP fixes are the second important block. MCP servers with paginated tools-list responses previously returned only the first page, which means an agent could be silently missing tools it was supposed to have. That is a quiet correctness bug: nothing errors, the agent simply cannot do something it should be able to do, and the run looks like a reasoning failure instead of a transport bug. The release also stops MCP images with unsupported MIME types, such as SVG, from breaking the conversation; the image is saved to disk and referenced instead. And the MCP list command now reports the real problem when a config file cannot be parsed, instead of silently showing no servers.

Background and detached sessions get the most individual fixes, which tells you where the real operational pain has been. Background sessions now support resume and show elapsed duration on completion. Background sessions crashing on macOS when the project lives under a Full Disk Access-protected folder is fixed. Scrolling, mouse wheel, and navigation in attached background sessions on Windows now work, and closing the terminal while attached no longer crashes. Resumed sessions keep the model they were using instead of inheriting another session's choice. Edit and Write no longer refuse with a worktree-isolation error right after detaching. Respawn no longer misreports a running session as stopped, and a brief failure to wake is no longer permanently marked as a startup crash. Together these make the detach, run, wake, respawn, and resume lifecycle something a builder can put a supervisor around.

Tool-call hygiene is the fourth area. Head and tail views now satisfy the read-before-edit check, and empty results from grep, git grep, or git diff are no longer reported as tool failures. False tool errors are not free: an agent that thinks a successful no-match search failed will retry, second-guess, or take a worse path. Removing spurious failures removes spurious agent behavior and wasted turns. The model picker is now session-scoped, with a separate default for new sessions, so changing the model for one task does not silently change it everywhere, including for Bedrock and Vertex users selecting a long-context Opus option.

The practical upgrade posture is to install 2.1.144 and then exercise the changed surfaces rather than assume them. Start a background session, detach it, wake it, resume it, and confirm it keeps its model. Run an MCP server that paginates its tool list and confirm the full set is visible. Feed an unsupported image type through an MCP tool. Run on a network where the API endpoint is briefly unreachable and confirm startup no longer stalls. The release is only valuable if the failure modes it removes are the ones your agents were actually hitting.

[18:00] Cursor Composer 2.5 — a cheaper long-horizon coding-agent model
Cursor released Composer 2.5 on May 18, built on a Kimi K2.5 base with heavier post-training and aimed at longer autonomous coding sessions. The reported numbers: SWE-Bench Multilingual rising from 73.7 to 79.8 percent, Terminal-Bench from 61.7 to 69.3 percent, a tie with Opus 4.7 on Terminal-Bench 2.0 while trailing GPT-5.5, at fifty cents per million input tokens and two dollars fifty per million output tokens. The headline is price: roughly a tenth of Opus 4.7 per token at comparable coding-benchmark performance.

The training method is the part worth explaining. Cursor reports three shifts. First, textual-feedback reinforcement learning: instead of only an end-of-run reward, the model gets localized hints at failed tool calls. For a long-horizon coding agent that is a credit-assignment change. A single pass-or-fail signal at the end of a long session tells the model very little about which of fifty tool calls was the mistake. Localized textual feedback at the point of failure gives a much sharper learning signal. Second, twenty-five times more synthetic tasks, including feature-deletion rebuild puzzles with exact ground truth. Third, MoE-scale training infrastructure using sharded Muon optimizers and dual-mesh HSDP, with the reinforcement learning run inside real Cursor sessions using the same harness the deployed model uses.

That last detail, harness-faithful RL, is the one builders should not skim. A coding agent's behavior is shaped as much by the harness as by the weights: how tools are presented, how errors come back, how context is trimmed, how retries work. Training the model in a different harness than the one it ships in introduces a distribution gap that shows up as the model feeling worse in production than in evaluation. Running RL inside the deployed harness closes that gap. The builder takeaway is economic: when a model reaches frontier-adjacent coding benchmarks at a tenth of the per-token cost, the math on running many long sessions changes, and a cheap-default-with-frontier-escalation routing pattern becomes attractive. The caution is that benchmark parity is not workflow parity; the real test is fully-loaded cost per completed task in your own harness on your own long-session distribution.

[28:00] Anthropic acquires Stainless and brings SDK code generation in-house
Anthropic announced on May 18 that it acquired Stainless, the developer-tools company whose service turns API specifications into production-ready, auto-maintained SDKs across Python, TypeScript, Go, Kotlin, and Java. Stainless was used by a long list of AI labs and infrastructure companies. Anthropic plans to wind down the hosted Stainless products, including the SDK generator; existing customers keep the SDKs already generated but lose future access to the hosted service.

The reason this is an agent-stack story is what an SDK actually is in an agent system. The SDK is the typed boundary an agent crosses every time it calls an external API. When an agent invokes a tool that wraps a service, the correctness of that call depends on the client matching the live API: the right endpoints, request and response shapes, error types, and pagination behavior. A code-generation pipeline that converts a spec into that client, and keeps it in sync as the spec changes, is infrastructure directly under the agent's tool layer. The failure mode is spec-to-SDK drift: a client that compiles and looks fine but silently mismatches the live API. For a human that surfaces as a bug report; for an autonomous agent it surfaces as a tool call returning something unexpected that the agent then reasons around incorrectly. For teams that relied on the hosted generator, the choice is open-source OpenAPI generators, vendor SDKs, or wrapping the API behind a stable internal contract such as an MCP server, and the builder pattern is to pin the spec version, diff the live spec on a schedule, and treat drift as an alert rather than a discovery made when an agent starts behaving strangely.

[36:00] Notion's Developer Platform turns the workspace into a hosted agent runtime
Notion launched its Developer Platform on May 13. The shift is that the workspace becomes a place agents run, not just a place they read. Workers are a hosted code sandbox with no servers to provision. The External Agent API lets third-party agents such as Claude Code, Cursor, and Codex act as first-class workspace participants. Database sync keeps external systems of record fresh inside Notion without infrastructure. Bidirectional webhooks let a Worker receive an event, run logic, and act back in Notion or call other APIs. A CLI handles auth, Worker deploy, and automation from the terminal.

The technically interesting piece for builders is deterministic Worker tools. Instead of an LLM-mediated tool call, a custom agent can invoke a Worker that runs predictable code with token-efficient execution. That is the right pattern when a step needs determinism and custom logic that a model-mediated call cannot guarantee. The tradeoff to reason about is the trust boundary: running third-party agents and custom code inside a workspace that holds real company data means the governance model — progressive trust, human review, sandboxed execution, unified activity visibility — is doing load-bearing work, not decoration. Builders should treat the External Agent API as a multi-vendor integration surface and the trust boundary as something to design, not inherit.

[42:00] Vercel AI SDK rewrites its LangChain and LangGraph adapter
The `@ai-sdk/langchain` adapter rewrite matters because most teams do not run one framework end to end. They prototype in one, deploy in another, and need the message and stream formats to interoperate without hand-written glue. The new adapter provides `toBaseMessages` and `convertModelMessages` to convert AI SDK message objects into LangChain `BaseMessage` format, and `toUIMessageStream` to transform LangChain model streams, LangGraph output, and `streamEvents()` results into the AI SDK `UIMessageStream`. `LangSmithDeploymentTransport` is a `ChatTransport` that connects a browser client directly to a LangSmith or LangGraph deployment with no custom backend route.

The builder lens is interop as infrastructure. Streaming event normalization, including granular `streamEvents()` for observability and typed custom data parts, is what lets a UI built on one stack render an agent built on another without lossy translation. The transport abstraction removing backend glue between a browser and a deployed graph is a real reduction in moving parts. Framework-bridging adapters are not a convenience; for mixed-stack agent builders they are the seam that keeps a heterogeneous stack from fragmenting into bespoke connectors.

[46:00] Cloudflare Mesh puts zero-trust networking under the agent lifecycle
Cloudflare's agent-cloud push includes Mesh, which applies zero-trust private networking and identity to how agents reach services and each other, plus dated developer-tooling changes such as the May 18 removal of the legacy `wrangler dev --remote` flag for KV-backed Durable Objects. The builder point is that as agents move from one process on a laptop to many sandboxed workers calling internal and external services, the network between them stops being an implementation detail. It becomes an attack surface and a policy boundary. Per-agent identity with scoped credentials beats shared ambient keys, network policy should attach to identity across the spawn-act-retire lifecycle, and local-versus-remote dev parity for Durable Object state is the kind of detail that decides whether an agent behaves the same in development and production. The recommendation is to treat the agent network as something you design with identity and scoped policy, not something agents inherit with broad ambient access.

[50:00] Closing: upgrade priorities
For Claude Code, install 2.1.144 and validate the changed surfaces directly: background-session detach, wake, respawn, and resume; MCP tool-list pagination and unsupported-image handling; startup behavior on a degraded network; and that resumed sessions keep the right model. For model selection, benchmark Composer 2.5 inside your own harness on your own long-session tasks and compare fully-loaded cost per completed task, not headline pass rates. For tool clients, audit where SDK or client generation comes from and treat spec-to-client drift as a monitored failure mode. For workspace agents, treat Notion's External Agent API as a multi-vendor surface and design the trust boundary deliberately. For mixed stacks, use the rewritten Vercel adapter to bridge LangGraph and the AI SDK instead of hand-rolling connectors. For agent networking, attach identity and scoped policy to the agent lifecycle rather than relying on broad ambient access.