
OpenClaw v2026.5.7, Agents SDK Sandboxes, and Gemini CLI Safety Runtime
OpenClaw Daily covers OpenClaw v2026.5.7, focusing on ClawHub publish recovery, `openai/chat-latest`, cron JSON status, channel listing, native-command owner enforcement, Active Memory admin scope, skills snapshot refresh, before-tool authorization for inline skill dispatch, SecretRef-backed Tavily credentials, context cache invalidation, Discord target parsing, compaction token clamping, delivery failure reporting, Discord voice probes, Telegram poller liveness, WhatsApp routing, and Codex approval handling. The episode then explains OpenAI Agents JS sandbox workspace contracts, realtime defaults, tool concurrency, MCP tool naming, and local artifact boundaries, before closing with Gemini CLI changes around shell safety evals, A2A approval races, compression queues, OAuth hangs, and Auto Memory patch allowlists. Show notes: https://tobyonfitnesstech.com/podcasts/episode-47/
🎧 Listen to EpisodeEP047 — OpenClaw v2026.5.7, Agents SDK Sandboxes, and Gemini CLI Safety Runtime
Release Coverage Check
- GitHub stable-release list, latest first:
v2026.5.7,v2026.5.6,v2026.5.5,v2026.5.4,v2026.5.3-1,v2026.5.3. - Recent-file tag set from the last five show-note files includes
v2026.5.6,v2026.5.5,v2026.5.4,v2026.5.3-1,v2026.5.3,v2026.5.2,v2026.4.29, and earlier stable tags. - Candidate verification: the walk starts at the latest stable release,
v2026.5.7, and stops immediately at the first already-covered stable release,v2026.5.6. - Result: include OpenClaw release coverage for
v2026.5.7only.
Episode Title
OpenClaw v2026.5.7, Agents SDK Sandboxes, and Gemini CLI Safety Runtime
Tagline
OpenClaw v2026.5.7 tightens plugin publishing, channel routing, cron status, auth boundaries, memory controls, delivery reporting, voice capture, and Codex approvals; then the episode dives into JavaScript agent sandboxes and Gemini CLI runtime hardening.
Feed Description
OpenClaw Daily covers OpenClaw v2026.5.7, focusing on ClawHub publish recovery, openai/chat-latest, cron JSON status, channel listing, native-command owner enforcement, Active Memory admin scope, skills snapshot refresh, before-tool authorization for inline skill dispatch, SecretRef-backed Tavily credentials, context cache invalidation, Discord target parsing, compaction token clamping, delivery failure reporting, Discord voice probes, Telegram poller liveness, WhatsApp routing, and Codex approval handling. The episode then explains OpenAI Agents JS sandbox workspace contracts, realtime defaults, tool concurrency, MCP tool naming, and local artifact boundaries, before closing with Gemini CLI changes around shell safety evals, A2A approval races, compression queues, OAuth hangs, and Auto Memory patch allowlists.
Story Slate
1. OpenClaw v2026.5.7 Makes Publishing, Channels, Cron, Auth, Memory, Delivery, Voice, WhatsApp, and Codex Approvals More Observable and Safer
OpenClaw v2026.5.7 is the valid release block for EP047 and should lead the episode. The release spans plugin publish verification, openai/chat-latest, cron status JSON, cleaner channel CLI output, native command owner enforcement, admin-only global memory toggles, skills snapshot invalidation, inline skill authorization hooks, Tavily SecretRef credential resolution, managed plugin lifecycle shell consistency, context cache invalidation, Discord channel target parsing, compaction token clamping, cron repair, Telegram allowlists and poller liveness, delivery success reporting, Discord voice permission probes and capture tuning, WhatsApp LID routing, Codex approval semantics, and provider transcript/media fixes. Technical depth angle: explain publish retry and post-publish package verification, moving operational auth state out of channel listing, computed cron status as an API contract, before-tool-call authorization for auto-reply inline skills, runtime config snapshots for SecretRef-backed tools, context-view cache invalidation after resets, provider-prefixed Discord target parsing, max-token clamping against model output limits, fail-fast delivery before token spend, voice capture silence-grace tuning, Codex native PermissionRequest decision caching, and tool-call transcript sanitization.
2. OpenAI Agents JS v0.9 through v0.11 Turns Sandboxes into Workspace Manifests, Runtime Limits, Tool Scheduling, and Artifact Boundaries
OpenAI’s Agents SDK for JavaScript moved quickly this week: sandbox agents landed, the default model changed, turn limits and local function-tool concurrency became explicit runtime settings, MCP tool naming gained server-prefix isolation, RealtimeAgent now defaults to gpt-realtime-2, and sandbox local source materialization now requires paths outside the base directory to be covered by Manifest.extraPathGrants. Technical depth angle: explain SandboxAgent, Manifest, mounts, local files and directories, snapshots, resume behavior, maxTurns=null, toolExecution.maxFunctionToolConcurrency, the separation between SDK-side tool scheduling and provider-side parallel tool calls, MCP server-name collision avoidance, realtime model default migration, and why local artifact materialization needs a base-directory boundary plus explicit read grants.
3. Gemini CLI v0.42 Nightlies Harden Agent Runtime Edges: Shell Safety Evals, Approval Races, Compression Queues, OAuth Hangs, and Auto Memory Patches
The Gemini CLI May 6 and May 7 nightlies are a strong operator story because they touch the failure modes that make command-line agents brittle: tool approval races, shell-command safety evaluation, context compression, premature stream closes, headless OAuth, sandbox naming, unsafe redirection behavior, and Auto Memory patch scope. Technical depth angle: explain A2A server approval state machines, waiting for tool completion versus reporting status, JSON output for stopped non-interactive runs, queuing user messages during compression, retrying premature-close stream failures, command safety evals for shell use, random sandbox container names, invalid plans-directory handling, untrusted-folder MCP list UX, OAuth liveness on headless Linux, and patch allowlists for private Auto Memory updates.
Extra Research Candidates
- Hugging Face Transformers v5.8.0 Adds DeepSeek-V4, Gemma 4 Assistant, and Granite Speech Plus — Primary source: https://github.com/huggingface/transformers/releases/tag/v5.8.0. Technical depth angle: explain DeepSeek-V4’s hybrid local plus long-range attention, Manifold-Constrained Hyper-Connections, static token-id to expert-id bootstrap routing, Gemma 4 Assistant MTP speculative decoding with KV sharing, and Granite Speech Plus projector inputs for timestamped speech transcription.
- OpenAI Developers Plugin for Codex Connects Codex Workflows to Platform Key Setup and API Troubleshooting — Primary source: https://developers.openai.com/learn/developers-codex-plugin. Technical depth angle: explain plugin-mediated OpenAI Platform access, Default Org and Default project key creation, Codex plugin permission surfaces, automatic versus explicit
@invocation, and the risk boundaries around agents that can create project API keys and wire them into local environments. - MCP Python SDK v1.27.0 Tightens Streamable HTTP and OAuth Resource Boundaries — Primary source: https://github.com/modelcontextprotocol/python-sdk/releases. Technical depth angle: explain StreamableHTTP idle timeouts, RFC 8707 resource validation, conformance-test backports, command-injection hardening in examples, non-UTF-8 stdio handling, and why MCP tool servers need explicit transport lifecycle and authorization-resource checks.
Show Notes
[00:00] v2026.5.7 gives operators new ways to see whether publishing, cron, channels, auth, delivery, voice, memory, and approvals are actually behaving. ClawHub publishing now retries transient dependency-install failures, keeps preview-passing plugin publishes moving when one preview cell flakes, and verifies every expected package version after publish. Cron JSON now exposes computed status directly. Channel listing is channel-focused instead of mixing in model auth details. Auto-reply inline skill dispatch now passes through before-tool-call authorization hooks. Tavily tools resolve SecretRef-backed credentials from the active runtime config snapshot. Discord channel targets with a provider prefix route as channel sends instead of misleading direct-message attempts. That is the shape of the release: fewer hidden assumptions, fewer stale caches, fewer false successes, and more explicit operator contracts.
[02:30] STORY 1 — OpenClaw v2026.5.7 Makes Publishing, Channels, Cron, Auth, Memory, Delivery, Voice, WhatsApp, and Codex Approvals More Observable and Safer
Start with release and plugin publishing. A maintenance release should not leave operators wondering whether a plugin was actually published everywhere it was supposed to land. v2026.5.7 adds retry handling for transient ClawHub CLI dependency install failures, keeps a plugin publishable when one preview cell flakes even though the preview passed, and verifies every expected ClawHub package version after publish. That final verification step is the important one. A publish pipeline is not complete when one command exits; it is complete when the expected package versions are visible where downstream installers will fetch them.
That matters for a plugin ecosystem because partial publishes are worse than clean failures. A package may be available in one place, missing in another, or published at the wrong version. Operators then chase install errors that look like local configuration problems but are really release distribution problems. Post-publish verification turns the release process into a reconciliation loop: declare the expected package set, attempt the publish, and check that the registry state matches. If it does not, the failure is visible at maintenance time rather than discovered by users.
The OpenAI model surface gets one targeted addition: `openai/chat-latest` is supported as an explicit direct API-key model override for trying the moving ChatGPT Instant API alias without changing the stable default model. The key phrase is explicit override. A moving alias is useful for testing behavior from the current instant-chat lane, but it is a bad default for production agents because its underlying model snapshot can change. Keeping it behind an explicit model id lets builders test the alias while preserving stable defaults for scheduled jobs, production workflows, and reproducible evaluations.
Cron becomes easier to automate. `cron list --json` and `cron show --json` now include computed `status`, so external tooling can read disabled, running, ok, error, skipped, or idle state without reimplementing OpenClaw’s status derivation. This is a good API contract change. If a CLI gives machines JSON, the JSON should include the state that humans see. Otherwise every dashboard, monitor, and wrapper script has to reverse-engineer status from timestamps, last error fields, enable flags, and run markers. That creates inconsistent automation and false alarms.
Channel inspection also gets a cleaner contract. `openclaw channels list` is now channel-only, adds `--all` for bundled and catalog channels, renders installed, configured, and enabled state, and moves model auth and usage details to `openclaw models auth list`, `openclaw status`, and `openclaw models list`. The split is important because channel readiness and model authentication are different operational domains. A Slack or Discord channel can be installed but disabled, configured but lacking permissions, or enabled but not currently healthy. Model auth belongs to provider credentials and runtime selection. Mixing those in one command makes it harder for automation to answer simple questions.
Security boundaries tighten in several places. Native command handlers now honor owner enforcement. Active Memory requires admin scope for global memory toggles. Auto-reply gates inline skill tool dispatch through before-tool-call authorization hooks. Those are all authorization fixes around surfaces that can change system state. Native commands may run privileged control paths. Global memory toggles can affect what future sessions remember. Inline skills can trigger tools from an auto-reply path, where the user may not be watching every internal step. The release pushes these actions back through explicit authorization checks.
The inline skill dispatch fix is worth calling out. Skill systems are convenient because they let an agent route a task to packaged behavior. But if auto-reply can dispatch an inline skill tool without passing through before-tool-call authorization, then the convenient path becomes the weaker path. The right design is that every tool entry point, whether invoked by a visible chat turn, an inline skill, a plugin, or a background auto-reply, flows through the same policy hook before execution. That gives administrators one place to implement allow, deny, prompt, log, or rate-limit behavior.
Credentials get a similar runtime-boundary fix. Dedicated `tavily_search` and `tavily_extract` tools now resolve credentials from the active runtime config snapshot, so `exec` SecretRef-backed API keys do not reach the tools unresolved. SecretRef systems only work if every tool reads secrets through the same resolved runtime view. If one tool receives the literal unresolved marker, the failure looks like bad credentials even though the secret exists. Worse, developers may work around it by hardcoding a key. Resolving from the active snapshot keeps per-run config, SecretRefs, and tool credentials aligned.
Managed plugin install paths get more robust as well. OpenClaw now uses the same absolute POSIX npm lifecycle shell for managed plugin install, rollback, repair, and uninstall npm operations as staged package updates. The failure mode here is subtle: a restricted PATH shell can break cleanup or rollback even if the original install path works. Using the same absolute lifecycle shell across install and repair actions reduces environment drift. Operators should be able to trust that rollback uses the same toolchain assumptions as install.
Session and context correctness gets attention. Gateway sessions clear cached skills snapshots during `/new` and `sessions.reset`, so long-lived channel sessions rebuild visible skill lists after skills change. Agents context engines invalidate cached assembled context views when source history shrinks or assembly fails, preventing stale pre-reset history from being reused. These are cache invalidation bugs, but the user-facing impact is severe. If a reset does not clear the skills snapshot, the agent may advertise stale capabilities. If a context view survives a history shrink, the model can see old conversation state after a reset.
Compaction gets a model-limit guard. Summary reserve tokens are clamped to each model’s output limit so high-context compaction no longer requests invalid `max_tokens` values. This is a practical multi-model issue. A context engine can plan a large summary reserve, but the selected model may not be able to emit that many tokens. If the request goes out anyway, compaction fails at the provider boundary. Clamping reserve tokens to model output limits turns an invalid request into a bounded summary plan.
Delivery semantics become more honest. Agent delivery now reports `deliverySucceeded=false` when outbound delivery returns no adapter result, so claimed or empty delivery paths no longer masquerade as successful sends. Cron isolated runs also fail implicit announce delivery before model execution when `delivery.channel=last` has no previous route, so recurring jobs do not spend tokens before hitting a permanent delivery-target error. Both changes follow the same principle: do not run expensive work if the output route is impossible, and do not report success when no adapter confirmed delivery.
Discord target parsing gets a concrete fix for cross-channel agent messaging. Provider-prefixed targets like `discord:channel:<id>` now parse as channel sends instead of legacy Discord direct-message targets. This prevents cross-channel agent `message(action="send")` calls from misrouting channel ids into misleading `Unknown Channel` failures. Namespaces matter here. A provider-prefixed channel target is not a user id, and a generic messaging layer should preserve that intent all the way down to the adapter.
Telegram hardening spans authorization and liveness. Telegram now honors `accessGroup:*` sender allowlists for DMs, groups, native commands, and callback authorization before applying numeric sender-ID checks. The polling watchdog remains tied to `getUpdates` liveness so unrelated outbound Bot API calls cannot mask a wedged inbound poller. That second fix is a classic health-check problem. A bot can still send messages while its inbound polling loop is stuck. If outbound calls reset the watchdog, the system looks healthy even though it cannot receive user input.
Voice operations improve on Discord. `channels capabilities` and `channels status --probe` now audit Discord voice-channel permissions, including auto-join targets, so missing Connect, Speak, and Read Message History permissions show up before `/vc join`. Voice capture is also less choppy: the default post-speech silence grace extends to 2.5 seconds, `voice.captureSilenceGraceMs` is available for noisy sessions, and the spoken-output prompt is tightened around live STT fragments. Voice agents live or die on turn segmentation. Too little silence grace clips speakers. Too much grace makes the agent sluggish. Making the grace configurable gives operators a tuning knob for room noise and speaking style.
WhatsApp fixes are operationally specific but important. Proactive phone-number sends now route through Baileys LID forward mappings when available, so LID-addressed contacts receive agent messages instead of creating sender-only ghost chats. Captioned `MEDIA:` directive auto-replies now send once instead of emitting an empty media message before the captioned media reply. Both are examples of adapter correctness. Messaging providers often have multiple identity forms and media primitives. The agent platform needs to map them correctly or users see duplicate, missing, or ghost messages.
Codex approval behavior changes in a way that should reduce approval fatigue without bypassing review. In Codex approval modes, OpenClaw stops installing the pre-guardian native `PermissionRequest` hook by default, allowing Codex’s reviewer to approve safe commands before OpenClaw surfaces an approval. It remembers `allow-always` decisions for identical Codex native `PermissionRequest` payloads within the active session window, and plugin approval requests validate and render their actual allowed decisions so Telegram and other native approval UIs cannot offer stale actions. The detail to explain is sequencing: let the inner reviewer handle safe commands, surface approvals when needed, and make sure the UI actions match the current allowed decision set.
Provider and media compatibility also get repairs. OpenClaw normalizes APNG sniffed PNG uploads, preserves Gemini 3 tool-call thought-signature replay with fallback signatures, accepts legacy `__env__:VAR` custom-provider keys, and repairs snake_case tool-call transcript sanitization. These are the edges that make transcripts and tools portable. If tool-call signatures do not replay, provider state can reject follow-up turns. If transcript sanitization mishandles snake_case tool calls, replay and audit surfaces diverge from what actually happened.
The practical release rating is high for operators. v2026.5.7 is not one flashy feature; it is a set of reliability contracts. Publishing verifies registry state. Cron exposes computed status. Channels and model auth are separated. Inline skills pass through authorization. SecretRefs resolve at runtime. Context caches invalidate after reset. Delivery fails before wasting tokens. Voice permissions and capture timing are visible. Codex approvals line up with actual decisions. These are the changes that make an agent system less mysterious when it runs all day.
[28:00] STORY 2 — OpenAI Agents JS v0.9 through v0.11 Turns Sandboxes into Workspace Manifests, Runtime Limits, Tool Scheduling, and Artifact Boundaries
OpenAI’s Agents SDK for JavaScript has been moving fast this week, and the most important change is the sandbox agent surface. Sandbox agents build on the existing Agent, Runner, and run flow, but add persistent workspaces, workspace manifests, sandbox sessions, capabilities, snapshots, memory, and resume support. In practical terms, this is the difference between an agent that can call a function and an agent that can work in a controlled computer environment over time.
The central abstraction is the `Manifest`. A manifest describes what the agent’s workspace contains: synthetic files and directories, local files and directories, Git repositories, environment variables, users, groups, permissions, mounts, and output locations. That turns sandbox setup into a declarative workspace contract. Instead of a task saying “somehow give this agent a repo and a place to write artifacts,” the manifest says exactly what is mounted, where it appears, which files are writable, and which external paths are allowed.
That matters because file and command agents need boundaries. If an agent can inspect files, run commands, apply patches, and generate artifacts, the sandbox has to define what is in scope. A persistent workspace is powerful for long-horizon tasks because the agent can continue from prior state, but persistence also means state can accumulate. Snapshots and resume behavior let builders decide when to preserve, restore, or discard that state. The operational question becomes: which files are task inputs, which are generated outputs, and which are persistent memory across runs?
v0.11.0 adds an artifact-boundary change that is easy to miss but important. Sandbox local source materialization now keeps `LocalFile.src` and `LocalDir.src` within the materialization `baseDir` unless the source path is covered by `Manifest.extraPathGrants`. The base directory is the SDK process current working directory when the manifest is applied. Relative sources resolve from that directory. Absolute sources must already be inside it or under an explicit grant. This closes a local artifact boundary issue.
The migration implication is direct. If an application intentionally copies trusted host files from outside the base directory into a sandbox workspace, it now has to declare that with an extra path grant. That is a better contract. It makes the difference between “the agent can read arbitrary host paths because a source path pointed there” and “the application explicitly granted this external directory, possibly read-only, with a description.” For agent systems that handle customer data, proprietary repositories, or credentials-adjacent files, explicit grants are the right default.
The SDK default model changed in v0.10.0 from `gpt-4.1` to `gpt-5.4-mini` when no model is set. That affects agents and runs that relied on implicit defaults. Because the new default is a GPT-5 model, implicit default model settings now include GPT-5 defaults such as `reasoning.effort="none"` and `verbosity="low"`. The builder recommendation is simple: production agents should set the model explicitly. Defaults are useful for examples and prototypes, but agent behavior, cost, latency, reasoning effort, and verbosity should not shift just because a dependency was upgraded.
Turn limits become explicit. `maxTurns=null` can disable the Agents SDK run turn limit, while leaving the default at `DEFAULT_MAX_TURNS`, currently ten, when the setting is omitted. This is a runtime safety valve. A hard turn cap prevents runaway loops and surprise cost. Disabling the cap can be necessary for long-horizon workflows, but it should be paired with other controls: wall-clock timeouts, tool allowlists, budget limits, checkpointing, and progress visibility. The useful distinction is omitted means safe default; null means the developer intentionally removed the limit.
Tool scheduling also becomes more controllable. `toolExecution.maxFunctionToolConcurrency` on `RunConfig` controls SDK-side local function-tool execution concurrency. This is separate from provider-side `ModelSettings.parallelToolCalls`. That separation matters. The model may be allowed to propose multiple tool calls in parallel, but the local runtime still needs to decide how many to execute at once. A filesystem tool, database tool, shell tool, and HTTP tool have different side effects and resource profiles. Local concurrency is a runtime scheduling policy, not just a model-generation preference.
MCP tool naming gets a collision-avoidance option. The SDK can include the MCP server name in tool names, aligning with the Python SDK. This prevents conflicts when multiple MCP servers expose tools with the same local name. Without server prefixes, two different servers might both expose `search`, `read`, or `query`. The agent then sees ambiguous tool names, and logs become harder to audit. Server-prefixed naming makes the tool namespace more explicit, which is especially important when agents connect to multiple internal and external tool servers.
v0.11.0 also changes RealtimeAgent’s default model to `gpt-realtime-2`. Realtime defaults are not cosmetic. A realtime model affects audio latency, turn-taking, transcription behavior, barge-in expectations, and voice interaction quality. If a product uses RealtimeAgent without pinning a model, upgrading the SDK can change the underlying realtime behavior. Again, the recommendation is to pin production defaults and test new realtime models under representative audio conditions before moving them into customer-facing sessions.
The release series also includes fixes that show where agent runtimes fail. v0.9.1 preserves duplicate-name agent identity in run-state serialization, reconciles streamed function calls when server-managed runs abort, and avoids replaying assistant conversation item IDs. v0.10.1 restores session history when responses compaction replacement fails and validates hosted MCP approval policies. These are state-machine fixes. Agent systems have local state, server state, streamed deltas, compaction replacement, and approval policy. If those drift, the next turn can replay the wrong item, lose history, or approve the wrong action.
The practical takeaway for JavaScript builders is to treat the Agents SDK as a runtime, not just a wrapper around model calls. Define workspace manifests narrowly. Use `extraPathGrants` for host paths on purpose. Pin production models. Decide whether turn limits should stay at the default, be raised, or be disabled with other guardrails. Set local tool concurrency based on side effects. Prefix MCP tool names when multiple servers are attached. Test realtime model changes as product behavior changes, not just dependency updates.
[40:00] STORY 3 — Gemini CLI v0.42 Nightlies Harden Agent Runtime Edges: Shell Safety Evals, Approval Races, Compression Queues, OAuth Hangs, and Auto Memory Patches
The Gemini CLI May 6 and May 7 nightlies are useful because they show the unglamorous runtime work required for command-line agents. The changes are not a single model launch; they are a list of edge cases around approval, shell safety, context compression, streaming failures, OAuth, sandbox naming, Auto Memory, and non-interactive output. Those are exactly the places where CLI agents become brittle.
Start with approvals. The A2A server fixes resolve a tool approval race condition, improve status reporting, and resolve a race in tool completion waiting. In an agent runtime, approval is a state machine. A tool call is proposed, approval is requested, a human or policy decision arrives, the tool executes or is denied, and the runtime reports completion. If the system reports status before the approval result is committed, or waits on the wrong completion signal, the UI can show a tool as done while it is still pending, or execute after a denial path should have stopped it.
Shell-command safety evals are a strong addition. Shell tools are powerful because they let agents inspect files, run tests, install dependencies, and operate developer workflows. They are dangerous for the same reason. A shell safety eval suite gives maintainers a way to test whether the model and runtime correctly identify risky commands, redirections, destructive file operations, credential exposure, and sandbox escape attempts. The important part is that safety is evaluated as behavior, not merely documented as policy.
The nightlies also add JSON output for `AgentExecutionStopped` in non-interactive mode. That matters for CI and wrapper scripts. A human TUI can show a stopped state in prose; automation needs a structured object. If non-interactive runs stop because of a policy, limit, error, or approval state, JSON output lets calling systems distinguish a clean stop from a crash or timeout. CLI agents increasingly run inside scheduled jobs and orchestration systems, so structured terminal semantics are part of the product surface.
Context compression gets a queueing fix: messages can be queued during compression. Compression is necessary because long agent sessions exceed context limits, but it is also a concurrency hazard. If a user sends a message while the runtime is compressing history, the system has to decide whether to block input, drop it, append it to the old context, or queue it for the compressed context. Queueing preserves the user’s intent while letting the context manager finish a coherent rewrite. Without that, chat corruption or lost turns become likely.
The CLI also retries `ERR_STREAM_PREMATURE_CLOSE` errors. Streaming agent runs depend on long-lived connections across model output, tool progress, and terminal rendering. A premature close can be a transient network or transport failure rather than a logical failure. Retrying carefully can make the system more robust, but the runtime must avoid duplicating side effects. The safe pattern is to retry idempotent or pre-execution stream paths and be more conservative once a tool has mutated state.
Several fixes are about headless and nonstandard environments. The CLI prevents a silent hang during OAuth auth on headless Linux. It rejects numeric project IDs in `GOOGLE_CLOUD_PROJECT`. It improves MCP list UX in untrusted folders. These are operator-quality fixes. Many agents run on remote machines, containers, CI workers, or servers without a browser. Authentication and project resolution failures need to fail with actionable status, not hang waiting for a UI that does not exist.
Sandbox behavior gets smaller but important changes. The CLI randomizes sandbox container names, and allows redirection in YOLO and AUTO_EDIT modes without sandboxing. Randomized container names reduce collisions when multiple runs execute on the same host. Redirection behavior matters because shell syntax is part of normal developer work: redirecting output to files, piping logs, and writing patches. But any change around redirection also needs safety evaluation because redirection can overwrite files or exfiltrate content. That is why shell safety evals and shell behavior changes belong in the same discussion.
Auto Memory gets a private patch allowlist tightening and clearer documentation that Auto Memory proposes memory updates and skills. Persistent memory is useful only if updates are scoped and reviewable. A memory patch allowlist controls what kinds of private memory updates can be applied automatically. Without that boundary, an agent might persist sensitive, incorrect, or overbroad facts. With too strict a boundary, memory becomes useless. The right operator model is proposal, scope, allowlist, and review.
The May 7 release also moves tool explanation from the thought stream to tool call content. That may sound like UI cleanup, but it affects privacy and observability. Thought streams and tool-call content have different audiences and retention rules in many systems. If a tool explanation belongs with the tool call, putting it there makes audit logs clearer and avoids mixing user-visible tool rationale with internal reasoning channels. Agent runtimes need these content boundaries to stay consistent.
The practical recommendation for Gemini CLI operators is to watch these nightlies as runtime-hardening signals. Approval state machines need race tests. Shell tools need safety evals. Compression needs input queueing. Streaming needs retry rules that avoid duplicate side effects. OAuth needs headless failure modes. Memory patches need allowlists. Non-interactive runs need JSON states. These are not optional polish for a CLI agent; they are the difference between a demo terminal and a reliable automation surface.
[50:00] Closing
The practical takeaway from EP047 is operational: OpenClaw v2026.5.7 makes system state more visible and safer across publishing, cron, channels, memory, delivery, voice, messaging, and approvals. OpenAI Agents JS shows how sandboxed agents are becoming workspace-manifest runtimes with explicit path grants, turn limits, tool concurrency, and MCP namespaces. Gemini CLI shows the runtime edges that matter when command-line agents run in real workflows: approval races, shell safety, compression queues, transport retries, OAuth liveness, and memory patch scope. For builders, the question is not just what the agent can do; it is which boundaries, statuses, and recovery paths exist when it does the work.
Verified Links
- OpenClaw — Release
v2026.5.7: https://github.com/openclaw/openclaw/releases/tag/v2026.5.7 - OpenAI Agents JS — Release
v0.11.0: https://github.com/openai/openai-agents-js/releases/tag/v0.11.0 - OpenAI Agents JS — Release
v0.10.0: https://github.com/openai/openai-agents-js/releases/tag/v0.10.0 - OpenAI Agents JS — Release
v0.9.0: https://github.com/openai/openai-agents-js/releases/tag/v0.9.0 - OpenAI API Changelog — May 2026 Agents SDK and Developers plugin entries: https://developers.openai.com/api/docs/changelog
- Google Gemini CLI — Release
v0.42.0-nightly.20260507.ga809bc7c5: https://github.com/google-gemini/gemini-cli/releases/tag/v0.42.0-nightly.20260507.ga809bc7c5 - Google Gemini CLI — Release
v0.42.0-nightly.20260506.g80d269054: https://github.com/google-gemini/gemini-cli/releases/tag/v0.42.0-nightly.20260506.g80d269054 - Hugging Face Transformers — Release
v5.8.0: https://github.com/huggingface/transformers/releases/tag/v5.8.0 - OpenAI Developers Plugin for Codex: https://developers.openai.com/learn/developers-codex-plugin
- MCP Python SDK — Releases: https://github.com/modelcontextprotocol/python-sdk/releases
Chapters
- [00:00] Hook — OpenClaw v2026.5.7 Leads
- [02:30] OpenClaw v2026.5.7 Makes Publishing, Channels, Cron, Auth, Memory, Delivery, Voice, WhatsApp, and Codex Approvals More Observable and Safer
- [28:00] OpenAI Agents JS v0.9 through v0.11 Turns Sandboxes into Workspace Manifests, Runtime Limits, Tool Scheduling, and Artifact Boundaries
- [40:00] Gemini CLI v0.42 Nightlies Harden Agent Runtime Edges: Shell Safety Evals, Approval Races, Compression Queues, OAuth Hangs, and Auto Memory Patches
- [50:00] Closing