Episode 78: क्लॉड फेबल 5 की वापसी: अमेरिकी प्रतिबंध हटाए

एपिसोड 078 — 01 जुलाई, 2026

[00:00] एपिसोड हुक

Claude Fable 5 फिर से सामान्य रूप से उपलब्ध है — वाशिंगटन ने 30 जून को Anthropic के Mythos और Fable मॉडल पर प्रतिबंध हटा दिए, और OpenRouter पर anthropic/claude-fable-5 लिस्टिंग 1,000,000-टोकन कॉन्टेक्स्ट विंडो के साथ लाइव है। रिलीज के दृष्टिकोण से, इस चक्र में OpenClaw v2026.6.11 और OpenAI Codex rust-v0.142.5 दोनों आए: OpenClaw चैनल डिलीवरी और सेशन रिकवरी में विश्वसनीयता पास के साथ, Codex में ट्रेस-लॉग डेटा-हाइजीन फिक्स के साथ। Anthropic का Claude Sonnet 5 भी OpenRouter पर 1M-टोकन कॉन्टेक्स्ट विंडो और चार-स्तरीय रीज़निंग-एफर्ट डायल के साथ सामने आया, और Google ने Nano Banana 2 Lite को — जिसका ब्रांड नाम Gemini 3.1 Flash Lite Image है — अपना सबसे तेज़, सबसे लागत-प्रभावी Gemini इमेज मॉडल के रूप में लिस्ट किया। शोध के दृष्टिकोण से, Orca पेपर HuggingFace Daily Papers पर ट्रेंड कर रहा है जिसमें 161 अपवोट मिले हैं — यह मल्टीमॉडल नेक्स्ट-स्टेट प्रेडिक्शन के माध्यम से बनाई गई यूनिफाइड वर्ल्ड लैटेंट स्पेस प्रस्तावित करता है — और InternScience का Agents-A1 35-अरब-पैरामीटर मिक्सचर-ऑफ-एक्सपर्ट्स स्टूडेंट से ट्रिलियन-पैरामीटर-क्लास एजेंट परफॉर्मेंस का दावा करता है।

[02:00] एजेंट स्टैक रिलीज रीडआउट: OpenClaw v2026.6.11; OpenAI Codex rust-v0.142.5

इस चक्र में दो स्थिर रिलीज आए। OpenClaw v2026.6.11 एक विश्वसनीयता रिलीज है: टीम इसे उन खुरदरी धारियों के बारे में फीडबैक का सीधा जवाब बताती है जो हार्नेस को कम विश्वसनीय बनाती हैं, जिसमें गलत जगह पर रिप्लाई, अटके हुए सेंड, गिरे हुए रीकनेक्ट, मॉडल सेटअप फेलियर, और सुरक्षित एडमिन डिफॉल्ट के लिए फिक्स शामिल हैं। सबसे बड़ा काम चैनल डिलीवरी विश्वसनीयता का है, Telegram, WhatsApp, Matrix, Google Chat, iMessage, Feishu, Mattermost, WebChat, Control UI, और टर्मिनल UI में डिलीवरी और रीकनेक्ट फिक्स के साथ। यहां ठोस तंत्र मायने रखते हैं: नए Google Chat डायरेक्ट मैसेज अब ग्रुप कन्वर्सेशन की तरह व्यवहार नहीं करते और सही वन-टू-वन चैट तक पहुंचते हैं; Telegram webhook यूजर्स चैनल रीस्टार्ट, कॉन्फ़िगरेशन रीलोड, और रिकवरी साइकिल के दौरान बिना अस्थायी ब्लैकआउट के DMs और ग्रुप मैसेज प्राप्त करना जारी रखते हैं; Matrix एंड-टू-एंड-एन्क्रिप्टेड गेटवे लंबे चलने वाले उपयोग के दौरान ऑनलाइन रहते हैं इसके बजाय कि धीरे-धीरे मेमोरी खाते रहें जब तक कि क्रैश चैनल और इन-फ्लाइट काम को नीचे न ले आए; और रीज़निंग-कैपेबल मॉडल पर हार्टबीट चेक अब सहायक के इरादे से जवाब को सतह पर लाते हैं Telegram और WhatsApp में आंतरिक रीज़निंग लीक करने के बजाय। एजेंट-रनटाइम की दृष्टि से, रिलीज स्पष्ट कॉन्फ़िगरेशन का सम्मान करते हुए डिफॉल्ट कंपैक्शन टाइमआउट को 180 सेकंड तक कम करती है, Codex कॉन्टेक्स्ट-इंजीन कंपैक्शन स्वामित्व को संरक्षित करती है, और प्रोवाइडर-फेल्योर टर्मिनल लाइफसाइकिल स्टेट को सही रखती है। OpenAI Codex rust-v0.142.5 एक फोकस्ड पैच है जिसका वास्तविक ऑपरेशनल वेट है: यह पूर्ण Responses WebSocket रिक्वेस्ट पेलोड को ट्रेस लॉग में लिखने से रोकता है, एक डेटा-हाइजीन फिक्स जो उन सभी के लिए मायने रखता है जो Codex ट्रेस को साझा ऑब्ज़र्वेबिलिटी इन्फ्रास्ट्रक्चर में भेज रहे हैं, जिसे जानबूझकर release/0.142 लाइन में बैकपोर्ट किया गया। बिल्डर्स के लिए, व्यावहारिक सवाल यह है कि क्या कोई भी रिलीज किसी डिफॉल्ट को बदलती है जिस पर आप वर्तमान में निर्भर हैं: अपने पिन किए गए वर्शन के खिलाफ चेंजलॉग को डिफ करें, एक प्रतिनिधि एजेंट सेशन को रीप्ले करें, और प्रोडक्शन में नए डिफॉल्ट को प्रमोट करने से पहले रीकनेक्ट व्यवहार देखें।

[03:05] Claude Fable 5 वापस है: वाशिंगटन ने Anthropic के फ्रंटियर टियर पर होल्ड हटाया

आज की सुर्खी: Claude Fable 5 फिर से सामान्य रूप से उपलब्ध है। अमेरिकी सरकार ने 30 जून को Anthropic के Mythos और Fable मॉडल पर प्रतिबंध हटा दिए, जिससे उस एक्सपोर्ट रेजीम को समाप्त किया गया जो हफ्तों तक Anthropic के फ्रंटियर टियर को बंद रखती थी। Fable 5 Mythos-क्लास टियर का सामान्य रूप से उपलब्ध चेहरा है — एक टियर जो Anthropic की लाइनअप में Opus से ऊपर बैठता है। यह Claude Mythos 5 के समान अंतर्निहित मॉडल को साझा करता है; अंतर डिप्लॉयमेंट सतह में है। Fable दोहरे-उपयोग क्षमताओं के लिए अतिरिक्त सुरक्षा उपायों के साथ आता है, जबकि Mythos 5 उन उपायों के बिना केवल स्वीकृत संगठनों को ही परोसा जाता है। OpenRouter लिस्टिंग पर anthropic/claude-fable-5, जो पहली बार 9 जून को पोस्ट की गई थी, ठोस क्षमताओं को दिखाती है: 1,000,000-टोकन कॉन्टेक्स्ट विंडो, टेक्स्ट, इमेज, और फाइल इनपुट टेक्स्ट आउटपुट के साथ, रीज़निंग सपोर्ट, और स्वायत्त ज्ञान कार्य और कोडिंग के लिए पोजिशनिंग। हार्नेस-साइड सपोर्ट पहले से मौजूद है — OpenClaw ने मध्य जून में Claude Fable 5 प्रोवाइडर सपोर्ट वायर कर दिया था, इसलिए एजेंट सेशन को मॉडल पर रूट करना एक मॉडल-स्ट्रिंग बदलाव है, इंटीग्रेशन प्रोजेक्ट नहीं। बिल्डर्स के लिए तत्काल कदम एक बेक-ऑफ है: anthropic/claude-fable-5 के माध्यम से एक प्रतिनिधि कोडिंग या लॉन्ग-होराइज़न एजेंट टास्क चलाएं वर्तमान Opus-क्लास डिफॉल्ट के खिलाफ और मापें कि Mythos-क्लास दावे कहां रखते हैं। आगे देखें: जैसे-जैसे उपलब्धता स्थिर होती है मूल्य निर्धारण और दर सीमाएं, क्या दोहरे-उपयोग सुरक्षा उपाय सुरक्षा-संबंधित वर्कलोड पर देखे जा सकते हैं, और नीति वातावरण कितनी जल्दी बसता है — वही प्रशासन जिसने ये प्रतिबंध हटाए थे, उनका पहले भी रुख बदल चुका है।

[04:10] Claude Sonnet 5 OpenRouter पर 1M कॉन्टेक्स्ट के साथ आता है

Anthropic ने OpenRouter पर Claude Sonnet 5 को एक नए मॉडल लिस्टिंग के रूप में सामने लाया है, जिसे कोडिंग, एजेंट, और पेशेवर कार्य में अब तक का सबसे सक्षम Sonnet-क्लास मॉडल बताया गया है। मॉडल Anthropic द्वारा स्वयं परोसा जाता है और anthropic/claude-sonnet-5 आइडेंटिफायर पर रजिस्टर होता है। बिल्डर्स के लिए दो विवरण अलग दिखते हैं। पहला, कॉन्टेक्स्ट विंडो 1,000,000 टोकन है, जो Sonnet 5 को हाल के फ्रंटियर रिलीज के समान लॉन्ग-कॉन्टेक्स्ट टियर में रखता है और एकल कॉल में पर्याप्त रिपॉजिटरी या मल्टी-सेशन एजेंट ट्रेस रखने के लिए पर्याप्त बड़ा है। दूसरा, एडाप्टिव थिंकिंग एक चयन योग्य पैरामीटर के रूप में उजागर की गई है जिसमें चार रीज़निंग एफर्ट स्तर हैं — कम, मध्यम, उच्च, और अधिकतम — जो कॉलर्स को एक निश्चित मोड पर प्रतिबद्ध होने के बजाय प्रति रिक्वेस्ट कंप्यूट बढ़ाने या घटाने की अनुमति देता है। यह संयोजन एक Sonnet-क्लास एंडपॉइंट को एजेंट लूप के लिए ट्यूनेबल कॉस्ट-और-क्वालिटी सतह के रूप में फिर से परिभाषित करता है। आगे देखें: OpenRouter अपने यूनिफाइड API में एफर्ट पैरामीटर को कैसे सतह पर लाता है, और क्या Anthropic-नेटिव SDK उसी चार-स्टेप डायल को मिरर करता है।

[05:08] Google OpenRouter पर Nano Banana 2 Lite इमेज मॉडल भेजता है

Google ने हाल ही में Nano Banana 2 Lite को OpenRouter पर google/gemini-3.1-flash-lite-image के रूप में लॉन्च किया है, जिससे पब्लिक मॉडल कैटलॉग में एक Flash-Lite इमेज एंडपॉइंट जुड़ गया है। इस लिस्टिंग में इसे Google's fastest, most cost-efficient Gemini image model के तौर पर प्रस्तुत किया गया है, जो high-velocity developer pipelines और rapid-fire visual exploration के लिए तैयार है। Context length 65536 टोकन की है, जो long, structured prompts और negative constraints को बिना mid-call trimming के absorb करने के लिए काफी है। जिस मैकेनिज्म पर ध्यान देना चाहिए वह Flash-Lite tier itself है: text-to-image generation जो low latency और high call volume के लिए tuned है, जहां Google आमतौर पर frontier fidelity की जगह throughput और unit economics को प्राथमिकता देता है। बिल्डर्स के लिए, practical effect यह है कि आपके पास एक Google-native image path है जिसे आप bulk asset pipelines, variant sweeps, और ideation loops में hammer कर सकते हैं बिना Pro-tier per-image rates चुकाए। OpenRouter entry एक router-friendly endpoint signal करती है, तो existing image-agent stacks provider switch कर सकते हैं सिर्फ model-string change से। देखें कि 65536 context window इमेज conditioning के लिए fully usable है या capped है, और pricing sustained production load के तहत कैसी रहती है।

[06:06] Orca Paper Proposes Unified World Latent Space Through Next-State Prediction

HuggingFace Daily Papers पर 161 upvotes के साथ trending एक नया पेपर Orca, multimodal next-state-prediction modeling के माध्यम से एक unified world latent space प्रस्तावित करता है। यह काम, जो orca-wm.github.io पर hosted है और arXiv 2606.30534 के रूप में published है, world modeling को नए सिरे से परिभाषित करता है: प्रति डोमेन अलग model train करने के बजाय, Orca world dynamics को एक shared latent में compress करता है और इसे downstream tasks में transfer करता है, जहां इसके authors का दावा है कि यह specialized baselines को beat करता है। यह generality headline capability है, और यही कारण है कि community इसे पढ़ रही है। Concrete mechanism multimodal next-state prediction है, वही pre-training objective जो recent agent और embodied-AI work को power करता है, अब एक single shared latent में scaled है न कि per-domain heads में। बिल्डर्स के लिए, practical signal यह है कि general world-model pre-training task-specific stacks का credible alternative बन रहा है, तो teams जो agentic या embodied pipelines plan कर रही हैं उनके पास अब एक नया architectural option है जिसे उनकी current SFT-only approaches के مقابل evaluate करना चाहिए। आगे देखें: eval suite और latent paper के benchmarks से beyond transfer करता है या नहीं।

[07:04] Agents-A1: 35B MoE Agent Hits Trillion-Parameter-Class Performance

Agents-A1, InternScience से एक 35-billion-parameter mixture-of-experts agentic model, trillion-parameter cost के बिना trillion-parameter-class performance का दावा करता है। Team का contribution दो scaling levers और एक three-stage distillation pipeline है, raw parameter count नहीं।

Long-horizon trajectory scaling model जिस multi-turn action sequences पर train करता है उसे expand करता है, single-step prompts से आगे extended tool-use traces में push करता है। Heterogeneous agent ability scaling specialist capabilities को coding, tool use, और retrieval domains में mix करता है। Training supervised fine-tuning on long agent traces के रूप में चलती है, फिर per-domain teacher models जो task family द्वारा specialize करते हैं, फिर multi-teacher distillation जो उन्हें एक 35B student में fuse करता है।

Cost-sensitive pipelines चलाने वाले बिल्डर्स के लिए, implication स्पष्ट है: frontier agent performance अब exclusively parameter count पर gated नहीं है, क्योंकि distillation recipes जो specialist teachers को absorb करती हैं वे अपने weight class से above punch कर सकती हैं। Open weights release और independent benchmark replication देखें; अगर long-horizon gains authors के eval harness के बाहर hold करती हैं, तो recipe इस तरह से reshape करती है कि teams serving budgets size करती हैं और open-weight students pick करती हैं।

[08:02] OmniRoute Turns One Endpoint Into 231 Model Providers

OmniRoute, developer diegosouzapw का एक open-source AI gateway, इस हफ्ते GitHub Trending पर आया। Project एक single OpenAI-compatible endpoint expose करता है और इसे 231 model providers की तरफ point करता है, लगभग 50 free tiers के साथ, coding agent को provider-specific client wiring के बिना Claude, GPT, या Gemini तक पहुंचने देता है। Claude Code, Codex, Cursor, Cline, या Copilot के आगे drop करें और gateway routing handle करती है। Notable mechanism एक stacked compression pass है — RTK plus Caveman mode — prompts box छोड़ने से पहले applied, claimed token usage को workload के आधार पर 15% से 95% तक cut करने के लिए। एक smart auto-fallback layer failed या rate-limited requests को अगले available provider पर reroute करती है, MCP और A2A support tool-calling और agent-to-agent flows intact रखते हुए। Self-hosted routing plane के लिए जो provider outages और free-tier churn survive करता है, यह बिल्डर्स के लिए means। Compression path पर latency overhead और fallback priority configuration देखें जब multiple free providers wired in हों।

[09:00] BlockPilot Picks Block Sizes Live for Diffusion Speculative Decoding

BlockPilot, एक पेपर जो HuggingFace Daily Papers पर 64 अपवोट के साथ ट्रेंड कर रहा है, डिफ्यूज़न-आधारित स्पेक्युलेटिव डिकोडिंग के लिए इंस्टेंस-एडाप्टिव पॉलिसी लर्निंग प्रस्तावित करता है। यह काम AMAP-ML ग्रुप से आता है और GitHub पर open-source है, arXiv preprint के साथ। मुख्य विचार है एक निश्चित ब्लॉक साइज़ को बदलना — डिफ्यूज़न ड्राफ्टर प्रति स्टेप कितने टोकन प्रोड्यूस करता है — एक छोटी पॉलिसी के साथ जो प्रॉम्प्ट के prefilling representations को पढ़ती है और ऑन-द-फ्लाई प्रति-रिक्वेस्ट ब्लॉक साइज़ चुनती है। लेखक रिपोर्ट करते हैं कि न्यूनतम पॉलिसी ओवरहेड के साथ स्टैटिक ब्लॉक-साइज़ शेड्यूल की तुलना में महत्वपूर्ण स्पीडअप मिला, और अपवोट काउंट reflect करता है कि इंफरेंस community कितनी सक्रिय रूप से adaptive drafting के साथ engaged है। बिल्डर्स के लिए, इसका मतलब है कि ब्लॉक साइज़ अब एक deploy-time knob नहीं रहा जिसे एक बार tune करते हैं; यह एक learned, prompt-conditioned decision है जो existing speculative-decoding pipelines में बिना target model को retrain किए fit हो सकता है। अगली चीज़ देखने की यह है कि क्या released policy model families में generalize करता है या सिर्फ paper के training distribution के अंदर ही holds करता है।

[09:58] Generative Skill Composition Tackles LLM Agent Skill Bottleneck

Xinyu Zhao, Zhen Tan, और Vaishnav Tadiparthi ने इस month arXiv 2606.32025 posted किया, skill composition को central bottleneck के रूप में framing करते हुए जैसे-जैसे agent skill libraries tasks और domains में scale होती हैं। Skills modular procedural knowledge को bundle करती हैं — sandboxing environments, test suites चलाना, multi-file refactors — और current approaches या तो पूरी library को agent की reasoning context में dump करती हैं या embeddings के through retrieve करती हैं। दोनों ही libraries बड़ी होने पर degrade होते हैं: full-context tokens burn करता है, retrieval compositions miss करता है। Paper generative skill composition प्रस्तावित करता है, जहां model एक fixed pool में से picking करने के बजाय ऑन-द-फ्लाई skill combinations synthesize करता है। Mechanism selection को retrieval से synthesis में reframes करता है, agent reasoning करता है कि task के लिए skills को कैसे combine करना है। बिल्डर्स के लिए, यह matter करता है क्योंकि skill libraries agents में reuse की natural unit हैं, और composition strategy shape करती है कि agent कितना procedural memory बिना context rot के carry करता है। Paper के full benchmark results के लिए देखें जो generative composition की तुलना standard agent suites पर retrieval baselines से करते हैं।

[10:56] TRIAGE Paper Proposes Role-Typed Credit Assignment for Agentic RL

TRIAGE agentic reinforcement learning के लिए एक role-typed credit assignment scheme है जो GRPO के flat outcome advantage के ऊपर एक semantic role axis जोड़ता है, ताकि search, click, edit, navigation, और object-interaction tokens एक learning signal share न करें। Authors Yuanda Xu, Zhengze Zhou, और Hejian Sang, arXiv 2606.32017 में, problem को directly frame करते हैं: GRPO का verifier-only reward everything को conflate करता है जो एक rollout ने produce किया, तो एक failed rollout में useful exploration step को wasted one की तरह punish किया जाता है, जबकि successful rollout में redundant steps को reinforce किया जाता है। TRIAGE एक structured judge insert करता है जो advantage compute होने से पहले प्रत्येक segment को role से classify करता है, और role label update को modulate करता है। Reported gains वहां concentrate होते हैं जहां rollouts dense tool use पर lean करते हैं। RL के साथ agent policies train करने वाले बिल्डर्स के लिए, result अगले optimization lever को stronger verifier से better credit-assignment layer की ओर reframes करता है। Judge model itself के लिए देखें, क्योंकि role classification quality नया bottleneck बन जाता है।

[11:54] Practical queue

आज की stories से: बिल्डर्स के लिए, release readout stack पर depend करने वाली चीज़ों को shift करता है — new default promote करने से पहले changelog को अपने pinned version के against diff करें। Claude Fable 5 का return एक frontier tier restore करता है Opus के above जिसे agent stacks एक router-friendly slug through reach कर सकते हैं, और immediate move एक bake-off है current Opus-class default के against। Sonnet 5 के लिए इसका क्या मतलब है: एक single Sonnet-class endpoint अब binary thinking toggle के बजाय एक tunable reasoning dial expose करता है। Image pipelines के लिए इसका क्या मतलब है: यदि आपका image-agent work Pro tier पर per-image cost या rate limits पर bottlenecked है, तो Flash-Lite endpoint high call volume के लिए purpose-built है। बिल्डर्स के लिए, Orca से practical signal यह है कि general world-model pre-training task-specific stacks का credible alternative बन रहा है। Agents-A1 matter करता है क्योंकि frontier agent performance को trillion-parameter serving budgets की अब आवश्यकता नहीं हो सकती — specialist-teacher distillation recipes frontier capability को deployable sizes में compress कर सकती हैं। OmniRoute एक self-hosted routing layer है जो एक coding agent और upstream model APIs के बीच बैठता है, ताकि एक single OpenAI-compatible base URL provider-by-provider client config की जगह ले। BlockPilot तर्क देता है कि block size एक learned, per-request decision होना चाहिए जो prompt के prefilling representations से driven हो। बढ़ती हुई skill libraries के साथ agent stacks चलाने वाले बिल्डर्स के लिए, generative skill composition brute-force context stuffing और embedding retrieval से generation-based composition की ओर shift signal करता है। RL के साथ agent policies train करने वाले बिल्डर्स के लिए, TRIAGE credit assignment को, verifier quality को नहीं, अगले optimization lever के रूप में reframes करता है।