Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Elston File

    February 20, 2026

    Page Not Found – Tipster Reviews

    February 20, 2026

    Nationwide Forensics Company NFA Islamabad Jobs Discover 2026 2026 Job Commercial Pakistan

    February 20, 2026
    Facebook X (Twitter) Instagram
    Friday, February 20
    Trending
    • Elston File
    • Page Not Found – Tipster Reviews
    • Nationwide Forensics Company NFA Islamabad Jobs Discover 2026 2026 Job Commercial Pakistan
    • “We all know Pakistan and counter them,” says Kiwi participant
    • ECB urged to ‘act fast’ as Pakistan players face The Hundred snub threat
    • The OpenAI mafia: 18 startups founded by alumni
    • Bridging the hole: Gender equality in vocational coaching in Sindh
    • Could insight from cats lead to breast cancer cures?
    • Pakistan’s Got Talent auditions kick off
    • XRP ‘Coiling’ for a Breakout? Liquidity Patterns Mirror Earlier Explosive Rallies
    Facebook X (Twitter) Instagram Pinterest Vimeo
    The News92The News92
    • Home
    • World
    • National
    • Sports
    • Crypto
    • Travel
    • Lifestyle
    • Jobs
    • Insurance
    • Gaming
    • AI & Tech
    • Health & Fitness
    The News92The News92
    Home - AI & Tech - NVIDIA Releases Dynamo v0.9.0: A Massive Infrastructure Overhaul Featuring FlashIndexer, Multi-Modal Support, and Removed NATS and ETCD
    AI & Tech

    NVIDIA Releases Dynamo v0.9.0: A Massive Infrastructure Overhaul Featuring FlashIndexer, Multi-Modal Support, and Removed NATS and ETCD

    Naveed AhmadBy Naveed AhmadFebruary 20, 2026No Comments4 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    NVIDIA has just released Dynamo v0.9.0. This is the most significant infrastructure upgrade for the distributed inference framework to date. This update simplifies how large-scale models are deployed and managed. The release focuses on removing heavy dependencies and improving how GPUs handle multi-modal data.

    The Great Simplification: Removing NATS and etcd

    The biggest change in v0.9.0 is the removal of NATS and ETCD. In previous versions, these tools handled service discovery and messaging. However, they added ‘operational tax’ by requiring developers to manage extra clusters.

    NVIDIA replaced these with a new Event Plane and a Discovery Plane. The system now uses ZMQ (ZeroMQ) for high-performance transport and MessagePack for data serialization. For teams using Kubernetes, Dynamo now supports Kubernetes-native service discovery. This change makes the infrastructure leaner and easier to maintain in production environments.

    Multi-Modal Support and the E/P/D Split

    Dynamo v0.9.0 expands multi-modal support across 3 main backends: vLLM, SGLang, and TensorRT-LLM. This allows models to process text, images, and video more efficiently.

    A key feature in this update is the E/P/D (Encode/Prefill/Decode) split. In standard setups, a single GPU often handles all 3 stages. This can cause bottlenecks during heavy video or image processing. v0.9.0 introduces Encoder Disaggregation. You can now run the Encoder on a separate set of GPUs from the Prefill and Decode workers. This allows you to scale your hardware based on the specific needs of your model.

    Sneak Preview: FlashIndexer

    This release includes a sneak preview of FlashIndexer. This component is designed to solve latency issues in distributed KV cache management.

    When working with large context windows, moving Key-Value (KV) data between GPUs is a slow process. FlashIndexer improves how the system indexes and retrieves these cached tokens. This results in a lower Time to First Token (TTFT). While still a preview, it represents a major step toward making distributed inference feel as fast as local inference.

    Smart Routing and Load Estimation

    Managing traffic across 100s of GPUs is difficult. Dynamo v0.9.0 introduces a smarter Planner that uses predictive load estimation.

    The system uses a Kalman filter to predict the future load of a request based on past performance. It also supports routing hints from the Kubernetes Gateway API Inference Extension (GAIE). This allows the network layer to communicate directly with the inference engine. If a specific GPU group is overloaded, the system can route new requests to idle workers with higher precision.

    The Technical Stack at a Glance

    The v0.9.0 release updates several core components to their latest stable versions. Here is the breakdown of the supported backends and libraries:

    ComponentVersion
    vLLMv0.14.1
    SGLangv0.5.8
    TensorRT-LLMv1.3.0rc1
    NIXLv0.9.0
    Rust Coredynamo-tokens crate

    The inclusion of the dynamo-tokens crate, written in Rust, ensures that token handling remains high-speed. For data transfer between GPUs, Dynamo continues to leverage NIXL (NVIDIA Inference Transfer Library) for RDMA-based communication.

    Key Takeaways

    1. Infrastructure Decoupling (Goodbye NATS and ETCD): The release completes the modernization of the communication architecture. By replacing NATS and ETCD with a new Event Plane (using ZMQ and MessagePack) and Kubernetes-native service discovery, the system removes the ‘operational tax’ of managing external clusters.
    2. Full Multi-Modal Disaggregation (E/P/D Split): Dynamo now supports a complete Encode/Prefill/Decode (E/P/D) split across all 3 backends (vLLM, SGLang, and TRT-LLM). This allows you to run vision or video encoders on separate GPUs, preventing compute-heavy encoding tasks from bottlenecking the text generation process.
    3. FlashIndexer Preview for Lower Latency :The ‘sneak preview’ of FlashIndexer introduces a specialized component to optimize distributed KV cache management. It is designed to make the indexing and retrieval of conversation ‘memory’ significantly faster, aimed at further reducing the Time to First Token (TTFT).
    4. Smarter Scheduling with Kalman Filters: The system now uses predictive load estimation powered by Kalman filters. This allows the Planner to forecast GPU load more accurately and handle traffic spikes proactively, supported by routing hints from the Kubernetes Gateway API Inference Extension (GAIE).

    Check out the GitHub Release here. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.




    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous Article£42,000 roaming invoice almost bankrupts household agency after TikTok use overseas
    Next Article Pakistani cricketers including Usman, Saim sign up for The Hundred 2026 auction
    Naveed Ahmad
    • Website
    • Tumblr

    Related Posts

    AI & Tech

    The OpenAI mafia: 18 startups founded by alumni

    February 20, 2026
    AI & Tech

    YouTube’s newest experiment brings its conversational AI software to TVs

    February 20, 2026
    AI & Tech

    Why these startup CEOs don’t think AI will replace human roles

    February 20, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Demo
    Top Posts

    Oatly loses ‘milk’ branding battle in UK Supreme Courtroom

    February 12, 20261 Views

    ‘Fly excessive my angel’: 12-year-old lady dies by suicide amid bullying allegations

    February 7, 20261 Views

    Elston File

    February 20, 20260 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Demo
    Most Popular

    Oatly loses ‘milk’ branding battle in UK Supreme Courtroom

    February 12, 20261 Views

    ‘Fly excessive my angel’: 12-year-old lady dies by suicide amid bullying allegations

    February 7, 20261 Views

    Elston File

    February 20, 20260 Views
    Our Picks

    Elston File

    February 20, 2026

    Page Not Found – Tipster Reviews

    February 20, 2026

    Nationwide Forensics Company NFA Islamabad Jobs Discover 2026 2026 Job Commercial Pakistan

    February 20, 2026

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions
    • Advertise
    • Disclaimer
    © 2026 TheNews92.com. All Rights Reserved. Unauthorized reproduction or redistribution of content is strictly prohibited.

    Type above and press Enter to search. Press Esc to cancel.