Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Legendary Sega Engineer Behind Genesis Dies At The Age Of 77

    February 15, 2026

    India A clinch victory over Pakistan A in Asia Cup Rising Stars Women’s 2026

    February 15, 2026

    The great computer science exodus (and where students are going instead)

    February 15, 2026
    Facebook X (Twitter) Instagram
    Sunday, February 15
    Trending
    • Legendary Sega Engineer Behind Genesis Dies At The Age Of 77
    • India A clinch victory over Pakistan A in Asia Cup Rising Stars Women’s 2026
    • The great computer science exodus (and where students are going instead)
    • High-voltage Pakistan-India clash faces rain threat
    • What Is a BC Game Redeem Code & How to Maximize Its Benefits
    • Nadra BIG Recruitment in Pakistan February 2026 Advertisement
    • Ethereum Bearish Sentiment Intensifies As Taker Purchase Promote Ratio Drops
    • Interview: Bringing Back Fatal Frame II for the Remake
    • Quantum Shift Code
    • Pakistan bundled out for 93 against India
    Facebook X (Twitter) Instagram Pinterest Vimeo
    The News92The News92
    • Home
    • World
    • National
    • Sports
    • Crypto
    • Travel
    • Lifestyle
    • Jobs
    • Insurance
    • Gaming
    • AI & Tech
    • Health & Fitness
    The News92The News92
    Home - AI & Tech - Meet ‘Kani-TTS-2’: A 400M Param Open Source Text-to-Speech Model that Runs in 3GB VRAM with Voice Cloning Support
    AI & Tech

    Meet ‘Kani-TTS-2’: A 400M Param Open Source Text-to-Speech Model that Runs in 3GB VRAM with Voice Cloning Support

    Naveed AhmadBy Naveed AhmadFebruary 15, 2026No Comments4 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email






    The landscape of generative audio is shifting toward efficiency. A new open-source contender, Kani-TTS-2, has been released by the team at nineninesix.ai. This model marks a departure from heavy, compute-expensive TTS systems. Instead, it treats audio as a language, delivering high-fidelity speech synthesis with a remarkably small footprint.

    Kani-TTS-2 offers a lean, high-performance alternative to closed-source APIs. It is currently available on Hugging Face in both English (EN) and Portuguese (PT) versions.

    The Architecture: LFM2 and NanoCodec

    Kani-TTS-2 follows the ‘Audio-as-Language‘ philosophy. The model does not use traditional mel-spectrogram pipelines. Instead, it converts raw audio into discrete tokens using a neural codec.

    The system relies on a two-stage process:

    1. The Language Backbone: The model is built on LiquidAI’s LFM2 (350M) architecture. This backbone generates ‘audio intent’ by predicting the next audio tokens. Because LFM (Liquid Foundation Models) are designed for efficiency, they provide a faster alternative to standard transformers.
    2. The Neural Codec: It uses the NVIDIA NanoCodec to turn those tokens into 22kHz waveforms.

    By using this architecture, the model captures human-like prosody—the rhythm and intonation of speech—without the ‘robotic’ artifacts found in older TTS systems.

    Efficiency: 10,000 Hours in 6 Hours

    The training metrics for Kani-TTS-2 are a masterclass in optimization. The English model was trained on 10,000 hours of high-quality speech data.

    While that scale is impressive, the speed of training is the real story. The research team trained the model in only 6 hours using a cluster of 8 NVIDIA H100 GPUs. This proves that massive datasets no longer require weeks of compute time when paired with efficient architectures like LFM2.

    Zero-Shot Voice Cloning and Performance

    The standout feature for developers is zero-shot voice cloning. Unlike traditional models that require fine-tuning for new voices, Kani-TTS-2 uses speaker embeddings.

    • How it works: You provide a short reference audio clip.
    • The result: The model extracts the unique characteristics of that voice and applies them to the generated text instantly.

    From a deployment perspective, the model is highly accessible:

    • Parameter Count: 400M (0.4B) parameters.
    • Speed: It features a Real-Time Factor (RTF) of 0.2. This means it can generate 10 seconds of speech in roughly 2 seconds.
    • Hardware: It requires only 3GB of VRAM, making it compatible with consumer-grade GPUs like the RTX 3060 or 4050.
    • License: Released under the Apache 2.0 license, allowing for commercial use.

    Key Takeaways

    • Efficient Architecture: The model uses a 400M parameter backbone based on LiquidAI’s LFM2 (350M). This ‘Audio-as-Language’ approach treats speech as discrete tokens, allowing for faster processing and more human-like intonation compared to traditional architectures.
    • Rapid Training at Scale: Kani-TTS-2-EN was trained on 10,000 hours of high-quality speech data in just 6 hours using 8 NVIDIA H100 GPUs.
    • Instant Zero-Shot Cloning: There is no need for fine-tuning to replicate a specific voice. By providing a short reference audio clip, the model uses speaker embeddings to instantly synthesize text in the target speaker’s voice.
    • High Performance on Edge Hardware: With a Real-Time Factor (RTF) of 0.2, the model can generate 10 seconds of audio in approximately 2 seconds. It requires only 3GB of VRAM, making it fully functional on consumer-grade GPUs like the RTX 3060.
    • Developer-Friendly Licensing: Released under the Apache 2.0 license, Kani-TTS-2 is ready for commercial integration. It offers a local-first, low-latency alternative to expensive closed-source TTS APIs.

    Check out the Model Weight. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.


    Michal Sutter is a data science professional with a Master of Science in Data Science from the University of Padova. With a solid foundation in statistical analysis, machine learning, and data engineering, Michal excels at transforming complex datasets into actionable insights.






    Previous articleGetting Started with OpenClaw and Connecting It with WhatsApp




    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleCM Maryam desires ‘grooming’ of Punjab Police to take care of residents respectfully – Pakistan
    Next Article Pakistan bundled out for 93 against India
    Naveed Ahmad
    • Website
    • Tumblr

    Related Posts

    AI & Tech

    The great computer science exodus (and where students are going instead)

    February 15, 2026
    AI & Tech

    Getting Started with OpenClaw and Connecting It with WhatsApp

    February 15, 2026
    AI & Tech

    Google AI Introduces the WebMCP to Enable Direct and Structured Website Interactions for New AI Agents

    February 15, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Demo
    Top Posts

    Oatly loses ‘milk’ branding battle in UK Supreme Courtroom

    February 12, 20261 Views

    ‘Fly excessive my angel’: 12-year-old lady dies by suicide amid bullying allegations

    February 7, 20261 Views

    Lenovo’s Qira is a Guess on Ambient, Cross-device AI—and on a New Type of Working System

    January 30, 20261 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Demo
    Most Popular

    Oatly loses ‘milk’ branding battle in UK Supreme Courtroom

    February 12, 20261 Views

    ‘Fly excessive my angel’: 12-year-old lady dies by suicide amid bullying allegations

    February 7, 20261 Views

    Lenovo’s Qira is a Guess on Ambient, Cross-device AI—and on a New Type of Working System

    January 30, 20261 Views
    Our Picks

    Legendary Sega Engineer Behind Genesis Dies At The Age Of 77

    February 15, 2026

    India A clinch victory over Pakistan A in Asia Cup Rising Stars Women’s 2026

    February 15, 2026

    The great computer science exodus (and where students are going instead)

    February 15, 2026

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions
    • Advertise
    • Disclaimer
    © 2026 TheNews92.com. All Rights Reserved. Unauthorized reproduction or redistribution of content is strictly prohibited.

    Type above and press Enter to search. Press Esc to cancel.