Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Ninja Turtles return sooner: “Mutant Mayhem 2”

    February 22, 2026

    Pakistan’s India boycott splits followers as politics overshadows cricket

    February 22, 2026

    How many Pakistanis perform informal work?

    February 22, 2026
    Facebook X (Twitter) Instagram
    Sunday, February 22
    Trending
    • Ninja Turtles return sooner: “Mutant Mayhem 2”
    • Pakistan’s India boycott splits followers as politics overshadows cricket
    • How many Pakistanis perform informal work?
    • Daily Football Winners – Football Tips from Luke Powell
    • What the ‘Year of the Horse’ means for romance
    • Bitcoin’s Network Distribution Factor Plunge Signals A Redistribution Event
    • Sq. Enix Video Says ‘NieR: Automata to Be Continued’
    • Nadra Jobs in Sargodha Region February 2026 Advertisement
    • 5G cellular gadgets to hit Pakistani market quickly – Enterprise
    • Beyond the boycott
    Facebook X (Twitter) Instagram Pinterest Vimeo
    The News92The News92
    • Home
    • World
    • National
    • Sports
    • Crypto
    • Travel
    • Lifestyle
    • Jobs
    • Insurance
    • Gaming
    • AI & Tech
    • Health & Fitness
    The News92The News92
    Home - AI & Tech - A New Google AI Research Proposes Deep-Thinking Ratio to Improve LLM Accuracy While Cutting Total Inference Costs by Half
    AI & Tech

    A New Google AI Research Proposes Deep-Thinking Ratio to Improve LLM Accuracy While Cutting Total Inference Costs by Half

    Naveed AhmadBy Naveed AhmadFebruary 22, 2026No Comments4 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    For the last few years, the AI world has followed a simple rule: if you want a Large Language Model (LLM) to solve a harder problem, make its Chain-of-Thought (CoT) longer. But new research from the University of Virginia and Google proves that ‘thinking long’ is not the same as ‘thinking hard’.

    The research team reveals that simply adding more tokens to a response can actually make an AI less accurate. Instead of counting words, the Google researchers introduce a new measurement: the Deep-Thinking Ratio (DTR).

    https://arxiv.org/pdf/2602.13517

    The Failure of ‘Token Maxing‘

    Engineers often use token count as a proxy for the effort an AI puts into a task. However, the researchers found that raw token count has an average correlation of r= -0.59 with accuracy.

    This negative number means that as the model generates more text, it is more likely to be wrong. This happens because of ‘overthinking,’ where the model gets stuck in loops, repeats redundant steps, or amplifies its own mistakes. Relying on length alone wastes expensive compute on uninformative tokens.

    What are Deep-Thinking Tokens?

    The research team argued that real ‘thinking’ happens inside the layers of the model, not just in the final output. When a model predicts a token, it processes data through a series of transformer layers (L).

    1. Shallow Tokens: For easy words, the model’s prediction stabilizes early. The ‘guess’ doesn’t change much from layer 5 to layer 36.
    2. Deep-Thinking Tokens: For difficult logic or math symbols, the prediction shifts significantly in the deeper layers.

    How to Measure Depth

    To identify these tokens, the research team uses a technique to peek at the model’s internal ‘drafts’ at every layer. They project the intermediate hidden states (htl) into the vocabulary space using the model’s unembedding matrix (WU). This produces a probability distribution (pt,l) for every layer.

    They then calculate the Jensen-Shannon Divergence (JSD) between the intermediate layer distribution and the final layer distribution (pt,L):

    Dt,l := JSD(pt,L || pt,l)

    A token is a deep-thinking token if its prediction only settles in the ‘late regime’—defined by a depth fraction (⍴). In their tests, they set ⍴= 0.85, meaning the token only stabilized in the final 15% of the layers.

    The Deep-Thinking Ratio (DTR) is the percentage of these ‘hard’ tokens in a full sequence. Across models like DeepSeek-R1-70B, Qwen3-30B-Thinking, and GPT-OSS-120B, DTR showed a strong average positive correlation of r = 0.683 with accuracy.

    https://arxiv.org/pdf/2602.13517

    Think@n: Better Accuracy at 50% the Cost

    The research team used this innovative approach to create Think@n, a new way to scale AI performance during inference.

    Most devs use Self-Consistency (Cons@n), where they sample 48 different answers and use majority voting to pick the best one. This is very expensive because you have to generate every single token for every answer.

    Think@n changes the game by using ‘early halting’:

    • The model starts generating multiple candidate answers.
    • After just 50 prefix tokens, the system calculates the DTR for each candidate.
    • It immediately stops generating the ‘unpromising’ candidates with low DTR.
    • It only finishes the candidates with high deep-thinking scores.

    The Results on AIME 2025

    MethodAccuracyAvg. Cost (k tokens)
    Cons@n (Majority Vote)92.7% 307.6
    Think@n (DTR-based Selection)94.7% 155.4

    On the AIME 25 math benchmark, Think@n achieved higher accuracy than standard voting while reducing the inference cost by 49%.

    Key Takeaways

    • Token count is a poor predictor of accuracy: Raw output length has an average negative correlation (r = -0.59) with performance, meaning longer reasoning traces often signal ‘overthinking’ rather than higher quality.
    • Deep-thinking tokens define true effort: Unlike simple tokens that stabilize in early layers, deep-thinking tokens are those whose internal predictions undergo significant revision in deeper model layers before converging.
    • The Deep-Thinking Ratio (DTR) is a superior metric: DTR measures the proportion of deep-thinking tokens in a sequence and exhibits a robust positive correlation with accuracy (average r = 0.683), consistently outperforming length-based or confidence-based baselines.
    • Think@n enables efficient test-time scaling: By prioritizing and finishing only the samples with high deep-thinking ratios, the Think@n strategy matches or exceeds the performance of standard majority voting (Cons@n).
    • Massive cost reduction via early halting: Because DTR can be estimated from a short prefix of just 50 tokens, unpromising generations can be rejected early, reducing total inference costs by approximately 50%.

    Check out the Paper. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.




    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleI Will Draw Your Past Life | Past Life Portraits
    Next Article Beyond the boycott
    Naveed Ahmad
    • Website
    • Tumblr

    Related Posts

    AI & Tech

    How to Design an Agentic Workflow for Tool-Driven Route Optimization with Deterministic Computation and Structured Outputs

    February 22, 2026
    AI & Tech

    Sam Altman would love remind you that people use numerous power, too

    February 22, 2026
    AI & Tech

    Is There a Community Edition of Palantir? Meet OpenPlanter: An Open Source Recursive AI Agent for Your Micro Surveillance Use Cases

    February 22, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Demo
    Top Posts

    Oatly loses ‘milk’ branding battle in UK Supreme Courtroom

    February 12, 20261 Views

    Ninja Turtles return sooner: “Mutant Mayhem 2”

    February 22, 20260 Views

    Pakistan’s India boycott splits followers as politics overshadows cricket

    February 22, 20260 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Demo
    Most Popular

    Oatly loses ‘milk’ branding battle in UK Supreme Courtroom

    February 12, 20261 Views

    Ninja Turtles return sooner: “Mutant Mayhem 2”

    February 22, 20260 Views

    Pakistan’s India boycott splits followers as politics overshadows cricket

    February 22, 20260 Views
    Our Picks

    Ninja Turtles return sooner: “Mutant Mayhem 2”

    February 22, 2026

    Pakistan’s India boycott splits followers as politics overshadows cricket

    February 22, 2026

    How many Pakistanis perform informal work?

    February 22, 2026

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions
    • Advertise
    • Disclaimer
    © 2026 TheNews92.com. All Rights Reserved. Unauthorized reproduction or redistribution of content is strictly prohibited.

    Type above and press Enter to search. Press Esc to cancel.