Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Who wants information facilities in house once they can float offshore?

    March 4, 2026

    Petrol, diesel prices may see sharp increase if Middle East conflict continues

    March 4, 2026

    Kraken Secures Federal Reserve Master Account: WSJ

    March 4, 2026
    Facebook X (Twitter) Instagram
    Wednesday, March 4
    Trending
    • Who wants information facilities in house once they can float offshore?
    • Petrol, diesel prices may see sharp increase if Middle East conflict continues
    • Kraken Secures Federal Reserve Master Account: WSJ
    • Battlefield Redsec Is Dumping Wildly Overpowered Marauder Marauder From Battle Royale
    • United Nations Development Programme UNDP Lahore Job 2026 2026 Job Advertisement Pakistan
    • Ontario students plan demonstration over Ford government’s OSAP changes
    • China develops AI able to summary thought
    • Iran women’s team have ‘so much concern’ about families at home
    • Eight Sleep raises $50M at $1.5B valuation
    • PSX plunges 1,350 factors on geopolitical tensions
    Facebook X (Twitter) Instagram Pinterest Vimeo
    The News92The News92
    • Home
    • World
    • National
    • Sports
    • Crypto
    • Travel
    • Lifestyle
    • Jobs
    • Insurance
    • Gaming
    • AI & Tech
    • Health & Fitness
    The News92The News92
    Home - AI & Tech - Physical Intelligence Team Unveils MEM for Robots: A Multi-Scale Memory System Giving Gemma 3-4B VLAs 15-Minute Context for Complex Tasks
    AI & Tech

    Physical Intelligence Team Unveils MEM for Robots: A Multi-Scale Memory System Giving Gemma 3-4B VLAs 15-Minute Context for Complex Tasks

    Naveed AhmadBy Naveed AhmadMarch 4, 2026No Comments3 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    Current end-to-end robotic policies, specifically Vision-Language-Action (VLA) models, typically operate on a single observation or a very short history. This ‘lack of memory’ makes long-horizon tasks, such as cleaning a kitchen or following a complex recipe, computationally intractable or prone to failure. To address this, researchers from Physical Intelligence, Stanford, UC Berkeley, and MIT have introduced Multi-Scale Embodied Memory (MEM).

    https://www.pi.website/download/Mem.pdf

    The Dual-Scale Memory Architecture

    MEM factorizes robotic memory into two distinct scales to balance semantic context with real-time control constraints.

    (1) Short-Term Video Memory

    For tasks requiring fine-grained spatial awareness—like resolving self-occlusions or adapting a grasp—dense visual data is required. MEM utilizes an efficient video encoder that extends standard Vision Transformers (ViTs). To maintain real-time inference (the 380ms ‘real-time barrier’), the architecture avoids joint attention over all patches. Instead, it uses Space-Time Separable Attention, interleaving spatial attention within frames with causal-temporal attention across frames every fourth layer.

    The computational complexity is reduced from O(n2K2) to O(Kn2+nK2), where n is the number of spatial patches and K is the number of timesteps. By dropping tokens from past timesteps in upper layers, the model passes only the current observation’s representation to the VLA backbone, keeping the token count invariant compared to single-frame models.

    (2) Long-Term Language Memory

    To handle tasks spanning up to 15 minutes, MEM uses a language-based representation for semantic events. The system decomposes the action prediction as:

    $$\pi(a_{t:t+H},l_{t+1},m_{t+1}|o_{t-T:t},m_{t},g) \approx\pi_{LL}(a_{t:t+H}|o_{t-K:t},l_{t+1},g)\pi_{HL}(l_{t+1},m_{t+1}|o_{t},m_{t},g)$$

    Here, a high-level policy (πHL) maintains a running language summary (mt) of past events and generates subtask instructions (lt+1) for a low-level policy (πLL). This language memory is trained using LLM-generated summaries that compress information (e.g., ‘I placed three bowls’ instead of individual attributes), reducing the risk of training-inference distribution shifts.

    https://www.pi.website/download/Mem.pdf

    Implementation and Performance

    The research team integrated MEM into the π0.6 VLA, which is initialized from a pre-trained Gemma 3-4B model. The model was pre-trained on a diverse mixture of robot demonstrations, vision-language tasks, and internet video data.

    Key Results:

    • In-Context Adaptation: MEM enables robots to adapt manipulation strategies based on recent failures. In evaluation, this led to a +62% success rate increase in opening refrigerators with unknown hinge directions and a +11% increase in picking up chopsticks at variable heights.
    • Long-Horizon Tasks: The model successfully performed 15-minute tasks like ‘Recipe Setup’ (retrieving ingredients from multiple locations) and ‘Kitchen Cleaning’ (washing dishes and wiping counters). Memory-less VLAs failed these tasks significantly more often.
    • Efficiency: The video encoder allows the model to process up to 16 observation frames (spanning ~1 minute) while remaining under critical real-time inference thresholds on a single NVIDIA H100 GPU.

    MEM demonstrates that combining dense, short-term visual tokens with compressed, long-term language summaries allows VLAs to scale their ‘working memory’ without incurring prohibitive computational costs.


    Check out the Paper and Technical details. Also, feel free to follow us on Twitter and don’t forget to join our 120k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.




    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticlePakistan seeks alternative oil supply route via Saudi Arabia after Hormuz Strait closure
    Next Article Babar, Saim dropped as 15-member squad for Bangladesh ODI series announced
    Naveed Ahmad
    • Website
    • Tumblr

    Related Posts

    AI & Tech

    Who wants information facilities in house once they can float offshore?

    March 4, 2026
    AI & Tech

    Eight Sleep raises $50M at $1.5B valuation

    March 4, 2026
    AI & Tech

    The brand new MacBook Professional laptops are as a lot as $400 costlier than their predecessors. Thank the RAM scarcity.

    March 4, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Demo
    Top Posts

    How to Get a Bigger Penis – The Stem Cell Secret to Natural Penis Enlargement & A Quiz

    February 22, 20261 Views

    10 Totally different Methods to Safe Your Enterprise Premises

    February 19, 20261 Views

    Oatly loses ‘milk’ branding battle in UK Supreme Courtroom

    February 12, 20261 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Demo
    Most Popular

    How to Get a Bigger Penis – The Stem Cell Secret to Natural Penis Enlargement & A Quiz

    February 22, 20261 Views

    10 Totally different Methods to Safe Your Enterprise Premises

    February 19, 20261 Views

    Oatly loses ‘milk’ branding battle in UK Supreme Courtroom

    February 12, 20261 Views
    Our Picks

    Who wants information facilities in house once they can float offshore?

    March 4, 2026

    Petrol, diesel prices may see sharp increase if Middle East conflict continues

    March 4, 2026

    Kraken Secures Federal Reserve Master Account: WSJ

    March 4, 2026

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions
    • Advertise
    • Disclaimer
    © 2026 TheNews92.com. All Rights Reserved. Unauthorized reproduction or redistribution of content is strictly prohibited.

    Type above and press Enter to search. Press Esc to cancel.