MiniMax Releases MiniMax M3 with MSA Architecture Supporting 1M-Token

MiniMax officially released MiniMax M3 on June 1, 2026. The model introduces MSA (MiniMax Sparse Attention), a new sparse attention architecture that gives M3 a 1M-token context window. M3 also supports image and video input and desktop computer operation natively. The API is live now.

MiniMax M3 is available today via MiniMax Code, the MiniMax Token Plan, and the MiniMax API. It is the next model in the M-series line after M2.7. MiniMax positions M3 as an open-weight model combining frontier-level coding performance, a 1M-token context window, and native multimodal input in a single architecture — the first to do so, per MiniMax. The corresponding model weights and technical report are scheduled for release within 10 days of launch.

MSA: MiniMax Sparse Attention

The central architectural change in MiniMax M3 is MSA (MiniMax Sparse Attention). Standard full attention has quadratic computational complexity: as context length grows, compute cost grows as the square of the sequence length. MSA is designed to address this.

Sparse attention mechanisms generally add a pre-filtering stage before computing attention, avoiding full quadratic cost. MiniMax team states that compared to approaches like DSA and MoBA, MSA partitions the KV cache into blocks more precisely, achieving higher effective context coverage.

At the operator level, MSA uses a “KV outer gather Q” approach. KV blocks serve as the outer loop to aggregate the queries that hit them. Each block is read only once and memory access is contiguous. MiniMax team reports this is more than 4× faster than open-source implementations such as Flash-Sparse-Attention and flash-moba under MiniMax M3’s head configuration.

The result: at a context length of 1 million tokens, MiniMax M3’s per-token compute is 1/20th that of the previous-generation M2 models. MiniMax team reports a speedup of more than 9× in the prefill stage and more than 15× in the decoding stage at 1M-token context. Across multiple ablation studies, MSA matched full attention on the majority of capabilities.

Coding and Agentic Benchmarks

Coding and agentic capabilities are key areas of improvement for M3. The benchmark results below are reported by MiniMax team. Several evaluations were run on MiniMax internal infrastructure, while some comparison scores were taken from official leaderboards or external benchmark sources, as noted in MiniMax’s methodology. SWE-Bench Verified was tested on internal infrastructure using Claude Code scaffolding and averaged over 4 runs. SWE-Bench Pro was also tested on internal infrastructure using Claude Code scaffolding, with testing logic aligned to the official evaluation.

SWE-Bench Pro: 59.0% (surpasses GPT-5.5 and Gemini 3.1 Pro; approaches Opus 4.7)
Terminal-Bench 2.1: 66.0%
SWE-fficiency: 34.8%
KernelBench Hard: 28.8% (evaluated on NVIDIA Blackwell GPUs, CUDA capability sm_120)
MCP Atlas: 74.2%
Claw-Eval: highest score among models evaluated (General Task Group, 161 tasks)
SVG-Bench: surpasses Opus 4.7

On OmniDocBench, a multimodal document understanding benchmark, M3 scores above Gemini 3.1 Pro. On OSWorld-Verified (361 samples), M3 achieves a 70.06% task completion rate for computer use (Max Steps = 200).

MiniMax also built an interactive user simulator framework for training and evaluation. It simulates multi-turn developer collaboration: requirement elaboration, solution discussion, feedback-based correction, continuous task switching, and multi-round project iteration. This is intended to reduce the gap between single-turn benchmark performance and real-world, multi-turn developer workflows.

Native Multimodality

MiniMax M3 underwent mixed-modality training from step 0. Text, images, and video are trained together from the beginning rather than added post-training. MiniMax team reports that interleaved data — sequences where text and images are naturally intermixed — is more critical to model performance than commonly assumed. After rebuilding the entire data pipeline for interleaved formats, training data was scaled to the order of 100 trillion tokens.

MiniMax M3 supports image and video input and can operate a desktop computer.

Real-World Task Examples from MiniMax

MiniMax documents three internal tasks in the release post:

Paper reproduction: MiniMax gave MiniMax M3 the ICLR 2025 Outstanding Paper Award-winning paper Learning Dynamics of LLM Finetuning and asked it to reproduce the experiments independently. M3 ran autonomously for nearly 12 hours, produced 18 commits and 23 experimental figures, and completed the core experiments without human intervention. It required multimodal capability to read curves and formulas, long context to hold the paper and experiment logs simultaneously, and coding capability to execute the reproduction across a long thread.

CUDA kernel optimization: MiniMax asked MiniMax M3 to optimize an FP8 matrix multiplication (GEMM) kernel on NVIDIA Hopper architecture GPUs. The model started with only a task description, a benchmark evaluation script, and a non-functional Triton skeleton — no reference implementation was provided. Over approximately 24 hours, MiniMax M3 made 147 benchmark submissions and 1,959 tool calls. It progressed through baseline implementation, autotune configuration generation, performance bottleneck diagnosis, CUDA Graph integration, persistent kernel rewriting, and host-side scheduling optimization. After six landmark rounds of optimization, MiniMax M3 improved Hopper FP8 hardware peak utilization from 7.6% to 71.3%, a 9.4× speedup. The best solution appeared on the 145th submission. MiniMax notes that most other models stopped making new progress within the first 30 submissions; only Opus 4.7 and M3 continued beyond that point.

PostTrainBench (autonomous model training): MiniMax gave MiniMax M3 four base models that had completed pretraining only. MiniMax M3 autonomously ran the full data synthesis → training → evaluation → iteration cycle over 12 hours with no human intervention. The target was for the base models to acquire capabilities across mathematical reasoning (AIME2025), tool calling (BFCL), scientific knowledge reasoning (GPQA Main), arithmetic reasoning (GSM8K), and code generation (HumanEval). MiniMax M3 scored 0.37, below Opus 4.7 (0.42) and GPT-5.5 (0.39), but ahead of the other models tested.

Marktechpost’s Visual Explainer

Overview

MiniMax M3: Frontier Coding, 1M-Token Context, Native Multimodality

MiniMax officially released M3 on June 1, 2026. The API is live now. Model weights and technical report will be open-sourced within 10 days.

M3 is the next model in the M-series line after M2.7. MiniMax positions it as the first open-weight model to combine all three of the following in a single architecture:

1M
Token Context Window

59.0%
SWE-Bench Pro Score

MSA
Sparse Attention Architecture

70.06%
OSWorld-Verified (Computer Use)

Architecture

MSA: MiniMax Sparse Attention

Standard full attention has quadratic computational complexity. As context length grows, compute cost grows as the square of the sequence length. MSA is designed to solve this at the operator level.

Compared to approaches like DSA and MoBA, MSA partitions the KV cache into blocks more precisely, achieving higher effective context coverage.

MSA uses a “KV outer gather Q” approach — each KV block is read only once, memory access is contiguous, and arithmetic intensity is significantly better than common methods.

>9×
Prefill Speedup at 1M ctx

>15×
Decoding Speedup at 1M ctx

1/20
Per-token compute vs M2 at 1M

>4×
Faster than Flash-Sparse-Attn

Benchmarks

Coding and Agentic Performance

Results reported by MiniMax. SWE-Bench Verified used Claude Code scaffolding, averaged over 4 runs. SWE-Bench Pro used Claude Code scaffolding, aligned to official evaluation.

SWE-Bench Pro: 59.0% — surpasses GPT-5.5 and Gemini 3.1 Pro; approaches Opus 4.7
Terminal-Bench 2.1: 66.0%
SWE-fficiency: 34.8%
KernelBench Hard: 28.8% — evaluated on NVIDIA Blackwell GPUs (sm_120)
MCP Atlas: 74.2%
Claw-Eval: Highest score among models evaluated (161 tasks)
SVG-Bench: Surpasses Opus 4.7
OmniDocBench: Above Gemini 3.1 Pro
OSWorld-Verified: 70.06% — 361 samples, Max Steps = 200

Multimodality

Native Multimodal Training from Step 0

M3 underwent mixed-modality training from step 0. Text, images, and video are trained together from the start — not added as a post-training capability.

MiniMax reports that interleaved data — sequences where text and images are naturally intermixed — is more critical to model performance than commonly assumed.

After rebuilding the entire data pipeline for interleaved formats, training data was scaled to the order of 100 trillion tokens.

Image input
Video input
Desktop computer operation (computer use)

Real-World Tasks

Three Internal Tasks Documented by MiniMax

Paper Reproduction — M3 reproduced the ICLR 2025 paper Learning Dynamics of LLM Finetuning autonomously over ~12 hours, producing 18 commits and 23 experimental figures with no human intervention.
CUDA Kernel Optimization — M3 optimized an FP8 GEMM kernel on NVIDIA Hopper GPUs over ~24 hours: 147 benchmark submissions, 1,959 tool calls, 6 landmark optimization rounds. Improved Hopper FP8 peak utilization from 7.6% → 71.3% (9.4× speedup). Best solution appeared on submission 145.
PostTrainBench — M3 autonomously ran data synthesis → training → evaluation → iteration for 4 base models over 12 hours. Scored 0.37, below Opus 4.7 (0.42) and GPT-5.5 (0.39), but ahead of other evaluated models. Targets: AIME2025, BFCL, GPQA Main, GSM8K, HumanEval.

MiniMax Code

MiniMax Code: Agent Product Built and Trained with M3

MiniMax Code is an agent product built and trained together with M3. Available at agent.minimaxi.com/download. Works with MiniMax Token Plans.

Agent Teams — multiple agents run concurrent, multi-stage, dynamically adjustable workflows
Producer + Verifier loop — adversarial harness enables continuous self-correction during execution
Computer use — M3’s native multimodal capability enables cross-application desktop automation
Built on OpenCode and Pi — MiniMax states it plans to open-source MiniMax Code in the future

// Example use case
User (on phone): “Open the local ERP client
and batch-enter invoice data from this Excel file.”
→ MiniMax Code handles operations across
applications, files, and systems on desktop.

API & Pricing

API Details and Token Plan Tiers

The M3 API is live at platform.minimax.io.

Pricing by input length: Calls ≤512K tokens → standard rate. Calls >512K → higher long-context rate.

Thinking mode: Toggle on/off at request time. Both modes share the same pricing.

Service tiers: standard (default) and priority (service_tier=priority) — priority available via sales, opening to all users soon.

Plus
~1.7B tokens/mo
$20/mo

Max
~5.1B tokens/mo
$50/mo

Ultra
~9.8B tokens/mo
$120/mo

Text, image, speech, and music usage all draw from the same token pool.

Key Takeaways

What Engineers and Researchers Need to Know

MiniMax M3 launched June 1, 2026. API is live. Open model weights and technical report committed within 10 days.
MSA delivers >9× prefill and >15× decoding speedup at 1M-token context vs M2, at 1/20th the per-token compute.
M3 scores 59.0% on SWE-Bench Pro, surpassing GPT-5.5 and Gemini 3.1 Pro.
Natively multimodal from step 0 — supports image, video input, and 70.06% on OSWorld-Verified for computer use.
Thinking mode toggleable at request time. Token Plan starts at $20/month (~1.7B M3 tokens).

Key Takeaways

MiniMax M3 launched June 1, 2026; API is live now. MiniMax has committed to releasing open model weights and a technical report within 10 days.
MSA (MiniMax Sparse Attention) delivers more than 9× prefill and more than 15× decoding speedup at 1M-token context versus M2, at 1/20th the per-token compute.
M3 scores 59.0% on SWE-Bench Pro, surpassing GPT-5.5 and Gemini 3.1 Pro.
M3 is natively multimodal from step 0, supporting image and video input, and achieves 70.06% on OSWorld-Verified for computer use.

Introducing MiniMax M3: The First Open-Weights Model to Combine Three Frontier Capabilities
– Coding & Agentic Frontier: 59.0% SWE-Bench Pro, 66.0% Terminal Bench 2.1, 34.8% SWE-fficiency, 28.8% KernelBench Hard, 74.2% MCP Atlas
– MiniMax Sparse Attention scales context to 1M
-… pic.twitter.com/TF891iJukF
— MiniMax (official) (@MiniMax_AI) June 1, 2026

Check out the Technical details. Also, feel free to follow us on Twitter and don’t forget to join our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us

Source link

CEO & Founder

Moiz Ahmad

MiniMax Releases MiniMax M3 with MSA Architecture Supporting 1M-Token Context, Native Multimodality, and Agentic Coding

MSA: MiniMax Sparse Attention

Coding and Agentic Benchmarks

Native Multimodality

Real-World Task Examples from MiniMax

Marktechpost’s Visual Explainer

MiniMax M3: Frontier Coding, 1M-Token Context, Native Multimodality

MSA: MiniMax Sparse Attention

Coding and Agentic Performance

Native Multimodal Training from Step 0

Three Internal Tasks Documented by MiniMax

MiniMax Code: Agent Product Built and Trained with M3

API Details and Token Plan Tiers

What Engineers and Researchers Need to Know

Key Takeaways

Leave a Reply Cancel reply

Trending News

Crypto

World

National

CEO & Founder

MSA: MiniMax Sparse Attention

Coding and Agentic Benchmarks

Native Multimodality

Real-World Task Examples from MiniMax

Marktechpost’s Visual Explainer

MiniMax M3: Frontier Coding, 1M-Token Context, Native Multimodality

MSA: MiniMax Sparse Attention

Coding and Agentic Performance

Native Multimodal Training from Step 0

Three Internal Tasks Documented by MiniMax

MiniMax Code: Agent Product Built and Trained with M3

API Details and Token Plan Tiers

What Engineers and Researchers Need to Know

Key Takeaways

Leave a Reply Cancel reply

Related News

Popular News

Trending News

Recent News