MiniMax officially released MiniMax M3 on June 1, 2026. The model introduces MSA (MiniMax Sparse Attention), a new sparse attention architecture that gives M3 a 1M-token context window. M3 also supports image and video input and desktop computer operation natively. The API is live now.
MiniMax M3 is available today via MiniMax Code, the MiniMax Token Plan, and the MiniMax API. It is the next model in the M-series line after M2.7. MiniMax positions M3 as an open-weight model combining frontier-level coding performance, a 1M-token context window, and native multimodal input in a single architecture — the first to do so, per MiniMax. The corresponding model weights and technical report are scheduled for release within 10 days of launch.
MSA: MiniMax Sparse Attention
The central architectural change in MiniMax M3 is MSA (MiniMax Sparse Attention). Standard full attention has quadratic computational complexity: as context length grows, compute cost grows as the square of the sequence length. MSA is designed to address this.
Sparse attention mechanisms generally add a pre-filtering stage before computing attention, avoiding full quadratic cost. MiniMax team states that compared to approaches like DSA and MoBA, MSA partitions the KV cache into blocks more precisely, achieving higher effective context coverage.
At the operator level, MSA uses a “KV outer gather Q” approach. KV blocks serve as the outer loop to aggregate the queries that hit them. Each block is read only once and memory access is contiguous. MiniMax team reports this is more than 4× faster than open-source implementations such as Flash-Sparse-Attention and flash-moba under MiniMax M3’s head configuration.
The result: at a context length of 1 million tokens, MiniMax M3’s per-token compute is 1/20th that of the previous-generation M2 models. MiniMax team reports a speedup of more than 9× in the prefill stage and more than 15× in the decoding stage at 1M-token context. Across multiple ablation studies, MSA matched full attention on the majority of capabilities.
Coding and Agentic Benchmarks
Coding and agentic capabilities are key areas of improvement for M3. The benchmark results below are reported by MiniMax team. Several evaluations were run on MiniMax internal infrastructure, while some comparison scores were taken from official leaderboards or external benchmark sources, as noted in MiniMax’s methodology. SWE-Bench Verified was tested on internal infrastructure using Claude Code scaffolding and averaged over 4 runs. SWE-Bench Pro was also tested on internal infrastructure using Claude Code scaffolding, with testing logic aligned to the official evaluation.
- SWE-Bench Pro: 59.0% (surpasses GPT-5.5 and Gemini 3.1 Pro; approaches Opus 4.7)
- Terminal-Bench 2.1: 66.0%
- SWE-fficiency: 34.8%
- KernelBench Hard: 28.8% (evaluated on NVIDIA Blackwell GPUs, CUDA capability sm_120)
- MCP Atlas: 74.2%
- Claw-Eval: highest score among models evaluated (General Task Group, 161 tasks)
- SVG-Bench: surpasses Opus 4.7
On OmniDocBench, a multimodal document understanding benchmark, M3 scores above Gemini 3.1 Pro. On OSWorld-Verified (361 samples), M3 achieves a 70.06% task completion rate for computer use (Max Steps = 200).
MiniMax also built an interactive user simulator framework for training and evaluation. It simulates multi-turn developer collaboration: requirement elaboration, solution discussion, feedback-based correction, continuous task switching, and multi-round project iteration. This is intended to reduce the gap between single-turn benchmark performance and real-world, multi-turn developer workflows.
Native Multimodality
MiniMax M3 underwent mixed-modality training from step 0. Text, images, and video are trained together from the beginning rather than added post-training. MiniMax team reports that interleaved data — sequences where text and images are naturally intermixed — is more critical to model performance than commonly assumed. After rebuilding the entire data pipeline for interleaved formats, training data was scaled to the order of 100 trillion tokens.
MiniMax M3 supports image and video input and can operate a desktop computer.
Real-World Task Examples from MiniMax
MiniMax documents three internal tasks in the release post:
Paper reproduction: MiniMax gave MiniMax M3 the ICLR 2025 Outstanding Paper Award-winning paper Learning Dynamics of LLM Finetuning and asked it to reproduce the experiments independently. M3 ran autonomously for nearly 12 hours, produced 18 commits and 23 experimental figures, and completed the core experiments without human intervention. It required multimodal capability to read curves and formulas, long context to hold the paper and experiment logs simultaneously, and coding capability to execute the reproduction across a long thread.
CUDA kernel optimization: MiniMax asked MiniMax M3 to optimize an FP8 matrix multiplication (GEMM) kernel on NVIDIA Hopper architecture GPUs. The model started with only a task description, a benchmark evaluation script, and a non-functional Triton skeleton — no reference implementation was provided. Over approximately 24 hours, MiniMax M3 made 147 benchmark submissions and 1,959 tool calls. It progressed through baseline implementation, autotune configuration generation, performance bottleneck diagnosis, CUDA Graph integration, persistent kernel rewriting, and host-side scheduling optimization. After six landmark rounds of optimization, MiniMax M3 improved Hopper FP8 hardware peak utilization from 7.6% to 71.3%, a 9.4× speedup. The best solution appeared on the 145th submission. MiniMax notes that most other models stopped making new progress within the first 30 submissions; only Opus 4.7 and M3 continued beyond that point.
PostTrainBench (autonomous model training): MiniMax gave MiniMax M3 four base models that had completed pretraining only. MiniMax M3 autonomously ran the full data synthesis → training → evaluation → iteration cycle over 12 hours with no human intervention. The target was for the base models to acquire capabilities across mathematical reasoning (AIME2025), tool calling (BFCL), scientific knowledge reasoning (GPQA Main), arithmetic reasoning (GSM8K), and code generation (HumanEval). MiniMax M3 scored 0.37, below Opus 4.7 (0.42) and GPT-5.5 (0.39), but ahead of the other models tested.
Marktechpost’s Visual Explainer
Key Takeaways
- MiniMax M3 launched June 1, 2026; API is live now. MiniMax has committed to releasing open model weights and a technical report within 10 days.
- MSA (MiniMax Sparse Attention) delivers more than 9× prefill and more than 15× decoding speedup at 1M-token context versus M2, at 1/20th the per-token compute.
- M3 scores 59.0% on SWE-Bench Pro, surpassing GPT-5.5 and Gemini 3.1 Pro.
- M3 is natively multimodal from step 0, supporting image and video input, and achieves 70.06% on OSWorld-Verified for computer use.
Check out the Technical details. Also, feel free to follow us on Twitter and don’t forget to join our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us
