Liquid AI has launched LFM2.5-1.2B-Considering, a 1.2 billion parameter reasoning mannequin that runs totally on gadget and matches in about 900 MB on a contemporary cellphone. What wanted an information middle 2 years in the past can now run offline on shopper {hardware}, with a concentrate on structured reasoning traces, device use, and math, quite than normal chat.
Place within the LFM2.5 household and core specs
LFM2.5-1.2B-Considering is a part of the LFM2.5 household of Liquid Basis Fashions, which extends the sooner LFM2 structure with extra pre-training and multi stage reinforcement studying for edge deployment.
The mannequin is textual content solely and normal objective with the next configuration:
- 1.17B parameters, reported as a 1.2B class mannequin
- 16 layers, with 10 double gated LIV convolution blocks and 6 GQA blocks
- Coaching finances of 28T tokens
- Context size of 32,768 tokens
- Vocabulary measurement of 65,536
- 8 languages, English, Arabic, Chinese language, French, German, Japanese, Korean, Spanish
Reasoning first conduct and considering traces
The ‘Considering’ variant is educated particularly for reasoning. At inference time it produces inner considering traces earlier than the ultimate reply. These traces are chains of intermediate steps that the mannequin makes use of to plan device calls, confirm partial outcomes, and work by multi step directions.
Liquid AI group recommends this mannequin for agentic duties, information extraction pipelines, and retrieval augmented era flows the place you need specific reasoning and verifiable intermediate steps. A sensible means to consider it, you utilize LFM2.5-1.2B-Considering because the planning mind inside brokers and instruments, and use different fashions while you want broad world data or code heavy workflows.
Benchmarks versus different 1B class fashions
Liquid AI group evaluates LFM2.5-1.2B-Considering in opposition to fashions round 1B parameters on a set of reasoning and instruction benchmarks.

In comparison with LFM2.5-1.2B-Instruct, three metrics enhance strongly, math reasoning rises from about 63 to 88 on MATH 500, instruction following rises from about 61 to 69 on Multi IF, and gear use rises from about 49 to 57 on BFCLv3.
LFM2.5-1.2B-Considering competes with Qwen3-1.7B in considering mode on most reasoning benchmarks whereas utilizing round 40 p.c fewer parameters and fewer output tokens on common. It additionally outperforms different 1B class baselines comparable to Granite-4.0-H-1B, Granite-4.0-1B, Gemma-3-1B-IT, and Llama-3.2-1B Instruct on many of those duties.
Coaching recipe and doom looping mitigation
Reasoning fashions typically undergo from doom looping, the place the mannequin repeats fragments of its chain of thought as an alternative of ending the reply. LFM2.5-1.2B-Considering makes use of a multi stage coaching pipeline to cut back this.
The method begins with mid coaching that features reasoning traces so the mannequin learns a ‘purpose first then reply’ sample. Then supervised high-quality tuning on artificial chains improves chain of thought era. After that, choice alignment and RLVR are utilized. In choice alignment, the analysis group generates 5 temperature sampled candidates and 1 grasping candidate per immediate and makes use of an LLM choose to choose most well-liked and rejected outputs, whereas additionally labeling looping outputs explicitly. Throughout RLVR they add an n gram repetition penalty early in coaching. This reduces the doom loop price from 15.74 p.c at mid coaching to 0.36 p.c after RLVR on a set of consultant prompts.
The result’s a small reasoning mannequin that may produce considering traces with out getting caught in lengthy repetitive outputs, which is essential for interactive brokers and on gadget UX.
Inference efficiency and {hardware} footprint
A key design goal is quick inference with a small reminiscence footprint on CPUs and NPUs. LFM2.5-1.2B-Considering can decode at about 239 tokens per second on an AMD CPU and about 82 tokens per second on a cellular NPU, whereas operating below 1 GB of reminiscence, with broad day one help for llama.cpp, MLX, and vLLM.
The detailed {hardware} desk makes use of 1K prefill and 100 decode tokens and offers the next examples for LFM2.5-1.2B-Considering


These numbers present that the mannequin matches comfortably below 1 GB on telephones and embedded gadgets whereas sustaining helpful throughputs even at lengthy contexts.
Key Takeaways
- LFM2.5-1.2B-Considering is a 1.17B parameter reasoning mannequin with 32,768 context size and runs below 1 GB on telephones and laptops.
- The mannequin is optimized for specific considering traces, agentic workflows, information extraction, and RAG.
- It reaches sturdy scores for a 1B class mannequin, for instance 87.96 on MATH 500, 85.60 on GSM8K, and aggressive efficiency with Qwen3 1.7B in considering mode with fewer parameters.
- The coaching pipeline makes use of midtraining with reasoning traces, supervised high-quality tuning, choice alignment with 5 sampled together with 1 grasping candidate, and RLVR with n gram penalties, which reduces doom loops from 15.74 p.c to 0.36 p.c.
- The mannequin runs effectively on AMD and Qualcomm NPUs and CPUs with runtimes like llama.cpp, FastFlowLM, and NexaML, is accessible in GGUF, ONNX, and MLX codecs, and may be loaded simply from Hugging Face for on gadget deployment.
Internet hosting Suppliers/Deployment
You’ll be able to entry or host the mannequin by the next suppliers and platforms:
Cloud & API Suppliers
Mannequin Repositories (Self-Internet hosting)
If you wish to run the mannequin domestically or by yourself infrastructure, the weights can be found in numerous codecs:

